python - Scrapy, login with captcha failed -
i'm using following spider crawling tinyz.us website requires authentication.
from scrapy.spiders import basespider scrapy.http import formrequest import urllib2 class start(basespider): name = 'test' start_urls = ["http://tinyz.us"] def parse(self, response): user_agent = 'mozilla/4.0 (compatible; msie 5.5; windows nt)' headers = {'user-agent': user_agent} imgrequest = urllib2.request("http://tinyz.us/securimage/securimage_show.php", headers=headers) imgdata = urllib2.urlopen(imgrequest).read() open('captcha.png', 'wb') f: f.write(imgdata) captcha = raw_input("-----> enter captcha in manually :") return formrequest.from_response( response=response, formdata={"login_user": "myusername", "login_password": "mypass", "captcha_code": captcha}, formxpath="//*[@id='login-form']", callback=self.after_login, headers=headers) def after_login(self, response): print("after login") open('response.html', 'w') f: f.write(response.body)
the website uses constant url generating captcha , seems each time generates new one. i'm not familiar respective tech way tend around problem saving captcha , passing manually.
the problem returns failed response, i'm not sure if problem because of way scrapy passes data form
or because of captcha , can't find way debug spider properly.
ok, problem here captcha image needs receive cookies actual response, , using urllib2
make captcha request, scrapy isn't handling default.
use scrapy request check captcha, like:
def parse(self, response): yield request(url="http://tinyz.us/securimage/securimage_show.php", callback=self.parse_captcha, meta={'previous_response': response}) def parse_captcha(self, response): open('captcha.png', 'wb') f: f.write(response.body) captcha = raw_input("-----> enter captcha in manually :") return formrequest.from_response( response=response.meta['previous_response'], formdata={"login_user": "myusername", "login_password": "mypass", "captcha_code": captcha}, formxpath="//*[@id='login-form']", callback=self.after_login)
Comments
Post a Comment