python - Scrapy, login with captcha failed -


i'm using following spider crawling tinyz.us website requires authentication.

from scrapy.spiders import basespider scrapy.http import formrequest import urllib2   class start(basespider):     name = 'test'     start_urls = ["http://tinyz.us"]      def parse(self, response):          user_agent = 'mozilla/4.0 (compatible; msie 5.5; windows nt)'         headers = {'user-agent': user_agent}         imgrequest = urllib2.request("http://tinyz.us/securimage/securimage_show.php", headers=headers)         imgdata = urllib2.urlopen(imgrequest).read()          open('captcha.png', 'wb') f:             f.write(imgdata)          captcha = raw_input("-----> enter captcha in manually :")          return formrequest.from_response(             response=response,             formdata={"login_user": "myusername",                       "login_password": "mypass",                       "captcha_code": captcha},             formxpath="//*[@id='login-form']",             callback=self.after_login,             headers=headers)      def after_login(self, response):         print("after login")         open('response.html', 'w') f:             f.write(response.body) 

the website uses constant url generating captcha , seems each time generates new one. i'm not familiar respective tech way tend around problem saving captcha , passing manually.

the problem returns failed response, i'm not sure if problem because of way scrapy passes data form or because of captcha , can't find way debug spider properly.

ok, problem here captcha image needs receive cookies actual response, , using urllib2 make captcha request, scrapy isn't handling default.

use scrapy request check captcha, like:

def parse(self, response):     yield request(url="http://tinyz.us/securimage/securimage_show.php", callback=self.parse_captcha, meta={'previous_response': response})  def parse_captcha(self, response):     open('captcha.png', 'wb') f:         f.write(response.body)      captcha = raw_input("-----> enter captcha in manually :")      return formrequest.from_response(         response=response.meta['previous_response'],         formdata={"login_user": "myusername",                   "login_password": "mypass",                   "captcha_code": captcha},         formxpath="//*[@id='login-form']",         callback=self.after_login) 

Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -