python - Scrape Entire Website for Image URL's Only -
a client has retained me collect list of images on website. database huge mess, images stored on place (some in s3, on local server). need produce list of images migrate s3 new hosting company moving website to.
i've tried crawling database dump using regexp , image list coming not match site using.
what i'm looking do: unleash python script crawl entire website image url's. website wordpress, there lot of .jpg?8127 , such going on. don't care those, can clean output later.
so, objectives are:
-write python script follows every link on website, parses output image links. -dumps results text file cleanup , review
i looking @ using https://pypi.python.org/pypi/imagescraper part of since seems make sense.
how might best go this?
i think need check scrapy project. scrapy can write crawler , using pipeline save images o url of images.
Comments
Post a Comment