web scraping - CSS selector in Python BeautifulSoup -


i have built simple scraper looking @ airbnb listings. goal go through given site (i.e. this one).

first_page = beautifulsoup(requests.get("https://www.airbnb.com/s/copenhagen--denmark/homes?allow_override%5b%5d=&s_tag=khqeqtpz&section_offset=1").text, 'html.parser') listings = first_page.find_all('div', 'listing-card-wrapper') listing in listings:     print(listing.select("#listing-15616363 > div.infocontainer_v72lrv > > div.ellipsized_1iurgbx > div > span:nth-child(1) > span:nth-child(1)")) 

the code correctly loops through 18 elements on page. however, prints 18 empty arrays indicating listing.select statement not working. got css tag chrome dev tools copy selector function.

this because listing-15616363 specific every listing (notice format listing-{listing_id}) , there no class has id = 'listing-15616363' among looped listings.

for instance, if want fetch url, can :

listing.find('a', class_ = "linkcontainer_55zci1")['href'] 

alternatively, can use python lxml order of magnitude faster beautifulsoup (if used), :

import requests lxml import html  url = "https://www.airbnb.com/s/copenhagen--denmark/homes?allow_override%5b%5d=&s_tag=khqeqtpz&section_offset=1"  response = requests.get(url) root = html.fromstring(response.content) result_list = []  def remove_non_ascii(text) :     return ''.join([i if ord(i) < 128 else '' in text])  currency = root.xpath('//div[@itemprop="offers"]/meta[@itemprop="pricecurrency"]/@content')[0].strip()  row in root.xpath('//div[contains(@class, "listing-card-wrapper")]') :      if row :         url = row.xpath('.//a[@class="linkcontainer_55zci1"]/@href')[0].strip()         title = row.xpath('.//div[@class="ellipsized_1iurgbx"]/span/text()')[0].strip()         price = remove_non_ascii(row.xpath('.//div[@class="inline_g86r3e"]/span//text()')[0].strip())          result_list.append({'url' : "https://www.airbnb.com" + url,              'title' : title, 'price' : price, 'currency' : currency})  print result_list 

this result in :

[{'url': 'https://www.airbnb.com/rooms/5316912', 'currency': 'inr', 'price': u' 3,823', 'title': 'small city  apt. next metro'}, {'url': 'https://www.airbnb.com/rooms/16989400', 'currency': 'inr', 'price': u' 2,347', 'title': 'cozy room close city center'}, {'url': 'https://www.airbnb.com/rooms/17628374', 'currency': 'inr', 'price': u' 6,774', 'title': 'cosy, quiet apartment in downtown copenhagen'}, {'url': 'https://www.airbnb.com/rooms/1206721', 'currency': 'inr', 'price': u' 4,426', 'title': 'apt.close metro, airport , chp'}, {'url': 'https://www.airbnb.com/rooms/13813273', 'currency': 'inr', 'price': u' 3,622', 'title': 'large room in vesterbro'}, {'url': 'https://www.airbnb.com/rooms/14083881', 'currency': 'inr', 'price': u' 9,322', 'title': 'city room'}, {'url': 'https://www.airbnb.com/rooms/6221130', 'currency': 'inr', 'price': u' 5,365', 'title': 'cosy flat 2 min central statio'}, {'url': 'https://www.airbnb.com/rooms/15804159', 'currency': 'inr', 'price': u' 3,823', 'title': 'cozy, central near waterfront. quality breakfast!'}, {'url': 'https://www.airbnb.com/rooms/17266268', 'currency': 'inr', 'price': u' 3,756', 'title': 'cosy room in frederiksberg'}, {'url': 'https://www.airbnb.com/rooms/2647233', 'currency': 'inr', 'price': u' 3,353', 'title': 'bedroom & living room frederiksberg'}, {'url': 'https://www.airbnb.com/rooms/12083235', 'currency': 'inr', 'price': u' 5,969', 'title': 'wonderful copenhagen right here'}, {'url': 'https://www.airbnb.com/rooms/7787976', 'currency': 'inr', 'price': u' 7,042', 'title': 'homely renovated flat garden'}, {'url': 'https://www.airbnb.com/rooms/17556785', 'currency': 'inr', 'price': u' 1,610', 'title': u'small cosy home above our caf\xe9 ( breakfast incl )'}, {'url': 'https://www.airbnb.com/rooms/894420', 'currency': 'inr', 'price': u' 10,261', 'title': 'wonderful apt. right in city!'}, {'url': 'https://www.airbnb.com/rooms/17028460', 'currency': 'inr', 'price': u' 7,847', 'title': 'nyhavn 3-bed apartment families'}, {'url': 'https://www.airbnb.com/rooms/17651114', 'currency': 'inr', 'price': u' 6,371', 'title': 'spacious place canals in heart of copenhagen'}, {'url': 'https://www.airbnb.com/rooms/10564051', 'currency': 'inr', 'price': u' 3,420', 'title': u'\u623f\u95f4\u5728\u54e5\u672c\u54c8\u6839\u7684\u5fc3\u810f'}, {'url': 'https://www.airbnb.com/rooms/17709435', 'currency': 'inr', 'price': u' 2,951', 'title': u'hyggelig lejlighed t\xe6t p\xe5 centrum.'}] 

you can refer documentation scraping , lxml further understanding.


Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -