python - Why doesn't findall work on the table data I pulled from javascript (using python3) -
i can 2 parts of want separately, not together. i'm using anaconda 3 distribution.
the tables want dynamically loaded javascript tables , want extract them use them in pandas , sqllite3.
this give me text output, delivered separate lines each cell, of information want:
import sys pyqt5.qtwidgets import qapplication pyqt5.qtcore import qurl pyqt5.qtwebkitwidgets import qwebpage import bs4 bs import requests import pandas pd class client(qwebpage): def __init__(self, url): self.app = qapplication(sys.argv) qwebpage.__init__(self) self.loadfinished.connect(self.on_page_load) self.mainframe().load(qurl(url)) self.app.exec() def on_page_load(self): self.app.quit() #only run until page loads f = open('hmarathont.txt', 'w') url = 'http://results.houstonmarathon.com/2017/?page=2&event=mara&pid=search&search%5bclub%5d=%25&search%5bcompany%5d=%25&search%5bnation%5d=%25&search_sort=name' client_response = client(url) source = client_response.mainframe().tohtml() soup =bs.beautifulsoup(source, 'lxml') js_table = soup.find("table", {"class": "list-table"}) js_table_content = js_table.text f.write(js_table_content)
which great, imagine taking text file , parsing right format (currently it's line each cell , want original format) not best way it, when there way direct table straight pandas.
i have had success getting tables pandas, code wrote below, not dynamically loaded tables. have code works taking csv table data sqlite3, that's why export csv dataframe.
rows = js_table.findall('tr')[1:] data = {'ovrl_pl' :[],'gndr_pl' :[],'place_div' :[],'raw_name' :[],'gender' :[],'state' :[],'age_cat' :[],'age' :[],'time_net' :[],'time_gun' :[]} row in rows: cols = row.find_all('td') data['ovrl_pl'].append( cols[0].get_text() ) data['gndr_pl'].append( cols[1].get_text() ) data['place_div'].append( cols[2].get_text() ) data['raw_name'].append( cols[3].get_text() ) data['gender'].append( cols[4].get_text() ) data['state'].append( cols[5].get_text() ) data['age_cat'].append( cols[6].get_text() ) data['age'].append( cols[7].get_text() ) data['time_net'].append( cols[8].get_text() ) data['time_gun'].append( cols[9].get_text() ) hmarathon = pd.dataframe(data) hmarathon.tocsv("hmarathon2017test.csv")
the error keep getting @ point try use findall on js_table. nonetype, i'm guessing existed briefly , gone? (or maybe don't understand of :-/ ).
i managed capture table cell contents text, why can't capture dictionary?
Comments
Post a Comment