Extracting data from table in HTML file using python -

i have following table html file.

<table border="0" summary="" width="100%"> <a name="table.header.1"><!-- --></a> <table border="1" cellpadding="3" cellspacing="0" summary="" width="100%"> <tr bgcolor="#ccccff" class="tableheadingcolor"> <th align="left" colspan="3"><font size="+2"> table.header.1</font></th> </tr> <tr bgcolor="#ccccff" class="tablesubheadingcolor"> <th align="left">query fields</th> <th align="left">query orderings</th> </tr> <tr> <td width="50%">row1</td> <td>no</td> </tr> <tr> <td width="50%">row2</td> <td>yes</td> </tr> <tr> <td width="50%">row3</td> <td>no</td> </tr> </table> <br/> <table border="0" summary="" width="100%"> <a name="table.header.2"><!-- --></a> <table border="1" cellpadding="3" cellspacing="0" summary="" width="100%"> <tr bgcolor="#ccccff" class="tableheadingcolor"> <th align="left" colspan="3"><font size="+2"> table.header.2</font></th> </tr> <tr bgcolor="#ccccff" class="tablesubheadingcolor"> <th align="left">query fields</th> <th align="left">query orderings</th> </tr> <tr> <td width="50%">row5</td> <td>no</td> </tr> <tr> <td width="50%">row6</td> <td>yes</td> </tr> <tr> <td width="50%">row7</td> <td>no</td> </tr> <tr> <td width="50%">row8</td> <td>yes</td> </tr> </table>

what extract table headers , corresponding rows.

table.findall('th', limit=160)

the above code gives me

table.header.1 table.header.2

and gets data table headers using

tags = table.findall('td')

but above code gives me data:

row1 row2 row3 row5 row6 row7 row8

but not helping me because need know "rows" belong table header.

i want extract , save in form of dictionary.

tablerows = {'table.header.1': ['row1', 'row2', 'row3'],              'table.header.2': ['row5', 'row6', 'row7', 'row8']              }

i using beautifulsoup parse data.

any pointers on how can achieve above?

Search This Blog

Breniser

Extracting data from table in HTML file using python -

Comments

Post a Comment

Popular posts from this blog

4x4 Matrix in Python -

python - PyInstaller UAC not working in onefile mode -

javascript - Building and updating array objects -