Extracting data from table in HTML file using python -
i have following table html file.
<table border="0" summary="" width="100%"> <a name="table.header.1"><!-- --></a> <table border="1" cellpadding="3" cellspacing="0" summary="" width="100%"> <tr bgcolor="#ccccff" class="tableheadingcolor"> <th align="left" colspan="3"><font size="+2"> table.header.1</font></th> </tr> <tr bgcolor="#ccccff" class="tablesubheadingcolor"> <th align="left">query fields</th> <th align="left">query orderings</th> </tr> <tr> <td width="50%">row1</td> <td>no</td> </tr> <tr> <td width="50%">row2</td> <td>yes</td> </tr> <tr> <td width="50%">row3</td> <td>no</td> </tr> </table> <br/> <table border="0" summary="" width="100%"> <a name="table.header.2"><!-- --></a> <table border="1" cellpadding="3" cellspacing="0" summary="" width="100%"> <tr bgcolor="#ccccff" class="tableheadingcolor"> <th align="left" colspan="3"><font size="+2"> table.header.2</font></th> </tr> <tr bgcolor="#ccccff" class="tablesubheadingcolor"> <th align="left">query fields</th> <th align="left">query orderings</th> </tr> <tr> <td width="50%">row5</td> <td>no</td> </tr> <tr> <td width="50%">row6</td> <td>yes</td> </tr> <tr> <td width="50%">row7</td> <td>no</td> </tr> <tr> <td width="50%">row8</td> <td>yes</td> </tr> </table>
what extract table headers , corresponding rows.
table.findall('th', limit=160)
the above code gives me
table.header.1 table.header.2
and gets data table headers using
tags = table.findall('td')
but above code gives me data:
row1 row2 row3 row5 row6 row7 row8
but not helping me because need know "rows" belong table header.
i want extract , save in form of dictionary.
tablerows = {'table.header.1': ['row1', 'row2', 'row3'], 'table.header.2': ['row5', 'row6', 'row7', 'row8'] }
i using beautifulsoup parse data.
any pointers on how can achieve above?
Comments
Post a Comment