Extracting data from table in HTML file using python -


i have following table html file.

<table border="0" summary="" width="100%"> <a name="table.header.1"><!-- --></a> <table border="1" cellpadding="3" cellspacing="0" summary="" width="100%"> <tr bgcolor="#ccccff" class="tableheadingcolor"> <th align="left" colspan="3"><font size="+2"> table.header.1</font></th> </tr> <tr bgcolor="#ccccff" class="tablesubheadingcolor"> <th align="left">query fields</th> <th align="left">query orderings</th> </tr> <tr> <td width="50%">row1</td> <td>no</td> </tr> <tr> <td width="50%">row2</td> <td>yes</td> </tr> <tr> <td width="50%">row3</td> <td>no</td> </tr> </table> <br/> <table border="0" summary="" width="100%"> <a name="table.header.2"><!-- --></a> <table border="1" cellpadding="3" cellspacing="0" summary="" width="100%"> <tr bgcolor="#ccccff" class="tableheadingcolor"> <th align="left" colspan="3"><font size="+2"> table.header.2</font></th> </tr> <tr bgcolor="#ccccff" class="tablesubheadingcolor"> <th align="left">query fields</th> <th align="left">query orderings</th> </tr> <tr> <td width="50%">row5</td> <td>no</td> </tr> <tr> <td width="50%">row6</td> <td>yes</td> </tr> <tr> <td width="50%">row7</td> <td>no</td> </tr> <tr> <td width="50%">row8</td> <td>yes</td> </tr> </table> 

what extract table headers , corresponding rows.

table.findall('th', limit=160) 

the above code gives me

table.header.1 table.header.2 

and gets data table headers using

tags = table.findall('td') 

but above code gives me data:

row1 row2 row3 row5 row6 row7 row8 

but not helping me because need know "rows" belong table header.

i want extract , save in form of dictionary.

tablerows = {'table.header.1': ['row1', 'row2', 'row3'],              'table.header.2': ['row5', 'row6', 'row7', 'row8']              } 

i using beautifulsoup parse data.

any pointers on how can achieve above?


Comments

Popular posts from this blog

javascript - Clear button on addentry page doesn't work -

c# - Selenium Authentication Popup preventing driver close or quit -

tensorflow when input_data MNIST_data , zlib.error: Error -3 while decompressing: invalid block type -