0

I have a string which contains the source code of a html file extracted through mechanize library. The html file will always contain a table like this. I want to convert the table to CSV Format.

Several SO questions which address the same problem have the table with a class name. But my table doesnt have a class attribute. So what should i do...?

<table border=1 cellPadding="2" cellSpacing="0" width="75%"  bordercolor="#000000" >

  <tr bgcolor="mediumblue">
    <td width="20%"><p align="center"><font face="Arial" color="white" size="2"><strong>SUB CODE</strong></font></p></td>
    <td width="26%"><p align="left"><font face="Arial" color="white" size="2"><strong>SUB NAME</strong></font></p></td>
    <td width="13%"><p align="left"><font face="Arial" color="white" size="2"><strong>THEORY</strong></font></p>  </td>
    <td width="10%"><p align="left"><font face="Arial" color="white" size="2"><strong>PRACTICAL</strong></font></p> </td>
    <td width="17%"><p align="left"><font face="Arial" color="white" size="2"><strong>MARKS</strong></font></p></td>
    <td width="14%"><p align="center"><font face="Arial" color="white" size="2"><strong>GRADE</strong></font></p></td>
  </tr>


  <tr bgColor="#ffffff">
    <td align="middle"><font face="Arial" size=2> 301</font></td>
    <td align="left" ><font face="Arial" size=2>ENGLISH CORE</font></td>
    <td align="left" ><font face="Arial" size=2>067</font></td>
    <td align="left" ><font face="Arial" size=2></font></td>
    <td align="left" ><font face="Arial" size=2>067&nbsp;&nbsp;&nbsp;&nbsp;</font></td>
    <td align="middle"><font face="Arial" size=2>C2</font></td>
  </tr>

  </table>
4
  • Is there a reason why you cannot give it a class? Commented May 27, 2015 at 9:37
  • because i am getting the code directly from a library called mechanize Commented May 27, 2015 at 9:38
  • If there is only going to be one table in the string that you need to convert, can you identify it by the table element tag, rather than a class name? Commented May 27, 2015 at 9:40
  • no...there are more than 1 tables Commented May 27, 2015 at 9:41

1 Answer 1

2

pandas has a neat way to read html tables.

import pandas as pd

html_data = '''
<table border=1 cellPadding="2" cellSpacing="0" width="75%"  bordercolor="#000000" >

  <tr bgcolor="mediumblue">
    <td width="20%"><p align="center"><font face="Arial" color="white" size="2"><strong>SUB CODE</strong></font></p></td>
    <td width="26%"><p align="left"><font face="Arial" color="white" size="2"><strong>SUB NAME</strong></font></p></td>
    <td width="13%"><p align="left"><font face="Arial" color="white" size="2"><strong>THEORY</strong></font></p>  </td>
    <td width="10%"><p align="left"><font face="Arial" color="white" size="2"><strong>PRACTICAL</strong></font></p> </td>
    <td width="17%"><p align="left"><font face="Arial" color="white" size="2"><strong>MARKS</strong></font></p></td>
    <td width="14%"><p align="center"><font face="Arial" color="white" size="2"><strong>GRADE</strong></font></p></td>
  </tr>


  <tr bgColor="#ffffff">
    <td align="middle"><font face="Arial" size=2> 301</font></td>
    <td align="left" ><font face="Arial" size=2>ENGLISH CORE</font></td>
    <td align="left" ><font face="Arial" size=2>067</font></td>
    <td align="left" ><font face="Arial" size=2></font></td>
    <td align="left" ><font face="Arial" size=2>067&nbsp;&nbsp;&nbsp;&nbsp;</font></td>
    <td align="middle"><font face="Arial" size=2>C2</font></td>
  </tr>

  </table>
'''

print pd.read_html(html_data)[0].to_csv(index=False, header=False)

When where's multiple tables in html, you can check column names of the table, to remove unneeded ones.

Sign up to request clarification or add additional context in comments.

2 Comments

print pd.read_html(html_data)[0].to_csv(index=False,header=False) TypeError: to_csv() takes at least 2 arguments (3 given)
@PRP using Python2, pandas 0.16.0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.