1

I have a fixed stracture of data array that comes in a table I saved the html with the table now im trying to take the table that is inside the html file and put it in a csv file to organize the data (with a code I saw here) and im keep getting an error

this is the code (taken from this site)

# Importing the required modules 
import os
import sys
import pandas as pd
from bs4 import BeautifulSoup

path = 'file location' + 'file.html'

# empty list
data = []

# for getting the header from
# the HTML file
list_header = []
soup = BeautifulSoup(open(path),'html.parser')
header = soup.find_all("table")[0].find("tr")

for items in header:
    try:
        list_header.append(items.get_text())
    except:
        continue

# for getting the data 
HTML_data = soup.find_all("table")[0].find_all("tr")[1:]

for element in HTML_data:
    sub_data = []
    for sub_element in element:
        try:
            sub_data.append(sub_element.get_text())
        except:
            continue
    data.append(sub_data)

# Storing the data into Pandas
# DataFrame 
dataFrame = pd.DataFrame(data = data, columns = list_header)

# Converting Pandas DataFrame
# into CSV file
dataFrame.to_csv('Geeks.csv')

but get this error

File, line 38, in <module>
dataFrame = pd.DataFrame(data = data, columns = list_header)
File, line 509, in __init__
arrays, columns = to_arrays(data, columns, dtype=dtype)
File, line 524, in to_arrays
return _list_to_arrays(data, columns, coerce_float=coerce_float, dtype=dtype)
File, line 567, in _list_to_arrays
raise ValueError(e) from e
ValueError: 1 columns passed, passed data had 7 columns

what am I doing wrong ?

1
  • Just reading the error, ValueError: 1 columns passed, passed data had 7 columns, it looks like your header has 1 element but your data has 7 elements per row; or maybe it's the other way around. Print your header and the first row and see if the columns match. Commented Dec 28, 2021 at 23:28

1 Answer 1

2

The big thing I see that's missing in your sample code is that you're not iterating td elements inside every row element, maybe the for sub_element in element line does the cell iteration, but it's not clear. For yourself, and anyone else who needs to read your code (like, us 🙂), I recommend being very explicit with finding and iterating elements.

I don't have BeautifulSoup (BS) or Pandas installed, but I'd like to offer the following as a template for explicitly traversing your table's hierarchy. I'm using Python's standard xml and csv modules. (I think the BS API is similar enough to ElementTree to guide you)

Here's a very simple HTML with a table, input.html:

<html>
<body>
<table>
    <tr><td>Col1</td><td>Col2</td><td>Col3</td></tr>
    <tr><td>Row1Col1</td><td>Row1Col2</td><td>Row1Col3</td></tr>
    <tr><td>Row2Col1</td><td>Row2Col2</td><td>Row2Col3</td></tr>
    <tr><td>Row3Col1</td><td>Row3Col2</td><td>Row3Col3</td></tr>
    <tr><td>Row4Col1</td><td>Row4Col2</td><td>Row4Col3</td></tr>
</table>
</body>
</html>
import csv
from xml.etree import ElementTree as ET

# Accumulate rows from the table
all_rows = []

root = ET.parse('input.html')

# Get the table once
my_table = root.find('.//table')

# Iterate rows (tr) for that table
for elem_tr in my_table.findall('tr'):
    new_row = []

    # Iterate cells (td) per row
    for elem_td in elem_tr.findall('td'):
        # Build your row cell-by-cell
        new_row.append(elem_td.text)

    # Save finished row
    all_rows.append(new_row)

# Finally write all rows
with open('out.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(all_rows)

When I run that, here's my out.csv:

Col1,Col2,Col3
Row1Col1,Row1Col2,Row1Col3
Row2Col1,Row2Col2,Row2Col3
Row3Col1,Row3Col2,Row3Col3
Row4Col1,Row4Col2,Row4Col3
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.