2

I am trying to scrape table data from web page by using below code but getting error:

ValueError: could not convert string to float: 'False' in this line data = (tabulate(df[0], headers='keys', tablefmt='psql') )

import pandas as pd
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate

res = requests.get("http://rerait.telangana.gov.in/PrintPreview/PrintPreview/UHJvamVjdElEPTQmRGl2aXNpb249MSZVc2VySUQ9MjAyODcmUm9sZUlEPTEmQXBwSUQ9NSZBY3Rpb249U0VBUkNIJkNoYXJhY3RlckQ9MjImRXh0QXBwSUQ9")
soup = BeautifulSoup(res.content,'html.parser')

table_data = []

for i in range(len(soup.find_all('table'))):

    table = soup.find_all('table')[i] 
    df = pd.read_html(str(table))
    data = (tabulate(df[0], headers='keys', tablefmt='psql') )
    print (data)

df_1 = pd.DataFrame(data)
df_1.to_csv('D:/out_table.csv')

Error:

Traceback (most recent call last):

  File "<ipython-input-128-30edd695db38>", line 15, in <module>
    data = (tabulate(df[0], headers='keys', tablefmt='psql') )

  File "D:\Conda\lib\site-packages\tabulate.py", line 1286, in tabulate
    for c, ct, fl_fmt, miss_v in zip(cols, coltypes, float_formats, missing_vals)]

  File "D:\Conda\lib\site-packages\tabulate.py", line 1286, in <listcomp>
    for c, ct, fl_fmt, miss_v in zip(cols, coltypes, float_formats, missing_vals)]

  File "D:\Conda\lib\site-packages\tabulate.py", line 1285, in <listcomp>
    cols = [[_format(v, ct, fl_fmt, miss_v, has_invisible) for v in c]

  File "D:\Conda\lib\site-packages\tabulate.py", line 754, in _format
    return format(float(val), floatfmt)

ValueError: could not convert string to float: 'False'
6
  • please post stacktrace as well Commented Nov 5, 2018 at 9:33
  • 10
    It's what it says -- you cannot convert the string "False" to a floating point number. Commented Nov 5, 2018 at 9:33
  • Can you check if your data contains NULL? stackoverflow.com/questions/44826250/… Commented Nov 5, 2018 at 9:38
  • I have updated my question Commented Nov 5, 2018 at 9:45
  • @SilverSlash, how to solve it? Commented Nov 5, 2018 at 9:58

1 Answer 1

2

The error is self-explanatory. You can't convert the string 'False' to float. What you can do is force your dataframe to numeric via pd.to_numeric, replacing non-convertible values with NaN, which is float:

dfs = pd.read_html(str(table))
dfs[0] = dfs[0].iloc[:].apply(pd.to_numeric, errors='coerce')
data = tabulate(dfs[0], headers='keys', tablefmt='psql')
Sign up to request clarification or add additional context in comments.

6 Comments

error: ` File "<ipython-input-148-3637055db8e6>", line 17, in <module> df[0] = pd.to_numeric(df[0], errors='coerce') File "D:\Conda\lib\site-packages\pandas\core\tools\numeric.py", line 120, in to_numeric raise TypeError('arg must be a list, tuple, 1-d array, or Series') TypeError: arg must be a list, tuple, 1-d array, or Series `
@user10468005, See update. In general, the convention is for df to represent a single dataframe. Here, pd.read_html represents a list of dataframes (i.e. use dfs), so you need to adjust the logic slightly.
No Error but getting non: +----+-----+-----+-----+-----+ | | 0 | 1 | 2 | 3 | |----+-----+-----+-----+-----| | 0 | nan | nan | nan | nan | +----+-----+-----+-----+-----+
@user10468005, So there's something wrong with your input data. I can't help with that since you haven't provided any data. I suggest you need to sort this out at source or via upstream logic.
I have shared the link from there I am getting data. I am updating one table data
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.