1

What is the best way to avoid this error?

DataError: invalid input syntax for integer: "669068424.0" CONTEXT: COPY sequence_raw, line 2, column id: "669068424.0"

I created a table using pgadmin which specified the data type for each column. I then read the data in with pandas and do some processing. I could explicitly provide a list of columns and say that they are .astype(int), but is that necessary?

I understand that the reason that there is a .0 after the integers is because there are NaNs in the data so they are turned into floats instead of integers. What is the best way to work around this? I saw on the pre-release of pandas 0.19 that there is better handling of sparse data, is this covered by any chance?

def process_file(conn, table_name, file_object):
    fake_conn = pg_engine.raw_connection()
    fake_cur = fake_conn.cursor()
    fake_cur.copy_expert(sql=to_sql % table_name, file=file_object)
    fake_conn.commit()
    fake_cur.close()


df = pd.read_sql_query(sql=query.format(**params), con=engine)
df.to_csv('../raw/temp_sequence.csv', index=False)
df = open('../raw/temp_sequence.csv')
process_file(conn=pg_engine, table_name='sequence_raw', file_object=df)
7
  • 1
    So you have a table with a float column but you want to export it to csv as an int column? Is that what you're asking? Commented Sep 13, 2016 at 17:00
  • 1
    They are all ints (number of seconds). However, there are rows with NULLs. Python or pandas makes those columns into floats because it doesn't handle NaN integers. I need to fillna with 0 in order for the column to be recognized as an integer (this seems like a waste, I get about 2 million rows per day and a lot of the rows have blanks). Commented Sep 13, 2016 at 17:03
  • 1
    It's still quite unclear what your exact situation is. Let me see if I understand correctly. You created a table manually with an int column, but when you try to export it to a CSV you somehow get a float column back? Commented Sep 13, 2016 at 18:15
  • Yes, if an integer column has a blank in it then that column is converted to float64. pandas.pydata.org/pandas-docs/stable/gotchas.html. I am trying to find the most efficient workaround. Do I fill the blanks with 0 and then explicitly convert to int? Do I change the columns in Postgres to numeric instead? Is there a better way? Commented Sep 13, 2016 at 18:20
  • 1
    I see; it's the round-trip to CSV that mangles the data. Have you tried specifying float_format argument for to_csv to remove the decimal places? Commented Sep 13, 2016 at 18:32

1 Answer 1

2

You can use the float_format parameter for to_csv to specify the format of the floats in the CSV:

df.to_csv('../raw/temp_sequence.csv', index=False, float_format="%d")
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.