0

I have some data in a PostgreSQL table.

I am pulling the data back to a notebook via code like the following:

import numpy as np
import pandas as pd

%load_ext sql
%sql postgresql://foo:foo@localhost:5432/barbar

result_from_sql = %%sql SELECT Date, Year,Score, Cost FROM MyData;
result_df = result_from_sql.DataFrame()

In the PostgreSQL table all columns were typed accurately but result_df is as follows:

result_df.dtypes

date          object
year          int64
score         object
cost          object

Converting the date column was fine:

result_df['date'] = pd.to_datetime(result_df['date'])

As was ensuring all None values are now NaN values:

result_df.replace([None], [np.nan], inplace=True)

But to convert the columns score & cost to numeric I need to execute the following 3 lines of code:

s = ['score', 'cost']
result_df[s] = pd.to_numeric(result_df[s].astype(str), errors = 'coerce')
result_df[s] = result_df[s].apply(pd.to_numeric, errors='coerce')

If I use only lines 1 and 2 then the typing is still object - if I use only lines 1 and 3 then all the data is converted to NaN as if all the data has not coerced.

Why do I have to use this code and is there a more elegant solution?

1 Answer 1

2

you can use the following solution to parse to numeric:


s = ['score', 'cost']

result_df[s] = result_df[s].astype(float) # incase you wanted to parse them to floats 

let me know if this works

Sign up to request clarification or add additional context in comments.

1 Comment

Hello marwen - this works perfectly - so much simpler than the convoluted rubbish I originally had !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.