Python Convert Pandas Float to String

Question

Hi have a data frame (df) that contains two columns (date, text) which is being read from an Excel spreadsheet into Python/Pandas.

xl = pd.ExcelFile(dir+"file.xlsx")
df = xl.parse(xl.sheet_names[0])

    date        text                
0   2013-08-06  NaN                 
1   2013-08-06  Text with unicode
2   ...

The text contains unwanted unicode characters which I normally strip out using

df['text'] = df['text'].apply(lambda sentence: ''.join(word for word in sentence if ord(word) < 128))

However, since the text in the first row contains "NaN", it appears that the column is being typed as "float" by Pandas and the above command fails since it only operates on strings. I can't find a way to reassign the type as string since it contains unicode characters:

df['text'] = df['text'].astype(str)   

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-12: ordinal not in range(128)

It feels like I'm getting into a "the chicken or the egg" dilemma.

Can you not just call dropna or you want to replace the NaN with some value? — EdChum
– EdChum, Commented Sep 4, 2014 at 13:58
@chrisaycock: I have added the line for reading the spreadsheet. — slaw
– slaw, Commented Sep 4, 2014 at 14:03
@EdChum: If I dropna, I am assuming that Pandas still treats the column as float. I still can't convert it to type string since it contains unicode. — slaw
– slaw, Commented Sep 4, 2014 at 14:07
I don't get the same as you, mine is object which is a string I don't understand how you can have a dtype like that. Still after dropping the NaNs you should be able to cast it using astype(float) — EdChum
– EdChum, Commented Sep 4, 2014 at 14:15

tktk · Accepted Answer · 2014-09-04 14:10:11Z

1

It's not your whole column typed as float - otherwise it wouldn't be able to hold strings at all. It's just the NaN values that are causing your method to throw an exception.

So you have to deal with NaNs - How would you want your code to convert NaNs? to 'NaN'?

This kind of beats the point of NaN as a special value. If you don't want NaN values - you can use dropna. If you want some other value instead (or the string value) - you can use .fillna('NaN'). If you want to keep the NaNs for future use (which seems like the way to go for me) - just have a special case for them at your lambda, which will keep them as NaNs:

from pandas import isnull
lambda sentence: sentence if isnull(sentence) else \
                          ''.join(word for word in sentence if ord(word) < 128)

edited Sep 4, 2014 at 14:10

answered Sep 4, 2014 at 14:04

tktk

11.8k8 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

slaw Over a year ago

As stated in the post, the text is currently typed as "float" and need to converted to type "string" first. However, I can't convert the text to string due to the unwanted unicode in the text.

tktk Over a year ago

@slaw How about you post some real data in the question.

Collectives™ on Stack Overflow

Python Convert Pandas Float to String

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related