Converting a column within pandas dataframe from int to string

Question

I have a dataframe in pandas with mixed int and str data columns. I want to concatenate first the columns within the dataframe. To do that I have to convert an int column to str. I've tried to do as follows:

mtrx['X.3'] = mtrx.to_string(columns = ['X.3'])

or

mtrx['X.3'] = mtrx['X.3'].astype(str)

but in both cases it's not working and I'm getting an error saying "cannot concatenate 'str' and 'int' objects". Concatenating two str columns is working perfectly fine.

stackoverflow.com/questions/22005911/…

kdauria
– kdauria

2014-09-25 21:00:55 +00:00
Commented Sep 25, 2014 at 21:00 — kdauria
– kdauria, Commented Sep 25, 2014 at 21:00

cs95 · Accepted Answer · 2019-01-23 18:43:35Z

192

In [16]: df = DataFrame(np.arange(10).reshape(5,2),columns=list('AB'))

In [17]: df
Out[17]: 
   A  B
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9

In [18]: df.dtypes
Out[18]: 
A    int64
B    int64
dtype: object

Convert a series

In [19]: df['A'].apply(str)
Out[19]: 
0    0
1    2
2    4
3    6
4    8
Name: A, dtype: object

In [20]: df['A'].apply(str)[0]
Out[20]: '0'

Don't forget to assign the result back:

df['A'] = df['A'].apply(str)

Convert the whole frame

In [21]: df.applymap(str)
Out[21]: 
   A  B
0  0  1
1  2  3
2  4  5
3  6  7
4  8  9

In [22]: df.applymap(str).iloc[0,0]
Out[22]: '0'

df = df.applymap(str)

edited Jan 23, 2019 at 18:43

cs95

406k106 gold badges744 silver badges797 bronze badges

answered Jul 30, 2013 at 14:59

Jeff

130k21 gold badges223 silver badges189 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Malfet Over a year ago

I really don't understand why, but mtrx['X.3'].apply(str) does not work for me either :( dtype still shows int64. The dataframe for 23177 row and X.3 column got only numbers. In [21]: mtrx['X.3'].dtype Out[21]: dtype('int64')

Malfet Over a year ago

0.7.0, come with python 2.7 on Ubuntu system

Jeff Over a year ago

current version is 0.12, you should upgrade.

Dmitry Konovalov Over a year ago

df['A'].apply(str) is not working. but df.column_name = df.column_name.astype(str) works. No idea why.

Sriram Arvind Lakshmanakumar Over a year ago

@DmitryKonovalov in python strings are immutable, so whenever you manipulating the data, you have to put the result back in to the variable.

|

Maxim · Accepted Answer · 2016-04-16 08:18:44Z

132

Change data type of DataFrame column:

To int:

df.column_name = df.column_name.astype(np.int64)

To str:

df.column_name = df.column_name.astype(str)

edited Apr 16, 2016 at 8:18

Maxim

10k6 gold badges67 silver badges112 bronze badges

answered Feb 6, 2016 at 12:24

tanaque

1,3211 gold badge8 silver badges3 bronze badges

3 Comments

John Zwinck Over a year ago

This is appealing, but it is about 4x slower than apply(str) from @Jeff, in my test using pd.Series(np.arange(1000000)).

tommy.carstensen Over a year ago

This works for me. df['A'] = df['A'].apply(str) also works. The answer provided by @Jeff does not work for me.

hamx0r Over a year ago

Regarding @JohnZwinck's comment, using Python3 it seems to be more like 2x as fast to use apply() instead of astype(): timeit.Timer('c.apply(str)', setup='import pandas as pd; c = pd.Series(range(1000))').timeit(1000) >>> 0.41499893204309046 >>> timeit.Timer('c.astype(str)', setup='import pandas as pd; c = pd.Series(range(1000))').timeit(1000) 0.8004439630312845

Keith · Accepted Answer · 2017-05-16 17:49:56Z

23

Warning: Both solutions given ( astype() and apply() ) do not preserve NULL values in either the nan or the None form.

import pandas as pd
import numpy as np

df = pd.DataFrame([None,'string',np.nan,42], index=[0,1,2,3], columns=['A'])

df1 = df['A'].astype(str)
df2 =  df['A'].apply(str)

print df.isnull()
print df1.isnull()
print df2.isnull()

I believe this is fixed by the implementation of to_string()

answered May 16, 2017 at 17:49

Keith

4,9508 gold badges46 silver badges78 bronze badges

2 Comments

seanv507 Over a year ago

to_string allows you to choose handling of Nan eg to return empty string rather than 'Nan'

seanv507 Over a year ago

(I wasn't disagreeing, just expanding on what you said) -- had wanted to say +1

Govinda · Accepted Answer · 2021-07-30 19:47:10Z

There are four ways to convert columns to string

1. astype(str)
df['column_name'] = df['column_name'].astype(str)

2. values.astype(str)
df['column_name'] = df['column_name'].values.astype(str)

3. map(str)
df['column_name'] = df['column_name'].map(str)

4. apply(str)
df['column_name'] = df['column_name'].apply(str)

Lets see the performance of each type

#importing libraries
import numpy as np
import pandas as pd
import time

#creating four sample dataframes using dummy data
df1 = pd.DataFrame(np.random.randint(1, 1000, size =(10000000, 1)), columns =['A'])
df2 = pd.DataFrame(np.random.randint(1, 1000, size =(10000000, 1)), columns =['A'])
df3 = pd.DataFrame(np.random.randint(1, 1000, size =(10000000, 1)), columns =['A'])
df4 = pd.DataFrame(np.random.randint(1, 1000, size =(10000000, 1)), columns =['A'])

#applying astype(str)
time1 = time.time()
df1['A'] = df1['A'].astype(str)
print('time taken for astype(str) : ' + str(time.time()-time1) + ' seconds')

#applying values.astype(str)
time2 = time.time()
df2['A'] = df2['A'].values.astype(str)
print('time taken for values.astype(str) : ' + str(time.time()-time2) + ' seconds')

#applying map(str)
time3 = time.time()
df3['A'] = df3['A'].map(str)
print('time taken for map(str) : ' + str(time.time()-time3) + ' seconds')

#applying apply(str)
time4 = time.time()
df4['A'] = df4['A'].apply(str)
print('time taken for apply(str) : ' + str(time.time()-time4) + ' seconds')

Output

time taken for astype(str): 5.472359895706177 seconds
time taken for values.astype(str): 6.5844292640686035 seconds
time taken for map(str): 2.3686647415161133 seconds
time taken for apply(str): 2.39758563041687 seconds

If you run multiple times, time for each technique might vary. On average map(str) and apply(str) are takes less time compare with remaining two techniques

miller · Accepted Answer · 2019-08-01 08:28:11Z

16

Use the following code:

df.column_name = df.column_name.astype('str')

edited Aug 1, 2019 at 8:28

miller

1,7583 gold badges27 silver badges59 bronze badges

answered Jun 4, 2019 at 1:16

Faraz Ramtin

3453 silver badges12 bronze badges

Comments

SophieG · Accepted Answer · 2022-03-03 15:13:45Z

9

I realise this is an old question, but since that's the first things that comes up for df string conversion so IMHO it shall be up to date.

If you want the actual dtype to be string (rather than object) and/or if you need to handle datetime conversion in your df and/or you have NaN/None in you df. None of the above will work.

you should use:

df.astype('string')

You can compare results on this df:

import pandas as pd
import numpy as np
from datetime import datetime

# Example dataframe
min_index = datetime(2050, 5, 2, 0, 0, 0)
max_index = datetime(2050, 5, 3, 23, 59, 0)
df = pd.DataFrame(data=pd.date_range(start=min_index, end=max_index, freq = "H"), columns=["datetime"])
df["hours"] = df["datetime"].dt.hour
df["day_name"] = df["datetime"].dt.strftime("%A")
df["numeric_cat"] = [np.random.choice([0,1,2]) for a in range(df.shape[0])]

# Add missing values:
df = df.mask(np.random.random(df.shape) < 0.1)

# str 
df1 = df.astype(str) #same pb with apply(str)
df1.isnull().sum().sum() # return 0 which is wrong
df1.info() #gives you a dtype object 

# string
df2 = df.astype('string')
df2.isnull().sum().sum() # return the correct nb of missing value
df2.info() #gives you a dtype string

answered Mar 3, 2022 at 15:13

SophieG

1011 silver badge6 bronze badges

1 Comment

George Zorikov Over a year ago

Absolutely true. If you cast a column to "str" instead of "string", the result is going to be an object type with possible nan values. If you then save your dataframe into a Null sensible format, e.g. Parquet file, you will have a lot of headache because of this "str". I spent a few hours to find the problem and df['column_name'] = df['column_name'].astype("string") solved it

sujithramanathan · Accepted Answer · 2021-05-05 13:32:09Z

0

Just for an additional reference.

All of the above answers will work in case of a data frame. But if you are using lambda while creating / modify a column the above answer by others won't work, Because there it is considered as a int attribute instead of pandas series. You have to use str( target_attribute ) to make it as a string. Please refer the below example.

def add_zero_in_prefix(df):
    if(df['Hour']<10):
        return '0' + str(df['Hour'])

data['str_hr'] = data.apply(add_zero_in_prefix, axis=1)

edited May 5, 2021 at 13:32

answered Jun 16, 2020 at 20:14

sujithramanathan

1,26911 silver badges9 bronze badges

Collectives™ on Stack Overflow

Converting a column within pandas dataframe from int to string

7 Answers 7

6 Comments

3 Comments

2 Comments

Comments

Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

6 Comments

3 Comments

2 Comments

Comments

Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related