19

i have downloaded a csv file, and then read it to python dataframe, now all 4 columns all have object type, i want to convert them to str type,

enter image description here

and now the result of dtypes is as follows:

Name                      object
Position Title            object
Department                object
Employee Annual Salary    object
dtype: object

i try to change the type using the following methods:

path['Employee Annual Salary'] = path['Employee Annual Salary'].astype(str)

but dtypes still return type object, and i also try to provide the column type when reading csv,

path = pd.read_csv("C:\\Users\\IBM_ADMIN\\Desktop\\ml-1m\\city-of-chicago-salaries.csv",dtype={'Employee Annual Salary':str})

or

path = pd.read_csv("C:\\Users\\IBM_ADMIN\\Desktop\\ml-1m\\city-of-chicago-salaries.csv",dtype=str)

but still do not work, want to know how to change column type from object to str,

3
  • Possible duplicate of stackoverflow.com/questions/21018654/… Commented Dec 14, 2016 at 13:47
  • that link is helpful for me, then another problem is: how to remove that '$' from column Employee Annual Salary, and then convert that to float type ? Commented Dec 15, 2016 at 1:11
  • i found the reason why it failed to use replace, the correct way is : path['Employee Annual Salary'] = path['Employee Annual Salary'].str.replace('$',''), i didn't add str in front of replace in the past, Commented Dec 15, 2016 at 1:19

5 Answers 5

29

Actually you can set the type of a column to string. Use .astype('string') rather than .astype(str).

Sample Data Set

df = pd.DataFrame(data={'name': ['Bla',None,'Peter']})

The column name is by default a object.

Single Column Solution

df.name = df.name.astype('string')

It's important to write .astype('string') rather than .astype(str) which didn't work for me. It will stay as object as you do so.

Multi-Column Solution

df = df.astype(dtype={'name': 'string'})

Allows to change multiple fields at once.

Sign up to request clarification or add additional context in comments.

2 Comments

When I use .astype('string'), I get this error -> TypeError: data type 'string' not understood pandas version -> 0.25.3
This worked great .astype('str') worked for me, but I had a slightly different problem
27

For strings, the column type will always be 'object.' There is no need for you convert anything; it is already doing what you require.

The types come from numpy, which has a set of numeric data types. Anything else is an object.

You might want to read http://nbviewer.jupyter.org/github/jakevdp/PythonDataScienceHandbook/blob/master/notebooks/02.01-Understanding-Data-Types.ipynb for a fuller explanation.

3 Comments

i try to remove '$' from column Employee Annual Salary, if i use replace directly, it do not work,
object is actually for str, so no need to convert it to str type,
But then there may be an issue when trying to df.join ("ValueError: You are trying to merge on object and int64 columns.")
9

Please use:--

df = df.convert_dtypes()

It will automatically convert to suitable Types. and it whould work.

1 Comment

What a nice thing to know...
2

I think that the astype worked, it's just that you can't see the results of the changes viewing dtypes. For example,

import pandas
data = [{'Name': 'Schmoe, Joe', 'Position Title': 'Dude', 'Department': 'Zip', 'Employee Annual Salary': 200000.00},
        {'Name': 'Schmoe, Jill', 'Position Title': 'Dudette', 'Department': 'Zam', 'Employee Annual Salary': 300000.00},
        {'Name': 'Schmoe, John', 'Position Title': 'The Man', 'Department': 'Piz', 'Employee Annual Salary': 100000.00},
        {'Name': 'Schmoe, Julie', 'Position Title': 'The Woman', 'Department': 'Maz', 'Employee Annual Salary': 150000.00}]
df = pandas.DataFrame.from_records(data, columns=['Name', 'Position Title', 'Department', 'Employee Annual Salary'] )

Now if I do dtypes on df I see:

In [32]: df.dtypes
Out[32]:
Name                       object
Position Title             object
Department                 object
Employee Annual Salary    float64
dtype: object

Now if I do,

In [33]: df.astype(str)['Employee Annual Salary'].map(lambda x:  type(x))
Out[33]:
0    <type 'str'>
1    <type 'str'>
2    <type 'str'>
3    <type 'str'>
Name: Employee Annual Salary, dtype: object

I see that all of my salary values are now floats even though the dtype shows up as a column.

So the bottom line is that I think that you are fine.

2 Comments

the column Employee Annual Salary has '$', i want to remove it, after i use replace, it do not work,
object is actually for str, so no need to convert it to str using astype,
0

I agree with the above mentioned answers. You do not need to convert objects to string. However, if you ever have the need to convert a multitude of columns to another datatype (ex. int) you can use the following code:

object_columns_list = list(df.select_dtypes(include='object').columns)

for object_column in object_columns_list:
    df[object_column] = df[object_column].astype(int)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.