13

I'm trying to read csv file as DataFrame with pandas, and I want to read index row as string. However, since the row for index doesn't have any characters, pandas handles this data as integer. How to read as string?

Here are my csv file and code:

[sample.csv]    
    uid,f1,f2,f3
    01,0.1,1,10
    02,0.2,2,20
    03,0.3,3,30

[code]
df = pd.read_csv('sample.csv', index_col="uid" dtype=float)
print df.index.values

The result: df.index is integer, not string:

>>> [1 2 3]

But I want to get df.index as string:

>>> ['01', '02', '03']

And an additional condition: The rest of index data have to be numeric value and they're actually too many and I can't point them with specific column names.

1
  • I don't understand "additional condition: The rest of index data have to be numeric value and they're actually too many and I can't point them with specific column names.". Do you only want rows '01','02','03' to have string indices, and all the other rows integer? That's not supported in pandas, each column can only have one dtype, unless you want to use dtype:'object'. Why would you want to mix string and integer indices, that sounds like trouble? Commented Feb 22, 2018 at 9:05

2 Answers 2

13

pass dtype param to specify the dtype:

In [159]:
import pandas as pd
import io
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
df = pd.read_csv(io.StringIO(t), dtype={'uid':str})
df.set_index('uid', inplace=True)
df.index

Out[159]:
Index(['01', '02', '03'], dtype='object', name='uid')

So in your case the following should work:

df = pd.read_csv('sample.csv', dtype={'uid':str})
df.set_index('uid', inplace=True)

The one-line equivalent doesn't work, due to a still-outstanding pandas bug here where the dtype param is ignored on cols that are to be treated as the index**:

df = pd.read_csv('sample.csv', dtype={'uid':str}, index_col='uid')

You can dynamically do this if we assume the first column is the index column:

In [171]:
t="""uid,f1,f2,f3
01,0.1,1,10
02,0.2,2,20
03,0.3,3,30"""
cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()
index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str
df = pd.read_csv(io.StringIO(t), dtype=dtypes)
df.set_index('uid', inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 3 entries, 01 to 03
Data columns (total 3 columns):
f1    3 non-null float64
f2    3 non-null float64
f3    3 non-null float64
dtypes: float64(3)
memory usage: 96.0+ bytes

In [172]:
df.index

Out[172]:
Index(['01', '02', '03'], dtype='object', name='uid')

Here we read just the header row to get the column names:

cols = pd.read_csv(io.StringIO(t), nrows=1).columns.tolist()

we then generate dict of the column names with the desired dtypes:

index_col_name = cols[0]
dtypes = dict(zip(cols[1:], [float]* len(cols[1:])))
dtypes[index_col_name] = str

we get the index name, assuming it's the first entry and then create a dict from the rest of the cols and assign float as the desired dtype and add the index col specifying the type to be str, you can then pass this as the dtype param to read_csv

Sign up to request clarification or add additional context in comments.

6 Comments

Sorry, I forgot to write some information. The other columns have to be numeric value and they're actually too many and I can't point them with specific name. so if I do on your way, other data are also string.
my method specifies one of the specific columns, so what is the problem? That you don't know the name or that you don't want to do dtype=str?
I want to do dtype=float, because I want to force the other columns to be float.
is the index col always the first column?
Yes. it's always first column.
|
0

If the result is not a string you have to convert it to be a string. try:

result = [str(i) for i in result]

or in this case:

print([str(i) for i in df.index.values])

1 Comment

Your way leads this result ['1', '2', '3'].My hope is ['01', '02', '03']

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.