Pandas dataframe converting specific columns from string to float

Question

I am trying to do some simple analyses on the Kenneth French industry portfolios (first time with Pandas/Python), data is in txt format (see link in the code). Before I can do computations, first want to load it in a Pandas dataframe properly, but I've been struggling with this for hours:

import urllib.request
import os.path
import zipfile
import pandas as pd
import numpy as np

# paths
url = 'http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/ftp/48_Industry_Portfolios_CSV.zip'
csv_name = '48_Industry_Portfolios.CSV'
local_zipfile = '{0}/data.zip'.format(os.getcwd())
local_file = '{0}/{1}'.format(os.getcwd(), csv_name)

# download data
if not os.path.isfile(local_file):
    print('Downloading and unzipping file!')
    urllib.request.urlretrieve(url, local_zipfile)
    zipfile.ZipFile(local_zipfile).extract(csv_name, os.path.dirname(local_file))

# read from file
df = pd.read_csv(local_file,skiprows=11)
df.rename(columns={'Unnamed: 0' : 'dates'}, inplace=True)

# build new dataframe
first_stop = df['dates'][df['dates']=='201412'].index[0]
df2 = df[:first_stop]

# convert date to datetime object
pd.to_datetime(df2['dates'], format = '%Y%m')
df2.index = df2.dates

All the columns, except dates, represent financial returns. However, due to the file formatting, these are now strings. According to Pandas docs, this should do the trick:

df2.convert_objects(convert_numeric=True)

But the columns remain strings. Other suggestions are to loop over the columns (see for example pandas convert strings to float for multiple columns in dataframe):

for d in df2.columns:
if d is not 'dates':
    df2[d] = df2[d].map(lambda x: float(x)/100)

But this gives me the following warning:

 home/<xxxx>/Downloads/pycharm-community-4.5/helpers/pydev/pydevconsole.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  try:

I have read the documentation on views vs copies, but having difficulty to understand why it is a problem in my case, but not in the code snippets in the question I linked to. Thanks

Edit:

df2=df2.convert_objects(convert_numeric=True)

Does the trick, although I receive a depreciation warning (strangely enough that is not in the docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.convert_objects.html)

Some of df2:

     dates    Agric    Food     Soda     Beer     Smoke    Toys     Fun    \
dates                                                                           
192607  192607     2.37     0.12   -99.99    -5.19     1.29     8.65     2.50   
192608  192608     2.23     2.68   -99.99    27.03     6.50    16.81    -0.76   
192609  192609    -0.57     1.58   -99.99     4.02     1.26     8.33     6.42   
192610  192610    -0.46    -3.68   -99.99    -3.31     1.06    -1.40    -5.09   
192611  192611     6.75     6.26   -99.99     7.29     4.55     0.00     1.82

Edit2: the solution is actually more simple than I thought:

df2.index = pd.to_datetime(df2['dates'], format = '%Y%m')
df2 = df2.astype(float)/100

silly question did you assign back the result of convert_objects? e.g. df = df.convert_objects(convert_numeric=True)? — EdChum
– EdChum, Commented Oct 29, 2015 at 21:11
Can you add what your df2 looks like? Edit your question with at least a partial view of the data frame. — under_the_sea_salad
– under_the_sea_salad, Commented Oct 29, 2015 at 21:12
@EdChum: that works, although I do not understand why I would need to assign the result, isn't that stored? Renaming columns is stored as well. — jvz
– jvz, Commented Oct 29, 2015 at 21:24

alex314159 · Accepted Answer · 2015-10-29 21:20:44Z

2

I would try the following to force convert everything into floats:

df2=df2.astype(float)

answered Oct 29, 2015 at 21:20

alex314159

3,2673 gold badges23 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

GeneralFailure · Accepted Answer · 2017-08-28 08:32:02Z

1

You can convert specific column to float(or any numerical type for that matter) by

df["column_name"] = pd.to_numeric(df["column_name"])

Posting this because pandas.convert_objects is deprecated in pandas 0.20.1

answered Aug 28, 2017 at 8:32

GeneralFailure

1,1153 gold badges16 silver badges32 bronze badges

Comments

EdChum · Accepted Answer · 2015-10-29 21:26:32Z

0

You need to assign the result of convert_objects as there is no inplace param:

df2=df2.convert_objects(convert_numeric=True)

you refer to the rename method but that one has an inplace param which you set to True.

Most operations in pandas return a copy and some have inplace param, convert_objects is one that does not. This is probably because if the conversion fails then you don't want to blat over your data with NaNs.

Also the deprecation warning is to split out the different conversion routines, presumably so you can specialise the params e.g. format string for datetime etc..

answered Oct 29, 2015 at 21:26

EdChum

397k204 gold badges836 silver badges583 bronze badges

3 Comments

jvz Over a year ago

Thanks for your help! I've added the solution, just realized I had the same issue with converting the dates. Maybe a bit off-topic, but suppose I would only want to convert a single column from strings to floats (as I tried to do in my original question), how can I avoid the SettingWithCopyWarning?

EdChum Over a year ago

You should filter the df for the object dtypes and then cast these using astype so something like cols = list(df) cols.pop('dates' for col in cols: df[cols] = df[cols].astype(float) / 100 should work but really convert_objects is the way to go

ipj Over a year ago

Unfortunately convert_objects is Deprecated since version 0.21.0. so another methods should be preferred nowadays

Collectives™ on Stack Overflow

Pandas dataframe converting specific columns from string to float

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related