0

First, a MWE comprising two files. The intent is to read the CSV into a pandas dataframe and then rescale all values in each column to the range (-1,1).

data.csv:

Var1,Var2,Var3
2.1,6.4,5.2
7.9,2.1,1.3
5.0,6.1,6.7

mwe.py:

import pandas as pd
import sklearn.preprocessing

data = pd.read_csv("data.csv")
scaler = sklearn.preprocessing.MinMaxScaler(feature_range = (-1, 1))
number_of_columns = data.shape[1]
indices_of_feature_columns = range(0, number_of_columns)
data[indices_of_feature_columns] = scaler.fit_transform(data[indices_of_feature_columns])

When I execute this (Python 2.7.13, sklearn 0.18.1, and pandas 0.20.3), I receive an odd error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "mwe.py", line 8, in <module>
    data[indices_of_feature_columns] = scaler.fit_transform(data[indices_of_feature_columns])
  File "/home/gavin/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 1958, in __getitem__
    return self._getitem_array(key)
  File "/home/gavin/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 2002, in _getitem_array
    indexer = self.loc._convert_to_indexer(key, axis=1)
  File "/home/gavin/miniconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1231, in _convert_to_indexer
    raise KeyError('%s not in index' % objarr[mask])
KeyError: '[0 1 2] not in index'

However, when a friend executes this code using a seemingly identical setup, the code runs correctly.

2
  • I cannot reproduce your error. Commented Jul 19, 2017 at 22:03
  • @DYZ, scaler.fit_transform(data[indices_of_feature_columns]) throws mentioned exception... Commented Jul 19, 2017 at 22:06

1 Answer 1

1

Try this:

import pandas as pd
import sklearn.preprocessing

data = pd.read_csv("data.csv")
scaler = sklearn.preprocessing.MinMaxScaler(feature_range = (-1, 1))

data = scaler.fit_transform(data)

Result:

In [15]: data
Out[15]:
array([[-1.        ,  1.        ,  0.44444444],
       [ 1.        , -1.        , -1.        ],
       [ 0.        ,  0.86046512,  1.        ]])

UPDATE: if you want to preserve scaled data as a DataFrame:

In [18]: data = pd.DataFrame(scaler.fit_transform(data), 
                             index=data.index, 
                             columns=data.columns)

In [19]: data
Out[19]:
   Var1      Var2      Var3
0  -1.0  1.000000  0.444444
1   1.0 -1.000000 -1.000000
2   0.0  0.860465  1.000000
Sign up to request clarification or add additional context in comments.

2 Comments

This works but has the side-effect of converting "data" from a pandas dataframe to a numpy array (which is not a show-stopper since I can convert it back, but it does mean that I lose the headings and need to find a workaround).
Thank you! This solves my problem. I am curious as to why the original version seems to work for some and not others; my friend was using the same python and module versions as I am.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.