Index-out-of range error using pandas

Question

First, a MWE comprising two files. The intent is to read the CSV into a pandas dataframe and then rescale all values in each column to the range (-1,1).

data.csv:

Var1,Var2,Var3
2.1,6.4,5.2
7.9,2.1,1.3
5.0,6.1,6.7

mwe.py:

import pandas as pd
import sklearn.preprocessing

data = pd.read_csv("data.csv")
scaler = sklearn.preprocessing.MinMaxScaler(feature_range = (-1, 1))
number_of_columns = data.shape[1]
indices_of_feature_columns = range(0, number_of_columns)
data[indices_of_feature_columns] = scaler.fit_transform(data[indices_of_feature_columns])

When I execute this (Python 2.7.13, sklearn 0.18.1, and pandas 0.20.3), I receive an odd error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "mwe.py", line 8, in <module>
    data[indices_of_feature_columns] = scaler.fit_transform(data[indices_of_feature_columns])
  File "/home/gavin/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 1958, in __getitem__
    return self._getitem_array(key)
  File "/home/gavin/miniconda2/lib/python2.7/site-packages/pandas/core/frame.py", line 2002, in _getitem_array
    indexer = self.loc._convert_to_indexer(key, axis=1)
  File "/home/gavin/miniconda2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1231, in _convert_to_indexer
    raise KeyError('%s not in index' % objarr[mask])
KeyError: '[0 1 2] not in index'

However, when a friend executes this code using a seemingly identical setup, the code runs correctly.

@DYZ, scaler.fit_transform(data[indices_of_feature_columns]) throws mentioned exception... — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Jul 19, 2017 at 22:06

MaxU - stand with Ukraine · Accepted Answer · 2017-07-19 22:17:56Z

1

Try this:

import pandas as pd
import sklearn.preprocessing

data = pd.read_csv("data.csv")
scaler = sklearn.preprocessing.MinMaxScaler(feature_range = (-1, 1))

data = scaler.fit_transform(data)

Result:

In [15]: data
Out[15]:
array([[-1.        ,  1.        ,  0.44444444],
       [ 1.        , -1.        , -1.        ],
       [ 0.        ,  0.86046512,  1.        ]])

UPDATE: if you want to preserve scaled data as a DataFrame:

In [18]: data = pd.DataFrame(scaler.fit_transform(data), 
                             index=data.index, 
                             columns=data.columns)

In [19]: data
Out[19]:
   Var1      Var2      Var3
0  -1.0  1.000000  0.444444
1   1.0 -1.000000 -1.000000
2   0.0  0.860465  1.000000

edited Jul 19, 2017 at 22:17

answered Jul 19, 2017 at 22:02

MaxU - stand with Ukraine

212k37 gold badges402 silver badges436 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Gavin Kirby Over a year ago

This works but has the side-effect of converting "data" from a pandas dataframe to a numpy array (which is not a show-stopper since I can convert it back, but it does mean that I lose the headings and need to find a workaround).

Gavin Kirby Over a year ago

Thank you! This solves my problem. I am curious as to why the original version seems to work for some and not others; my friend was using the same python and module versions as I am.

Collectives™ on Stack Overflow

Index-out-of range error using pandas

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related