4

I know there are several questions about this error already. But in this particular case I'm not sure whether there is already a solution for my problem. I have this part of code and i want to print the column "y" of the Dataframe df. The following error occurs: TypeError: only integer scalar arrays can be converted to a scalar index

labels=[]
xvectors=[]

for i in data:
    labels.append(i[0])
    xvectors.append(i[1])

X = np.array(xvectors)
y = np.array(labels)

feat_cols = [ 'xvec'+str(i) for i in range(X.shape[1]) ]
print(feat_cols)
df = pd.DataFrame(X,columns=[feat_cols])
df['y']= y
#df['label'] = df['y'].apply(lambda i: str(i))
print(df['y'])

X, y = None, None

Printing the whole DataFrame is possible. This looks like:

        xvec0     xvec1     xvec2     xvec3     xvec4  ...   xvec508   xvec509   xvec510   xvec511        y
0    3.397163 -1.112423  0.414708  0.563083  1.371336  ...  1.201095 -0.076261 -0.620443 -1.231465  DA01_03
1    0.159473  1.884818 -1.511547 -0.153500 -0.635701  ... -1.217205 -1.922081  0.878613  0.087912  DA01_06
2    1.089404  0.331919 -1.027480  0.594129 -2.473234  ... -3.505570 -3.509632 -0.553128 -0.453307  DA01_10
3    0.183993 -1.741467 -0.142570 -3.158320  4.355789  ...  3.857311  3.142393  0.991663 -2.842322  DA01_14

This is the whole errror message:

    print(df['y'])
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/frame.py", line 2958, in __getitem__
    return self._get_item_cache(key)
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/generic.py", line 3270, in _get_item_cache
    values = self._data.get(item)
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py", line 960, in get
    return self.iget(loc)
  File "/usr/local/lib/python3.7/dist-packages/pandas/core/internals/managers.py", line 977, in iget
    block = self.blocks[self._blknos[i]]
TypeError: only integer scalar arrays can be converted to a scalar index

I think it has something to do with the numpy array. Thank you in advance!

4
  • What's in i[0]? Can you show a print of labels? Commented Apr 10, 2020 at 16:54
  • The same error occurs when printing any other column. In i[0] is a string like the most right column. It looks like this: DA01_03 DA01_06 DA01_10 DA01_14 Commented Apr 10, 2020 at 16:59
  • 1
    print(dfA.reset_index()) maybe? Looks indeed to be due to how X is constructed. dfA.info() might give us a better hint. Commented Apr 10, 2020 at 17:07
  • thanks for your answer. df.info() gives: <class 'pandas.core.frame.DataFrame'> RangeIndex: 109 entries, 0 to 108 Columns: 513 entries, (xvec0,) to (y,) dtypes: float32(512), object(1) memory usage: 219.0+ KB None reset_index() returns only the df. it doesnt fix the problem. Commented Apr 10, 2020 at 17:14

1 Answer 1

2

Ah you pass your columns argument as a list in a list (feat_cols is already of type list). This turns your column headers 2-dimensional: you can see df.info() says it ranges from (xvec0,) to ... instead of xvec0.

Passing columns=feat_cols should do the trick :-)

Sign up to request clarification or add additional context in comments.

2 Comments

ou my god. I didn't see that! Thank you!
Wow, I just just made the exact same mistake. Thanks for having answered this!!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.