AttributeError - 'numpy.ndarray' object has no attribute 'drop'

Question

I am planning to run a scikit-learn Stochastic Graduent Booster algorithm over a CSV set that includes numerical data.

When calling X = Germany.drop('Status', axis='columns') of the script, I am however receiving an AttributeError: 'numpy.ndarray' object has no attribute 'drop'.

I assume that this error could be related to the fact that I am converting the CSV data pd.to_numeric, which possibly also converts the string headers. Is there any smart tweak that can make this run?

The CSV data has the following structure:

And the corresponding code looks like this:

Germany = pd.read_csv('./Germany_filtered.csv', index_col=0)
Germany = Germany.fillna("")
Germany = pd.to_numeric(Germany.columns.str, errors='coerce')
Germany.head()

X = Germany.drop('Status', axis='columns')
y = Germany['Status']

In the pd.to_numeric line, you overwrite the Germany variable with an array of the column names. Then, Germany no longer references your dataframe. — jkr
– jkr, Commented Sep 12, 2020 at 19:09
Thanks for the input. Yes I assumed that it is related to this line. Without the "overwiriting", I however received the error message ValueError: could not convert string to float:. This is why I included pd.to_numeric — Malte Susen
– Malte Susen, Commented Sep 12, 2020 at 19:10

hpaulj · Accepted Answer · 2020-09-12 20:33:12Z

1

In [167]: df = pd.DataFrame(np.arange(12).reshape(3,4),columns=['a','b','c','d'])

drop works fine on a dataframe:

In [168]: df.drop('c',axis='columns')
Out[168]: 
   a  b   d
0  0  1   3
1  4  5   7
2  8  9  11

to_numeric produces a numpy array:

In [169]: x = pd.to_numeric(df.columns.str,errors='coerce')
In [170]: x
Out[170]: 
array(<pandas.core.strings.StringMethods object at 0x7fef602862b0>,
      dtype=object)
In [171]: type(x)
Out[171]: numpy.ndarray

It should have complained about head, before getting to drop:

In [172]: x.head()
Traceback (most recent call last):
  File "<ipython-input-172-830ed5e65d76>", line 1, in <module>
    x.head()
AttributeError: 'numpy.ndarray' object has no attribute 'head'

In [173]: x.drop()
Traceback (most recent call last):
  File "<ipython-input-173-6d3a33341569>", line 1, in <module>
    x.drop()
AttributeError: 'numpy.ndarray' object has no attribute 'drop'

What does to_numeric docs say? I haven't worked with, but clearly you don't want to pass it that df.columns.str object. I haven't worked with this function, but let's try passing it the dataframe:

In [176]: x = pd.to_numeric(df,errors='coerce')
Traceback (most recent call last):
  File "<ipython-input-176-d095b0166b8f>", line 1, in <module>
    x = pd.to_numeric(df,errors='coerce')
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/tools/numeric.py", line 139, in to_numeric
    raise TypeError("arg must be a list, tuple, 1-d array, or Series")
TypeError: arg must be a list, tuple, 1-d array, or Series

So let's pass a column/Series:

In [177]: x = pd.to_numeric(df['a'],errors='coerce')
In [178]: x
Out[178]: 
0    0
1    4
2    8
Name: a, dtype: int64

the resulting Series could be assigned back to the dataframe, in the same column or a new one:

In [179]: df['a'] = x
In [180]: df
Out[180]: 
   a  b   c   d
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

Now in my example frame there's no need to do this conversion, but it should give you something to work with.

Let's try a real string conversion:

In [195]: df['a'] = ['00','04','LS']
In [196]: df
Out[196]: 
    a  b   c   d
0  00  1   2   3
1  04  5   6   7
2  LS  9  10  11

The linked answer doesn't help:

In [197]: pd.to_numeric(df.columns.str, errors='coerce')
Out[197]: 
array(<pandas.core.strings.StringMethods object at 0x7fef602862b0>,
      dtype=object)

But my version does produce a numeric Series:

In [198]: pd.to_numeric(df['a'], errors='coerce')
Out[198]: 
0    0.0
1    4.0
2    NaN
Name: a, dtype: float64

edited Sep 12, 2020 at 20:33

answered Sep 12, 2020 at 19:52

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Malte Susen Over a year ago

Thanks a lot for this very detailed answer. Yes correct - the errors complains about head first. I had added this line to the code ex-post. Given the divergence of dataframes and numpy array: is there any way to work around this issue? As mentioned, pd.to_numeric has been necessary since I am otherwise receiving an error ValueError: could not convert string to float:

Malte Susen Over a year ago

As an alternative thought: could this conversion issue be avoided e.g. by feeding the data from a JSON file instead of a CSV file?

hpaulj Over a year ago

Why did you pass df.columns.str to to_numeric? Who or what told that was a good idea?

Malte Susen Over a year ago

That had been recommended by another user: stackoverflow.com/questions/63861845/…

Collectives™ on Stack Overflow

AttributeError - 'numpy.ndarray' object has no attribute 'drop'

1 Answer 1

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related