0

I am planning to run a scikit-learn Stochastic Graduent Booster algorithm over a CSV set that includes numerical data.

When calling X = Germany.drop('Status', axis='columns') of the script, I am however receiving an AttributeError: 'numpy.ndarray' object has no attribute 'drop'.

I assume that this error could be related to the fact that I am converting the CSV data pd.to_numeric, which possibly also converts the string headers. Is there any smart tweak that can make this run?

The CSV data has the following structure:

enter image description here

And the corresponding code looks like this:

Germany = pd.read_csv('./Germany_filtered.csv', index_col=0)
Germany = Germany.fillna("")
Germany = pd.to_numeric(Germany.columns.str, errors='coerce')
Germany.head()

X = Germany.drop('Status', axis='columns')
y = Germany['Status']
2
  • 1
    In the pd.to_numeric line, you overwrite the Germany variable with an array of the column names. Then, Germany no longer references your dataframe. Commented Sep 12, 2020 at 19:09
  • Thanks for the input. Yes I assumed that it is related to this line. Without the "overwiriting", I however received the error message ValueError: could not convert string to float:. This is why I included pd.to_numeric Commented Sep 12, 2020 at 19:10

1 Answer 1

1
In [167]: df = pd.DataFrame(np.arange(12).reshape(3,4),columns=['a','b','c','d'])

drop works fine on a dataframe:

In [168]: df.drop('c',axis='columns')
Out[168]: 
   a  b   d
0  0  1   3
1  4  5   7
2  8  9  11

to_numeric produces a numpy array:

In [169]: x = pd.to_numeric(df.columns.str,errors='coerce')
In [170]: x
Out[170]: 
array(<pandas.core.strings.StringMethods object at 0x7fef602862b0>,
      dtype=object)
In [171]: type(x)
Out[171]: numpy.ndarray

It should have complained about head, before getting to drop:

In [172]: x.head()
Traceback (most recent call last):
  File "<ipython-input-172-830ed5e65d76>", line 1, in <module>
    x.head()
AttributeError: 'numpy.ndarray' object has no attribute 'head'

In [173]: x.drop()
Traceback (most recent call last):
  File "<ipython-input-173-6d3a33341569>", line 1, in <module>
    x.drop()
AttributeError: 'numpy.ndarray' object has no attribute 'drop'

What does to_numeric docs say? I haven't worked with, but clearly you don't want to pass it that df.columns.str object. I haven't worked with this function, but let's try passing it the dataframe:

In [176]: x = pd.to_numeric(df,errors='coerce')
Traceback (most recent call last):
  File "<ipython-input-176-d095b0166b8f>", line 1, in <module>
    x = pd.to_numeric(df,errors='coerce')
  File "/usr/local/lib/python3.6/dist-packages/pandas/core/tools/numeric.py", line 139, in to_numeric
    raise TypeError("arg must be a list, tuple, 1-d array, or Series")
TypeError: arg must be a list, tuple, 1-d array, or Series

So let's pass a column/Series:

In [177]: x = pd.to_numeric(df['a'],errors='coerce')
In [178]: x
Out[178]: 
0    0
1    4
2    8
Name: a, dtype: int64

the resulting Series could be assigned back to the dataframe, in the same column or a new one:

In [179]: df['a'] = x
In [180]: df
Out[180]: 
   a  b   c   d
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

Now in my example frame there's no need to do this conversion, but it should give you something to work with.


Let's try a real string conversion:

In [195]: df['a'] = ['00','04','LS']
In [196]: df
Out[196]: 
    a  b   c   d
0  00  1   2   3
1  04  5   6   7
2  LS  9  10  11

The linked answer doesn't help:

In [197]: pd.to_numeric(df.columns.str, errors='coerce')
Out[197]: 
array(<pandas.core.strings.StringMethods object at 0x7fef602862b0>,
      dtype=object)

But my version does produce a numeric Series:

In [198]: pd.to_numeric(df['a'], errors='coerce')
Out[198]: 
0    0.0
1    4.0
2    NaN
Name: a, dtype: float64
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks a lot for this very detailed answer. Yes correct - the errors complains about head first. I had added this line to the code ex-post. Given the divergence of dataframes and numpy array: is there any way to work around this issue? As mentioned, pd.to_numeric has been necessary since I am otherwise receiving an error ValueError: could not convert string to float:
As an alternative thought: could this conversion issue be avoided e.g. by feeding the data from a JSON file instead of a CSV file?
Why did you pass df.columns.str to to_numeric? Who or what told that was a good idea?
That had been recommended by another user: stackoverflow.com/questions/63861845/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.