0

I'm trying to concatenate two numpy arrays (one float, the other int) horizontally and put it through pandas DataFrame.

So I tried:

from sklearn.datasets import load_iris
    
iris = pd.DataFrame(np.concatenate((load_iris().data, np.array([load_iris().target]).T), axis=1), 
                    columns=[load_iris().feature_names+['target']])

But this automatically converts the target column into that of float type, from its original int. I tried to convert it back to int by

iris.target = iris.target.astype(int)

But this throws up a TypeError:

TypeError: only integer scalar arrays can be converted to a scalar index

So I have some questions.

(i) What is this error saying?

(ii) Is it even possible to change the type of a single column? (Incidentally, iris = iris.astype(int) works just fine, but this converts every column into that of int type, which isn't something I want.)

(iii) What is the most memory-efficient way to do what I want? The code below produces what I'm trying to do:

iris = pd.concat([pd.DataFrame(load_iris().data, columns = load_iris().feature_names), 
                  pd.DataFrame(load_iris().target, columns=['target'])], axis=1)

But this goes to the trouble of creating multiple pandas DataFrames and concatenating them. Is there a better way to get the exact same output?

2
  • 1
    np.concatenate does make an array with a common dtype. numpy arrays don't have mixed dtypes - unless they are structured arrays. A dataframe can have a different dtype for each column. Such a frame can be thought of as a collection of pandas Series. Commented Jul 14, 2020 at 6:17
  • You should df = load_iris() once, That gives a dataframe. df.data or `df['data'] is a column/series of that frame. Commented Jul 14, 2020 at 6:19

1 Answer 1

1

I think the error occurred because the column name of the retrieved data is multi-index. So you can change the data type in the following way.

iris.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   (sepal length (cm),)  150 non-null    float64
 1   (sepal width (cm),)   150 non-null    float64
 2   (petal length (cm),)  150 non-null    float64
 3   (petal width (cm),)   150 non-null    float64
 4   (target,)             150 non-null    float64
dtypes: float64(5)
memory usage: 6.0 KB

iris[[('target',)]] = iris[[('target',)]].astype(int)
# iris.iloc[:,4] = iris.iloc[:,4].astype(int)

iris.head()

|    |   ('sepal length (cm)',) |   ('sepal width (cm)',) |   ('petal length (cm)',) |   ('petal width (cm)',) |   ('target',) |
|---:|-------------------------:|------------------------:|-------------------------:|------------------------:|--------------:|
|  0 |                      5.1 |                     3.5 |                      1.4 |                     0.2 |             0 |
|  1 |                      4.9 |                     3   |                      1.4 |                     0.2 |             0 |
|  2 |                      4.7 |                     3.2 |                      1.3 |                     0.2 |             0 |
|  3 |                      4.6 |                     3.1 |                      1.5 |                     0.2 |             0 |
|  4 |                      5   |                     3.6 |                      1.4 |                     0.2 |             0 |
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.