Concatenating numpy arrays without changing types of columns

Question

I'm trying to concatenate two numpy arrays (one float, the other int) horizontally and put it through pandas DataFrame.

So I tried:

from sklearn.datasets import load_iris
    
iris = pd.DataFrame(np.concatenate((load_iris().data, np.array([load_iris().target]).T), axis=1), 
                    columns=[load_iris().feature_names+['target']])

But this automatically converts the target column into that of float type, from its original int. I tried to convert it back to int by

iris.target = iris.target.astype(int)

But this throws up a TypeError:

TypeError: only integer scalar arrays can be converted to a scalar index

So I have some questions.

(i) What is this error saying?

(ii) Is it even possible to change the type of a single column? (Incidentally, iris = iris.astype(int) works just fine, but this converts every column into that of int type, which isn't something I want.)

(iii) What is the most memory-efficient way to do what I want? The code below produces what I'm trying to do:

iris = pd.concat([pd.DataFrame(load_iris().data, columns = load_iris().feature_names), 
                  pd.DataFrame(load_iris().target, columns=['target'])], axis=1)

But this goes to the trouble of creating multiple pandas DataFrames and concatenating them. Is there a better way to get the exact same output?

np.concatenate does make an array with a common dtype. numpy arrays don't have mixed dtypes - unless they are structured arrays. A dataframe can have a different dtype for each column. Such a frame can be thought of as a collection of pandas Series. — hpaulj
– hpaulj, Commented Jul 14, 2020 at 6:17
You should df = load_iris() once, That gives a dataframe. df.data or `df['data'] is a column/series of that frame. — hpaulj
– hpaulj, Commented Jul 14, 2020 at 6:19

r-beginners · Accepted Answer · 2020-07-14 04:55:02Z

I think the error occurred because the column name of the retrieved data is multi-index. So you can change the data type in the following way.

iris.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   (sepal length (cm),)  150 non-null    float64
 1   (sepal width (cm),)   150 non-null    float64
 2   (petal length (cm),)  150 non-null    float64
 3   (petal width (cm),)   150 non-null    float64
 4   (target,)             150 non-null    float64
dtypes: float64(5)
memory usage: 6.0 KB

iris[[('target',)]] = iris[[('target',)]].astype(int)
# iris.iloc[:,4] = iris.iloc[:,4].astype(int)

iris.head()

|    |   ('sepal length (cm)',) |   ('sepal width (cm)',) |   ('petal length (cm)',) |   ('petal width (cm)',) |   ('target',) |
|---:|-------------------------:|------------------------:|-------------------------:|------------------------:|--------------:|
|  0 |                      5.1 |                     3.5 |                      1.4 |                     0.2 |             0 |
|  1 |                      4.9 |                     3   |                      1.4 |                     0.2 |             0 |
|  2 |                      4.7 |                     3.2 |                      1.3 |                     0.2 |             0 |
|  3 |                      4.6 |                     3.1 |                      1.5 |                     0.2 |             0 |
|  4 |                      5   |                     3.6 |                      1.4 |                     0.2 |             0 |

Collectives™ on Stack Overflow

Concatenating numpy arrays without changing types of columns

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related