create 1D array from data frame column

Question

I am looking for a way to get class label from my dataframe containing rows of features.

For instance, in this example:

df = pd.DataFrame([
['1',   'a',    'bb',   '0'],
['1',   'a',    'cc',   '0'],
['2', 'a',    'dd',   '1'],
['2',   'a',    'ee',   '1'],
['3', 'a',    'ff',   '2'],
['3', 'a',    'gg',   '2'],
['3', 'a',    'hh',   '2']], columns = ['ID', 'name', 'type', 'class'])

df 
    ID  name    type class
0   1    a      bb      0
1   1    a      cc      0
2   2    a      dd      1
3   2    a      ee      1
4   3    a      ff      2
5   3    a      gg      2
6   3    a      hh      2

My class array should be (i.e. for each ID the class value should be picked once):

class
array([0., 1., 2.,])

EDIT

df['class'].values produces array(['0', '0', '1', '1', '2', '2', '2'], dtype=object)

Expected answer:

I want array([0, 1, 2])

Which part are you having trouble with? - pandas.pydata.org/docs/user_guide/index.html — wwii
– wwii, Commented Sep 28, 2020 at 22:47
As created the dataframe contains strings in the column. That's what values is giving you. — hpaulj
– hpaulj, Commented Sep 28, 2020 at 22:52

Grayrigel · Accepted Answer · 2020-09-29 00:33:13Z

1

You can use groupby+ unique() as the following:

>>> df.groupby('ID')['class'].unique().astype(int).to_numpy()
array([0, 1, 2])

For given dataframe, you can use the following methods:

Solution 1 : Series.unique():

>>> df['class'].unique()
array(['0', '1', '2'], dtype=object)

#in case you want int outputs
>>> df['class'].unique().astype(int)
array([0, 1, 2])

Solution 2 value_counts():

>>> df['class'].value_counts(ascending=True).index.to_numpy().astype(int)
array([0, 1, 2])

edited Sep 29, 2020 at 0:33

answered Sep 28, 2020 at 22:56

Grayrigel

3,6045 gold badges18 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

arilwan Over a year ago

The issue with this answer is if you have other íd having a class that was previously listed, the value won't be included (say id =10 follows with class 0, this will not appear in the intended array since the class already exists.

Ehsan · Accepted Answer · 2020-09-28 23:03:58Z

0

In case multiple IDs can have same class, you can select your 'ID' and 'class' columns and drop duplicates, then fetch class column. Otherwise, simply use unique as suggested in other answer (of course you can convert this answer to ints too):

df[['ID','class']].drop_duplicates()['class'].values
#['0' '1' '2']

or similar to @wii's suggestion in comments:

df.drop_duplicates('ID')['class'].values
#['0' '1' '2']

answered Sep 28, 2020 at 23:03

Ehsan

12.5k2 gold badges24 silver badges36 bronze badges

Collectives™ on Stack Overflow

create 1D array from data frame column

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related