Pandas: Iteratively Extract Numpy Arrays From DataFrame

Question

I have a DataFrame with 6676 rows and 40 columns. This is a truncated version of the two columns of interest.

    user_id      pos
0   1520304915   0.3612
1   1520304915   0.0000
2   1520278540   0.0000
3   1520302105   0.4404
4   1520278547   -0.1531
5   1520303294   0.4404
6   1520278540   -0.1027
7   1522888020   0.9512
8   1520302847   0.7192
9   1523490451   0.8689

I also have a separate list of user_id's.

0    1528106864
1    1520303069
2    1520305391
3    1521519315
4    1520303294
5    1520302954
6    1520302702
7    1528108709
8    1520278540
9    1520304915

I want to iteratively extract individual numpy arrays for the 'pos' values for each 'user_id' if the 'user_id' is present in the list. This should return 10 individual arrays.

The arrays would be of differing lengths as there are variable user_id's in the DataFrame.

Here are two examples of what the arrays would look like drawn from the truncated data above... this is mainly just a visualisation aid based on the values above I can see.

1520304915: ([0.3612, 0.0000, ...
1520278540: ([0.0000, -0.1027, ...

EdChum · Accepted Answer · 2016-01-18 13:01:11Z

1

As you're specifically after np arrays, the following does what you want:

In [34]:
df[df['user_id'].isin(df1['ids'])].groupby('user_id')['pos'].apply(lambda x: x.values)

Out[34]:
user_id
1520278540    [0.0, -0.1027]
1520303294          [0.4404]
1520304915     [0.3612, 0.0]
Name: pos, dtype: object

Here is the first entry:

In [36]:
df[df['user_id'].isin(df1['ids'])].groupby('user_id')['pos'].apply(lambda x: x.values).iloc[0]

Out[36]:
array([ 0.    , -0.1027])

You can see this is a np array:

In [37]:
type(df[df['user_id'].isin(df1['ids'])].groupby('user_id')['pos'].apply(lambda x: x.values).iloc[0])

Out[37]:
numpy.ndarray

answered Jan 18, 2016 at 13:01

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Anton Protopopov · Accepted Answer · 2016-01-18 12:44:30Z

1

You could use isin method to subset your dataframe with your list_user_id. Then grouby by your user_id column and agg with tolist to convert variables to list:

In [199]: df['user_id'].isin(list_user_id)
Out[199]: 
0     True
1     True
2     True
3    False
4    False
5     True
6     True
7    False
8    False
9    False
Name: user_id, dtype: bool

In [200]: df[df['user_id'].isin(list_user_id)].groupby('user_id').agg(lambda x: x.tolist())
Out[200]: 
                       pos
user_id                   
1520278540  [0.0, -0.1027]
1520303294        [0.4404]
1520304915   [0.3612, 0.0]

answered Jan 18, 2016 at 12:44

Anton Protopopov

31.9k13 gold badges93 silver badges96 bronze badges

Comments

jezrael · Accepted Answer · 2016-01-18 13:11:08Z

You can use isin and groupby with apply np.array:

print df
      user_id     pos
0  1520304915  0.3612
1  1520304915  0.0000
2  1520278540  0.0000
3  1520302105  0.4404
4  1520278547 -0.1531
5  1520303294  0.4404
6  1520278540 -0.1027
7  1522888020  0.9512
8  1520302847  0.7192
9  1523490451  0.8689

l = [1528106864,  1520303069, 1520305391, 1521519315, 1520303294,
     1520302954, 1520302702, 1528108709, 1520278540, 1520304915]

g = df[df.user_id.isin(l)]
print g
      user_id     pos
0  1520304915  0.3612
1  1520304915  0.0000
2  1520278540  0.0000
5  1520303294  0.4404
6  1520278540 -0.1027

print g.groupby('user_id')['pos'].apply(np.array)

user_id
1520278540    [0.0, -0.1027]
1520303294          [0.4404]
1520304915     [0.3612, 0.0]
Name: pos, dtype: object

print type(g.groupby('user_id')['pos'].apply(np.array).iloc[0])
<type 'numpy.ndarray'>

Collectives™ on Stack Overflow

Pandas: Iteratively Extract Numpy Arrays From DataFrame

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related