2

I have a DataFrame with 6676 rows and 40 columns. This is a truncated version of the two columns of interest.

    user_id      pos
0   1520304915   0.3612
1   1520304915   0.0000
2   1520278540   0.0000
3   1520302105   0.4404
4   1520278547   -0.1531
5   1520303294   0.4404
6   1520278540   -0.1027
7   1522888020   0.9512
8   1520302847   0.7192
9   1523490451   0.8689

I also have a separate list of user_id's.

0    1528106864
1    1520303069
2    1520305391
3    1521519315
4    1520303294
5    1520302954
6    1520302702
7    1528108709
8    1520278540
9    1520304915

I want to iteratively extract individual numpy arrays for the 'pos' values for each 'user_id' if the 'user_id' is present in the list. This should return 10 individual arrays.

The arrays would be of differing lengths as there are variable user_id's in the DataFrame.

Here are two examples of what the arrays would look like drawn from the truncated data above... this is mainly just a visualisation aid based on the values above I can see.

1520304915: ([0.3612, 0.0000, ...
1520278540: ([0.0000, -0.1027, ...

3 Answers 3

1

As you're specifically after np arrays, the following does what you want:

In [34]:
df[df['user_id'].isin(df1['ids'])].groupby('user_id')['pos'].apply(lambda x: x.values)

Out[34]:
user_id
1520278540    [0.0, -0.1027]
1520303294          [0.4404]
1520304915     [0.3612, 0.0]
Name: pos, dtype: object

Here is the first entry:

In [36]:
df[df['user_id'].isin(df1['ids'])].groupby('user_id')['pos'].apply(lambda x: x.values).iloc[0]

Out[36]:
array([ 0.    , -0.1027])

You can see this is a np array:

In [37]:
type(df[df['user_id'].isin(df1['ids'])].groupby('user_id')['pos'].apply(lambda x: x.values).iloc[0])

Out[37]:
numpy.ndarray
Sign up to request clarification or add additional context in comments.

Comments

1

You could use isin method to subset your dataframe with your list_user_id. Then grouby by your user_id column and agg with tolist to convert variables to list:

In [199]: df['user_id'].isin(list_user_id)
Out[199]: 
0     True
1     True
2     True
3    False
4    False
5     True
6     True
7    False
8    False
9    False
Name: user_id, dtype: bool

In [200]: df[df['user_id'].isin(list_user_id)].groupby('user_id').agg(lambda x: x.tolist())
Out[200]: 
                       pos
user_id                   
1520278540  [0.0, -0.1027]
1520303294        [0.4404]
1520304915   [0.3612, 0.0]

Comments

1

You can use isin and groupby with apply np.array:

print df
      user_id     pos
0  1520304915  0.3612
1  1520304915  0.0000
2  1520278540  0.0000
3  1520302105  0.4404
4  1520278547 -0.1531
5  1520303294  0.4404
6  1520278540 -0.1027
7  1522888020  0.9512
8  1520302847  0.7192
9  1523490451  0.8689

l = [1528106864,  1520303069, 1520305391, 1521519315, 1520303294,
     1520302954, 1520302702, 1528108709, 1520278540, 1520304915]

g = df[df.user_id.isin(l)]
print g
      user_id     pos
0  1520304915  0.3612
1  1520304915  0.0000
2  1520278540  0.0000
5  1520303294  0.4404
6  1520278540 -0.1027

print g.groupby('user_id')['pos'].apply(np.array)

user_id
1520278540    [0.0, -0.1027]
1520303294          [0.4404]
1520304915     [0.3612, 0.0]
Name: pos, dtype: object

print type(g.groupby('user_id')['pos'].apply(np.array).iloc[0])
<type 'numpy.ndarray'>

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.