Subset a Dataframe in Python based on specific conditions

Question

I have a dataframe in Python which (simplified), looks similar to this:

Type  | Market |  Price
-------------------------
 1    |   A    |    2
 1    |   B    |    2
 1    |   B    |    2
-------------------------
 2    |   A    |    4
 2    |   C    |    4
 2    |   C    |    4
 2    |   B    |    8
-------------------------
 3    |   A    |    8
 3    |   B    |    7
 3    |   B    |    7
 3    |   C    |    7

(for better clarity I divided up the dataframe based on the different Types).

What I would like to do is subset the dataframe such that, if Market for each type only has "A" and "B" (and not "C") then I want to keep it. So for example, from the dataframe above, since Type "1" only has "A" and "B" in Market, I want to keep it, but since Type "2" has "A" and "C" and then "B", then I don't want to keep it. On the other hand, since Type "3" has "A", then "B", then "C", then I want to keep it. So from this dataframe, I want to keep Type "1" and Type "3".

I'm having a bit of trouble implementing this, since it requires very specific conditions, and I'm not very good at programming unfortunately. What is a good way of doing this? Thanks in advance :)

there is actually a query method for dataframes but from your question, it seems the order is important also? — MEdwin
– MEdwin, Commented Jun 9, 2022 at 8:29
@MEdwin Yes, the order is important unfortunately - if A is followed by B and then C, I wish to keep it, but if A is followed by C and then B, then I don't want to keep it — updownleft5134
– updownleft5134, Commented Jun 9, 2022 at 8:31
@updownleft5134 - if order is B, A for Type 1 it is keep? If is for Type=3 values D,A,B,B,C it is keep? If values are A,B,B,C,D it is keep? — jezrael
– jezrael, Commented Jun 9, 2022 at 9:01

ziying35 · Accepted Answer · 2022-06-09 08:26:14Z

1

try this:

df.groupby('Type').filter(lambda g: ''.join(g.Market.unique()[:2]) == 'AB')
>>>

    Type    Market  Price
0   1       A       2
1   1       B       2
2   1       B       2
7   3       A       8
8   3       B       7
9   3       B       7
10  3       C       7

answered Jun 9, 2022 at 8:26

ziying35

1,3155 silver badges6 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jezrael · Accepted Answer · 2022-06-09 09:08:23Z

0

If need keep only unique values A,B or A,B,C in order use:

s = df.drop_duplicates(['Type','Market']).groupby('Type')['Market'].agg(tuple)
df = df[df['Type'].isin(s.index[s.isin([('A','B'),('A','B','C')])])]
print (df)
    Type Market  Price
0      1      A      2
1      1      B      2
2      1      B      2
7      3      A      8
8      3      B      7
9      3      B      7
10     3      C      7

Another idea:

def f(x):
    u = tuple(dict.fromkeys(x))
    return (u  == ('A','B')) | (u  == ('A','B','C'))

df = df[df.groupby('Type').Market.transform(f)]

edited Jun 9, 2022 at 9:08

answered Jun 9, 2022 at 9:03

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Comments

Onyambu · Accepted Answer · 2022-06-09 08:24:14Z

0

df.loc[df.groupby('Type').Market.transform(lambda x : set(x) == {'A', 'B'} or all(x==sorted(x)))]

    Type Market  Price
0      1      A      2
1      1      B      2
2      1      B      2
7      3      A      8
8      3      B      7
9      3      B      7
10     3      C      7

answered Jun 9, 2022 at 8:24

Onyambu

80.3k3 gold badges29 silver badges65 bronze badges

Collectives™ on Stack Overflow

Subset a Dataframe in Python based on specific conditions

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related