0

I have a dataframe in Python which (simplified), looks similar to this:

Type  | Market |  Price
-------------------------
 1    |   A    |    2
 1    |   B    |    2
 1    |   B    |    2
-------------------------
 2    |   A    |    4
 2    |   C    |    4
 2    |   C    |    4
 2    |   B    |    8
-------------------------
 3    |   A    |    8
 3    |   B    |    7
 3    |   B    |    7
 3    |   C    |    7

(for better clarity I divided up the dataframe based on the different Types).

What I would like to do is subset the dataframe such that, if Market for each type only has "A" and "B" (and not "C") then I want to keep it. So for example, from the dataframe above, since Type "1" only has "A" and "B" in Market, I want to keep it, but since Type "2" has "A" and "C" and then "B", then I don't want to keep it. On the other hand, since Type "3" has "A", then "B", then "C", then I want to keep it. So from this dataframe, I want to keep Type "1" and Type "3".

I'm having a bit of trouble implementing this, since it requires very specific conditions, and I'm not very good at programming unfortunately. What is a good way of doing this? Thanks in advance :)

3
  • there is actually a query method for dataframes but from your question, it seems the order is important also? Commented Jun 9, 2022 at 8:29
  • @MEdwin Yes, the order is important unfortunately - if A is followed by B and then C, I wish to keep it, but if A is followed by C and then B, then I don't want to keep it Commented Jun 9, 2022 at 8:31
  • @updownleft5134 - if order is B, A for Type 1 it is keep? If is for Type=3 values D,A,B,B,C it is keep? If values are A,B,B,C,D it is keep? Commented Jun 9, 2022 at 9:01

3 Answers 3

1

try this:

df.groupby('Type').filter(lambda g: ''.join(g.Market.unique()[:2]) == 'AB')
>>>

    Type    Market  Price
0   1       A       2
1   1       B       2
2   1       B       2
7   3       A       8
8   3       B       7
9   3       B       7
10  3       C       7
Sign up to request clarification or add additional context in comments.

Comments

0

If need keep only unique values A,B or A,B,C in order use:

s = df.drop_duplicates(['Type','Market']).groupby('Type')['Market'].agg(tuple)
df = df[df['Type'].isin(s.index[s.isin([('A','B'),('A','B','C')])])]
print (df)
    Type Market  Price
0      1      A      2
1      1      B      2
2      1      B      2
7      3      A      8
8      3      B      7
9      3      B      7
10     3      C      7

Another idea:

def f(x):
    u = tuple(dict.fromkeys(x))
    return (u  == ('A','B')) | (u  == ('A','B','C'))

df = df[df.groupby('Type').Market.transform(f)]

Comments

0
df.loc[df.groupby('Type').Market.transform(lambda x : set(x) == {'A', 'B'} or all(x==sorted(x)))]

    Type Market  Price
0      1      A      2
1      1      B      2
2      1      B      2
7      3      A      8
8      3      B      7
9      3      B      7
10     3      C      7

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.