2

I have a dataframe which looks like this:

d = {'id': ['Mc','Web','G','M','F'], 'Person1':['x','x','x',None,None],'Person2':['x',None,'x','x',None], 'Person3':['x',None, None,None, None]}

df = pd.DataFrame(d)
df.set_index('id', inplace=True)

    Person1 Person2 Person3
id                         
Mc        x       x       x
Web       x    None    None
G         x       x    None
M      None       x    None
F      None    None    None

How can I get the id-value and column header if an id appears with more than one person?. For example, the above data frame should give the following dictionary:

{'Mc':[Person1, Person2, Person3], 'G':[Person1, Person2]}

Any help would be very much appreciated.

4 Answers 4

4
df[df.notnull().sum(1)>1].stack().reset_index().\
     groupby('id')['level_1'].apply(list).to_dict()
Out[382]: {'G': ['Person1', 'Person2'], 'Mc': ['Person1', 'Person2', 'Person3']}
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your help, Wen! Accepting because your answer works and you were the first to provide me with a solution.
@dliv glad it help , have a nice day :-)
3

First filter and create dictionary and then get keys if values are not Nones:

d = df[df.count(1) > 1].to_dict(orient='index')
print (d)
{'G': {'Person1': 'x', 'Person3': None, 'Person2': 'x'}, 
'Mc': {'Person1': 'x', 'Person3': 'x', 'Person2': 'x'}}

d1 = {k:[k1 for k1, v1 in v.items() if pd.notnull(v1)] for k,v in d.items()}
print (d1)
{'G': ['Person1', 'Person2'], 'Mc': ['Person1', 'Person3', 'Person2']}

2 Comments

You are welcome! Btw, I am a bit curious about timings, is possible check it?
With your real data I think, because all solutions are nice ;)
3

Use a mask i.e

ndf = df.where(df.isnull(),df.apply(lambda x : x.index,1))
temp = ndf[ndf.notnull().sum(1)>=2]
   Person1  Person2  Person3
id                           
Mc  Person1  Person2  Person3
G   Person1  Person2     None

For a dictionary we can use

di = { key: value[pd.notnull(value)].tolist() for key,value in zip(temp.index,temp.values)}

{'G': ['Person1', 'Person2'], 'Mc': ['Person1', 'Person2', 'Person3']}

1 Comment

Thank you Bharath for another solution!
2

Late to the party, but I was curious to know if it was possible with more native Pandas features. I know it's already accepted but feel free to upvote if it adds another perspective :)

Got it down to two statements:

# Use dropna to limit the DataFrame to remove names with more than 2 `None` values
In[1]: basic_dict = df.dropna(thresh=2, axis=0).to_dict(orient="index")
Out[1]:
{'G': {'Person1': 'x', 'Person2': 'x', 'Person3': None},
 'Mc': {'Person1': 'x', 'Person2': 'x', 'Person3': 'x'}}

# Strip the dictionary to remove any remaining `None` values
In[2]:  { k:[i for i in v if v[i] == "x"] for k,v in basic_dict.items()}
Out[2]: {'G': ['Person1', 'Person2'], 'Mc': ['Person3', 'Person1', 'Person2']}

The returning list isn't sorted in the same order, but I was guessing that wasn't critical.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.