Here is my dataset:
Unique_ID No_of_Filings Req_1 Req_2 Req_3 Req_4
RCONF045 3 Blue Red White Violet
RCONF046 3 Blue Red White Brown
RCONF047 3 Blue Red White Brown
RCONF048 3 Black Yellow Green N/A
RCONF051 4 Black Yellow Green N/A
RCONF052 4 Black Brown Green Orange
I've extracted the unique values from the last 4 columns (Req_1 through Req_4) by the following:
pd.unique(df1[["Req_1","Req_2","Req_3","Req_4"]].values.ravel("K"))
Out[20]: array(['Blue', 'Black', 'Red', 'Yellow', 'Brown', 'White', 'Green',
'Violet', nan, 'Orange'], dtype=object)
Here's what I need for the output. Frequency = how many times it shows up in the last four columns (e.g. Yellow only shows up twice) and Number of Filings = sum(No_of_Filings if the requirement is in that row). For example, Blue is in the first three rows, so that's 3 + 3 + 3 = 9 and Brown is in the second, third, and sixth row, so it's 3 + 3 + 4 = 10
Requirements Frequency Number of Filings
Blue 3 9
Black 3 11
Red 3 9
Brown 3 10
White 3 9
Green 3 11
Yellow 2 7
N/A 2 7
Violet 1 3
Orange 1 4
How can I create those two columns in my newly-created dataframe above using pandas?
Thanks