I have a dataframe that looks like this:
Col1 | Col2 | Col1 | Col3 | Col1 | Col4
a | d | | h | a | p
b | e | b | i | b | l
| l | a | l | | a
l | r | l | a | l | x
a | i | a | w | | i
| c | | i | r | c
d | o | d | e | d | o
Col1 is repeated multiple times in the dataframe. In each Col1, there is missing information. I need to create a new column that has all of the information from each Col1 occurrence.
How can I create a column with the complete information and then delete the previous duplicate columns?
Some information may be missing from multiple columns. This script is also meant to be used in the future when there could be one, three, five, or any number of duplicated Col1 columns.
The desired output looks like this:
Col2 | Col3 | Col4 | Col5
d | h | p | a
e | i | l | b
l | l | a | a
r | a | x | l
i | w | i | a
c | i | c | r
o | e | o | d
I have been looking over this question but it is not clear to me how I could keep the desired Col1 with complete values. I could delete multiple columns of the same name but I need to first create a column with complete information.
Col1andCol5of the desired output? why isCol1of the output the same asCol4of the sample data?groupby+firstin that case.