3

I have a file with multiple (over 1000) columns and rows, and their names do not follow any pattern. The example of it as in below:

file1.txt

IDs     AABC  ABC6    YHG.8     D78Ha 
Ellie   12            48.70    33        
Kate    98      34    21       76.36        
Joe     22      53    49                    
Van     77            40       12.1
Xavier                         88.85   

First, I have to fill the blanks with NA, so that it will look like :

file1.txt



IDs     AABC  ABC6    YHG.8    D78Ha 
Ellie   12      NA    48.70    33        
Kate    98      34    21       76.36         
Joe     22      53    49       NA                
Van     77      NA    40       12.1
Xavier  NA      NA    NA       88.85   

Then, I am trying to get all combinations for IDs and other column as AABC, ABC6,YHG.8 and D78Ha, such as :

Ellie , AABC --> 12
Ellie, ABC6 --> NA
Ellie, YHG.8 --> 48.70  ( without rounding )
Ellie, D78Ha --> 33
Kate,AABC --> 98
Kate, ABC6 --> 34
...

So the desired output should be 20 lines (4 columns x 5 IDs) as following:

output.txt


Ellie  AABC   12
Ellie  ABC6   NA
Ellie  YHG.8  48.70
Ellie  D78Ha  33
Kate   AABC   98
Kate   ABC6   34
..

For this reason, I filled the blanks manually with NA, read file with pandas, and indexed the IDs.

So that I can reach with the ID names and other column names.

But I could not iterate it. My try was:

import pandas as pd
tablefile = pd.read_csv('file1.txt',sep='\t')
print(tablefile)
df2=tablefile.set_index("IDs")
print("Ellie AABC " , df2.loc["Ellie", "AABC" ])
print("Kate AABC " , df2.loc["Kate", "AABC" ])
print("Xavier AABC " , df2.loc["Xavier", "AABC" ])

It prints:

('Ellie AABC ', 12.0)
('Kate AABC ', 98.0)
('Xavier AABC ', nan)

How can I fill the blanks with NAs and iterate in this array without calling the names by writing it one by one? Maybe with increasing i in [i,i]?

0

2 Answers 2

2

IIUC stack with dropna = False

df.set_index('IDs').stack(dropna=False).astype(object).reset_index()

Out[915]: 
       IDs level_1      0
0    Ellie    AABC     12
1    Ellie    ABC6    NaN
2    Ellie   YHG.8   48.7
3    Ellie   D78Ha     33
4     Kate    AABC     98
5     Kate    ABC6     34
6     Kate   YHG.8     21
7     Kate   D78Ha  76.36
8      Joe    AABC     22
9      Joe    ABC6     53
10     Joe   YHG.8     49
11     Joe   D78Ha    NaN
12     Van    AABC     77
13     Van    ABC6    NaN
14     Van   YHG.8     40
15     Van   D78Ha   12.1
16  Xavier    AABC    NaN
17  Xavier    ABC6    NaN
18  Xavier   YHG.8    NaN
19  Xavier   D78Ha  88.85
Sign up to request clarification or add additional context in comments.

12 Comments

Thank you for your reply. However, it should print the IDs for each line, not only once..
@bapors this so called multiple index , you can add reset_index() at the end df.set_index('IDs').stack(dropna=False).astype(object).reset_index()
OP need dont change int to floats, so your solution dont do it.
@jezrael astype(object) did you see this ?
it is printed to file, so not :(
|
2

Simply melt to reshape dataframe:

Data

from io import StringIO 
import pandas as pd

txt = """IDs     AABC  ABC6    YHG.8    D78Ha 
Ellie   12      NA    48.70    33        
Kate    98      34    21       76.36         
Joe     22      53    49       NA                
Van     77      NA    40       12.1
Xavier  NA      NA    NA       88.8"""

tabledf = pd.read_table(StringIO(txt), sep="\s+")

Melt

melted_df = pd.melt(tabledf, id_vars = "IDs").sort_values('IDs').reset_index(drop=True)
print(melted_df)

#        IDs variable  value
# 0    Ellie     AABC  12.00
# 1    Ellie     ABC6    NaN
# 2    Ellie    YHG.8  48.70
# 3    Ellie    D78Ha  33.00
# 4      Joe     AABC  22.00
# 5      Joe    D78Ha    NaN
# 6      Joe     ABC6  53.00
# 7      Joe    YHG.8  49.00
# 8     Kate     AABC  98.00
# 9     Kate     ABC6  34.00
# 10    Kate    YHG.8  21.00
# 11    Kate    D78Ha  76.36
# 12     Van     AABC  77.00
# 13     Van     ABC6    NaN
# 14     Van    D78Ha  12.10
# 15     Van    YHG.8  40.00
# 16  Xavier     ABC6    NaN
# 17  Xavier     AABC    NaN
# 18  Xavier    YHG.8    NaN
# 19  Xavier    D78Ha  88.80

10 Comments

Thank you for your reply. It complains as : ´KeyError: 'IDs'´
You are setting the index to IDs. Do not run set_index() after import for this solution.
The problem is that, it is the true version.. What is the ´df´ did you refer to?
It works now but it converted to the integers.. I dont have 76.36 anymore, but I do have 76..
Please look at docs. You can give missing anything you want with na_rep argument. It defaults to empty literal: ''.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.