Getting all possible values from an array in python

Question

I have a file with multiple (over 1000) columns and rows, and their names do not follow any pattern. The example of it as in below:

file1.txt

IDs     AABC  ABC6    YHG.8     D78Ha 
Ellie   12            48.70    33        
Kate    98      34    21       76.36        
Joe     22      53    49                    
Van     77            40       12.1
Xavier                         88.85

First, I have to fill the blanks with NA, so that it will look like :

file1.txt



IDs     AABC  ABC6    YHG.8    D78Ha 
Ellie   12      NA    48.70    33        
Kate    98      34    21       76.36         
Joe     22      53    49       NA                
Van     77      NA    40       12.1
Xavier  NA      NA    NA       88.85

Then, I am trying to get all combinations for IDs and other column as AABC, ABC6,YHG.8 and D78Ha, such as :

Ellie , AABC --> 12
Ellie, ABC6 --> NA
Ellie, YHG.8 --> 48.70  ( without rounding )
Ellie, D78Ha --> 33
Kate,AABC --> 98
Kate, ABC6 --> 34
...

So the desired output should be 20 lines (4 columns x 5 IDs) as following:

output.txt


Ellie  AABC   12
Ellie  ABC6   NA
Ellie  YHG.8  48.70
Ellie  D78Ha  33
Kate   AABC   98
Kate   ABC6   34
..

For this reason, I filled the blanks manually with NA, read file with pandas, and indexed the IDs.

So that I can reach with the ID names and other column names.

But I could not iterate it. My try was:

import pandas as pd
tablefile = pd.read_csv('file1.txt',sep='\t')
print(tablefile)
df2=tablefile.set_index("IDs")
print("Ellie AABC " , df2.loc["Ellie", "AABC" ])
print("Kate AABC " , df2.loc["Kate", "AABC" ])
print("Xavier AABC " , df2.loc["Xavier", "AABC" ])

It prints:

('Ellie AABC ', 12.0)
('Kate AABC ', 98.0)
('Xavier AABC ', nan)

How can I fill the blanks with NAs and iterate in this array without calling the names by writing it one by one? Maybe with increasing i in [i,i]?

BENY · Accepted Answer · 2018-03-02 14:44:15Z

2

IIUC stack with dropna = False

df.set_index('IDs').stack(dropna=False).astype(object).reset_index()

Out[915]: 
       IDs level_1      0
0    Ellie    AABC     12
1    Ellie    ABC6    NaN
2    Ellie   YHG.8   48.7
3    Ellie   D78Ha     33
4     Kate    AABC     98
5     Kate    ABC6     34
6     Kate   YHG.8     21
7     Kate   D78Ha  76.36
8      Joe    AABC     22
9      Joe    ABC6     53
10     Joe   YHG.8     49
11     Joe   D78Ha    NaN
12     Van    AABC     77
13     Van    ABC6    NaN
14     Van   YHG.8     40
15     Van   D78Ha   12.1
16  Xavier    AABC    NaN
17  Xavier    ABC6    NaN
18  Xavier   YHG.8    NaN
19  Xavier   D78Ha  88.85

edited Mar 2, 2018 at 14:44

answered Mar 2, 2018 at 14:33

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

bapors Over a year ago

Thank you for your reply. However, it should print the IDs for each line, not only once..

BENY Over a year ago

@bapors this so called multiple index , you can add reset_index() at the end df.set_index('IDs').stack(dropna=False).astype(object).reset_index()

jezrael Over a year ago

OP need dont change int to floats, so your solution dont do it.

BENY Over a year ago

@jezrael astype(object) did you see this ?

jezrael Over a year ago

it is printed to file, so not :(

|

Parfait · Accepted Answer · 2018-03-05 14:34:52Z

2

Simply melt to reshape dataframe:

Data

from io import StringIO 
import pandas as pd

txt = """IDs     AABC  ABC6    YHG.8    D78Ha 
Ellie   12      NA    48.70    33        
Kate    98      34    21       76.36         
Joe     22      53    49       NA                
Van     77      NA    40       12.1
Xavier  NA      NA    NA       88.8"""

tabledf = pd.read_table(StringIO(txt), sep="\s+")

Melt

melted_df = pd.melt(tabledf, id_vars = "IDs").sort_values('IDs').reset_index(drop=True)
print(melted_df)

#        IDs variable  value
# 0    Ellie     AABC  12.00
# 1    Ellie     ABC6    NaN
# 2    Ellie    YHG.8  48.70
# 3    Ellie    D78Ha  33.00
# 4      Joe     AABC  22.00
# 5      Joe    D78Ha    NaN
# 6      Joe     ABC6  53.00
# 7      Joe    YHG.8  49.00
# 8     Kate     AABC  98.00
# 9     Kate     ABC6  34.00
# 10    Kate    YHG.8  21.00
# 11    Kate    D78Ha  76.36
# 12     Van     AABC  77.00
# 13     Van     ABC6    NaN
# 14     Van    D78Ha  12.10
# 15     Van    YHG.8  40.00
# 16  Xavier     ABC6    NaN
# 17  Xavier     AABC    NaN
# 18  Xavier    YHG.8    NaN
# 19  Xavier    D78Ha  88.80

edited Mar 5, 2018 at 14:34

answered Mar 2, 2018 at 14:42

Parfait

108k19 gold badges102 silver badges138 bronze badges

10 Comments

bapors Over a year ago

Thank you for your reply. It complains as : ´KeyError: 'IDs'´

Parfait Over a year ago

You are setting the index to IDs. Do not run set_index() after import for this solution.

bapors Over a year ago

The problem is that, it is the true version.. What is the ´df´ did you refer to?

bapors Over a year ago

It works now but it converted to the integers.. I dont have 76.36 anymore, but I do have 76..

Parfait Over a year ago

Please look at docs. You can give missing anything you want with na_rep argument. It defaults to empty literal: ''.

|

Collectives™ on Stack Overflow

Getting all possible values from an array in python

2 Answers 2

12 Comments

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

12 Comments

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related