0

I am having some troubles with an assigned task. I have tried to figure it out where the error is, but still with no success. I have the following (small, just for example) dataset:

T1              T2                     V1       V2
name_1  ['name_3', 'name_4']          [1,2]   ['a','b']
name_2  []                            []      []
name_3  ['name_1']                    [1]     ['c']
name_4  ['name_1','name_x','name_13'] [12,2,4] ['c','NA','d']
name_4  ['name_1','name_x','name_13'] [12,2,4] ['c','NA','d']

Since T2, V1 and V2 are shown as lists, I had to use explode() and create two new variables, new_t2 and new_t1, to check what values from those columns are still missing and need to be checked.

df = df.explode('T2')

getting the following error (caused within a function that I am going to discuss later in this post):

TypeError: string indices must be integers

Many thanks for your help.

1 Answer 1

1

Here's the solution from what I understood -

1st you can flatten the dataframe to make easy calculations this can be done via(use ast.literal_eval only when explode doesn't work directly)

from ast import literal_eval
for col in df.columns.values[1:]:
    df[col] = df[col].apply(literal_eval) #convert to list type
df = df.set_index(['T1']).apply(pd.Series.explode).reset_index()

The operation will flatten the dataframe like this -

T1 T2 V1 V2
name_1 name_3 1 a
name_1 name_4 2 b
name_2 NaN NaN NaN
name_3 name_1 1 c
name_4 name_1 12 c
name_4 name_x 2 NA
name_4 name_13 4 d
name_4 name_1 12 c
name_4 name_x 2 NA
name_4 name_13 4 d

Now, if you want tuples for only those rows in 'T2' which are not there in 'T1'. You can use apply or something else. I used apply-

unique_t1 = set(df['T1'].to_list())
def tuple_creation(x):
    if x['T2'] not in unique_t1:
        return (x['T2'],x['V1'],x['V2']) #if you want to add T1 in the tuple just add it here.
df['tuple'] = df.apply(lambda x: tuple_creation(x), axis=1)

Output (for display I've transformed tuple to '-' separated string)-

T1 T2 V1 V2 tuple
name_1 name_3 1 a
name_1 name_4 2 b
name_2 nan-nan-nan
name_3 name_1 1 c
name_4 name_1 12 c
name_4 name_x 2 NA name_x-2-NA
name_4 name_13 4 d name_13-4-d
name_4 name_1 12 c
name_4 name_x 2 NA name_x-2-NA
name_4 name_13 4 d name_13-4-d
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.