1

I have a dataframe:

import pandas as pd
data = {'id':[1,2,3],
            'tokens': [[ 'in', 'the' , 'morning',
                             'cat', 'run', 'today', 'very', 'quick'],['dog', 'eat', 'meat', 'chicken', 'from', 'bowl'],
                            ['mouse', 'hides', 'from', 'a', 'cat']]}
        
df = pd.DataFrame(data)

Also I have a list of lists of indexes.

lst_index = [[3, 4, 5], [0, 1, 2], [2, 3, 4]]

I want to create a column that will contain the elements from the tokens column array. Moreover, the elements are taken by indices from lst_index. So it will be:

    id             tokens                                          new
0   1   [in, the, morning, cat, run, today, very, quick]    [cat, run, today]
1   2   [dog, eat, meat, chicken, from, bowl]               [dog, eat, meat]
2   3   [mouse, hides, from, a, cat]                        [from, a, cat]

3 Answers 3

1

Use a simple list comprehension:

lst_index = [[3, 4, 5], [0, 1, 2], [2, 3, 4]]

df['new'] = [[l[i] for i in idx] for idx,l in zip(lst_index, df['tokens'])]

output:

   id                                            tokens                new
0   1  [in, the, morning, cat, run, today, very, quick]  [cat, run, today]
1   2             [dog, eat, meat, chicken, from, bowl]   [dog, eat, meat]
2   3                      [mouse, hides, from, a, cat]     [from, a, cat]
Sign up to request clarification or add additional context in comments.

Comments

1

You can traverse both dictionary and list as follows to get the new column:

data = {'id':[1,2,3],
            'tokens': [[ 'in', 'the' , 'morning',
                             'cat', 'run', 'today', 'very', 'quick'],['dog', 'eat', 'meat', 'chicken', 'from', 'bowl'],
                            ['mouse', 'hides', 'from', 'a', 'cat']]}
lst_index = [[3, 4, 5], [0, 1, 2], [2, 3, 4]]
l = []

for i in range(len(data["tokens"])):
    l.append([])
    for j in range(len(lst_index[i])):
        l[i].append(data["tokens"][i][lst_index[i][j]])

data["new"] = l
print(data)

Output:

{'id': [1, 2, 3], 'tokens': [['in', 'the', 'morning', 'cat', 'run', 'today', 'very', 'quick'], ['dog', 'eat', 'meat', 'chicken', 'from', 'bowl'], ['mouse', 'hides', 'from', 'a', 'cat']], 'new': [['cat', 'run', 'today'], ['dog', 'eat', 'meat'], ['from', 'a', 'cat']]}

Comments

0

This is maybe not the most efficient solution but it works:

df['new'] = [[token[i] for i in index] for token, index in zip(df['tokens'], lst_index)]

    id                                        tokens                new
0   1  [in, the, morning, cat, run, today, very, quick]  [cat, run, today]
1   2             [dog, eat, meat, chicken, from, bowl]   [dog, eat, meat]
2   3                      [mouse, hides, from, a, cat]     [from, a, cat]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.