2

I have the following series (df):

Index    Information
1        [2, A, C]
2        [3, B, C]
3        [4, C, H]
4        [5, D, H]
5        [6, E, H]
6        [7, F, H]

and I want a series that only extracts and stores the third value of each list :

Index    Information
1        [C]
2        [C]
3        [H]
4        [H]
5        [H]
6        [H]

If I try df[0][2], it correctly gives the required output [C].

however, if I try df[:][2], instead of giving

[C]
[C]
[H]
[H]
[H]
[H]

the output is

3        [4, C, H]

What should be the correct syntax for this?

1
  • how are you creating the dataframe? Commented May 21, 2018 at 17:16

2 Answers 2

2

pandas.Series.str

df.Information.str[2:3]

0    [C]
1    [C]
2    [H]
3    [H]
4    [H]
5    [H]
Name: Information, dtype: object

With assign

df.assign(Information=df.Information.str[2:3])

   Index Information
0      1         [C]
1      2         [C]
2      3         [H]
3      4         [H]
4      5         [H]
5      6         [H]

comprehension per @coldspeed

df.assign(Information=[l[2:3] for l in df.Information.tolist()])

   Index Information
0      1         [C]
1      2         [C]
2      3         [H]
3      4         [H]
4      5         [H]
5      6         [H]
Sign up to request clarification or add additional context in comments.

3 Comments

You may go for the alternative [l[2:3] for l in df.Information.tolist()] as well.
what if the name of the column labelled 'information' is a varies depending on size of the data? How can I represent it?
@KrishnaAnapindi In that case [l[-1:] for l in df.Information.tolist()] would do the trick.
0

Another alternative:

df["new_col"] = df["Information"].apply(lambda x: x[2])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.