1

I have a dataframe where in one column, the data for each row is a string like this:

[[25570], [26000]]

I want each entry in the series to become a list of integers.

IE:

[25570, 26000] ^ ^ int int

So far I can get it to a list of strings, but retaining empty spaces:

s = s.str.replace("[","").str.replace("]","")
    s = s.str.replace(" ","").str.split(",")

Dict for Dataframe:

     f =  {'chunk': {0: '[72]',
  1: '[72, 68]',
  2: '[72, 68, 65]',
  3: '[72, 68, 65, 70]',
  4: '[72, 68, 65, 70, 67]',
  5: '[72, 68, 65, 70, 67, 74]',
  6: '[68]',
  7: '[68, 65]',
  8: '[68, 65, 70]',
  9: '[68, 65, 70, 67]'},
 'chunk_completed': {0: '[25570]',
  1: '[26000]',
  2: '[26240]',
  3: '[26530]',
  4: '[26880]',
  5: '[27150]',
  6: '[26000]',
  7: '[26240]',
  8: '[26530]',
  9: '[26880]'},
 'chunk_id': {0: '72',
  1: '72-68',
  2: '72-68-65',
  3: '72-68-65-70',
  4: '72-68-65-70-67',
  5: '72-68-65-70-67-74',
  6: '68',
  7: '68-65',
  8: '68-65-70',
  9: '68-65-70-67'},
 'diffs_avg': {0: nan,
  1: 430.0,
  2: 335.0,
  3: 320.0,
  4: 327.5,
  5: 316.0,
  6: nan,
  7: 240.0,
  8: 265.0,
  9: 293.3333333333333},
 'sd': {0: nan,
  1: nan,
  2: 134.35028842544406,
  3: 98.48857801796105,
  4: 81.80260794538685,
  5: 75.3657747256671,
  6: nan,
  7: nan,
  8: 35.355339059327385,
  9: 55.075705472861024},
 'timecodes': {0: '[[25570]]',
  1: '[[25570], [26000]]',
  2: '[[25570], [26000], [26240]]',
  3: '[[25570], [26000], [26240], [26530]]',
  4: '[[25570], [26000], [26240], [26530], [26880]]',
  5: '[[25570], [26000], [26240], [26530], [26880], [27150]]',
  6: '[[26000]]',
  7: '[[26000], [26240]]',
  8: '[[26000], [26240], [26530]]',
  9: '[[26000], [26240], [26530], [26880]]'}}

1 Answer 1

2

try this

f = pd.DataFrame().from_dict(s, orient='index')
f.columns = ['timecodes']
f['timecodes'].apply(lambda x: [a[0] for a in eval(x) if a])

Output

Out[622]:
0                                        [25570]
1                                 [25570, 26000]
2                          [25570, 26000, 26240]
3                   [25570, 26000, 26240, 26530]
4            [25570, 26000, 26240, 26530, 26880]
5     [25570, 26000, 26240, 26530, 26880, 27150]
6                                        [26000]
7                                 [26000, 26240]
8                          [26000, 26240, 26530]
9                   [26000, 26240, 26530, 26880]
10           [26000, 26240, 26530, 26880, 27150]
11                                       [26240]
12                                [26240, 26530]
13                         [26240, 26530, 26880]
14                  [26240, 26530, 26880, 27150]
15                                       [26530]
16                                [26530, 26880]
17                         [26530, 26880, 27150]
18                                       [26880]
19                                [26880, 27150]
Name: 0, dtype: object
Sign up to request clarification or add additional context in comments.

4 Comments

How could I make this apply in a situation where there are no values in the original string? ie [[]]
I don't understand this solution and can't get it to work. This is part of a larger df, fyi.
i am assuming that you have another entry like [[]] in the dict..right.. if thats the case the update should work, i added a check in the list comprehension "if a ".. that should work
in the dataframe, yes. if my dataframe is 'f' and the column in question is 'timecodes' how should I format your answer?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.