3

I have a traffic data that looks like this. Here, each column have data in format meters:seconds. Like in row 1 column 2, 57:9 represents 57 meters and 9 seconds.

0 1 2 3 4 5 6 7 8 9
0:0 57:9 166:34 178:37 203:44 328:63 344:65 436:77 737:108 None
0:0 166:34 178:37 203:43 328:61 436:74 596:51 737:106 None None
0:0 57:6 166:30 178:33 203:40 328:62 344:64 436:74 596:91 None
0:0 203:43 328:61 None None None None None None None
0:0 57:7 166:20 178:43 203:10 328:61 None None None None

I want to extract meters values from the dataframe and store them in a list in ascending order. Then create a new dataframe in which the the column header will be the meters value (present in the list). Then it will match the meter value in the parent dataframe and add the corresponding second value beneath. The missing meters:second pair should be replaced by NaN and the current pair at the position would move to next column within same row.

The desired outcome is: list = [0,57,166,178,203,328,344,436,596,737]

dataframe:

0 57 166 178 203 328 344 436 596 737
0 9 34 37 44 63 65 77 NaN 108
0 NaN 34 37 43 61 NaN 74 51 106
0 6 30 33 40 62 64 74 91 None
0 NaN NaN NaN 43 61 None None None None
0 7 20 43 10 61 None None None None

I know I must use a loop to iterate over whole dataframe. I am new to python so I am unable to solve this. I tried using str.split() but it work only on 1 column. I have 98 columns and 290 rows. This is just one month data. I will be having 12 month data. So, need suggestions and help.

1 Answer 1

3

Try:

tmp = df1.apply(
    lambda x: dict(
        map(int, val.split(":"))
        for val in x
        if isinstance(val, str) and ":" in val
    ),
    axis=1,
).to_list()
out = pd.DataFrame(tmp)
print(out[sorted(out.columns)])

Prints:

   0    57    166   178  203  328   344   436   596    737
0    0  9.0  34.0  37.0   44   63  65.0  77.0   NaN  108.0
1    0  NaN  34.0  37.0   43   61   NaN  74.0  51.0  106.0
2    0  6.0  30.0  33.0   40   62  64.0  74.0  91.0    NaN
3    0  NaN   NaN   NaN   43   61   NaN   NaN   NaN    NaN
4    0  7.0  20.0  43.0   10   61   NaN   NaN   NaN    NaN
Sign up to request clarification or add additional context in comments.

9 Comments

@Omicron See my edit.
@Omicron Try out = out[sorted(out.columns)]. Then print(out)
@Omicron For that question I suggest to read these answers: stackoverflow.com/questions/11548005/… (in short, try out = out[sorted(out.columns)].round().astype("Int64"))
Thank you :) It did worked. However it changed NaN to <NA>. But that's not a big deal. Also, this code also worked. out = out.astype('Int64').
@Omicron I'm going through the dataframe row by row. In every cell I check if the data inside it is string and it contains :. If yes, I split the string to two and create a dictionary from every row. The collect these dictionaries to list and create new dataframe. Then sort the columns afterwards.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.