Python: Create a new data frame using rows from existing df depending on a given index

Question

I have a Dataframe and I need to create a new one where by when a row has the same element in a certain column as another row, the row where the second occurrence is should be moved directly under the row containing the first occurrence. I'm afraid this might be hard to explain but hopefully with examples it is clearer.

I have a df such as this: (The important column is 'Direction')

    Node  |  Feature | Indicator | Value | Class | Direction
    --------------------------------------------------------
    1     |  WPS     |     <=    | 0.27  | 4     | 1 -> 2  
    --------------------------------------------------------
    2     |  ABC     |     <=    | 0.40  | 5     | 2 -> 3
    --------------------------------------------------------
    3     |  CXC     |     <=    | 0.45  | 2     | 3 -> 4
    --------------------------------------------------------
    4     |  WPS     |     <=    | 0.56  | 1     | 1 -> 5
    --------------------------------------------------------
    5     |  ABC     |     <=    | 0.30  | 3     | 2 -> 5
   --------------------------------------------------------
    6     |  CXC     |     <=    | 0.55  | 5     | 3 -> 1

When the first number in direction occurs twice (in the case of nodes (1 & 4), (2 & 5) and (3 & 6), I would like the row with the second occurrence (node 4, 5 and 6) to be moved directly below the other row.

I need the result to look like this:

    Node  |  Feature | Indicator | Value | Class | Direction
    --------------------------------------------------------
    1     |  WPS     |     <=    | 0.27  | 4     | 1 -> 2  
    --------------------------------------------------------
    4     |  WPS     |     <=    | 0.56  | 1     | 1 -> 5
    --------------------------------------------------------
    2     |  ABC     |     <=    | 0.40  | 5     | 2 -> 3
    --------------------------------------------------------
    5     |  ABC     |     <=    | 0.30  | 3     | 2 -> 5
    --------------------------------------------------------
    3     |  CXC     |     <=    | 0.45  | 3     | 3 -> 4
    --------------------------------------------------------
    6     |  CXC     |     <=    | 0.55  | 5     | 3 -> 1

I have spent so long trying to come up with a solution so I would be so grateful if anyone is able to help.

What I am trying to do at the moment:

Create a list containing the first integers from the ['Direction'] col: first_Ints_ls = [1, 2, 3, 1, 2, 3]

I then try to find the indices of the first and second occurrence within the first_Ints_ls, which I hoped to use to access the rows of the Dataframe by the indices.

first_ind_ls = []
second_ind_ls = []

    for i in firstInt_ls:
        # Find the indexes of the first and second occurance
        first_ind = firstInt_ls.index(i, 0)
        second_ind = firstInt_ls.index(i, first_ind+1)
        first_ind_ls.append(first_ind)
        second_ind_ls.append(second_ind)

This produces:

print(first_ind_ls)
>> [1, 2, 3, 1, 2, 3]
print(second_ind_ls)
>> [4, 5, 6]

I remove any duplicates from first_ind_ls so that both lists are the same size.

# Resulting lists:
>> [1, 2, 3]
>> [4, 5, 6]

Now I wanted to iterate through my Dataframe and take the row at the first index in first_ind_ls (which is 1) and add to a new data frame, then take the row which is at the first index of second_ind_ls (which is 4) and add that to the new data frame. And continue until I end up with a Dataframe as above.

What I have already tried is not working at all so I won't bother posting the code unless requested.

I'm really having trouble figuring out how I can loop through my df and access the rows whilst at the same time looping through both lists containing the indices, then adding rows at each index to a new df...

I just don't know what else to do so if anyone has any advice I'd be most appreciative. I am quite new to programming so I guess my way of looking at the problem may be wrong

ichafai · Accepted Answer · 2019-05-23 11:49:20Z

1

If I understand right the only key to the sorting is the first element in the Direction column. I assume Direction is of type string. So see if this actually very simple naive method works for you.

Create a key column (not absolutely needed but for clarification)

df['key'] = df['Direction'].apply(lambda x: x.split()[0])

Then sort values on this key

df.sort_values('key')

Does this work ? Or am I missing something ?

answered May 23, 2019 at 11:49

ichafai

3412 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

codiearcher Over a year ago

It is not working unfortunately, it doesn't seem to be sorting them at all

ichafai Over a year ago

Could you tell what is the type of the Direction column ? I tried it at my workspace and seemed to work

codiearcher Over a year ago

It is type string. I think I might know what the problem is, I think I need to have a different variable name for the Dataframe as maybe it is not updating it in place

codiearcher Over a year ago

It's working! Thank you so much. I was trying to make it so much more complicated.

Collectives™ on Stack Overflow

Python: Create a new data frame using rows from existing df depending on a given index

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related