0

I have a Dataframe and I need to create a new one where by when a row has the same element in a certain column as another row, the row where the second occurrence is should be moved directly under the row containing the first occurrence. I'm afraid this might be hard to explain but hopefully with examples it is clearer.

I have a df such as this: (The important column is 'Direction')

    Node  |  Feature | Indicator | Value | Class | Direction
    --------------------------------------------------------
    1     |  WPS     |     <=    | 0.27  | 4     | 1 -> 2  
    --------------------------------------------------------
    2     |  ABC     |     <=    | 0.40  | 5     | 2 -> 3
    --------------------------------------------------------
    3     |  CXC     |     <=    | 0.45  | 2     | 3 -> 4
    --------------------------------------------------------
    4     |  WPS     |     <=    | 0.56  | 1     | 1 -> 5
    --------------------------------------------------------
    5     |  ABC     |     <=    | 0.30  | 3     | 2 -> 5
   --------------------------------------------------------
    6     |  CXC     |     <=    | 0.55  | 5     | 3 -> 1

When the first number in direction occurs twice (in the case of nodes (1 & 4), (2 & 5) and (3 & 6), I would like the row with the second occurrence (node 4, 5 and 6) to be moved directly below the other row.

I need the result to look like this:

    Node  |  Feature | Indicator | Value | Class | Direction
    --------------------------------------------------------
    1     |  WPS     |     <=    | 0.27  | 4     | 1 -> 2  
    --------------------------------------------------------
    4     |  WPS     |     <=    | 0.56  | 1     | 1 -> 5
    --------------------------------------------------------
    2     |  ABC     |     <=    | 0.40  | 5     | 2 -> 3
    --------------------------------------------------------
    5     |  ABC     |     <=    | 0.30  | 3     | 2 -> 5
    --------------------------------------------------------
    3     |  CXC     |     <=    | 0.45  | 3     | 3 -> 4
    --------------------------------------------------------
    6     |  CXC     |     <=    | 0.55  | 5     | 3 -> 1

I have spent so long trying to come up with a solution so I would be so grateful if anyone is able to help.

What I am trying to do at the moment:

Create a list containing the first integers from the ['Direction'] col: first_Ints_ls = [1, 2, 3, 1, 2, 3]

I then try to find the indices of the first and second occurrence within the first_Ints_ls, which I hoped to use to access the rows of the Dataframe by the indices.

first_ind_ls = []
second_ind_ls = []

    for i in firstInt_ls:
        # Find the indexes of the first and second occurance
        first_ind = firstInt_ls.index(i, 0)
        second_ind = firstInt_ls.index(i, first_ind+1)
        first_ind_ls.append(first_ind)
        second_ind_ls.append(second_ind)

This produces:

print(first_ind_ls)
>> [1, 2, 3, 1, 2, 3]
print(second_ind_ls)
>> [4, 5, 6]

I remove any duplicates from first_ind_ls so that both lists are the same size.

# Resulting lists:
>> [1, 2, 3]
>> [4, 5, 6]

Now I wanted to iterate through my Dataframe and take the row at the first index in first_ind_ls (which is 1) and add to a new data frame, then take the row which is at the first index of second_ind_ls (which is 4) and add that to the new data frame. And continue until I end up with a Dataframe as above.

What I have already tried is not working at all so I won't bother posting the code unless requested.

I'm really having trouble figuring out how I can loop through my df and access the rows whilst at the same time looping through both lists containing the indices, then adding rows at each index to a new df...

I just don't know what else to do so if anyone has any advice I'd be most appreciative. I am quite new to programming so I guess my way of looking at the problem may be wrong

1 Answer 1

1

If I understand right the only key to the sorting is the first element in the Direction column. I assume Direction is of type string. So see if this actually very simple naive method works for you.

Create a key column (not absolutely needed but for clarification)

df['key'] = df['Direction'].apply(lambda x: x.split()[0])

Then sort values on this key

df.sort_values('key')

Does this work ? Or am I missing something ?

Sign up to request clarification or add additional context in comments.

4 Comments

It is not working unfortunately, it doesn't seem to be sorting them at all
Could you tell what is the type of the Direction column ? I tried it at my workspace and seemed to work
It is type string. I think I might know what the problem is, I think I need to have a different variable name for the Dataframe as maybe it is not updating it in place
It's working! Thank you so much. I was trying to make it so much more complicated.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.