I have a Dataframe and I need to create a new one where by when a row has the same element in a certain column as another row, the row where the second occurrence is should be moved directly under the row containing the first occurrence. I'm afraid this might be hard to explain but hopefully with examples it is clearer.
I have a df such as this: (The important column is 'Direction')
Node | Feature | Indicator | Value | Class | Direction
--------------------------------------------------------
1 | WPS | <= | 0.27 | 4 | 1 -> 2
--------------------------------------------------------
2 | ABC | <= | 0.40 | 5 | 2 -> 3
--------------------------------------------------------
3 | CXC | <= | 0.45 | 2 | 3 -> 4
--------------------------------------------------------
4 | WPS | <= | 0.56 | 1 | 1 -> 5
--------------------------------------------------------
5 | ABC | <= | 0.30 | 3 | 2 -> 5
--------------------------------------------------------
6 | CXC | <= | 0.55 | 5 | 3 -> 1
When the first number in direction occurs twice (in the case of nodes (1 & 4), (2 & 5) and (3 & 6), I would like the row with the second occurrence (node 4, 5 and 6) to be moved directly below the other row.
I need the result to look like this:
Node | Feature | Indicator | Value | Class | Direction
--------------------------------------------------------
1 | WPS | <= | 0.27 | 4 | 1 -> 2
--------------------------------------------------------
4 | WPS | <= | 0.56 | 1 | 1 -> 5
--------------------------------------------------------
2 | ABC | <= | 0.40 | 5 | 2 -> 3
--------------------------------------------------------
5 | ABC | <= | 0.30 | 3 | 2 -> 5
--------------------------------------------------------
3 | CXC | <= | 0.45 | 3 | 3 -> 4
--------------------------------------------------------
6 | CXC | <= | 0.55 | 5 | 3 -> 1
I have spent so long trying to come up with a solution so I would be so grateful if anyone is able to help.
What I am trying to do at the moment:
Create a list containing the first integers from the ['Direction'] col: first_Ints_ls = [1, 2, 3, 1, 2, 3]
I then try to find the indices of the first and second occurrence within the first_Ints_ls, which I hoped to use to access the rows of the Dataframe by the indices.
first_ind_ls = []
second_ind_ls = []
for i in firstInt_ls:
# Find the indexes of the first and second occurance
first_ind = firstInt_ls.index(i, 0)
second_ind = firstInt_ls.index(i, first_ind+1)
first_ind_ls.append(first_ind)
second_ind_ls.append(second_ind)
This produces:
print(first_ind_ls)
>> [1, 2, 3, 1, 2, 3]
print(second_ind_ls)
>> [4, 5, 6]
I remove any duplicates from first_ind_ls so that both lists are the same size.
# Resulting lists:
>> [1, 2, 3]
>> [4, 5, 6]
Now I wanted to iterate through my Dataframe and take the row at the first index in first_ind_ls (which is 1) and add to a new data frame, then take the row which is at the first index of second_ind_ls (which is 4) and add that to the new data frame. And continue until I end up with a Dataframe as above.
What I have already tried is not working at all so I won't bother posting the code unless requested.
I'm really having trouble figuring out how I can loop through my df and access the rows whilst at the same time looping through both lists containing the indices, then adding rows at each index to a new df...
I just don't know what else to do so if anyone has any advice I'd be most appreciative. I am quite new to programming so I guess my way of looking at the problem may be wrong