1

I have a set of CSVs that I need to modify. The following code finds the places where the modification needs to happen -- where the 'Markers' column has consecutive 4s, 3s, or a 5-3, or a 4-3. I need to insert a 2 in between any of those patterns (i.e. 3,3, should become 3,2,3. 5,3, should become 5,2,3, etc)

The following code finds those patterns by inserting a new copy column of markers, shifted one down:

columns=['TwoThrees','TwoFours', 'FiveThree', 'FourThree']

PVTdfs=[]

def PVTscore(pdframe):
    Taskname ='PVT_'
    ID=(re.findall('\\d+', file))
    dfName = 'Scoringdf_'+str(ID)
    dfName = pd.DataFrame([[0,0,0,0]],columns=columns, index=ID)
    pdframe['ShiftedMarkers'] = pdframe.Markers.shift()
    for index, row in pdframe.iterrows():
        if row[1] == row[2]:
            if row[1]==3:
                print("looks like two threes")
                print(index, row[1],row[2])
                dfName.TwoThrees[0]+=1
            elif row[1]==4:
                print("looks like two fours")
                print(index, row[1],row[2])
                dfName.TwoFours[0]+=1
        if row[1]==3 and row[2]==5:
            print("looks like a three then a five")
            print(index, row[1],row[2])
            dfName.FiveThree[0]+=1
        if row[1]==3 and row[2]==4:
            print("looks like a four then a three")
            print(index, row[1],row[2])
            dfName.FourThree[0]+=1
    if 'post' in file:
        print('Looks like a Post')
        PrePost = 'Post_'
        dfName.columns = [Taskname+ PrePost +x for x in columns]
    elif'pre' in file: 
        print('Looks like a PRE')
        PrePost = 'Pre_'
        dfName.columns = [Taskname+ PrePost +x for x in columns]
    PVTdfs.append(dfName)

an example CSV is:

Relative Time   Markers
1  928      1
2  1312     2
3  1364     5
4  3092     2
5  3167     3
6  5072     2
7   5147    3
8   5908    2
9   5969    3 
10  7955    3 <-- these two should be amended
11  9560    3 <-- these two should be amended
12  10313   2
13  10391   3
14 11354    2

Desired output:

Relative Time   Markers
1  928      1
2  1312     2
3  1364     5
4  3092     2
5  3167     3
6  5072     2
7   5147    3
8   5908    2
9   5969    3 
10   NAN    2
11  7955    3 <-- fixed
12   NAN    2
13  9560    3 <-- fixed
14  10313   2
15  10391   3
16  11354   2

I've tried np.insert and df.loc assignments but they just replace the existing row, I need to insert a new one and update the indexing.

2 Answers 2

1

Here is the sample csv I used:

    Relative    Time    Markers
0   928     1   NaN
1   1312    2   NaN
2   1364    5   NaN
3   3092    2   NaN
4   3167    3   NaN
5   5072    2   NaN
6   5147    3   NaN
7   5908    2   NaN
8   5969    3   NaN
9   7955    3   1.0
10  9560    3   1.0
11  10313   2   NaN
12  10391   3   NaN
13  11354   2   NaN
14  12322   5   NaN
15  12377   5   1.0

And the code to work on:

# get list of indices where markers are present
marked = df[~pd.isnull(df.Markers)].index.tolist()
print marked
# create insert template row
insert = pd.DataFrame({'Relative':[np.nan],'Time':['2'],'Markers':[np.nan]})
print insert
# loop through marked indices and insert row
for x in marked:
    df = pd.concat([df.loc[:x-1],insert,df.loc[x:]])
# finally reset the index and spit out new df
df = df.reset_index(drop=True)
df  

Gives the output:

[9L, 10L, 15L]
   Markers  Relative Time
0      NaN       NaN    2

    Markers    Relative    Time
0   NaN     928.0       1
1   NaN     1312.0      2
2   NaN     1364.0      5
3   NaN     3092.0      2
4   NaN     3167.0      3
5   NaN     5072.0      2
6   NaN     5147.0      3
7   NaN     5908.0      2
8   NaN     5969.0      3
9   NaN     NaN     2
10  1.0     7955.0      3
11  NaN     NaN     2
12  1.0     9560.0      3
13  NaN     10313.0     2
14  NaN     10391.0     3
15  NaN     11354.0     2
16  NaN     12322.0     5
17  NaN     NaN     2
18  1.0     12377.0     5
Sign up to request clarification or add additional context in comments.

1 Comment

This was the easiest way to implement for batching.
1

Why not using pd.concat() method? (see doc)

Depending on your workflow, you can slice your dataframe at the index you want to insert your new row, and insert the row this way:

>>> d = {'col1': ['A', 'B', 'D'], 'col2': [1, 2, 4]}    
>>> df = pd.DataFrame(data=d)
>>> df
  col1  col2
0    A     1
1    B     2
2    D     4

>>> row = {'col1':['C'], 'col2': [3]}  
>>> row = pd.DataFrame(data=row)

>>> new_df = pd.concat([df.iloc[:2], row, df.iloc[2:]]).reset_index(drop=True)
>>> new_df
  col1  col2
0    A     1
1    B     2
2    C     3
3    D     4

Note You need to add the arg drop=True in the chained method reset_index(), otherwise your "old" index will be added as a new column.

Hope this helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.