I have a dataframe with two columns (and alot of rows), one column is the full sequence the other contains a sub sequence.
I want to find the index of where the sub sequence starts within the full sequence and add this as a another column:
I have tried this:
df["start"] = df.sequence.index(df.sub_sequence)
But this returns: TypeError: 'RangeIndex' object is not callable
What am i doing wrong?
Heres the df and the df i wish to end up with:
Sample dataframe:
import pandas as pd
data = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"]}
df = pd.DataFrame (data, columns = ['sequence','sub_sequence'])
sequence sub_sequence
0 abcde cde
1 fghij gh
2 klmno no
Expected result:
data2 = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"], "start": [2,1,3]}
df2 = pd.DataFrame (data2, columns = ['sequence','sub_sequence','start'])
sequence sub_sequence start
0 abcde cde 2
1 fghij gh 1
2 klmno no 3