I have two dataframes (df1 and df2) with the same length of rows. I want to check if values in each column in df2 are contained as substring in the column S in df1 for each row correspondingly, then sum up the matches for each column in df2. any helps are appreciated.
df1 = read.table(text="R S
GG AACCTT
CC AAGGTT
CC AAGGTT
GG AACCTT
GG AACCTT
CC AAGGTT
GG AACCTT
GG AACCTT
GG AACCTT
TT AACCGG
AA CCGGTT
CC AAGGTT
TT AACCGG
AA CCGGTT
AA CCGGTT", header=T, stringsAsFactors=F)
df2 = read.table(text="M1 M2 M3 M4 M5 M6 M7 M8
GG GG GG GG GG GG GG GG
CC CC CC CC CC CC CC CC
CC TT TT TT TT TT TT TT
HH TT TT TT TT HH TT TT
GG AA GG GG GG -- GG HH
CC CC CC CC CC TT CC CC
GG GG GG GG GG -- GG GG
GG GG GG GG GG -- GG GG
-- -- HH AA HH AA HH AA
TT -- HH CC HH CC HH CC
AA -- AA AA AA -- AA AA
CC CC CC CC CC CC CC CC
-- HH CC CC CC HH HH CC
AA HH GG GG GG HH HH GG
AA HH GG GG GG HH GG GG", header=T, stringsAsFactors=F)
after checking, the intermediate result as:
M1 M2 M3 M4 M5 M6 M7 M8
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
FALSE TRUE TRUE TRUE TRUE FALSE TRUE TRUE
FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE
FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE
FALSE FALSE TRUE TRUE TRUE FALSE TRUE TRUE
the expected final result is summed up by colSums():
M1 M2 M3 M4 M5 M6 M7 M8
0 3 5 7 5 4 3 7