1

I need an arrayformula solution to determine the first row number in which my search term exists. I have a column of strings which represents the data to be searched (Column E below) and I have a column of single word search terms (Column A below).

Column A Column E
1 rain All terrain, and, no pain
2 sun There's 2 much, rain, in Spain
3 blue The sky, was, blue, on, Sunday
4 Purple Clouds, are pink around here
5 spot Red sunspot at night is alarming
6 3 Yellow, sun spots
7 Purple rain
8 3 spot of rain
9 5 583334378

The result I'm looking for is:

Column A Column B
rain 2
sun 6
blue 3
Purple 7
spot 8
3 6
5 Nope

An example sheet is here.

I've tried various different methods but the most successful has been the QUERY function below.

=if(A2<>"",iferror(query({arrayformula(row($E$2:$E)-1),$E$2:$E}, "select Col1 where Col2 matches '.*\b"&A2&"\b.*' limit 1"),"Nope"),)

My problem with both VLOOKUP and MATCH is that I couldn't find a solution that provided an index on a partial match within a string, only whole cell matches.

I've also tried a TRANSPOSE/SPLIT pairing but that seemed to remove my ability to determine the originating row - which is the bit I want. A helper column in a different sheet expanded the row count exponentially and I think it broke the sheet. The strings are typically a total length of 100+ chars each (including spaces and punctuation, no line feeds).

The remaining problem with QUERY solution is that it's a 'Copy Down' solution only and I need the arrayformula version. I tried this but it doesn't traverse the array.

=ArrayFormula(if($A$2:$A<>"",iferror(query({arrayformula(row($E$2:$E)-1),$E$2:$E}, "select Col1 where Col2 matches '.*\b"&$A$2:$A&"\b.*' limit 1"),"Nope"),))

I tried this from Player0 but it provides an incomplete result in my sample sheet. I can't fathom out why it misses certain search terms in the VLOOKUP part when the data clearly includes it in its REGEXEXTRACT output. I did use a helper column in G here but it misses the last row of the sheet I don't think I can allow (can't test it). ... and it also seems to hang the sheet with the real world data.

={"ArrayFormula2"; ARRAYFORMULA(IFNA(VLOOKUP($A$2:$A,{REGEXEXTRACT(""&$F$2:$F, "\b"&TEXTJOIN("\b|\b", 1, $A$2:$A)&"\b"), $G$2:$G*1}, 2, 0)))}

The requirement for a single cell formula is that this is actually going to be the second part of an existing arrayformula. If the first expression in my combined formula fails then the formula I learn here (hopefully) will be called. In my second QUERY example above I've emulated the pass/fail component with a simple 'if not empty' test.

The data set of search terms is approximately 25K-30K rows. The column of strings is approximately 5K to 10K rows. Both arrays are dynamic in that they can have their data changed at any time and rows can be added or deleted - in other words neither row count can be determined without manual interaction - something I am keen to avoid.

4
  • If any moderator comes by... Given the answers below my title may not now be the best description for others to find this as a solution. Suggestions please. Commented Nov 16, 2023 at 19:33
  • 1
    Under what circumnstances would one ever want to return a "row" number? That's usually the middle step in an XY question? Commented Nov 16, 2023 at 20:32
  • When the correct row reference needs to be validated so when it becomes an argument to another reference the right data is pulled from another sheet. Any more expansive explanation than that would turn into a monologue of the data structures I am being given as sources. Not right for here I think. Commented Nov 16, 2023 at 23:19
  • But you don't need yo use the row number as an argument to another reference in another sheet, you can just pull the data you need, but i understand what you're saying about not wanting to rant about the structure. I'd just say that you seem to be in an XY problem. Commented Nov 17, 2023 at 2:39

2 Answers 2

1

Here's one approach you may test out:

=map(A2:A,lambda(Σ,if(Σ="",,ifna(+filter(row(F2:F)-1,--regexmatch(F2:F&"","\b"&Σ&""&"\b")=1),"Nope"))))

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you. This worked straight out of the box. I'll tick as correct answer when I've tested all the kind responses on the real world data set - may take me a day or so, but TIA
I'm working through your logic (slowly). Would you mind clarifying a couple of points for me please. 1. REGEXMATCH naturally returns TRUE or FALSE. What is the reason for forcing a boolean output by using '--' prefix and seemingly doubling up the bool output with the '=1' suffix? It works obviously - but so does removing them it appears. 2. I can't find a reference as to why adding the '+' prefix to the FILTER forces a single value output. Do you know where an explanation of this feature is documented? Thx again.
0

Try this:

=arrayformula(LET(a,MAP($C$1:$C$9,LAMBDA(z,SUM(N(SPLIT(REGEXREPLACE(z, "_|,|;", " "), " ")=B1)))),MATCH(TRUE,a<>0,0)))

or to spill:

=arrayformula(map(B1:B9,lambda(x,(LET(a,MAP($C$1:$C$9,LAMBDA(z,SUM(N(SPLIT(REGEXREPLACE(z, "_|,|;", " "), " ")=x)))),MATCH(TRUE,a<>0,0))))))

Result:

enter image description here

Wrap it with iferror to eliminate #N/A errors. Note that row numbers are relative to dataset.

1 Comment

:-) Thank you - my first use of nested LAMBDA functions. I adjusted the spill option to fit the data and this now replicates it exactly (for those who may be following). I'll tick as correct answer when I've tested all the kind responses on the real world data set. =arrayformula(if($A$2:$A<>"",iferror(map($A$2:$A,lambda(x,(LET(a,MAP($F$2:$F,LAMBDA(z,SUM(N(SPLIT(REGEXREPLACE(z, "_|,|;", " "), " ")=x)))),MATCH(TRUE,a<>0,0))))),"Nope"),))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.