Evening,

Is there any way to find a specific sequence of numerical values, matching certain logic (in the same way as in a regex), within a numpy/torch array or matrix row?

I know i could just convert the array to a string or list of strings based on conditions (certain characters for certain values) and then just use a regular-regex, but is there a native way in pure numpy/torch to do it directly over the numbers?

Thanks in advance.

4 Replies 4

Denormalize your data, storing a redundant copy.

For a df with columns df.a, .b, .c, synthesize object (string) column df.abc, perhaps as the JSON serialization of the three numeric columns.

Then use df.abc.str.contains(pattern) in the usual way. Remember to update a given df.abc entry each time you mutate any of the three values in its row.


Using a regex on numeric data is weird. Perhaps your (unstated) true goal is to compute set membership, e.g. "show me each row where at least one of the three values exactly equals 50". Then the appropriate course of action would be to tidy up and normalize, so we have columns df.row_num, df.col_index, and df.value. So representing a zero-th row (a, b, c) of (6, 7, 8) would look like

[(0, 0, 6),
 (0, 1, 7),
 (0, 2, 8),
]

Armed with that representation, it becomes trivial to query df[df.value == 50]. And since the row number comes back, you then can easily ask for all three values having that row number.

Whatever the rule is, you can write a function to test it. For example, perhaps you want to know if the array contains a 5, followed by a 7, which is followed by a 9. This could be done using a few vectorized operations, such as in Numpy first occurrence of value greater than existing value. For more complicated rules, you'll have to loop through the numbers in the array, using the same kinds of logic used in regex matching. For example, maybe you want to test if a row has a 5 followed by a 7 followed by a 9, with only negative numbers between the 5 and the 7. I don't think that can be done without a loop.

Well, i know i can do it in other ways, but my point was to check if it can be done with pure numpy/torch in some "equivalent" method to a regex.

My goal is to filter regions in a differential field that show some derivate sign sequence but with a variable shape. I can easily just map positive derivates to "1", negatives to "-1", flats to "0", or whatever and use a regexp, but i thought maybe it was achievable natively in numpy :/.

Python's argparse looks for argument patterns by first converting the strings in the sys.argv list into codes, 'A' or 'O' (argument or optionals flag). Then based on the nargs parameter looks for regex patterns like 'OA+' or 'AA*'.

Your Reply

By clicking “Post Your Reply”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.