0

I have a project for searching for bytecode instructions and I would like to expand it to allow the use of Regular Expressions for matching patterns.

The gist of what I want to do is have custom character classes/sets so I can have something such as ISTORE match any of the following instructions:

ISTORE ISTORE_0 ISTORE_1 ISTORE_2 ISTORE_3

And then something similar for ILOAD ... ILOAD_n etc.

ISTORE and ILOAD would be similar to metacharacters like \s where they truly stand for multiple characters.

Basically I am just looking for a jumping off point so I can find a way to implement my own metacharacters.

8
  • 2
    I don't get it, what are you trying to match with regex ? Commented Sep 3, 2013 at 20:17
  • 1
    What do you have now? Are you just looking for a regex tutorial? I'm not sure what you're asking. Commented Sep 3, 2013 at 20:17
  • "I am just looking for a jumping off point so I can find a way to implement my own special characters." I want to be able to use "ISTORE" in a regular expression in Python and have it match ISTORE, ISTORE_0 ... ISTORE_n just how \s will match whitespace. I am very familiar with writing regexes, however I have never tried to mess with the engine itself. Commented Sep 3, 2013 at 20:22
  • 1
    As I understand it you want ISTORE in an expression to be treated as something like ISTORE(_\d)? or (ISTORE_0|ISTORE_1|ISTORE_2|ISTORE_3|ISTORE). You are asking how to extend the actual regex engine to handle this, rather than add a preprocessing search and replace step. Commented Sep 3, 2013 at 20:23
  • 1
    @kstev you really really really don't want to extend the regex engine. We are just saving you time by letting you know before you try it and find out for yourself. Commented Sep 3, 2013 at 20:28

1 Answer 1

2

You don't need to modify the regex engine (which would be quite difficult)

You just need a helper function to convert your flavour of regex to python's in the same way that you use re.escape

def my_re_escape(regex):
    regex = re.escape(regex)
    regex = regex.replace('foo', 'bar')
    # etc
    return regex
Sign up to request clarification or add additional context in comments.

5 Comments

It's not a flavor of regex though. It's metacharacters. Considering there are 205 opcodes in Java bytecode this really doesn't sound appealing to me to just have a find and replace method.
Preprocessing the regex is also recommended here: stackoverflow.com/a/1922324/2036917
So I should make it replace ISTORE with (ISTORE_0|ISTORE_1|ISTORE_2|ISTORE_3|ISTORE) every time? Sounds incredibly inefficient.
@kstev, You would replace ISTORE with ISTORE_\d+
@kstev are 205 search and replace rules, any worse than 205 rules to teach a regex engine in that respect? Concerning performance, yes I imagine a custom regex engine could be more efficient than a string replace, but have you determined this is so performance critical that it is worth the additional effort?

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.