0

I have a 10 000 lines source code with tons of duplication. So I read in the file as text.

Example:

    assert PyArray_TYPE(real0) == np.NPY_DOUBLE, "real0 is not double"
    assert real0.ndim == 1, "real0 has wrong dimensions"
    if not (PyArray_FLAGS(real0) & np.NPY_C_CONTIGUOUS):
        real0 = PyArray_GETCONTIGUOUS(real0)
    real0_data = <double*>real0.data

I want to replace all occurances of this pattern with

    real0_data = _get_data(real0, "real0")

where real0 can be any variable name [a-z0-9]+


So don't get confused by the source code. The code doesn't matter, this is text processing and regex.

This is what I have so far:

    PATH = "func.pyx"
    source_string = open(PATH,"r").read()

    pattern = r"""
    assert PyArray_TYPE\(([a-z0-9]+)\) == np.NPY_DOUBLE, "([a-z0-9]+) is not double"
    assert ([a-z0-9]+).ndim == 1, "([a-z0-9]+) has wrong dimensions"
    if not (PyArray_FLAGS(([a-z0-9]+)) & np.NPY_C_CONTIGUOUS):
       ([a-z0-9]+) = PyArray_GETCONTIGUOUS(([a-z0-9]+))
    ([a-z0-9]+)_data = ([a-z0-9]+).data"""

    

7
  • are you using vim as your editor? Commented Apr 18, 2013 at 17:40
  • This question is a duplicate with the question: How to input a regex in string.replace in python? Commented Apr 18, 2013 at 17:41
  • "where real0 can be any variable name [a-z0-9]+" You mean where the variable name is not real0? If that's the case, and you have to actually decide what variable to place there, you're asking the impossible... Commented Apr 18, 2013 at 17:42
  • 1
    @perror No, it isn't. This is a much more complex problem as you have a wildcard in the middle that you have to detect and remember. Commented Apr 18, 2013 at 17:43
  • 1
    @siamii - Can you make it a little less ambiguous what pattern you want to match and what you want to replace it with? It's not at all clear at the minute. You want to replace the whole block of code with the one-line function call? Commented Apr 18, 2013 at 17:59

1 Answer 1

1

You can do this in any text editor that supports multiline regular expression search and replace.

I used Komodo IDE to test this, because it includes an excellent regular expression tester ("Rx Toolkit") for experimenting with regular expressions. I think there are also some online tools like this. The same regular expression works in the free Komodo Edit. It should also work in most other editors that support Perl-compatible regular expressions.

In Komodo, I used the Replace dialog with the Regex option checked, to find:

assert PyArray_TYPE\((\w+)\) == np\.NPY_DOUBLE, "\1 is not double"\s*\n\s*assert \1\.ndim == 1, "\1 has wrong dimensions"\s*\n\s*if not \(PyArray_FLAGS\(\1\) & np\.NPY_C_CONTIGUOUS\):\s*\n\s*\1 = PyArray_GETCONTIGUOUS\(\1\)\s*\n\s*\1_data = <double\*>\1\.data

and replace it with:

\1_data = _get_data(\1, "\1")

Given this test code:

    assert PyArray_TYPE(real0) == np.NPY_DOUBLE, "real0 is not double"
    assert real0.ndim == 1, "real0 has wrong dimensions"
    if not (PyArray_FLAGS(real0) & np.NPY_C_CONTIGUOUS):
        real0 = PyArray_GETCONTIGUOUS(real0)
    real0_data = <double*>real0.data

    assert PyArray_TYPE(real1) == np.NPY_DOUBLE, "real1 is not double"
    assert real1.ndim == 1, "real1 has wrong dimensions"
    if not (PyArray_FLAGS(real1) & np.NPY_C_CONTIGUOUS):
        real1 = PyArray_GETCONTIGUOUS(real1)
    real1_data = <double*>real1.data

    assert PyArray_TYPE(real2) == np.NPY_DOUBLE, "real2 is not double"
    assert real2.ndim == 1, "real2 has wrong dimensions"
    if not (PyArray_FLAGS(real2) & np.NPY_C_CONTIGUOUS):
        real2 = PyArray_GETCONTIGUOUS(real2)
    real2_data = <double*>real2.data

The result is:

    real0_data = _get_data(real0, "real0")

    real1_data = _get_data(real1, "real1")

    real2_data = _get_data(real2, "real2")

So how did I get that regular expression from your original code?

  1. Prefix all instances of (, ), ., and * with \ to escape them (an easy manual search and replace).
  2. Replace the first instance of real0 with (\w+). This matches and captures a string of alphanumeric characters.
  3. Replace the remaining instances of real0 with \1. This matches the text captured by (\w+).
  4. Replace each newline and the leading space on the next line with \s*\n\s*. This matches any trailing space on the line, plus the newline, plus all leading space on the next line. That way the regular expression works regardless of the nesting level of the code it's matching.

Finally, the "replace" text uses \1 where it needs the original captured text.

You could of course use a similar regular expression in Python if you want to do it that way. I would suggest using \w instead of [a-z0-9] just to make it simpler. Also, don't include the newlines and leading spaces; instead use the \s*\n\s* approach I used instead of the multiline string. This way it will be independent of the nesting level as I mentioned above.

Sign up to request clarification or add additional context in comments.

4 Comments

This looks good, but is it possible to do this in python or online. I would rather not install yet another IDE. I'm using Eclipse with Pydev
Yes, I added a couple of suggestions for doing this with Python. You can probably do it directly in Eclipse as well.
And as a general note, I recommend not limiting yourself to a single editor or IDE. Komodo Edit is free; it doesn't cost you anything to use it. So are several other good text editors. I find it very helpful to have more than one editor available; if one isn't good at a particular task I can easily switch to another for that task. For example, I use Komodo IDE for most of my work, but it doesn't handle extremely large files very well. So I use other editors such as UltraEdit for very large files.
I updated the answer to correct one thing: I forgot that you can use \1 in the regular expression (and not just in the replacement string) to match the first (\w+) group. This takes care of the problem I mentioned where it wasn't checking that all of the realN strings had the same value for realN.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.