Multiple regexes in Python

Question

I am creating a programming language. For this language, I am creating a program that compiles it into Python. I don't need a lexer, because most of the grammar can be converted into Python with regexes.

Here's what I have so far:

import re

infile = input()

output = open(infile + ".py","w")
input = open(infile + ".hlx")
# I'm aware the .hlx extension is already taken, but it doesn't really matter.

for line in input:
    output.write(re.sub(r'function (\S+) (\S+) =', r'def \1(\2):', line))

for line in input:
    output.write(re.sub(r'print(.+)', r'print(\1)', line))

for line in input:
    output.write(re.sub(r'call (\S+) (\S+)', r'\1(\2)', line))

# More regexes go here, eventually.

input.close()
output.close()

I had to put each regex in a separate for statement because if I put them together, it would replace each line 3 times.

The problem here is that it only performs one of the regexes, which is the first one. The order doesn't really matter here, but I still need the program to perform all of the regexes. How would I do this?

By the way, here's the code I want to replace in my language:

function hello input =
    print "Hello, ", input, "!"
hello "world"

And here's the code I want to replace it with in Python:

def hello(input):
    print("Hello, " + input + "!")
hello("world")

If you want to iterate over an open file multiple times you need to seek() to the beginning of the file. Also, why don't you assign the output of each re.sub call to a variable so you can call each re.sub on the same line before you have to write it. — bunji
– bunji, Commented Feb 10, 2017 at 1:17

DYZ · Accepted Answer · 2017-02-10 01:26:46Z

1

Perform one all substitutions in one loop, one after another. I also suggest having regular expressions and their replacements in a separate data structure, which would make further extensions easier:

conversions = (
  (r'function (\S+) (\S+) =', r'def \1(\2):'),
  (r'print(.+)',              r'print(\1)'  ),
  (r'call (\S+) (\S+)',       r'\1(\2)'     ),
)

for line in input:
    for (pattern, sub) in conversions:
        line = re.sub(pattern, sub, line)
    output.write(line)

answered Feb 10, 2017 at 1:26

DYZ

57.3k10 gold badges73 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

SyscalineGaming Over a year ago

By any chance, do you also know how to perform regexes in a certain order? For example: do this regex, then do this one, then do this one...

DYZ Over a year ago

They are applied in the order of appearance in the tuple.

Collectives™ on Stack Overflow

Multiple regexes in Python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related