1

I am trying to remove a block of text in the apache configuration file, specifically virtualhosts. I need to remove the virtualhost containers including the < VirtualHost> markers.

Stuff

<VirtualHost   asdfalsdkf:*> 
asldkfjasl;dkfjasldkfj
asdfljasldjf;laksdfj
a;lsdkfj;laksjdfas
asldkfjasldfkj
3495034ijfgdl9)_*)(%$
more stuff
</VirtualHost>

stuff

So far I have tried to regex it out. but it is not changing the file. I am actually trying to update the file and remove the code.

This is what I have so far that is not working.

for line in fileinput.input('/etc/apache2/apache2.conf.replace',inplace=True):
    sys.stdout.write(re.sub(r'<VirtualHost.*?>.*?</VirtualHost>','',line))
2
  • token comment (somebody's gotta say it :) -- python provides a bunch of markup parsers which are more suited for parsing html/xml. It is typically not advisable to try to parse markup with regex. Commented Aug 22, 2012 at 13:28
  • From what I see, this is not markup, it looks like an Apache configuration file. They use a similar syntax but it is not valid markup, for example it does not use an attribute when using <VirtualHost ... and would thus be invalid. Commented Aug 29, 2012 at 15:14

2 Answers 2

5

There are two issues here. The first is (as javex pointed out) that you need to use re.DOTALL.

But that's not enough. You're still only feeding the regex one line at a time, so it will never see both the opening and closing VirtualHost tags. AFAIK, there's not a way to get a file's entire contents using fileinput, but assuming you don't need to accept input from STDIN and there files will be small enough to read into memory all at once (which should be the case for Apache config files), this should do it:

import os
import sys
import re

for fn in sys.argv[1:]:
    os.rename(fn, fn + '.orig')
    with open(fn + '.orig', 'rb') as fin, open(fn, 'wb') as fout:
        data = fin.read()
        data = re.sub(r'<VirtualHost.*?>.*?</VirtualHost>', '', data,
                      flags=re.DOTALL)
        fout.write(data)

(This requires Python 2.7, because I'm using the built-in syntax for nested contexts in the with statement, but you can get the same functionality with earlier versions using contextlib.nested.)

Sign up to request clarification or add additional context in comments.

1 Comment

This answer covers both issues, very good. And nice note on the with statement!
2

The dot char . will not match a newline unless re.DOTALL is specified:

for line in fileinput.input('/etc/apache2/apache2.conf.replace',inplace=True):
    sys.stdout.write(re.sub(r'<VirtualHost.*?>.*?</VirtualHost>','',line, flags=re.DOTALL))

(See pythons re documentation)

1 Comment

I still don't think this will work because the user is parsing the file line by line (I think) -- for line in fileinput.input(...)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.