4

I have a text file with two lines in a text file:

<BLAHBLAH>483920349<FOOFOO>
<BLAHBLAH>4493<FOOFOO>

Thats the only thing in the text file. Using python, I want to write to the text file so that i can take away BLAHBLAH and FOOFOO from each line. It seems like a simple task but after refreshing my file manipulation i cant seem to find a way to do it. Help is greatly appreciated :)

Thanks!

2
  • 4
    Is the file really XML? Or HTML? Or XHTML? If so, please update the question to be more specific on what the file really looks like. There are simple ways to do this if the file matches any of the standards. Commented Aug 30, 2011 at 22:20
  • 1
    Can you show us at least what have you tried yet? Commented Aug 30, 2011 at 22:21

3 Answers 3

5

If it's a text file as you say, and not HTML/XML/something else, just use replace:

for line in infile.readlines():
    cleaned_line = line.replace("BLAHBLAH","")
    cleaned_line = cleaned_line.replace("FOOFOO","")

and write cleaned_line to an output file.

Sign up to request clarification or add additional context in comments.

Comments

5
f = open(path_to_file, "w+")

f.write(f.read().replace("<BLAHBLAH>","").replace("<FOOFOO>",""))
f.close()

Update (saving to another file):

f = open(path_to_input_file, "r")
output = open(path_to_output_file, "w")

output.write(f.read().replace("<BLAHBLAH>","").replace("<FOOFOO>",""))
f.close()
output.close()

2 Comments

This way, you append the corrected data to the already existing file.
In addition to glglgl, I generally consider it pretty poor form to overwrite the input file unless it's absolutely necessary. What if there's a bug somewhere in your program?
1

Consider the regular expressions module re.

result_text = re.sub('<(.|\n)*?>',replacement_text,source_text)

The strings within < and > are identified. It is non-greedy, ie it will accept a substring of the least possible length. For example if you have "<1> text <2> more text", a greedy parser would take in "<1> text <2>", but a non-greedy parser takes in "<1>" and "<2>".

And of course, your replacement_text would be '' and source_text would be each line from the file.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.