0

I would like to parse the data with 'tab delimited' and would like to replace certain string in the data.

input file: vi foo.txt:

Bob lives in%3a Boston
Sam lives    in Houston
Jay       lives in Ruston
Bill        lives in           Atlanta

This is what I came up with: vi foo.py:

import re

fin = open("foo.txt")
fout =  open("bar.txt", "w")
for line in fin.readlines():
    fout.write('\t'.join(line.split())+'\n') # parse data with tab delimited

for line in fin.readlines():
    fout.write(re.sub('%3a',':',line)) # substitute string with regex

vi bar.txt:

Bob lives   in%3a   Boston
Sam lives   in  Houston
Jay lives   in  Ruston
Bill    lives   in  Atlanta

Why is %3a still in output rather than ':'?

Thanks,

Rio

1
  • 1
    I dont know what you are looking to do with 2 loops, but you need to reset the file pointer before the second loop with a fin.seek(0) , then you will see 8 lines instead of 4 - the first set would have %3a, and second would have a : Commented Sep 14, 2014 at 17:43

1 Answer 1

3

readlines() is an iterator. Once you consume it - it's no more. You are consuming it on the first call to for line in fin.readlines():. On the second call - there's nothing to iterate over anymore, so this line: fout.write(re.sub('%3a',':',line)) is never called.

Even if it did - it would have created two copies of the input data in the output - one tab delimited but with %3a, and one not tab delimited but without %3a.

What you want to do is this:

for line in fin.readlines():
    fout.write('\t'.join(re.sub('%3a',':',line).split())+'\n')
Sign up to request clarification or add additional context in comments.

5 Comments

I did not see he's called readlines() twice. Your answer is correct. And by the way, re is too heavy for a simple string replacing in his case.
Thanks! How can I achieve it with 2 for loops as I tried before? I want the code to be more readable and I have many strings to substitute.
@stanleyxu2005 This might be true, but it's not the epitome of the issues with his code.
@Rio I don't tink that more loops are more readable - on the contrary. However, if you do want to implement more than one loop - you need each loop to take as input the output of the loops before that.
If I have multiple patterns to be replaced e.g. %3a as :, %3f as ? how can I achieve it? Thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.