2

I'm removing specific string and empty line in a text file and this is following to my earlier question... I refer to some examples and solution by our SO experts... and it work well by removing the string but not the empty line. To make it simple to understand i highlight the problem here.

Some part of the text file contain line of stringA, stringB and stringC and also empty line below it and only to delete single line below it.

line0
line1      stringAxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line2                stringBxxxxxxxxxxxxxxxxxxxxxxx
line3        stringCxxxxxxxxxxxxxxxxxxx 
line4
line5
line6  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line7  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line8  
line9  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line10 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line11               stringBxxxxxxxxxxxxxxxxxxxxxxx
line12       stringCxxxxxxxxxxxxxxxxxxx  
line13
line14
line15  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line16  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line17 
line18  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line19  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line20
line21  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line22  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line23 
line24  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line25  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line26               stringBxxxxxxxxxxxxxxxxxxxxxxx
line27       stringCxxxxxxxxxxxxxxxxxxx  
line28
line29
line30  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line31  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line32  

In this case to remove any line that have any stringA, stringB, stringC and one line after it. For example above, remove line 1,2,3,4 remove line 11,12,13 remove line 26,27,28

I have tried using strip() but it remove all empty line. This is the script I use and it does remove all the line that contain stringA, stringB and stringC.

filename = 'raw.txt'
with open(filename, 'r') as fin:
    lines = fin.readlines()
with open('clean.txt', 'w') as fout:
   for line in lines:
        if not re.match(r"\s+(stringA|stringB|stringC)", line):
            fout.write(line)

expected output

line0
line5
line6  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line7  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line8  
line9  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line10 textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line14
line15  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line16  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line17 
line18  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line19  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line20
line21  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line22  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line23 
line24  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line25  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line29
line30  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line31  textxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
line32  

Appreciate for your help and kind assistance. Thank you.

2 Answers 2

1

Optimized solution:

with open('raw.txt', 'r') as fin, open('clean.txt', 'w') as fout:
    string_c_pat = re.compile(r'\s+stringC')
    pat = re.compile(r"\s+(stringA|stringB|stringC)")

    for line in fin:    # traversing file as iterator 
        if string_c_pat.match(line):
            next(fin)   # skip `stringC` line and jump to next line
        if not pat.match(line):
            fout.write(line)

using re.compile() and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program.

Sign up to request clarification or add additional context in comments.

8 Comments

I agree that solution sounds way more optimized, but you're assuming that it needs to skip ONLY after stringC. If the point is to delete the next empty line after String A / B or C then that shouldn't work if I'm not wrong
@Nqsir, I don't see the crucial lines go differently than "stringA, stringB and stringC" in OP's input, the OP might elaborate that if that could be critical
the way i understood the question made me assume : "the delimiter should be any stringA/B/C" but I totally agree that with that input putting the delimiter on stringC is a right option.
Hi sir... this solution works but when i want to delete top and bottom empty line...im getting error TypeError: 'file' object has no attribute 'getitem'... .. i add..... for line in fin [6:-2]
@chenoi, why are you using fin[6:-2] in such case?
|
1

I'm pretty sure this is not the best answer but "flag-like" method works:

import re
filename = 'raw.txt'
with open(filename, 'r') as fin:
    lines = fin.readlines()

flag = 0

with open('clean.txt', 'w') as fout:
    for line in lines:
        if not re.match(r'.*(stringA|stringB|stringC)', line):
            if not flag:
                fout.write(line)
            flag = 0
        else:
            flag = 1

Hope it helped

3 Comments

Hi sir...your method also works... i have not use flag before...appreciate if u able to explain a bit...thanks
@chenoi, hi there, a flag is a figurated image, you're using a type, most of the time a char or an int, to indicated that something happent. In that case I'm using an int; wether I'm not writting a line I raise my flag to 1, otherwise my flag is permanently 0. When my flag is equal to 1 it means I do not want to write the line after. This is very common and useful, especially in low-level languages, and maybe not the best option. I hope I made it more clear ^^
ok...thanks...it does make me clear...at least I know why..thanks

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.