Remove all - at every end of lines in text file with cyrillic text

Question

I have a .txt file with cyrillic text where a lot of lines end with a short hyphen (-). I want these removed, but without removing the hyphens anywhere else in the file.

Have made this thus far, where my idea is to line by line in file f1 copy the text into f2, without a hyphen at the end.

f2 = open('n_dim.txt','w')
with open('dim.txt','r',encoding='utf-8') as f1:
    for line in f1:
        f2.write(line.removesuffix('-'))

Currently receiving zero errors. I managed to copy the file content, but the hyphens persist. How can I properly remove them?

Side note: you should use with open() on both files: with open('dim.txt', 'r', encoding='utf-8') as f1, open('n_dim.txt', 'w') as f2:. — B Remmelzwaal
– B Remmelzwaal, Commented Mar 2, 2023 at 22:28
Normally should use the same utf-8 encoding on both files as well. The default encoding is OS-dependent. — Mark Tolonen
– Mark Tolonen, Commented Mar 2, 2023 at 23:20

dskrypa · Accepted Answer · 2023-03-02 22:55:26Z

1

The reason this is not working as intended is that each line that you get while iterating over a file pointer includes the \n or \r\n at the end of each line. We can see that by adding a print of the repr of each line while iterating over the file.

I will use the following example file content for the rest of the answer:

Hello-there-
Привет--
Hello-

If we print the repr of each line, we can see:

with open('dim.txt', 'r', encoding='utf-8') as f_in:
    for line in f_in:
        print(repr(line))

->

'Hello-there-\n'
'Привет--\n'
'Hello-\n'

To fix this, we can strip all whitespace at the end of each line before calling removesuffix:

with open('dim.txt', 'r', encoding='utf-8') as f_in:
    with open('n_dim.txt', 'w', encoding='utf-8') as f_out:
        for line in f_in:
            f_out.write(line.rstrip().removesuffix('-') + '\n')

This results in the following:

Hello-there
Привет-
Hello

Note that if there may be more than 1 trailing dash per line and you want to remove all trailing dashes, then you would need to use rstrip instead:

with open('dim.txt', 'r', encoding='utf-8') as f_in:
    with open('n_dim.txt', 'w', encoding='utf-8') as f_out:
        for line in f_in:
            f_out.write(line.rstrip().rstrip('-') + '\n')

This results in the following:

Hello-there
Привет
Hello

If you need to support opening the file in older Windows programs, then you would need to use + '\r\n' instead of + '\n' when writing the output.

If the input file is small enough, another approach would be to read the whole file and use splitlines once instead of rstrip on each line. Using splitlines would preserve any other trailing whitespace, while rstrip will remove it. Example:

with open('dim.txt', 'r', encoding='utf-8') as f_in:
    with open('n_dim.txt', 'w', encoding='utf-8') as f_out:
        for line in f_in.read().splitlines():
            f_out.write(line.rstrip('-') + '\n')

answered Mar 2, 2023 at 22:55

dskrypa

1,1282 gold badges7 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Mark Tolonen Over a year ago

Re: '\r\n' and Windows. The files are opened in text mode, where '\n' is translated to '\r\n' on write, so the second-to-last paragraph is incorrect. Just write '\n' in all cases.

dskrypa Over a year ago

If it's written from Linux for Windows consumption, then it would be necessary.

Mark Tolonen Over a year ago

True, but if written from Windows \r\n will be translated to \r\r\n. Where does the OP mention Linux?

dskrypa Over a year ago

I was just trying to provide that for completeness... I would prefer to pretend \r\n doesn't exist...

Collectives™ on Stack Overflow

Remove all - at every end of lines in text file with cyrillic text

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related