Issue with printing formatted string in Python

Question

I am trying to parse a text document line by line and in doing so I stumbled onto some weird behavior which I believe is caused by the presence of some kind of ankh symbol (☥). I am not able to copy the real symbol here. In my code I try to determine whether a '+' symbol is present in the first characters of each line. To see if this worked I added a print statement containing a boolean and this string.

The relevant part of my code:

with open(file_path) as input_file:
    content = input_file.readlines()
    for line in content:
        plus = '+' in line[0:2]
        print('Plus: {0}, line: {1}'.format(plus,line))

A file I could try to parse:

+------------------------------
row 1 with some content
+------+------+-------+-------
☥+------+------+-------+------
|  col 1 | col 2 | col 3 ...
+------+------+-------+-------
|_ valu | val |    |   dsf |..
|_ valu | valu | ...

What I get as output:

Plus: True, line: +------------------------------

Plus: False, line: row 1 with some content

Plus: True, line: +------+------+-------+-------

♀+------+------+-------+------

Plus: False, line: | col 1 | col 2 | col 3 ...

Plus: True, line: +------+------+-------+-------

Plus: False, line: |_ valu | val | | dsf |..

Plus: False, line: |_ valu | valu | ...

So my question is why does it just print the line containing the symbol without the 'Plus: True/False'. How should I solve this? Thanks.

I just tried to reproduce this with the same sequence of input lines and didn't get any repeated lines. — khelwood
– khelwood, Commented Mar 3, 2017 at 12:36
Maybe your lines have a \r character in them. Try printing the repr version of them. — khelwood
– khelwood, Commented Mar 3, 2017 at 12:40
Mm I did have to insert a unicode symbol in here because I can't seem to copy the real symbol. — spijs
– spijs, Commented Mar 3, 2017 at 12:40
@spijs here you have it, \r resets caret to line beginning. — Łukasz Rogalski
– Łukasz Rogalski, Commented Mar 3, 2017 at 12:50
You may want to process it or not, but in ASCII, '\x0c' is the code for form feed. It means that the program that has created it intended to start a new page there. — Serge Ballesta
– Serge Ballesta, Commented Mar 3, 2017 at 12:59

Stephen Rauch · Accepted Answer · 2017-03-03 18:17:04Z

1

What you are seeing is the gender symbol. It is from the original IBM PC character set and is encoded as 0x0c, aka FormFeed, aka Ctrl-L.

If you are parsing text data with these present, they likely were inserted to indicate to a printer to start a new page.

From wikipedia:

Form feed is a page-breaking ASCII control character. It forces the printer to eject the current page and to continue printing at the top of another. Often, it will also cause a carriage return. The form feed character code is defined as 12 (0xC in hexadecimal), and may be represented as control+L or ^L.

answered Mar 3, 2017 at 18:17

Stephen Rauch♦

50.1k32 gold badges118 silver badges143 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Issue with printing formatted string in Python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related