Get the printed length of a string in terminal

Question

It seems like a fairly simple task, yet I can't find a fast and reliable solution to it.

I have strings in bash, and I want to know the number of characters that will be printed on the terminal. The reason I need this, is to nicely align the strings in three columns of n characters each. For that, I need to add as many "space" as necessary to make sure the second and third columns always starts at the same location in the terminal.

Example of problematic string length:

v='féé'

echo "${#v1}"
 > # 5 (should be 3)

printf '%s' "${v1}" | wc -m
 > # 5 (should be 3)

printf '%s' "${v1}" | awk '{print length}'
 > # 5 (should be 3)

The best I have found is this, that works most of the time.

echo "${v}" | python3 -c 'v=input();print(len(v))'
 > # 3 (yeah!)

But sometimes, I have characters that are modified by the following sequences. I can't copy/past that here, but this is how it looks like:

v="de\314\201tresse"
echo "${v}"
 > # détresse
echo "${v}" | python3 -c 'v=input();print(len(v))'
 > # 9 (should be 8)

I know it can be even more complicated with \r character or ANSI sequences, but I am only going to have to deal with "regular" strings that can be commonly found in filenames, documents and other file content writing by humans. Since the string IS printed in the terminal, I guess there must be some engine that knows or can know the printed length of the string.

I have also considered the possible solution of sending ANSI sequence to get the position of the cursor in the terminal before and after printing the string, and use the difference to compute the length, but it looks like a rabbit hole I don't want to dig. Plus it will be very slow.

I don;t know if this is contributing to your problems or not but be aware that echo will add a newline so the output of echo 'foo' is 4 characters long, not 3 as you might expect. You could use printf '%s' 'foo' instead but then the output is no longer a valid text "file" since it doesn't have a terminating newline so YMMV with what any text processing tool does with it so - read the man page for whatever tool you use to determine the length if you go that route rather than just subtracting 1. — Ed Morton
– Ed Morton, Commented Sep 5, 2023 at 20:35
These should be helpful: How can I get sensible results from len(), str.format() and a zero-width space?, How do I get the "visible" length of a combining Unicode string in Python? — wjandrea
– wjandrea, Commented Sep 6, 2023 at 0:43
Be careful what you ask. "Length of a string" does not mean "apparent number of occupied columns onscreen". The "length of a string" maybe means the "number of characters", but sometimes means the number of bytes, which is a very different thing. féé looks like three characters, but those é's are not e's. It takes two bytes to print that one character, though if you print out the individual bytes, one looks like an e, and the other doesn't generally print, since it's an "overstrike". Be super careful in your terminology - and I bet I botched mine here somewhere, lol. Tricky! — Paul Hodges
– Paul Hodges, Commented Sep 6, 2023 at 4:41
length is (5, 3, 7, 10, 12, 3), ie number of characters, number of graphemes, number of bytes (UTF-8), number of bytes (utf-16-le/be), number of bytes (utf-16), number of terminal cells) — Andj
– Andj, Commented Sep 11, 2023 at 7:28

Lucas Moura Gomes · Accepted Answer · 2023-09-06 15:12:17Z

2

How about

v='féé'
echo "${v}" | python3 -c 'import unicodedata as ud;v=input();print(len(ud.normalize("NFC",v)))'

If you have trouble installing with

pip install unicodedata

try unicodedata2

Additional Notes

This will normalize strings to utf-8 according to the NFC standard explained here. If you are working with Latin ANSI, then it should work fine. However, for pre-Unicode ANSI encodings of languages such as Arabic, Greek, Hebrew, Russian or Thai, then NFC may keep the original formatting. Although it is generally more advisable to use NFC, you could try NFKC in those cases. The reason for preferring NFC is to avoid normalizing symbols that are compatible but not canonically equivalent, for example the single character ﬀ (U+FB00): if you normalize it with NFC, it is length 1, but if you normalize it with NFKC, that's length 2. Depending on your application that can create some issues, but if you just want readable text, then NFKC is fine.

edited Sep 6, 2023 at 15:12

answered Sep 5, 2023 at 20:48

Lucas Moura Gomes

1827 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Mark Ransom Over a year ago

You should note the limitations of this approach.

Lucas Moura Gomes Over a year ago

I will update it with some clarifications

Slagt Over a year ago

NFKC seems to work just fine with Latin AINSI, is there a reason to prefer NFC?

Lucas Moura Gomes Over a year ago

I've added some explanation about that. it's about unicode equivalence.

Andj Over a year ago

@LucasMouraGomes, you do not install unicodedata via pip, it is part of your Python install.

|

Andj · Accepted Answer · 2023-09-06 12:02:12Z

1

To get the number of terminal cells used by a string, it is possible to use wcswidth. There is a Python implementation for wcwidth and wcswidth.

With Python install wcwidth:

pip install wcwidth

And the Python code would be:

from wcwidth import wcswidth
v = 'féé'
print(wcswidth(v))
# 3

It will also yield the correct result for NFD:

v = ud.normalize("NFD",v)
print(wcswidth(v))
# 3

Additionally it will correctly handle wide characters, i.e. characters that take up 2 terminal cells per character:

v='中文'
print(wcswidth(v))
# 4

And adapting Lucas' solution above, for the terminal:

v='féé'
echo "${v}" | python3 -c 'from wcwidth import wcswidth;v=input();print(wcswidth(v))'

edited Sep 6, 2023 at 12:02

answered Sep 6, 2023 at 11:49

Andj

1,7121 gold badge10 silver badges13 bronze badges

Comments

Gilles Quénot · Accepted Answer · 2023-09-05 23:22:34Z

0

With Perl:

Without modules:

perl -CSAD -E 'say length($ARGV[0])' été
3

With utf8::all module:

perl -Mutf8::all -E 'say length($ARGV[0])' été
3

edited Sep 5, 2023 at 23:22

answered Sep 5, 2023 at 22:17

Gilles Quénot

188k43 gold badges232 silver badges229 bronze badges

1 Comment

Slagt Over a year ago

It does not look to work well with combined characters like in de\314\201tresse (output is 9, should be 8).

Paolo · Accepted Answer · 2023-09-06 07:03:05Z

0

Using grep and wc:

$ v="de\314\201tresse"
$ printf "%s" "$v" | grep -o '[a-z]' | wc -l
8

$ v='féé'
$ printf "%s" "$v" | grep -o '[a-z]' | wc -l
3

answered Sep 6, 2023 at 7:03

Paolo

26.6k8 gold badges51 silver badges88 bronze badges

1 Comment

Slagt Over a year ago

Thanks, that's a very simple solution without any additional tools. It does require to list all the "valid" characters though, one way or another grep -i -o '[a-z0-9 +special characters]'. I am not sure why [a-z] matches with é though, how is that an expected behavior?

Collectives™ on Stack Overflow

Get the printed length of a string in terminal

4 Answers 4

6 Comments

Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

6 Comments

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related