36

I have some code that pulls data from a com-port and I want to make sure that what I got really is a printable string (i.e. ASCII, maybe UTF-8) before printing it. Is there a function for doing this? The first half dozen places I looked, didn't have anything that looks like what I want. (string has printable but I didn't see anything (there, or in the string methods) to check if every char in one string is in another.

I am looking for a single function, not a roll-your-own solution.

Note: control characters are not printable for my purposes.

1
  • If there's no ready-made solution, you can DIY with string.printable: printables = set(string.printable); if all(char in printables for char in your_string): ... Commented Sep 3, 2010 at 15:07

10 Answers 10

55

As you've said the string module has printable so it's just a case of checking if all the characters in your string are in printable:

>>> hello = 'Hello World!'
>>> bell = chr(7)
>>> import string
>>> all(c in string.printable for c in hello)
True
>>> all(c in string.printable for c in bell)
False

You could convert both strings to sets - so the set would contain each character in the string once - and check if the set created by your string is a subset of the printable characters:

>>> printset = set(string.printable)
>>> helloset = set(hello)
>>> bellset = set(bell)
>>> helloset
set(['!', ' ', 'e', 'd', 'H', 'l', 'o', 'r', 'W'])
>>> helloset.issubset(printset)
True
>>> set(bell).issubset(printset)
False

So, in summary, you would probably want to do this:

import string
printset = set(string.printable)
isprintable = set(yourstring).issubset(printset)
Sign up to request clarification or add additional context in comments.

6 Comments

I was kinda hoping for a non-roll your own solution. Why the heck doesn't python have this as a function?
"Why the heck doesn't python have this as a function?": this solution, and others like it, are trivial compositions of builtin python facilities. if this was given a special name, and every other useful but trivial feature was also blessed with a name, then the python namespace would be abysmally cluttered. this short composition is every bit as readable as some hypothetical stringutil.stringisprintable(myvar), except that there's no need to maintain that extra module.
Does this handle anything beyond ASCII?
Well, Python does have isalpha, isdigit, isspace, isalnum, islower, isupper and istitle. The one's it's missing (compared to C) are iscntrl, isgraph, isprint, ispunct and isxdigit. Given the C library implements them already, it's not entirely strange to assume Python would have them too.
since this post is old, python 2 does not have a str.isprint or str.isprintable builtin method or function. python 3 does. ...it's a minor annoyance that instead of following convention and style, they called it isprintable instead of isprint. py2 -> docs.python.org/2.7/library/stdtypes.html#str.isalnum ; py3 -> docs.python.org/3.6/library/stdtypes.html#str.isalnum
|
8

try/except seems the best way:

def isprintable(s, codec='utf8'):
    try: s.decode(codec)
    except UnicodeDecodeError: return False
    else: return True

I would not rely on string.printable, which might deem "non-printable" control characters that can commonly be "printed" for terminal control purposes (e.g., in "colorization" ANSI escape sequences, if your terminal is ANSI-compliant). But that, of course, depends on your exact purposes for wanting to check this!-)

5 Comments

string.printable is well defined. "a combination of digits, letters, punctuation, and whitespace." Whitesapce OTOH is a little less so: "On most systems this includes the characters space, tab, linefeed, return, formfeed, and vertical tab."
@BCS, it's basically the same concept as C's bad old isprint macro, and exhibits exactly the same failings (no control sequences / escape sequences -- but many terminals and printers can accept some control / escape sequences for cosmetic purposes such as colorization, and, depending on the app's purposes, forbidding such characters from the output may therefore prove unwise).
My concern is that whitespace could include more than those 6 chars. I know that if my data source ever contains "control chars", that I can assume they are junk.
Alex, your suggested function fails for even trivial unprintable input; for example: isprintable('\00\01\02\03')True — unless I am misunderstanding your intent?
Alex's function might mean "submittable to print() and to other unspecified streams (like the console and many print devices) without raising an exception" whereas string.printable() loosely means "has a glyph". See Unicode category. The streams you submit a string.printable() char to must agree with your definition. For example, a browser displaying SVG text may raise an exception over not printable() characters (in the Unicode category 'control'.) That's what Alex means by "exact purposes", its about printable()'s ensure assertion and down stream require assertion.
6

This Python 3 string contains all kinds of special characters:

s = 'abcd\x65\x66 äüöë\xf1 \u00a0\u00a1\u00a2 漢字 \a\b\r\t\n\v\\ \231\x9a \u2640\u2642\uffff'

If you try to show it in the console (or use repr), it makes a pretty good job of escaping all non-printable characters from that string:

>>> s
'abcdef äüöëñ \xa0¡¢ 漢字 \x07\x08\r\t\n\x0b\\ \x99\x9a ♀♂\uffff'

It is smart enough to recognise e.g. horizontal tab (\t) as printable, but vertical tab (\v) as not printable (shows up as \x0b rather than \v).

Every other non printable character also shows up as either \xNN or \uNNNN in the repr. Therefore, we can use that as the test:

def is_printable(s):
    return not any(repr(ch).startswith("'\\x") or repr(ch).startswith("'\\u") for ch in s)

There may be some borderline characters, for example non-breaking white space (\xa0) is treated as non-printable here. Maybe it shouldn't be, but those special ones could then be hard-coded.


P.S.

You could do this to extract only printable characters from a string:

>>> ''.join(ch for ch in s if is_printable(ch))
'abcdef äüöëñ ¡¢ 漢字 \r\t\n\\  ♀♂'

Comments

6

In Python 3, strings have an isprintable() method:

>>> 'a, '.isprintable()
True

For Python 2.7, see David Webb's answer.

2 Comments

Confusingly, str.isprintable() has a different notion of "printable" than string.printable (for example, the former does not consider \n and \t to be printable).
This function considers a string using Cyrillic characters as not printable. "Човек" returns false. Totally useless for my needs.
4
>>> # Printable
>>> s = 'test'
>>> len(s)+2 == len(repr(s))
True

>>> # Unprintable
>>> s = 'test\x00'
>>> len(s)+2 == len(repr(s))
False

3 Comments

This is just a little too clever. You probably shouldn't do this, but +1 anyway because it made me smile.
It fails for s = 'a\nb'.
Even fails for '\\'. repr('\\') = "'\\\\'"
1

The category function from the unicodedata module might suit your needs. For instance, you can use this to check whether there are any control characters in a string while still allowing non-ASCII characters.

>>> import unicodedata

>>> def has_control_chars(s):
...     return any(unicodedata.category(c) == 'Cc' for c in s)

>>> has_control_chars('Hello 世界')
False

>>> has_control_chars('Hello \x1f 世界')
True

Comments

1
# Here is the full routine to display an arbitrary binary string
# Python 2

ctrlchar = "\n\r| "

# ------------------------------------------------------------------------

def isprint(chh):
    if ord(chh) > 127:
        return False
    if ord(chh) < 32:
        return False
    if chh in ctrlchar:
        return False
    if chh in string.printable:
        return True
    return False


# ------------------------------------------------------------------------
# Return a hex dump formatted string

def hexdump(strx, llen = 16):
    lenx = len(strx)
    outx = ""
    for aa in range(lenx/16):
        outx += " "
        for bb in range(16):
            outx += "%02x " % ord(strx[aa * 16 + bb])
        outx += " | "     
        for cc in range(16):
            chh = strx[aa * 16 + cc]
            if isprint(chh):
                outx += "%c" % chh
            else:
                outx += "."
        outx += " | \n"

    # Print remainder on last line
    remn = lenx % 16 ;   divi = lenx / 16
    if remn:
        outx += " "
        for dd in range(remn):
            outx += "%02x " % ord(strx[divi * 16 + dd])
        outx += " " * ((16 - remn) * 3) 
        outx += " | "     
        for cc in range(remn):
            chh = strx[divi * 16 + cc]
            if isprint(chh):
                outx += "%c" % chh
            else:
                outx += "."
        outx += " " * ((16 - remn)) 
        outx += " | \n"


    return(outx)

Comments

1

In the ASCII table, [\x20-\x7e] are printable characters.
Use regular expressions to check whether characters other than these characters are included in the string.
You can make sure whether this is a printable string.

>>> import re

>>> # Printable
>>> print re.search(r'[^\x20-\x7e]', 'test')
None

>>> # Unprintable
>>> re.search(r'[^\x20-\x7e]', 'test\x00') != None
True

>>> # Optional expression
>>> pattern = r'[^\t-\r\x20-\x7e]'

1 Comment

This would be a better answer if you explained how the code you provided answers the question.
0

Mine is a solution to get rid of any known set of characters. it might help.

non_printable_chars = set("\n\t\r ")     # Space included intensionally
is_printable = lambda string:bool(set(string) - set(non_printable_chars))
...
...
if is_printable(string):
    print("""do something""")

...

Comments

0
ctrlchar = "\n\r| "

# ------------------------------------------------------------------------
# This will let you control what you deem 'printable'
# Clean enough to display any binary 

def isprint(chh):
    if ord(chh) > 127:
        return False
    if ord(chh) < 32:
        return False
    if chh in ctrlchar:
        return False
    if chh in string.printable:
        return True
    return False

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.