0

We have a large project that is entirely coded in ASCII. Is it worth putting coding statements at the beginning of each source file (e.g. #coding=utf-8) for some reason if the source doesn't have any unicode in it?

Thanks, --Peter

4
  • I'd certanly do it... A python script would do it pretty quickly ;-) Commented Apr 14, 2014 at 17:25
  • 2
    no I see no reason to specify encoding unless required to run the script ... furthermore I think that adding #coding=utf8 to the top would make it confusing if theres no utf8 characters ... (not overly so but meh ...) Commented Apr 14, 2014 at 17:26
  • 3
    Depends what you want to happen if somebody pastes some non-ASCII UTF-8 data into a string in one of the files. Do you want it to work, or to complain that they aren't supposed to do that? What do you want if they paste in some non-ASCII data in an encoding other than UTF-8? That is to say, are your files ASCII by policy, or are they UTF-8 by policy that happens only to contain ASCII characters? Commented Apr 14, 2014 at 17:26
  • In our case, there is no policy as to the encoding, but the files do happen to be almost universally ASCII. Seems to me that the best course of action, since Python 3 will (I hope) be in our future at some point, and since we are getting a very small benefit from ASCII encoding (namely, that should there accidentally be unicode committed to the file that the interpreter will raise an exception), that we should go ahead and explicitly label the encoding ASCII (except for any files that actually need UTF-8) Commented Apr 14, 2014 at 21:02

3 Answers 3

2

For portability I would explicitly declare it, especially as the default file encoding is changing in Python 3 (see PEP-3120):

This PEP proposes to change the default source encoding from ASCII to UTF-8. Support for alternative source encodings continues to exist; an explicit encoding declaration takes precedence over the default.

Although it doesn't affect you with ASCII, seeing how explicit is better than implicit I would recommend you add it to the top of your file.

Sign up to request clarification or add additional context in comments.

1 Comment

the source code encoding declaration is redundant redundant for ascii-only files.
1

ASCII is the default in Python 2. UTF-8 is the default in Python 3.

If your files are ascii-only; you don't need to declare the source code encoding in both version (ascii is a subset of utf-8).

Non-ASCII character leads to SyntaxError in Python 2 therefore an accidental non-ascii character won't go unnoticed and won't corrupt any data. There is no reason to declare source code encoding for ascii-only files.

Comments

1

You should do one of two things (at least):

  • Add a hook to your repository making it verify on checkin that all python files are still pure ASCII.
  • Put the explicit ASCII-encoding tag in the files.

You might want to check if you get significantly better startup when the explicit tag is UTF-8 though. Anyway, I would consider that a bug of the interpreter.

This way, if anyone slips and mistakenly adds some non-ASCII characters, you won't have to chase that (potential) bug. Explicitly restricting to ASCII has one advantage: You actually can reliably see what each string contains and there are no equal-seeming distinct names.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.