15

I’ve done some Google searches, but I get results related to encoding strings or files.

Can I write my Node.js JavaScript source code in UTF-8? Can I use non-ASCII characters in comments, strings, or as variable names?

ECMA-262 seems to require UTF-16 encoding, but Node.js won’t run a UTF-16 encoded .js file. It will, however run UTF-8 source and correctly interpret non-ASCII characters.

So is this by design or by “accident”? Is it specified somewhere that UTF-8 source code is supported?

8
  • 1
    I've never given this a second though, but I constantly use UTF-8 for everything I do and never had a problem. Commented Apr 12, 2012 at 14:05
  • 1
    I expect that it's not so much a Node.js thing, but a V8 thing. Commented Apr 12, 2012 at 14:07
  • 1
    I was hoping someone could point to, say, Node.js or V8 documentation that says what source encodings are allowed. (Python example: python.org/dev/peps/pep-0263). Yeah, I can and did futz around and see what works, but I want a more concrete answer. Commented Apr 12, 2012 at 15:12
  • You're linking to a very old version of the spec (3rd rev. is from 1999, we just hit 6th rev. last June). The current version is here. The requirement is "unicode" (with, by convention, ASCII being a subset of unicode, since the lower 127 codepoints in unicode are the same as the ASCII encoding specifies) Commented Sep 11, 2015 at 17:07
  • Hi @Nate , it seems some years have past from when you asked this question. I'm seeking for something like the Python example you wrote in the comment. Had you found a concrete answer in the meanwhile? Commented Nov 11, 2021 at 12:42

2 Answers 2

0

Reference: http://mathiasbynens.be/notes/javascript-identifiers

UTF-8 characters are valid javascript variable names. Go ahead and encode UTF-8.

Sign up to request clarification or add additional context in comments.

3 Comments

Unicode characters and UTF-8 encoding are different things. The standard actually seems to require UTF-16, not UTF-8 (but that doesn’t seem to be true in practice). It’s nice to have confirmation Unicode characters are valid variable names though.
Although available, I can't recommend doing things like var Hͫ̆̒̐ͣ̊̄ͯ͗͏̵̗̻̰̠̬͝ͅE̴̷̬͎̱̘͇͍̾ͦ͊͒͊̓̓̐_̫̠̱̩̭̤͈̑̎̋ͮͩ̒͑̾͋͘Ç̳͕̯̭̱̲̣̠̜͋̍O̴̦̗̯̹̼ͭ̐ͨ̊̈͘͠M̶̝̠̭̭̤̻͓͑̓̊ͣͤ̎͟͠E̢̞̮̹͍̞̳̣ͣͪ͐̈T̡̯̳̭̜̠͕͌̈́̽̿ͤ̿̅̑Ḧ̱̱̺̰̳̹̘̰́̏ͪ̂̽͂̀͠ = 'Zalgo';
The standard says that the native text processing model of JavaScript is based on UTF-16 code units. That doesn't specify what byte-encoding is used to convert a source file to those units.
0

I can't find documentation that says that Node treats files as encoded in UTF-8, but it seems that way experimentally:

/* Check in your editor that this Javascript file was saved in UTF-8 */
var nonEscaped = "Планета_Зямля";
var escaped = "\u041f\u043b\u0430\u043d\u0435\u0442\u0430\u005f\u0417\u044f\u043c\u043b\u044f";
if (nonEscaped === escaped) {
  console.log("They match");
}

The above example prints They match.

Non-BMP note:

Note that UTF-8 supports non-BMP code points (U+10000 and onwards), but Javascript has complications in that case, it automatically converts them to surrogate pairs. This is part of the language:

/* Check in your editor that this Javascript file was saved in UTF-8 */
var nonEscaped = "💩"; // U+1F4A9
var escaped1 = "\ud83d\udca9";
if (nonEscaped === escaped1) {
  console.log("They match");
}
/* Newer implementations support this syntax: */
var escaped2 = "\u{1f4a9}";
if (nonEscaped === escaped2) {
   console.log("The second string matches");
}

This prints They match and The second string matches.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.