Why are ASCII-encoded files extended to UTF-8 or in reverse reduced to ASCII?
user:~$ echo 'A B C | } ~' > ./file
user:~$
user:~$ file --brief --mime ./file
text/plain; charset=us-ascii
user:~$
user:~$
user:~$ echo 'ᴁ ♫ ⼌ 𝐑 🀵 🈀' >> ./file
user:~$
user:~$ file --brief --mime ./file
text/plain; charset=utf-8
user:~$
user:~$
user:~$ cat ./file
A B C | } ~
ᴁ ♫ ⼌ 𝐑 🀵 🈀
user:~$
user:~$
user:~$ sed -i '$d' ./file
user:~$
user:~$ cat ./file
A B C | } ~
user:~$
user:~$ file --brief --mime ./file
text/plain; charset=us-ascii
user:~$
In case you cannot read a character in the second echo statement: From first to last: U+1D01, ᴁ; U+266B, ♫; U+2F0C, ⼌; U+1D411, 𝐑; U+1F035, 🀵; U+1F200, 🈀.
The locale settings are:
user:~$ echo $LANG
en_US.UTF-8
user:~$ echo $LANGUAGE
en_US:en
user:~$ echo $LC_COLLATE
user:~$ echo $LC_CTYPE
user:~$ echo $SHELL
/bin/bash
user:~$ echo $SHELL
/bin/bash
user:~$
user:~$ ps -p $$
PID TTY TIME CMD
7537 pts/6 00:00:00 bash
user:~$
fileutility can report UTF-8 for the second revision as well, but it chooses to display a more refined one.