Questions tagged [character-encoding]
Questions that deal with various representations of characters & character sets, such as: ASCII, UTF-8, EBCDIC, among others. Often encountered when moving files between operating systems that encode new lines with carriage returns and/or newline characters.
425 questions
2
votes
1
answer
105
views
apt_auth.conf file with machine login password fails on an not findable extra junk character
At work, on an Ubuntu 22.04.1 I'm willing to use apt_auth.conf abilty of apt to ease me getting packages from an artifactory.
I've wrote my artifactory.conf file into /etc/apt/apt.conf.d that way:
...
3
votes
2
answers
182
views
Embedded special characters skewing sed output
The Issue
I've been parsing a file with sed trying to tweeze out the desired data. This has worked fine for most lines in the file but there appears to be some embedded special characters that are ...
2
votes
1
answer
112
views
Tmux pane with long-running session using wrong character set?
Today I connected to a long-running process in tmux over ssh for work, to find that the pane the process was running in seems to have started using the wrong character encoding for its output, leading ...
0
votes
1
answer
86
views
Output of echo uses different encoding than the one specified according to LANG and LC_CTYPE
It is my understanding that the LANG and LC_CTYPE environment variables define the encoding used by shell commands when writing to stdout. However, after executing
LANG=de_DE.iso88591 LC_CTYPE=de_DE....
0
votes
0
answers
109
views
Advanced CLI tool/code to determine text encoding (besides enca)
Looking for advanced CLI tool/code to determine text Codepage/Language (besides enca).
Goal: Automate as much as possible conversion of hundreds/thousands of 8-bit text files (including non-ASCII ...
-2
votes
1
answer
88
views
Convert subtitles so they are coded correctly (Polish and `"` even gets wrongly coded)
Wrong encoding:
1
00:01:27,879 --> 00:01:31,216
No i dupa.
Koniec z darmowym wi-fi.
2
00:01:33,009 --> 00:01:34,972
- Ki-jung!
- No?
3
00:01:35,219 --> 00:01:39,183
Kobieta z góry
...
1
vote
1
answer
165
views
How can I set the character to Latn-1 or MCS when using serial-getty?
I'd like to use my old VT420 terminal as system console. Adding RS232 ports and setting up serial-getty are not a problem, but: For years, almost all Linux distros have been using UTF-8 as the ...
0
votes
1
answer
161
views
regex: how come the trademark symbol matches to a-z?
Sorry if this is a repeat or basic question but it is hard to search for a ™. I'm writing a script to remove weird characters from file names.
How come the trade mark symbol ™ matches [^a-z] ???
$ ...
4
votes
2
answers
1k
views
How can I convert full-width characters to half-width characters (and vice versa)?
Here is my simple problem, how can I convert half-width to full-width from the command line. I thought this would be built-in my iconv command line, but I did not find anything here:
$ iconv -l | ...
0
votes
0
answers
587
views
Strange/Buggy Characters in the Terminal
I use Debian SID and the Terminator is my terminal emulator. After updating the system the last time (yesterday 2023/11/22) and rebooting, some characters in my terminal in certain commands are ...
0
votes
1
answer
203
views
How to preserve non-ASCII characters?
We have default POSIX locale in our server but when non-ASCII character like רקטות לגוש דן וירושלים(hebrew) uploaded in server its getting changes to רק××ת ×××ש ×× ××ר×ש×××, How can preserve it ...
6
votes
1
answer
423
views
Converting from ISO-IR-87 to UTF-8 encoding
I am working on Debian and derivatives system. I'd like to convert from an original input ISO-IR-87 to UTF-8. Is there an easy way to do it ?
For reference:
% iconv -l | grep "IR-8"
ISO-IR-8-...
-1
votes
1
answer
386
views
Incorrect shell output encoding
I'm not an expert in Linux, but I am following the development of a software that runs on Linux Buildroot. The device can only use the program for the graphical interface, access the shell, or connect ...
4
votes
0
answers
677
views
MacOS files to NAS with rsync --iconv repeats the sync over and over
According to this hint and similar advice I am using the --iconv option in rsync (version 3.2.7) to sync file with umlauts (ä ü ö ...) to my Synology NAS. However the --iconv option does not work as ...
4
votes
4
answers
549
views
Collect chars from strings and print their unicode
Context (skip, if you don't care; read, if you suspect I'm totally on the wrong track)
For an embedded system with small memory, I want to generate fonts which contain only those glyphs actually ...
1
vote
1
answer
112
views
Command similar to ascii for ascii extended and/or for unicode?
ascii command in Linux is fast and great. It allows us to search for a character or for a code point and returns all relevant results for a given search. Is there something similar for ASCII extended (...
3
votes
1
answer
318
views
Different encoding/Unicode interpretation using terminal vs using shell script
I was working on a keymap script (map keys from one language keyboard layout to another). And after a lot of hard time trying to get everything working I found out that different characters are ...
0
votes
1
answer
117
views
testdisk utility reports nonexistent files from a exFAT drive used with Windows - why?
I tried to recover lost files from an exFAT thumb drive with the testdisk package on linux. It was very good at finding deleted files. However as I went through the entries, I saw weird entries. The ...
0
votes
0
answers
160
views
Repairing mixed encoding
I got some files containing Finnish text with mixed encoding, something one would get by (echo Mäntysalo ; echo Mäntysalo | recode utf-8..iso-8859-1) > problem.txt. Is there a "right" way ...
0
votes
1
answer
728
views
How can i give a program raw binary byte input which is produced by a different file?
I have 2 programs:
x - prompts user for input from stdin.
binary - prints something to stdout, the stuff it prints is made up of various raw binary bytes which are not fully supported by my terminals ...
2
votes
1
answer
130
views
Is my text mangled beyond repair?
My mangled Czech text:
NOTE ON CZECH BIRTH NUMBER VALIDATION IN CZECH LANGUAGE;
in Czechia birth number = personal identification number
========================================================
Do ...
0
votes
1
answer
43
views
when I use tab on rsync and remote pc magical characters happear in display
Simple.
I have the file "longname.server" on remote pc, I want to copy on my pc, but..I don't remind the name because is long and I use tab completion.
\rsync -avP remote:^[\\\[0\\\;...
1
vote
1
answer
71
views
Chaotic Command-line Interface Layout
When I type a long command on a command-line interface. Something strange may happen in the layout. The characters I typed don't show in lines correctly. Instead, they merge into 1 line or overwrite ...
1
vote
0
answers
98
views
GNU Recode - Properly decode mixed HTML Character/Numeric encoded text?
I recently found GNU recode as something that can be used to decode HTML entities, however when looking at a piece of malware I noticed that it appears to be mixed HTML character/entity encoding, such ...
1
vote
1
answer
3k
views
Characters do not display properly in terminal (st)
When typing German umlauts (ä,ö,ü) into the terminal (I am using st on Arch Linux, $XTERM is st-256color), it displays only <ffffffff>. Locale seems to be set properly.
Output of locale is
...