In DuckDB, can there be proper UTF-8 output in duckbox mode to Windows console? [closed]

Question

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.

This question does not appear to be about programming within the scope defined in the help center.

Closed 2 days ago.

I cannot get non-ASCII characters to be properly displayed in DuckDB console, even if the console application supports UTF-8. I have a sample CSV file encoded in UTF-8 containing a few test strings:

Language Code,Greeting
pl,Cześć
de,Grüß dich
el,Γειά σου
ru,Привет
ar,مرحبا
he,שלום
ja,こんにちは
ko,안녕하세요
hi,नमस्ते

After starting duckdb.exe from Windows console (chcp reports code page 852) I use the command

SELECT * FROM read_csv('hello_in_languages.csv');

and the response (in duckbox output mode) has flawed non-latin characters as expected:

┌───────────────┬────────────┐
│ Language Code │  Greeting  │
│    varchar    │  varchar   │
├───────────────┼────────────┤
│ pl            │ Cześć      │
│ de            │ Grüß dich  │
│ el            │ ???? ???   │
│ ru            │ ??????     │
│ ar            │ ?????      │
│ he            │ ????       │
│ ja            │ ????? │
│ ko            │ ????? │
│ hi            │ ??????       │
├───────────────┴────────────┤
│ 9 rows           2 columns │
└────────────────────────────┘

Then I switch shell's code page to UTF-8 using Windows command chcp 65001:

.shell chcp 65001

and I see different issue:

����������������������������Ŀ
� Language Code �  Greeting  �
�    varchar    �  varchar   �
����������������������������Ĵ
� pl            � Cze��      �
� de            � Gr�� dich  �
� el            � ???? ???   �
� ru            � ??????     �
� ar            � ?????      �
� he            � ????       �
� ja            � ????? �
� ko            � ????? �
� hi            � ??????       �
����������������������������Ĵ
� 9 rows           2 columns �
������������������������������

this happens regardless of input file format UTF-8 BOM / UTF-8 no BOM.
this happens regardless of the Windows console (old cmd console / new Terminal app).
command COPY ( SELECT * FROM read_csv('hello_in_languages.csv') ) TO 'greetings-out.csv' produces a file identical to the input file ⇒ all characters are fully preserved during the processing, and what we see is only a display issue
command .shell type hello_in_languages.csv shows that the console normally supports UTF-8:

Language Code,Greeting
pl,Cześć
de,Grüß dich
el,Γειά σου
ru,Привет
ar,مرحبا
he,שלום
ja,こんにちは
ko,안녕하세요
hi,नमस्ते

Does this mean that DuckDB cannot work properly with UTF-8 data through the console? Or is there a fix?

I marked the question for migration to SuperUser SE. If someone else also thinks it should be migrated, let's use Close feature to mark it for automatic migration. — miroxlav
– miroxlav, Commented Nov 18 at 2:00

miroxlav · Accepted Answer · 2025-11-18 01:53:36Z

`.binary on`

After further examining of possibly related options, I found the above dot command.

Then DuckDB passes UTF-8 as expected:

.binary on
select * from read_csv('hello_in_languages.csv');
┌───────────────┬────────────┐
│ Language Code │  Greeting  │
│    varchar    │  varchar   │
├───────────────┼────────────┤
│ pl            │ Cześć      │
│ de            │ Grüß dich  │
│ el            │ Γειά σου   │
│ ru            │ Привет     │
│ ar            │ مرحبا      │
│ he            │ שלום       │
│ ja            │ こんにちは │
│ ko            │ 안녕하세요 │
│ hi            │ नमस्ते       │
├───────────────┴────────────┤
│ 9 rows           2 columns │
└────────────────────────────┘

Collectives™ on Stack Overflow

In DuckDB, can there be proper UTF-8 output in duckbox mode to Windows console? [closed]

1 Answer 1

`.binary on`

Comments

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

.binary on

Comments

Related

`.binary on`