How to handle multiple human languages in a PostgreSQL database?

Question

If the database uses UTF-8 encoding, can text from all human languages be properly stored and retrieved?

Are there any "gotchas" when dealing with non-English languages in a PostgreSQL database?

Working in Ruby on Rails and PostgreSQL 9.1.

Craig Ringer · Accepted Answer · 2012-08-11 07:13:00Z

4

In addition to Spidey and Kevin's points (use utf-8 in the client and an ENCODING 'utf-8' database, beware of differing collations), I strongly recommend tagging each text field with the language it is in if at all possible.

If you ever want to use full text search or any kind of linguistic analysis, it really helps to know which language each field is in. Full text search can't do root-word analysis etc unless it has a dictionary and suffix list for the text being indexed - and for that it needs to know the language.

Storing ISO 639 language codes is probably a reasonable choice.

answered Aug 11, 2012 at 7:13

Craig Ringer

329k83 gold badges742 silver badges820 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

kgrittn · Accepted Answer · 2012-08-10 18:40:23Z

3

Different languages tend to order the same character strings differently, so be careful about the COLLATION when sorting.

http://www.postgresql.org/docs/current/static/collation.html

answered Aug 10, 2012 at 18:40

kgrittn

19.8k4 gold badges41 silver badges47 bronze badges

Comments

Spidey · Accepted Answer · 2012-08-10 16:59:15Z

2

UTF-8 can encode all Unicode codepoints, so yes, you won't have any problem at all. You'll need to connect with a UTF-8 connection though, and make sure your application also reads the output as UTF-8 encoded text.

answered Aug 10, 2012 at 16:59

Spidey

2,5892 gold badges30 silver badges41 bronze badges

Collectives™ on Stack Overflow

How to handle multiple human languages in a PostgreSQL database?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related