29

I am new to postrges and want to sort varchar type columns. want to explain the problem with with below example:

table name: testsorting

   order       name
    1            b
    2            B
    3            a
    4            a1
    5            a11
    6            a2
    7            a20
    8            A
    9            a19

case sensitive sorting (which is default in postgres) gives:

select name from testsorting order by name;

    A
    B
    a
    a1
    a11
    a19
    a2
    a20
    b

case in-sensitive sorting gives:

select name from testsorting order by UPPER(name);

      A
      a
      a1
      a11
      a19
      a2
      a20
      B
      b

how can i make alphanumeric case in-sensitive sorting in postgres to get below order:

          a
          A
          a1
          a2
          a11
          a19
          a20
          b
          B

I wont mind the order for capital or small letters, but the order should be "aAbB" or "AaBb" and should not be "ABab"

Please suggest if you have any solution to this in postgres.

1
  • I'm adding a flat on: "case sensitive sorting (which is default in postgres)", that's not exactly right. default means "means the locale settings defined for the database." (Ref: PostgreSQL documentation collation.html). Example: on Windows, it may be case-insensitive. It is in my case on Windows 10. Commented Sep 6, 2019 at 2:01

6 Answers 6

11

My PostgreSQL sorts the way you want. The way PostgreSQL compares strings is determined by locale and collation. When you create database using createdb there is -l option to set locale. Also you can check how it is configured in your environment using psql -l:

[postgres@test]$ psql -l
List of databases
 Name    |  Owner   | Encoding |  Collate   |   Ctype    |   Access privileges
---------+----------+----------+------------+------------+-----------------------
 mn_test | postgres | UTF8     | pl_PL.UTF8 | pl_PL.UTF8 |

As you see my database uses Polish collation.

If you created database using other collation then you can use other collation in query just like:

SELECT * FROM sort_test ORDER BY name COLLATE "C";
SELECT * FROM sort_test ORDER BY name COLLATE "default";
SELECT * FROM sort_test ORDER BY name COLLATE "pl_PL";

You can list available collations by:

SELECT * FROM pg_collation;

EDITED:

Oh, I missed that 'a11' must be before 'a2'.

I don't think standard collation can solve alphanumeric sorting. For such sorting you will have to split string into parts just like in Clodoaldo Neto response. Another option that is useful if you frequently have to order this way is to separate name field into two columns. You can create trigger on INSERT and UPDATE that split name into name_1 and name_2 and then:

SELECT name FROM sort_test ORDER BY name_1 COLLATE "en_EN", name_2;

(I changed collation from Polish into English, you should use your native collation to sort letters like aącć etc)

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Michal. I checked psql -l but not showing me the configured locale. Using COLLATE "pl_PL" in SELECT worked and sorted the list in case in-sensitive however the problem still with the alphanumeric and "a2" is listed after "a11" and "a19". do you mean that using a proper COLLATE will solve alphanumeric sorting?
FYI, 'a2' before 'a11' is called natural sorting, and SQLite has a contrib for it: See sqlite.org/forum/info/65814bc11d873327
5

If the name is always in the 1 alpha followed by n numerics format then:

select name
from testsorting
order by
    upper(left(name, 1)),
    (substring(name from 2) || '0')::integer

4 Comments

I am getting ERROR: invalid input syntax for integer: "" with this. Do i need to check collate?? If I remove ::integer at end, i got output which is not exactly sorted. a11 and a19 listed before a2, which is wrong(A, a , a1,a11,a19,a2, a20, B, b). The correct sorted order should be A, a , a1,a2,a11,a19,a20, B, b.
Thanks Clodoaldo. I got this error because of null value, so need additional overhead to handle null value in the approach you suggested. any comments?
@Akhilesh A null value does not produce that error message. Check select (substring(null from 2) || '0')::integer. What is the exact error message?
ah atlast found what i wanted.
3

PostgreSQL uses the C library locale facilities for sorting strings. C library is provided by the host operating system. On Mac OS X or a BSD-family operating system,the UTF-8 locale definitions are broken and hence the results are as per collation "C".

image attached for collation results with ubuntu 15.04 as host OS

Check FAQ's on postgres wiki for more details : https://wiki.postgresql.org/wiki/FAQ

Comments

3

As far as I'm concerned, I have used the PostgreSQL module citext and used the data type CITEXT instead of TEXT. It makes both sort and search on these columns case insensitive.

The module can be installed with the SQL command CREATE EXTENSION IF NOT EXISTS citext;

Comments

1

I agree with Clodoaldo Neto's answer, but also don't forget to add the index

CREATE INDEX testsorting_name on testsorting(upper(left(name,1)), substring(name from 2)::integer)

3 Comments

This is the index, not the order by. It won't sort your list, Clodoaldo Neto's answer will. This will make the sort efficient.
This index probably won't be used for sorting. Only a unique index is used for sorting as far as I can see it here with 9.3. I'd happily be proven wrong, though...
Any B-tree index can be utilised for a sort, regardless of if it is unique: postgresql.org/docs/9.3/static/indexes-ordering.html. Obviously depends on an index existing that reflects the specific query being made.
1

Answer strongly inspired from this one.
By using a function it will be easier to keep it clean if you need it over different queries.

CREATE OR REPLACE FUNCTION alphanum(str anyelement)
   RETURNS anyelement AS $$
BEGIN
   RETURN (SUBSTRING(str, '^[^0-9]*'),
      COALESCE(SUBSTRING(str, '[0-9]+')::INT, -1) + 2000000);
END;
$$ LANGUAGE plpgsql IMMUTABLE;

Then you could use it this way:

SELECT name FROM testsorting ORDER BY alphanum(name);

Test:

WITH x(name) AS (VALUES ('b'), ('B'), ('a'), ('a1'),
   ('a11'), ('a2'), ('a20'), ('A'), ('a19'))
SELECT name, alphanum(name) FROM x ORDER BY alphanum(name);

 name |  alphanum   
------+-------------
 a    | (a,1999999)
 A    | (A,1999999)
 a1   | (a,2000001)
 a2   | (a,2000002)
 a11  | (a,2000011)
 a19  | (a,2000019)
 a20  | (a,2000020)
 b    | (b,1999999)
 B    | (B,1999999)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.