Sorting records from Oracle with multiple decimal points (.)

Question

UPDATE:

ORACLE VERSION 10G

I have a list of records in Oracle as follows, these are actually sections of various books
The records are generated in the below format

[main topic].[sub topic].[first level section] ..... .[last level section]

Sections
--------
1
7.1
6.2 
7.1
7.4
6.8.3
6.8.2
10
1.1
7.6
6.1
11
8.3
8.5
1.1.2
6.4
6.6
8.4
1.1.6
6.8.1
7.7.1
7.5
7.3

I want to order this like as follows

But as the field is not a numeric datatype the sorting results in something like this

1
10
1.1
1.1.2
1.1.6
....
.....
8.5

How can I sort them. I am unable to convert them to number due to multiple number of decimal points.

Is there any function in oracle that supports such a sorting technique

Is number of '.' symbols in string fixed (i.e. no more than two) or it can not be determined? — Mikhail
– Mikhail, Commented Jan 9, 2014 at 12:36
It can be more than 2 at most 6 very rarely(almost never) more than that...I just kept the example simple.. — Sangeet Menon
– Sangeet Menon, Commented Jan 9, 2014 at 12:38
Ideally, long term, split these into their own columns/tables. Storing data like this is a violation of 1NF. — Clockwork-Muse
– Clockwork-Muse, Commented Jan 10, 2014 at 6:39

Vincent Malgrat · Accepted Answer · 2014-01-28 15:56:09Z

7

When the maximum depth is known, you can split the section in sub-sections:

SQL> SELECT SECTION FROM DATA
  2   ORDER BY to_number(regexp_substr(SECTION, '[^.]+', 1, 1)) NULLS FIRST,
  3            to_number(regexp_substr(SECTION, '[^.]+', 1, 2)) NULLS FIRST,
  4            to_number(regexp_substr(SECTION, '[^.]+', 1, 3)) NULLS FIRST;

SECTION
-------
1
1.1
1.1.2
1.1.6
6.1
6.2
[...]
8.5
10
11

If the maximum depth of sub-sections is unknown (but presumably less than a couple hundred on 8-bit character databases or less than a few thousands in ANSI-character databases), you could define a function that converts your unsortable digits into sortable characters:

SQL> CREATE OR REPLACE FUNCTION order_section (p_section VARCHAR2)
  2     RETURN VARCHAR2 IS
  3     l_result VARCHAR2(4000);
  4  BEGIN
  5     FOR i IN 1..regexp_count(p_section, '[^.]+') LOOP
  6        l_result := l_result
  7                    || CASE WHEN i > 1 THEN '.' END
  8                    || CHR(64+regexp_substr(p_section, '[^.]+', 1, i));
  9     END LOOP;
 10     RETURN l_result;
 11  END;
 12  /

Function created

SQL> SELECT SECTION, order_section(SECTION)
  2    FROM DATA
  3   ORDER BY 2;

SECTION ORDER_SECTION(SECTION)
------- -------------------------
1       A
1.1     A.A
1.1.2   A.A.B
1.1.6   A.A.F
6.1     F.A
6.2     F.B
[...]
8.5     H.E
10      J
11      K

edited Jan 28, 2014 at 15:56

answered Jan 9, 2014 at 12:39

Vincent Malgrat

67.9k9 gold badges122 silver badges176 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Sangeet Menon Over a year ago

I cannot resolve regexp_count during compiling, I get PLS-00201: identifier 'REGEXP_COUNT' must be declared. I I am currently working on 10g, I tried it on 11i and it was working. I forgot to mention that I am on 10g

Sangeet Menon Over a year ago

I have finally got it to work in 10g Replaced REGEXP_COUNT (working perfectly in 11i) with something like this (LENGTH(p_section) - length(replace(p_section,'.',null))) Do add this alternative(for 10g users) to the answer. But now, it returns something like this (square box).A.A for 2005.1.1. Can this create any kind of problem?

Vincent Malgrat Over a year ago

You're right, regexp_count doesn't seem to work on PL/SQL 10g (works with SQL though). Your alternative is fine for 10g. The second method converts a number to a character so you might run into problems if the number is an invalid character. A square is probably a character that your can't display properly on your client (chr(2005) is displayed as Õ in my UTF8 db).

Sangeet Menon Over a year ago

it failed when the section went above 191. The section above 191 get ordered before. Any work around? Posted the problem in your answer...

Vincent Malgrat Over a year ago

@SangeetMenon This fails because CHR(191) seems to be unsortable. If you have to order sections with this large number, you should use a function that orders each chapter correctly instead of CHR, for instance TO_CHAR(p_section, 'fm000000').

|

ThinkJet · Accepted Answer · 2014-01-10 06:25:03Z

Solution without regexp and functions (suppose t is a table with source data):

select * from t
order by
    (
      select 
        sum(
          to_number(substr(
                   sections,
                   decode(level,
                     1,1,
                     instr(sections, '.', 1, level-1)+1
                   ),
                   decode(instr(sections, '.', 1, level),
                     0, length(sections),
                     instr(sections, '.', 1, level) 
                     - 
                     decode(level,
                       1,1,
                       instr(sections, '.', 1, level-1)+1
                     )
                   )  
          )) 
          * power(1000, 10-level)
        )
      from dual
        connect by instr(sections,'.',1,level-1) > 0
    )

SQLFiddle example

Main idea is to calculate number, wich indicates priority of each row. Suppose, we have 33.17.21.2 value. This string may be treated as a number in hypotetical numeral system with base Q like a hexadecimal numbers represents IPv4 address, and then converted to a numeric representation:
33*(Q^3) + 17*(Q^2) + 21*(Q^1) + 2*(Q^0)

For example, if Q=100 then number from exmple is

33*1000000 + 17*10000 + 21*100 + 2*1 = 33172102

First trouble with this approach is that each level numbers required to be less than choosed Q value. It's by design and can't be eleminated.

Next is that we don't know how many levels at all, we have 7.1 and 2.2.2.2.2.2, and shorter one most come first. Therefore while calculating value it starts from some fixed power N and then degrades power of Q, so in case of Q=100 and N=3 sequence of multipilers starts with this numbers: 1000000, 10000, 100, 1, 1/100, 1/10000, 1/1000000, ...

In code above Q=1000 and N=10, but this may be changed depending on required parameters. Number of levels limited by choosed Q value and precision of Oracle number type. Theoretically it's possible to build expression for longer strings by splitting string into parts.

Rest of the code is just hierarchical query for splitting string to sequence of numbers.

Update

Same approach may be used easily just for strings: '20' comes before '8' because information about second digit are missing. If we pad both values to some fixed length it ordered as expected: '008' < '020', so it's possible to deal with strings only:

select * from t order by 
  (
    select
      listagg(
        lpad(
          substr(
            sections,
            decode( level,
              1,1,
              instr(sections, '.', 1, level-1)+1
            ),
            decode(instr(sections, '.', 1, level),
              0, length(sections),
              instr(sections, '.', 1, level)
              -
              decode(level,
                1, 1,
                instr(sections, '.', 1, level-1)+1
              )
            )
          ),
          8,'0'
        ),
        '-'
      ) within group (order by level)
    from dual
    connect by instr(sections,'.',1,level-1) > 0
  )

With string length limitation of 4000 chars and 9 digits on each level with single separation symbol ('-' in example above) it's possible to handle 400 levels of hierarchy.

Main disadvantage of this method is a memory consumption and comparison speed. From other side, lack of a conversion to a number makes it compatible even with mixed chapter numbering( things like '13.3.a.vii' or 'III.A.13.2' (Ooops ... roman numerals handled improperly)

In case of decimal-number-only numbering variant with strings may be compacted by translation of numbers to hexadecimal representation. With 4 hex symbols it's possible to handle 16535 numbers on each level, and with 8 symbols - full 32-bit number which more than enough for most applications.

select * from t order by 
  (
    select
      listagg(
        lpad(
          trim(to_char(
            to_number(substr(
              sections,
              decode( level,
                1,1,
                instr(sections, '.', 1, level-1)+1
              ),
              decode(instr(sections, '.', 1, level),
                0, length(sections),
                instr(sections, '.', 1, level)
                -
                decode(level,
                  1, 1,
                  instr(sections, '.', 1, level-1)+1
                )
              )
            )),
            'XXXXXXXX'
          )),
          4,'0'
        ),
        '-'
      ) within group (order by level)
    from dual
    connect by instr(sections,'.',1,level-1) > 0
  )

P.S. Of course, it's possible to use all expressions above in select list to examine calculated values instead of using it in order by.

Wernfried Domscheit · Accepted Answer · 2014-01-09 13:12:02Z

2

In case the number of level is fix (e.g. max. 4) you can use this one:

ORDER BY 
    TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 1)) NULLS FIRST, 
    TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 2)) NULLS FIRST, 
    TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 3)) NULLS FIRST, 
    TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 4)) NULLS FIRST

edited Jan 9, 2014 at 13:12

answered Jan 9, 2014 at 12:37

Wernfried Domscheit

60.4k10 gold badges92 silver badges132 bronze badges

1 Comment

Wernfried Domscheit Over a year ago

Corrected it with NULLS FIRST

Mikhail · Accepted Answer · 2014-01-09 12:52:15Z

0

Here is the solution I've ended up with for general case (when number of dots is not known) - leave the first dot as it is, and replace all others with zeroes, so you will have just a float number on which you can apply order by:

SELECT SECTIONS FROM T_TABLE
ORDER BY TO_NUMBER(SUBSTR(SECTIONS,0,DECODE(INSTR(SECTIONS,'.'),0,LENGTH(SECTIONS)+1,INSTR(SECTIONS,'.'))) ||
REPLACE(SUBSTR(SECTIONS,DECODE(INSTR(SECTIONS,'.'),0,LENGTH(SECTIONS)+1,INSTR(SECTIONS,'.'))),'.','0'))

It can be rewritten more ellegantly using regular expresions, but I'm not really familiar with them, so just used basic Oracle functions :)

answered Jan 9, 2014 at 12:52

Mikhail

1,5702 gold badges13 silver badges13 bronze badges

Comments

Suyash Khandwe · Accepted Answer · 2014-01-09 13:52:55Z

0

You can try this out -

Best part - Don't need to worry about the depth of levels

SELECT
Section
FROM SectionData
ORDER BY
CAST (CASE WHEN CHARINDEX('.',Section) > 0
THEN SUBSTRING(Section,0,CHARINDEX('.',Section))
ELSE Section END AS INT)
,REPLACE(Section,'0',':')

How it works:-

So sort first based on the integer before the first DOT.

Since your Sections are of string type, the sorting is done based on ASCII codes. Also, the most significant part of your section is the first set of digist before the first DOT.

This is your second sorting criteria.

Now, its the 0 which would create all the problems - so replace the '0' with anything which has ASCII value higher than that of 9.

I've tested it with some basic combinations (including higher depths) - go ahead and test it properly before using it.

answered Jan 9, 2014 at 13:52

Suyash Khandwe

3963 silver badges11 bronze badges

2 Comments

Sangeet Menon Over a year ago

I am not sure on what data you tested it.... But I get ORA-01722: invalid number in Oracle. I have replaced SUBSTRING with SUBSTR and CHARINDEX with INSTR.

Suyash Khandwe Over a year ago

I tested with the same data in the original question + some of the dummy data which i entered. I believe there might be some changes required for ORACLE becasue I tested on SQL SERVER. Anyway, try keeping the below 2 lines in SELECT clause one-by-one to see what is being fetched using sub-string - ------------------------------- CHARINDEX('.',Section)------------------------------ SUBSTRING(Section,0,CHARINDEX('.',Section) -----------------------------CAST (CASE WHEN CHARINDEX('.',Section) > 0 THEN SUBSTRING(Section,0,CHARINDEX('.',Section)) ELSE Section END AS INT)

Art · Accepted Answer · 2014-01-09 15:50:41Z

0

Simplest I think... Copy and run to see the output:

SELECT val FROM  --,to_number(trim(BOTH '.' FROM substr(val, 1, 2))) num_val,
(
 SELECT '1' val FROM dual
 UNION ALL
 SELECT '7.1' FROM dual
 UNION ALL
 SELECT '6.2' FROM dual
 UNION ALL
 SELECT '7.1' FROM dual
 UNION ALL
 SELECT '7.4' FROM dual
 UNION ALL
 SELECT '6.8.3' FROM dual
 UNION ALL
 SELECT '6.8.2' FROM dual
 UNION ALL
 SELECT '10' FROM dual
 UNION ALL
 SELECT '1.1' FROM dual
 UNION ALL
 SELECT '7.6' FROM dual
 UNION ALL
 SELECT '6.1' FROM dual
 UNION ALL
 SELECT '11' FROM dual
 UNION ALL
 SELECT '8.3' FROM dual
 UNION ALL
 SELECT '8.5' FROM dual
 UNION ALL
 SELECT '1.1.2' FROM dual
 UNION ALL
 SELECT '6.4' FROM dual
 UNION ALL
 SELECT '6.6' FROM dual
 UNION ALL
 SELECT '8.4' FROM dual
 UNION ALL
 SELECT '1.1.6' FROM dual
 UNION ALL
 SELECT '6.8.1' FROM dual
 UNION ALL
 SELECT '7.7.1' FROM dual
 UNION ALL
 SELECT '7.5' FROM dual
 UNION ALL
 SELECT '7.3' FROM dual
)
ORDER BY to_number(trim(BOTH '.' FROM substr(val, 1, 2)))

answered Jan 9, 2014 at 15:50

Art

5,8222 gold badges24 silver badges22 bronze badges

4 Comments

ThinkJet Over a year ago

What about '123.17.32.11.8' ? And second level not ordered.

Art Over a year ago

I do not see '123.17.32.11.8' in the input. I went with what I see, not ip address numbers or such. If he puts it in example then answer would be different.

ThinkJet Over a year ago

As stated in a question source data is sections of various books, so it's possible to have more then 99 parts in the book. E.g. "Moby-Dick; or, The Whale" have 135 chapters ... Again, what about second level?

Art Over a year ago

It is just an example. And again, it is what I see, not read. Not sure what is second level, sorry.

Collectives™ on Stack Overflow

Sorting records from Oracle with multiple decimal points (.)

6 Answers 6

6 Comments

Update

Comments

1 Comment

Comments

2 Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

6 Comments

Update

Comments

1 Comment

Comments

2 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related