9

UPDATE:

ORACLE VERSION 10G

I have a list of records in Oracle as follows, these are actually sections of various books
The records are generated in the below format

[main topic].[sub topic].[first level section] ..... .[last level section]

Sections
--------
1
7.1
6.2 
7.1
7.4
6.8.3
6.8.2
10
1.1
7.6
6.1
11
8.3
8.5
1.1.2
6.4
6.6
8.4
1.1.6
6.8.1
7.7.1
7.5
7.3

I want to order this like as follows

 1
 1.1
 1.1.2
 1.1.6
 6.2    
 6.4    
 6.5    
 6.6    
 6.7    
 6.8.1    
 6.8.2    
 6.8.3    
 7.2    
 7.3    
 7.4    
 7.5    
 7.6    
 7.7.1    
 7.7.2    
 8.3    
 8.4    
 8.5
 10

But as the field is not a numeric datatype the sorting results in something like this

1
10
1.1
1.1.2
1.1.6
....
.....
8.5

How can I sort them. I am unable to convert them to number due to multiple number of decimal points.

Is there any function in oracle that supports such a sorting technique

3
  • Is number of '.' symbols in string fixed (i.e. no more than two) or it can not be determined? Commented Jan 9, 2014 at 12:36
  • It can be more than 2 at most 6 very rarely(almost never) more than that...I just kept the example simple.. Commented Jan 9, 2014 at 12:38
  • Ideally, long term, split these into their own columns/tables. Storing data like this is a violation of 1NF. Commented Jan 10, 2014 at 6:39

6 Answers 6

7

When the maximum depth is known, you can split the section in sub-sections:

SQL> SELECT SECTION FROM DATA
  2   ORDER BY to_number(regexp_substr(SECTION, '[^.]+', 1, 1)) NULLS FIRST,
  3            to_number(regexp_substr(SECTION, '[^.]+', 1, 2)) NULLS FIRST,
  4            to_number(regexp_substr(SECTION, '[^.]+', 1, 3)) NULLS FIRST;

SECTION
-------
1
1.1
1.1.2
1.1.6
6.1
6.2
[...]
8.5
10
11

If the maximum depth of sub-sections is unknown (but presumably less than a couple hundred on 8-bit character databases or less than a few thousands in ANSI-character databases), you could define a function that converts your unsortable digits into sortable characters:

SQL> CREATE OR REPLACE FUNCTION order_section (p_section VARCHAR2)
  2     RETURN VARCHAR2 IS
  3     l_result VARCHAR2(4000);
  4  BEGIN
  5     FOR i IN 1..regexp_count(p_section, '[^.]+') LOOP
  6        l_result := l_result
  7                    || CASE WHEN i > 1 THEN '.' END
  8                    || CHR(64+regexp_substr(p_section, '[^.]+', 1, i));
  9     END LOOP;
 10     RETURN l_result;
 11  END;
 12  /

Function created

SQL> SELECT SECTION, order_section(SECTION)
  2    FROM DATA
  3   ORDER BY 2;

SECTION ORDER_SECTION(SECTION)
------- -------------------------
1       A
1.1     A.A
1.1.2   A.A.B
1.1.6   A.A.F
6.1     F.A
6.2     F.B
[...]
8.5     H.E
10      J
11      K
Sign up to request clarification or add additional context in comments.

6 Comments

I cannot resolve regexp_count during compiling, I get PLS-00201: identifier 'REGEXP_COUNT' must be declared. I I am currently working on 10g, I tried it on 11i and it was working. I forgot to mention that I am on 10g
I have finally got it to work in 10g Replaced REGEXP_COUNT (working perfectly in 11i) with something like this (LENGTH(p_section) - length(replace(p_section,'.',null))) Do add this alternative(for 10g users) to the answer. But now, it returns something like this (square box).A.A for 2005.1.1. Can this create any kind of problem?
You're right, regexp_count doesn't seem to work on PL/SQL 10g (works with SQL though). Your alternative is fine for 10g. The second method converts a number to a character so you might run into problems if the number is an invalid character. A square is probably a character that your can't display properly on your client (chr(2005) is displayed as Õ in my UTF8 db).
it failed when the section went above 191. The section above 191 get ordered before. Any work around? Posted the problem in your answer...
@SangeetMenon This fails because CHR(191) seems to be unsortable. If you have to order sections with this large number, you should use a function that orders each chapter correctly instead of CHR, for instance TO_CHAR(p_section, 'fm000000').
|
4

Solution without regexp and functions (suppose t is a table with source data):

select * from t
order by
    (
      select 
        sum(
          to_number(substr(
                   sections,
                   decode(level,
                     1,1,
                     instr(sections, '.', 1, level-1)+1
                   ),
                   decode(instr(sections, '.', 1, level),
                     0, length(sections),
                     instr(sections, '.', 1, level) 
                     - 
                     decode(level,
                       1,1,
                       instr(sections, '.', 1, level-1)+1
                     )
                   )  
          )) 
          * power(1000, 10-level)
        )
      from dual
        connect by instr(sections,'.',1,level-1) > 0
    ) 

SQLFiddle example

Main idea is to calculate number, wich indicates priority of each row. Suppose, we have 33.17.21.2 value. This string may be treated as a number in hypotetical numeral system with base Q like a hexadecimal numbers represents IPv4 address, and then converted to a numeric representation:
33*(Q^3) + 17*(Q^2) + 21*(Q^1) + 2*(Q^0)

For example, if Q=100 then number from exmple is

33*1000000 + 17*10000 + 21*100 + 2*1 = 33172102

First trouble with this approach is that each level numbers required to be less than choosed Q value. It's by design and can't be eleminated.

Next is that we don't know how many levels at all, we have 7.1 and 2.2.2.2.2.2, and shorter one most come first. Therefore while calculating value it starts from some fixed power N and then degrades power of Q, so in case of Q=100 and N=3 sequence of multipilers starts with this numbers: 1000000, 10000, 100, 1, 1/100, 1/10000, 1/1000000, ...

In code above Q=1000 and N=10, but this may be changed depending on required parameters. Number of levels limited by choosed Q value and precision of Oracle number type. Theoretically it's possible to build expression for longer strings by splitting string into parts.

Rest of the code is just hierarchical query for splitting string to sequence of numbers.

Update

Same approach may be used easily just for strings: '20' comes before '8' because information about second digit are missing. If we pad both values to some fixed length it ordered as expected: '008' < '020', so it's possible to deal with strings only:

select * from t order by 
  (
    select
      listagg(
        lpad(
          substr(
            sections,
            decode( level,
              1,1,
              instr(sections, '.', 1, level-1)+1
            ),
            decode(instr(sections, '.', 1, level),
              0, length(sections),
              instr(sections, '.', 1, level)
              -
              decode(level,
                1, 1,
                instr(sections, '.', 1, level-1)+1
              )
            )
          ),
          8,'0'
        ),
        '-'
      ) within group (order by level)
    from dual
    connect by instr(sections,'.',1,level-1) > 0
  )

With string length limitation of 4000 chars and 9 digits on each level with single separation symbol ('-' in example above) it's possible to handle 400 levels of hierarchy.

Main disadvantage of this method is a memory consumption and comparison speed. From other side, lack of a conversion to a number makes it compatible even with mixed chapter numbering( things like '13.3.a.vii' or 'III.A.13.2' (Ooops ... roman numerals handled improperly)

In case of decimal-number-only numbering variant with strings may be compacted by translation of numbers to hexadecimal representation. With 4 hex symbols it's possible to handle 16535 numbers on each level, and with 8 symbols - full 32-bit number which more than enough for most applications.

select * from t order by 
  (
    select
      listagg(
        lpad(
          trim(to_char(
            to_number(substr(
              sections,
              decode( level,
                1,1,
                instr(sections, '.', 1, level-1)+1
              ),
              decode(instr(sections, '.', 1, level),
                0, length(sections),
                instr(sections, '.', 1, level)
                -
                decode(level,
                  1, 1,
                  instr(sections, '.', 1, level-1)+1
                )
              )
            )),
            'XXXXXXXX'
          )),
          4,'0'
        ),
        '-'
      ) within group (order by level)
    from dual
    connect by instr(sections,'.',1,level-1) > 0
  ) 

P.S. Of course, it's possible to use all expressions above in select list to examine calculated values instead of using it in order by.

Comments

2

In case the number of level is fix (e.g. max. 4) you can use this one:

ORDER BY 
    TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 1)) NULLS FIRST, 
    TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 2)) NULLS FIRST, 
    TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 3)) NULLS FIRST, 
    TO_NUMBER(REGEXP_SUBSTR(Sections, '\d+', 1, 4)) NULLS FIRST

1 Comment

Corrected it with NULLS FIRST
0

Here is the solution I've ended up with for general case (when number of dots is not known) - leave the first dot as it is, and replace all others with zeroes, so you will have just a float number on which you can apply order by:

SELECT SECTIONS FROM T_TABLE
ORDER BY TO_NUMBER(SUBSTR(SECTIONS,0,DECODE(INSTR(SECTIONS,'.'),0,LENGTH(SECTIONS)+1,INSTR(SECTIONS,'.'))) ||
REPLACE(SUBSTR(SECTIONS,DECODE(INSTR(SECTIONS,'.'),0,LENGTH(SECTIONS)+1,INSTR(SECTIONS,'.'))),'.','0'))

It can be rewritten more ellegantly using regular expresions, but I'm not really familiar with them, so just used basic Oracle functions :)

Comments

0

You can try this out -

Best part - Don't need to worry about the depth of levels

SELECT
Section
FROM SectionData
ORDER BY
CAST (CASE WHEN CHARINDEX('.',Section) > 0
THEN SUBSTRING(Section,0,CHARINDEX('.',Section))
ELSE Section END AS INT)
,REPLACE(Section,'0',':')

How it works:-

So sort first based on the integer before the first DOT.

Since your Sections are of string type, the sorting is done based on ASCII codes. Also, the most significant part of your section is the first set of digist before the first DOT.

This is your second sorting criteria.

Now, its the 0 which would create all the problems - so replace the '0' with anything which has ASCII value higher than that of 9.

I've tested it with some basic combinations (including higher depths) - go ahead and test it properly before using it.

2 Comments

I am not sure on what data you tested it.... But I get ORA-01722: invalid number in Oracle. I have replaced SUBSTRING with SUBSTR and CHARINDEX with INSTR.
I tested with the same data in the original question + some of the dummy data which i entered. I believe there might be some changes required for ORACLE becasue I tested on SQL SERVER. Anyway, try keeping the below 2 lines in SELECT clause one-by-one to see what is being fetched using sub-string - ------------------------------- CHARINDEX('.',Section)------------------------------ SUBSTRING(Section,0,CHARINDEX('.',Section) -----------------------------CAST (CASE WHEN CHARINDEX('.',Section) > 0 THEN SUBSTRING(Section,0,CHARINDEX('.',Section)) ELSE Section END AS INT)
0

Simplest I think... Copy and run to see the output:

SELECT val FROM  --,to_number(trim(BOTH '.' FROM substr(val, 1, 2))) num_val,
(
 SELECT '1' val FROM dual
 UNION ALL
 SELECT '7.1' FROM dual
 UNION ALL
 SELECT '6.2' FROM dual
 UNION ALL
 SELECT '7.1' FROM dual
 UNION ALL
 SELECT '7.4' FROM dual
 UNION ALL
 SELECT '6.8.3' FROM dual
 UNION ALL
 SELECT '6.8.2' FROM dual
 UNION ALL
 SELECT '10' FROM dual
 UNION ALL
 SELECT '1.1' FROM dual
 UNION ALL
 SELECT '7.6' FROM dual
 UNION ALL
 SELECT '6.1' FROM dual
 UNION ALL
 SELECT '11' FROM dual
 UNION ALL
 SELECT '8.3' FROM dual
 UNION ALL
 SELECT '8.5' FROM dual
 UNION ALL
 SELECT '1.1.2' FROM dual
 UNION ALL
 SELECT '6.4' FROM dual
 UNION ALL
 SELECT '6.6' FROM dual
 UNION ALL
 SELECT '8.4' FROM dual
 UNION ALL
 SELECT '1.1.6' FROM dual
 UNION ALL
 SELECT '6.8.1' FROM dual
 UNION ALL
 SELECT '7.7.1' FROM dual
 UNION ALL
 SELECT '7.5' FROM dual
 UNION ALL
 SELECT '7.3' FROM dual
)
ORDER BY to_number(trim(BOTH '.' FROM substr(val, 1, 2)))

4 Comments

What about '123.17.32.11.8' ? And second level not ordered.
I do not see '123.17.32.11.8' in the input. I went with what I see, not ip address numbers or such. If he puts it in example then answer would be different.
As stated in a question source data is sections of various books, so it's possible to have more then 99 parts in the book. E.g. "Moby-Dick; or, The Whale" have 135 chapters ... Again, what about second level?
It is just an example. And again, it is what I see, not read. Not sure what is second level, sorry.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.