0

This might be related to this question, but in my current SAS Viya Stable 2024.12 environment I encounter an empty Error Message and the SAS Studio session context breaks, while trying to do a SELECT DISTINCT in a PROC SQL step.


My table was loaded from postgres DB to WORK with SELECT *. It has 1.18 Mio. rows, Col1 & Col3 are of type varchar(64), Col2 is of type varchar(256) the other 27 columns are not of interest. (The exported .sas7bdat file from work is like 283 MB)


If I run

PROC SQL _method;
   CREATE TABLE WORK.OUTPUT AS SELECT
      t1.col1, t1.col2, t1.col3
   FROM WORK.INPUT AS t1;
QUIT;

it works like a charm and needs like 3 seconds, with following log:

NOTE: SQL execution methods chosen are:
      sqxcrta
          sqxsrc( WORK.INPUT(alias = T1) )
NOTE: Compressing data set WORK.OUTPUT decreased size by 95.76 percent. 
      Compressed is 1,192 pages; un-compressed would require 28,144 pages.
NOTE: Table WORK.OUTPUT created, with 1,182,028 rows and 3 columns.
NOTE: PROZEDUR SQL used (Total process time):
      real time           1.55 Seconds
      user cpu time       1.47 Seconds
      system cpu time     0.12 Seconds
      memory               6,032.03k
      OS Memory           30,448.00k
      Timestamp           ...
      Step Count                        12  Switch Count  0
      Page Faults                             0
      Page Reclaims                         244
      Page Swaps                              0
      Voluntary Context Switches             29
      Involuntary Context Switches            8
      Block Input Operations                  0
      Block Output Operations           153,360

But if I use "SELECT DISTINCT" it raises an empty error and the log stops at:

NOTE: SQL execution methods chosen are:
      sqxcrta
          sqxunqs
              sqxsrc( WORK.INPUT(alias = T1) )

This also happen if I only SELECT DISTINCT on t1.Col2 (or a combination of four other varchar(64) columns...).


Apparently, the loaded varchar columns have quadrupled in size (256 → 1024), and I encountered the following behavior:

  • SELECT DISTINCT t1.col2 - Fails
  • SELECT DISTINCT PUT(t1.col2, %256.) - Succeeds
  • SELECT DISTINCT PUT(t1.col2, %1024.) - Fails

I tried to monitor and raise the memory usage, as follows, as well as executing on an older preliminary env with Viya Stable 2024.08, and there it works:

STATUS Distinct MEMSIZE MAXMEMQUERY
PreEnv Viya Succeeds 2,147,483,648 268,435,456
Current Viya Dev Fails 4,294,967,296 268,435,456
Current Viya Test Fails 21,474,836,480 268,435,456
NOTE: SQL execution methods chosen are:
      sqxcrta
          sqxunqs
              sqxsrc( WORK.INPUT(alias = T1) )
NOTE: Table WORK.OUTPUT created, with 158457 rows and 3 columns.
NOTE: PROZEDUR SQL used (Total process time):
      real time           38.33 Seconds
      user cpu time       3.20 Seconds
      system cpu time     7.98 Seconds
      memory              1,052,242.14k
      OS Memory           1,082,636.00k
      Timestamp           ...
      Step Count                        16  Switch Count  0
      Page Faults                                0
      Page Reclaims                        309,992
      Page Swaps                                 0
      Voluntary Context Switches            55,732
      Involuntary Context Switches           4,160
      Block Input Operations            10,254,992
      Block Output Operations            5,322,656

So it seems to be not an memory issue, maybe it is the compressing of data? I'm no admin, so I have limit insight to the environment configurations.


I know I just could use data steps or similar, but this is just an minimal example; this error happens with different tables/flows/steps. In addition, we normally use SAS Studio flows to 'program', which generates its code automatically. However, this code also fails copied in a separate SAS program, which is represented in the example.

We also have in mind that the data size will only but expand, so this issue needs to be solved.


Does someone have an idea which setting/config/property could be the cause of this issue and what can be done to prevent/eliminate it?

2
  • What does the SAS log show after you run this code? Commented Mar 24 at 18:09
  • Looks like POSTGRESL is having trouble sorting such a large number of long strings. Try copying the table to SAS first and then running the DISTINCT selection there. Will probably get better performance using PROC SORT instead of PROC SQL . Commented May 4 at 14:54

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.