SQL: count number of distinct values in every column

Question

I need a query that will return a table where each column is the count of distinct values in the columns of another table.

I know how to count the distinct values in one column:

select count(distinct columnA) from table1;

I suppose that I could just make this a really long select clause:

select count(distinct columnA), count(distinct columnB), ... from table1;

but that isn't very elegant and it's hardcoded. I'd prefer something more flexible.

This is the elegant and simplest solution... do you mean "given a table name, give me distinct counts for each column in that table"? — gbn
– gbn, Commented Sep 1, 2009 at 17:09
Possible duplicate: stackoverflow.com/questions/1330692/distinct-pair-of-values-sql — Will Bickford
– Will Bickford, Commented Sep 1, 2009 at 17:12

Community · Accepted Answer · 2014-06-26 15:08:43Z

5

This code should give you all the columns in 'table1' with the respective distinct value count for each one as data.

DECLARE @TableName VarChar (Max) = 'table1'
DECLARE @SqlString VarChar (Max)

set @SqlString = (
  SELECT DISTINCT
    'SELECT ' + 
        RIGHT (ColumnList, LEN (ColumnList)-1) + 
      ' FROM ' + Table_Name
    FROM INFORMATION_SCHEMA.COLUMNS COL1
      CROSS AppLy (
        SELECT ', COUNT (DISTINCT [' + COLUMN_NAME + ']) AS ' + '''' + COLUMN_NAME + ''''
          FROM INFORMATION_SCHEMA.COLUMNS COL2
          WHERE COL1.TABLE_NAME = COL2.TABLE_NAME
          FOR XML PATH ('')
      ) TableColumns (ColumnList)
    WHERE
      1=1 AND 
      COL1.TABLE_NAME = @TableName
)

EXECUTE (@SqlString)

edited Jun 26, 2014 at 15:08

CommunityBot

11 silver badge

answered Sep 1, 2009 at 17:09

Raj More

48.2k34 gold badges139 silver badges199 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Galen Over a year ago

Why WHERE 1=1?

Raj More Over a year ago

@galen It keeps all my AND clauses at the same level of indentation. Purely aesthetic.

KM. · Accepted Answer · 2009-09-01 18:16:46Z

4

try this (sql server 2005 syntax):

DECLARE @YourTable table (col1  varchar(5)
                         ,col2  int
                         ,col3  datetime
                         ,col4  char(3)
                         )

insert into @YourTable values ('abcdf',123,'1/1/2009','aaa')
insert into @YourTable values ('aaaaa',456,'1/2/2009','bbb')
insert into @YourTable values ('bbbbb',789,'1/3/2009','aaa')
insert into @YourTable values ('ccccc',789,'1/4/2009','bbb')
insert into @YourTable values ('aaaaa',789,'1/5/2009','aaa')
insert into @YourTable values ('abcdf',789,'1/6/2009','aaa')


;with RankedYourTable AS
(
SELECT
    ROW_NUMBER() OVER(PARTITION by col1 order by col1) AS col1Rank
        ,ROW_NUMBER() OVER(PARTITION by col2 order by col2) AS col2Rank
        ,ROW_NUMBER() OVER(PARTITION by col3 order by col3) AS col3Rank
        ,ROW_NUMBER() OVER(PARTITION by col4 order by col4) AS col4Rank
    FROM @YourTable
)
SELECT
    SUM(CASE WHEN      col1Rank=1 THEN 1 ELSE 0 END) AS col1DistinctCount
        ,SUM(CASE WHEN col2Rank=1 THEN 1 ELSE 0 END) AS col2DistinctCount
        ,SUM(CASE WHEN col3Rank=1 THEN 1 ELSE 0 END) AS col3DistinctCount
        ,SUM(CASE WHEN col4Rank=1 THEN 1 ELSE 0 END) AS col4DistinctCount
    FROM RankedYourTable

OUTPUT:

col1DistinctCount col2DistinctCount col3DistinctCount col4DistinctCount
----------------- ----------------- ----------------- -----------------
4                 3                 6                 2

(1 row(s) affected)

answered Sep 1, 2009 at 18:16

KM.

104k34 gold badges183 silver badges214 bronze badges

1 Comment

Ludovic Aubert Over a year ago

Simple query to help generate yours painlessly:SELECT 'ROW_NUMBER() OVER(PARTITION BY ' + COLUMN_NAME + ' ORDER BY ' + COLUMN_NAME + ') AS ' + COLUMN_NAME + 'Rank,', 'SUM(CASE WHEN ' + COLUMN_NAME + 'Rank=1 THEN 1 ELSE 0 END) AS ' + COLUMN_NAME + 'DistinctCount,' FROM information_schema.COLUMNS WHERE TABLE_NAME='YourTable';

Amy B · Accepted Answer · 2009-09-01 17:14:35Z

0

and it's hardcoded.

It is not hardcoding to provide a field list for a sql statement. It's common and acceptable practice.

answered Sep 1, 2009 at 17:14

Amy B

110k21 gold badges140 silver badges191 bronze badges

4 Comments

ijw Over a year ago

...As is programmatically creating the SQL, as far as it goes (providing your users never supply the values you put in it - fine in this question, you have the column list somewhere).

Amy B Over a year ago

if I was going to code-gen some SQL, and if I was on MSSqlServer, I'd check out sysobjects and syscolumns.

Kibbee Over a year ago

If you want to write the code in a more portable way, you should select from Information_Schema.Tables and Information_Schema.Columns, rather than selecting from sysobjects and syscolumns

Ryan Over a year ago

What I would like to do is find the number of distinct values in each column without knowledge of the column names. I can see how using information_schema.tables would be useful here (I'm using MySQL.) So you would generate the query based on looking at the info_schema.tables table rather than write a query that finds the column names, groups by them, and then counts the values? That should work well enough.

MartW · Accepted Answer · 2009-09-01 18:01:02Z

0

This won't necessarily be possible for every field in a table. For example, you can't do a DISTINCT against a SQL Server ntext or image field unless you cast them to other data types and lose some precision.

answered Sep 1, 2009 at 18:01

MartW

12.6k3 gold badges46 silver badges68 bronze badges

1 Comment

Ryan Over a year ago

Good point. I shouldn't have to worry about this. The fields will only be text or numbers.

Ryan · Accepted Answer · 2009-09-03 15:20:23Z

0

I appreciate all of the responses. I think the solution that will work best for me in this situation (counting the number of distinct values in each column of a table from an external program that has no knowledge of the table except its name) is as follows:

Run "describe table1" and pull out the column names from the result.

Loop through the column names and create the query to count the distinct values in each column. The query will look something like "select count(distinct columnA), count(distinct columnB), ... from table1".

answered Sep 3, 2009 at 15:20

Ryan

4,7778 gold badges32 silver badges35 bronze badges

Comments

SeanLi · Accepted Answer · 2022-06-27 04:55:46Z

Raj More's answer works well if you don't need to consider null as a value as count(distinct...) does not count null. Here is a modification to count values including null by converting values to a string and replacing null with "NULL AS SOME IMPOSSIBLE STRING":

DECLARE @TableName VarChar (1024) = 'tableName'
DECLARE @SqlString VarChar (Max)

set @SqlString = (
  SELECT DISTINCT
    'SELECT ' + 
        RIGHT (ColumnList, LEN (ColumnList)-1) + 
      ' FROM ' + Table_Name
    FROM INFORMATION_SCHEMA.COLUMNS COL1
      CROSS AppLy (
        SELECT ', COUNT (DISTINCT coalesce(cast([' + COLUMN_NAME + '] as varchar),
          ''NULL AS SOME IMPOSSIBLE STRING'')) AS ' + '''' + COLUMN_NAME + ''''
          FROM INFORMATION_SCHEMA.COLUMNS COL2
          WHERE COL1.TABLE_NAME = COL2.TABLE_NAME
          FOR XML PATH ('')
      ) TableColumns (ColumnList)
    WHERE
      COL1.TABLE_NAME = @TableName
)

EXECUTE (@SqlString)

Cristian Cotovan · Accepted Answer · 2009-09-01 17:47:11Z

-2

DISTINCT is evil. Do COUNT/GROUP BY

answered Sep 1, 2009 at 17:47

Cristian Cotovan

1,0901 gold badge13 silver badges23 bronze badges

4 Comments

Kibbee Over a year ago

Please qualify this with more information. How is using distinct evil when compred to doing COUNT/GROUP BY?

Cristian Cotovan Over a year ago

DISTINCT behaves erratically with larger datasets and from platform to platform. At least in my experience. I find grouping results to be more predictable, particularly if you deal with differently encoded data, UTF, etc.

Ryan Over a year ago

I'll have to look into using group by.

Shannon Severance Over a year ago

count/group by would only get distict count for a single column. For columns A and B you would end up with two selects, because select A, B, count(*) from ... group by A, B would give you counts of the distinct pair (A, B), not distinct A and distinct B. The OP is on the right track with count(distinct A), count(distinct B)

Collectives™ on Stack Overflow

SQL: count number of distinct values in every column

7 Answers 7

2 Comments

1 Comment

4 Comments

1 Comment

Comments

Comments

4 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

2 Comments

1 Comment

4 Comments

1 Comment

Comments

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related