1

I have the following database schema. It looks at shoppers and how many orders they have made from three websites in a network.

ID Name Country website1_Orders website2_Orders website3_Orders
123 JOHNC USA null 1 null
456 KAYLAB USA 5 null null
789 LAURAT USA 2 6 3
999 RONR CA 1 null 16
017 MATTE CA 7 null 4
767 JROB MX null 1 null
224 TINAS MX null null null
670 TOMR MX null 8 null

What I want my SQL output to look like is as follows:

Country Websites_Avail
USA 3
CA 2
MX 1

The logic is that, if no customer in their country has made an order from website1, website2, or website3, then this website does not service that particular country at this time.

So basically, across multiple columns, I need to figure out how to properly aggregate and show the correct number of results, broken out by country. This is a very simple sample of the database - which is much larger.

with count as 
(
    select
        country,
        case  
            when website1_orders is not null 
                then 'Web 1' 
        end as Web_One,
        case 
            when website2_orders is not null 
                then 'Web 2' 
        end as Web_Two,
        case 
            when website3_orders is not null 
                then 'Web 3' 
        end as Web_Three 
    from 
        my_database
)
select 
    country, 
    COUNT(DISTINCT Web_One) + COUNT(DISTINCT Web_Two) + COUNT(DISTINCT Web_Three) as total_count
from 
    count
group by 
    1

This is a very simple version (there are 20 sites in total) and it works for me in theory if I were to just look at these 8ish rows. But it is not scaling and I'm not sure why. I also do not think this is the best way to aggregate across the columns either. But It's all I can think of at this moment.

I would prefer not to do anything like normalizing in a new temp table, but if that's the way to go I'm open to trying to figure out how. But I was hoping within a CTE I could get the correct counts.

Essentially, if a customer in any country makes an order from any site, then 1 is added to the unique total_count at the end. No state can be more than 20 (which would mean at least one customer from that country has made an order from all of the 20 sites at some point). But I'm getting values into the thousands.

Is there an optimal way of looking at this? It's just Postgres SQL in Snowflake.

3 Answers 3

1

I would avoid SUM or COUNT here because you don't really want to count or sum anything. You simply want to check if the columns have at least one value which is NOT NULL in any row per country, the exact number of such rows or even the values in those rows don't matter at all.

That's a good use case for BOOL_OR whose purpose is to apply exactly such kind of logic.

You just need to convert its result from boolean to int to sum it up:

SELECT
  country,
  BOOL_OR(website1_Orders IS NOT NULL)::int +
  BOOL_OR(website2_Orders IS NOT NULL)::int +
  BOOL_OR(website3_Orders IS NOT NULL)::int AS Websites_Avail
FROM my_database
GROUP BY country;

See this db<>fiddle with your sample data.

Sign up to request clarification or add additional context in comments.

2 Comments

This answer seemed to work the best for me at this point. Thank you for your help. If we were to build on this and ask to calculate the average number as well as the total, would this be something we could build upon in the existing CTE or would it require its own? I feel like the logic would be similar.
Great to hear my solution helps you. Please read What to do if someone answers my question?. I don't understand the question in your comment, this seems to be another question? It would be best to create a new question and show sample input data and expected result for this new use case. You can write a comment with the url to your new question. Thus, it gets the attention it deserves and you will get far better answers than when writing a question in a comment which most people will never see.
1

For the example shown, even if you increase the number of sites to 20, you can do it without unpivot or dynamic SQL.
The main thing is to immediately group by country. This will significantly reduce the size of the calculated part.

select country
  ,case when sum(website1_orders) is null then 0 else 1 end
  +case when sum(website2_orders) is null then 0 else 1 end
  +case when sum(website3_orders) is null then 0 else 1 end
    as total_count
from my_database
group by country
country total_count
USA 3
CA 2
MX 1

Or use other aggregate function

select country
  ,case when min(website1_orders) is null then 0 else 1 end
  +case when min(website2_orders) is null then 0 else 1 end
  +case when min(website3_orders) is null then 0 else 1 end
    as total_count
from my_database
group by country
select country
  ,sum(website1_orders) w1s
  ,sum(website2_orders) w2s
  ,sum(website3_orders) w3s
from my_database
group by country
country w1s w2s w3s
USA 7 7 3
CA 8 null 20
MX null 9 null

fiddle

Comments

0

Essentialy, you are asking for number of distinct websites that someone ordered from per country of the person that made the order - try it like here:

SELECT country, Count(Distinct websites) as websites
FROM  ( Select country, 
               unnest(array['website1_orders','website2_orders','website3_orders'])  AS websites, 
               unnest(array[website1_orders,website2_orders,website3_orders]) AS orders
       From my_database
     )
WHERE orders Is Not Null
GROUP BY country
ORDER BY websites Desc
country websites
USA 3
CA 2
MX 1

fiddle

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.