Postgres need to get count of rows by uniqueness

Question

I have a simple table that has lat, long, and time. Basically, I want the result of my query to give me something like this:

lat,long,hourwindow,count

I can't seem to figure out how to do this. I've tried so many things I can't keep them straight. And unfortunately Here's what I've got so far:

WITH all_lat_long_by_time AS (
    SELECT
      trunc(cast(lat AS NUMERIC), 4) AS lat,
      trunc(cast(long AS NUMERIC), 4) AS long,
      date_trunc('hour', time :: TIMESTAMP WITHOUT TIME ZONE) AS hourWindow

    FROM my_table
),
    unique_lat_long_by_time AS (
      SELECT DISTINCT * FROM all_lat_long_by_time
  ),
  all_with_counts AS (
   -- what do I do here?
  )
SELECT * FROM all_with_counts;

Please explain how "count of rows by uniqueness" is defined exactly. Do you mean a count of unique rows (after truncating numbers)? So the number of distinct (lat, long) per hour? Postgres version and table definition are always helpful, too. time :: TIMESTAMP WITHOUT TIME ZONE looks suspicious. — Erwin Brandstetter
– Erwin Brandstetter, Commented Mar 20, 2019 at 22:14

Gordon Linoff · Accepted Answer · 2019-03-20 18:44:32Z

1

I think this is pretty basic aggregation query:

SELECT date_trunc('hour', time :: TIMESTAMP WITHOUT TIME ZONE) AS hourWindow
       trunc(cast(lat AS NUMERIC), 4) AS lat,
       trunc(cast(long AS NUMERIC), 4) AS long,
       COUNT(*)
FROM my_table
GROUP BY hourWindow, trunc(cast(lat AS NUMERIC), 4), trunc(cast(long AS NUMERIC), 4)
ORDER BY hourWindow

answered Mar 20, 2019 at 18:44

Gordon Linoff

1.3m62 gold badges705 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Busch Over a year ago

Ha, when you stare at a problem space for so long that you forget how to SQL. Thanks.

Erwin Brandstetter · Accepted Answer · 2019-03-20 22:36:54Z

If "count of rows by uniqueness" is meant to count distinct coordinates per hour (after truncating the numbers), count(DISTINCT (lat,long)) does the job:

SELECT date_trunc('hour', time::timestamp) AS hour_window
     , count(DISTINCT (trunc( lat::numeric, 4)
                     , trunc(long::numeric, 4))) AS count_distinct_coordinates
FROM   tbl
GROUP  BY 1
ORDER  BY 1;

Details in the manual here.
(lat,long) is a ROW value and short for ROW(lat,long). More here.

But count(DISTINCT ...) is typically slow, a subquery should be faster for your case:

SELECT hour_window, count(*) AS count_distinct_coordinates
FROM  (
   SELECT date_trunc('hour', time::timestamp) AS hour_window
        , trunc( lat::numeric, 4) AS lat
        , trunc(long::numeric, 4) AS long
   FROM   tbl
   GROUP  BY 1, 2, 3
   ) sub
GROUP  BY 1
ORDER  BY 1;

Or:

SELECT hour_window, count(*) AS count_distinct_coordinates
FROM  (
   SELECT DISTINCT
          date_trunc('hour', time::timestamp) AS hour_window
        , trunc( lat::numeric, 4) AS lat
        , trunc(long::numeric, 4) AS long
   FROM   tbl
   ) sub
GROUP  BY 1
ORDER  BY 1;

After the subquery folds duplicates, the outer SELECT can use a plain count(*).

Collectives™ on Stack Overflow

Postgres need to get count of rows by uniqueness

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related