3

I have a table called work which has columns as:

CREATE TABLE work (user text, user_type text, medium text, 
docs_read int, on_date timestamp with timezone);

I want to create buckets(0-99, 100-199, etc) of number of documents read per day and calculate average, min and max productivity of each combination of user_type and medium across days.

I can calculate sum of docs_read and group by on_date to get number of docs_read per day using:

SELECT on_date::date as day, sum(docs_read) as total_docs_read 
FROM work GROUP BY day;

Now, I have to group total_docs_read per day into buckets of size 100 and calculate average, min and max of productivity of each user_type and medium for each of those buckets.

Productivity = sum of docs_read in a day/ number of users working that day

Basically we have different types of users like Prof, Asst Prof etc reading docs in different languages and we want to know how many docs they read per day per user. So for each work-load bucket, each user_type and medium, I want to get average, max and min of average productivity per day over multiple days that fall within a bucket.

Sample output should be:

docs_read_bucket   user_type   medium    avg_prod  max_prod  min_prod
0-99               A           English     30       50         15
0

2 Answers 2

1

Let's define bucket indices 0,1,2,3... corresponding to buckets '0-99','100-199', '200-299', '300-399'... respectively. Mathematically bucket_index = floor(total_docs_read/100).

Check if the query below works for you.

Summary of solution is - We first create a table for productivity of each user_type and medium on each day. We create another table for total_docs_read on each day. We then join these two tables on day and aggregate the resultant table on bucket_index, user_type and medium.

SELECT 
    bucket_index, user_type, medium, AVG(productivity) as avg_prod, 
    MAX(productivity) as max_prod, MIN(productivity) as min_prod
FROM
    (SELECT 
            floor(t1.total_docs_read/100) as bucket_index, 
            t2.user_type as user_type, t2.medium as medium, 
            t2.productivity as productivity
    FROM
        (SELECT 
            on_date::date as day, sum(docs_read) as total_docs_read 
        FROM work 
        GROUP BY day) as t1,
        (SELECT 
            on_date::date as day, user_type, medium, 
            sum(docs_read)/count(distinct(user)) as productivity
        FROM work
        GROUP BY day, user_type, medium) as t2
    WHERE t1.day=t2.day) as t3
GROUP BY bucket_index, user_type, medium
Sign up to request clarification or add additional context in comments.

1 Comment

why use this sum(docs_read)/count(distinct(user)) instead of AVG(docs_read)? I think your query is more generic but since each user would have only one entry per day. Both should give the same answer?
0

You want two levels of aggregation. If I understand correctly, you want:

SELECT floor(total_docs_read / 100) as grp,
       day, user_type, medium,
       AVG(total_docs_read) as avg_prod,
       MAX(total_docs_read) as max_prod,
       MIN(total_docs_read) as min_prod,
FROM (SELECT user, user_type, medium, on_date::date as day,
             sum(docs_read) as total_docs_read 
      FROM work
      GROUP BY user, user_type, medium, day
     ) w
GROUP BY grp, day, user_type, medium

I'm not 100% sure this matches your definition of "productivity". However, it does seem like a sensible result.

2 Comments

I have updated the definition of productivity. Please have a look.
Your total_docs_read would be split across user, user_type, medium and day. But i want group to be based on the amount of work there is on that day. It would be represent a high work day, medium or low work day and then we can see how do different user_types and mediums behave according to different types of daily loads

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.