3

I have a list of students and parents and would like to group them into families using the student id's. Parents who share common student id's can be considered to be a family while also students who share common parent id's can be considered to be a family. This is a sample table:

 p_id | parent_name | s_id | student_name |
------------------------------------------|
 1    | John Doe    | 100 |  Mike Doe     |
 3    | Jane Doe    | 100 |  Mike Doe     |
 3    | Jane Doe    | 105 |  Lisa Doe     |
 5    | Will Willy  | 108 |  William Son  |

I'd like to end up with something like:

     parents       |       students         |
-------------------|------------------------|
John Doe, Jane Doe |  Mike Doe, Lisa Doe    |
Will Willy         |  William Son           |

To achieve this I'm currently using:

SELECT array_agg(parents) AS parents FROM (
    SELECT array_agg(p_id) AS par_ids, array_agg(parent_name) AS parents, student_name, s_id 
    FROM (
        /* sub query */
    )b
    GROUP BY s_id, student_name
    ORDER BY parents ASC
)c
GROUP BY unnest(par_ids)
ORDER BY parents ASC

But I get an error: ERROR: cannot accumulate arrays of different dimensionality. SQL state: 2202E

How can I attain the desired results? The inner query from the above statement returns:

| par_ids |       parents       |  student_name   | s_id |
--------------------------------|------------------------|
| {1,3}   | {John Doe, Jane Doe}|  Mike Doe       | 100  |
| {3}     | {Jane Doe}          |  Lisa Doe       | 105  |
| {5}     | {Will Willy}        |  William Son    | 108  |

Grouping these students now to the parents is where I'm stuck.

1 Answer 1

5

I did something similar (but a bit more complex) already here: https://stackoverflow.com/a/53129510/3984221


step-by-step demo:db<>fiddle

SELECT
    array_agg(parent_name) as parents,                -- 4
    array_agg(student_name) as students
FROM (
    SELECT DISTINCT ON (t.s_id)                       -- 3
        *
    FROM (
        SELECT
            s_id,
            array_agg(p_id) as parents                -- 1
        FROM mytable
        GROUP BY s_id
    ) s JOIN mytable t ON t.p_id = ANY(s.parents)     -- 2
    ORDER BY t.s_id, CARDINALITY(parents) DESC        -- 3
) s
GROUP BY parents
  1. Aggregate the p_id values into an array:

    s_id parents
    108 {5}
    105 {3}
    100 {1,3}
  2. Self-join the original table on this array:

    s_id parents p_id parent_name s_id student_name
    100 {1,3} 1 John Doe 100 Mike Doe
    105 {3} 3 Jane Doe 100 Mike Doe
    100 {1,3} 3 Jane Doe 100 Mike Doe
    105 {3} 3 Jane Doe 105 Lisa Doe
    100 {1,3} 3 Jane Doe 105 Lisa Doe
    108 {5} 5 Will Willy 108 William Son
  3. Remove all duplicate student records. The remaining ones should be the records with the most complete p_id array. This can be done using DISTINCT ON(s_id) on a descending order by the array length:

    s_id parents p_id parent_name s_id student_name
    100 {1,3} 1 John Doe 100 Mike Doe
    100 {1,3} 3 Jane Doe 105 Lisa Doe
    108 {5} 5 Will Willy 108 William Son
  4. Finally you can group by the p_id array and aggregate the two name columns:

    parents students
    {"John Doe","Jane Doe"} {"Mike Doe","Lisa Doe"}
    {"Will Willy"} {"William Son"}

If you don't want to get an array, but a string list, you can use string_agg(name_colum, ',') instead of array_agg(name_column)

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. This really helped me out. Couldn't figure a way around it.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.