0

ERD

Here is my ER diagram. If I know the user_id, how can I get a single user's user data=>projects=>boards=>posts using a single SQL query?

I have found out about recursive CTE, but all of the examples I can find have all of the data stored into a single table. I have my data split into 4 tables. Is there a way to get all user data here?

I don't have any SQL to show I tried, because honestly I don't even know where to begin. I thought of just adding a user_id field to every table, but it doesn't seem like the correct solution.


EDIT: Is this a good way to deal with the redundant data from the joins?

I have 3 ideas:

1-> Get all of the data with duplicates. No duplicate data in database and one query, but sending more data than I need to.

2-> Split the query into four separate queries for each table. No duplicate data, but then I am running four separate queries.

3-> Add 'user_id' to every table, and query each table directly. Then I have redundant data in my database.

4-> ?? A better option? Or more concise query?

Here is my idea to split the queries (2)

-- user data
SELECT nickname, theme FROM users WHERE user_id = 'exampleid';

-- project data
SELECT
    pr.project_id,
    pr.time_created ,
    pr.time_last_modified,
    pr.title 
FROM users u
    INNER JOIN projects pr
        ON u.user_id = pr.fk_projects_users 
WHERE u.user_id = 'exampleid';

-- board data
SELECT
    b.board_id,
    b.fk_boards_projects,
    b.title,
    b.order_position,
    b.color
FROM users u
    INNER JOIN projects pr
        ON u.user_id = pr.fk_projects_users 
    INNER JOIN boards b
        ON pr.project_id = b.fk_boards_projects
WHERE u.user_id = 'exampleid';

-- post data
SELECT
    po.post_id,
    po.fk_posts_boards,
    po.time_created,
    po.title,
    po.priority,
    po.time_due,
    po.body
FROM users u
    INNER JOIN projects pr
        ON u.user_id = pr.fk_projects_users 
    INNER JOIN boards b
        ON pr.project_id = b.fk_boards_projects
    INNER JOIN posts po
        ON po.post_id = b.board_id
WHERE u.user_id = 'exampleid'
1
  • One of the most fundamental concepts in relational data bases is that the joining of two tables produces ALL combinations of all rows in each table. The ON clause that follows and the WHERE clause below it filter out the unwanted rows. By saying users u Inner Join projects p you ask for ALL users and ALL projects, whether the user worked on it or not. By adding On u.user_id=pr.fk_projects_users you throw out all the users that did not work on a project and keep only the list of users and projects they worked on. Commented Nov 27, 2022 at 15:23

1 Answer 1

1

A series of simple joins will be sufficient.

Queries that access multiple tables (or multiple instances of the same table) at one time are called join queries. They combine rows from one table with rows from a second table, with an expression specifying which rows are to be paired

Demo:

create table users   (user_id int);
create table projects(project_id int, fk_projects_users int);
create table boards  (board_id int,   fk_boards_projects int);
create table posts   (post_id int,    fk_posts_boards int);

insert into users    values (1),       (2),        (3);
insert into projects values (11,1),    (12,1),     (13,2);
insert into boards   values (101,11),  (102,11),   (103,13);
insert into posts    values (1001,101),(1002,101), (1003,102),(1004,103);

select  po.post_id 
from    users u 
    inner join projects pr 
        on u.user_id=pr.fk_projects_users 
    inner join boards b 
        on pr.project_id=b.fk_boards_projects 
    inner join posts po 
        on b.board_id=po.fk_posts_boards
where u.user_id=1;

-- post_id
-----------
--    1001
--    1002
--    1003
--(3 rows)

Your next logical step would be to start using aggregate functions that let you collect stats per user.

select  
  u.user_id,
  array_agg(po.post_id) as "array of all their post ids",
  count(po.post_id)     as "how many posts this user has",
  max(po.post_id)       as "latest post of this user (by id)"
from    users u 
    inner join projects pr 
        on u.user_id=pr.fk_projects_users 
    inner join boards b 
        on pr.project_id=b.fk_boards_projects 
    inner join posts po 
        on b.board_id=po.fk_posts_boards
group by u.user_id;

-- user_id | array of all their post ids | how many posts this user has | latest post of this user (by id)
-----------+-----------------------------+------------------------------+----------------------------------
--       2 | {1004}                      |                            1 |                             1004
--       1 | {1001,1002,1003}            |                            3 |                             1003
--(2 rows)

Sign up to request clarification or add additional context in comments.

3 Comments

This is what I needed to know! But now how to deal with duplicate data from the joins? (e.g. repeated user data for each post). I essentially need to dump all the data into the front end as a JSON object. I edited my post with my ideas what to do. Do you think I should query everything, and deal with duplicates on the back end, or query each table separately and get exactly the data I need, but in 4 queries?
Your edit adds another question - it's better to open a separate one and show exactly what query you used, what part of its results you're trying to eliminate, what you've tried so far and why it doesn't work. Without examples I'm not sure what you're getting and what exactly you consider duplicate and redundant. You can copy and modify the sample DDLs from my answer, just adding some columns and sample data to them, to reflect your setup as closely as possible.
I have make a new question, if you would like to check it I'd appreciate it!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.