1

I have a table of posts, a table of hashtags, and a table linking posts to hashtags, like so:

CREATE TABLE posts(
id SERIAL PRIMARY KEY,
post_data VARCHAR(128) NOT NULL
);

CREATE TABLE hashtags (
id SERIAL PRIMARY KEY,
value VARCHAR(128) NOT NULL
);

CREATE TABLE post_hashtags(
id SERIAL PRIMARY KEY,
post_id INTEGER NOT NULL REFERENCES posts(id),
hashtag_id INTEGER NOT NULL REFERENCES hashtag(id)
);

INSERT INTO posts(post_data) VALUES ('post1');
INSERT INTO hashtags (value) VALUES ('hashtag1'), ('hashtag2'), ('hashtag3');

-- At this point I might want to add links between post1 and existing hashtags as well as possibly new ones

Whenever a user makes a post using several hash tags, I want to:

  1. Create a new row in posts and get the ID
  2. Create new rows in hashtags for hashtags that don't exist and get their IDs
  3. For each hashtag, create a row in post_hashtags with the post and hashtag ids

Right now, I'm capable of handling it on the server side but that's terrible performance, obviously: insert post and get ID; for each hashtag, insert into hashtags if it doesn't exist, get the ID; then insert into the post_hashtags. Lots of calls to DB that I'm guessing could be simplified, but my SQL skills are lacking at the moment.

3
  • At least provide proper table definitions, your Postgres version, and sample input. Also, whatever have so far, even if it's not working. Commented Jun 23, 2024 at 5:30
  • Ok I added SQL code that can be run to setup the state. I'd be interested in any query, regardless of Postgres version. As I mentioned in the post, I only have server-side code, no SQL yet Commented Jun 23, 2024 at 12:44
  • 1
    The id column in post_hashtags isn't necessary; however, there should be a unique constraint on (post_id, hashtag_id) to preclude redundant rows. Commented Jun 23, 2024 at 18:30

1 Answer 1

1

You can do this safely and efficiently with a single query using data-modifying CTEs, and an UPSERT for the tags.

Of course, you need a UNIQUE or PRIMARY KEY constraint on hashtags (value). (And you should have one on post_hashtags (post_id, hashtag_id).)

WITH input(post_data, tags) AS (  -- provide single data row with array of tags
   VALUES ('post2', '{hashtag1, hashtag2, hashtag4}'::text[])  -- single post!
   )
, tag_set AS (  -- unnest tags - may be empty/missing (?)
   SELECT unnest(i.tags) AS value
   FROM   input i
   )
, ins_p AS (
   INSERT INTO posts (post_data)
   SELECT i.post_data
   FROM   input i
   RETURNING id AS post_id
   )
, ins_h AS (
   INSERT INTO hashtags (value)
   SELECT t.value
   FROM   tag_set t
   WHERE  NOT EXISTS (SELECT FROM hashtags h WHERE h.value = t.value)  -- optional to avoid burning lots of serial IDs
   ON CONFLICT (value) DO NOTHING
   RETURNING id AS hashtag_id
   )
INSERT INTO post_hashtags
      (  post_id,   hashtag_id)
SELECT p.post_id, t.hashtag_id
FROM   ins_p p
CROSS  JOIN (  -- only if actual tags were entered
   TABLE ins_h  -- new tags
   UNION  ALL
   SELECT h.id AS hastag_id  -- pre-existing tags
   FROM   tag_set  t
   JOIN   hashtags h USING (value)
   ) t
RETURNING *;

fiddle

This is safe, even under heavy concurrent write load.
Two possible corner cases remain:

  1. A concurrent transaction might create the same new hashtag, but then roll back, resulting in a missing entry in table hashtags for this query, raising an exception.

  2. A pre-existing hashtag might be deleted by a concurrent transaction between looking it up here, and inserting into post_hashtags. Again, raising an exception. Extremely unlikely, but possible.

You might just re-run this query if one of these two unlikely issues arises.
Or you cover both corner cases in the same query. I did not bother to go that far. See instructions and detailed explanation here:

Related:

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.