1

I currently have a MongoDB database with the following schema:

Image: { name: String, src: String, category: String, tags: [String] }

I'd like to migrate this to Postgres and for that I'd have 4 tables

image (id, src, name, category_id)
tag (id, name)
image_tag (image_id, tag_id)
category (id, name)

There might be new tags on every image inserts so when using CTE I need to select all the tags (and only insert new tags if they don't exist). I was thinking about using a cache (redis) to store the already inserted tags (so I don't need to select them from the db).

So my question is should I go with CTE with insert into tags.. where not exists statements or CTE + redis and only inserting tags when it could not be found in the cache?

2
  • What is your PG version? Commented Mar 28, 2016 at 11:35
  • If you're on 9.5 version then you can use new upsert syntax: INSERT INTO tags ... ON CONFLICT (name) DO NOTHING (you need to have unique constraint on name column) Commented Mar 28, 2016 at 12:11

1 Answer 1

1

So here is the small statement to insert an image with a category and multiple tags into multiple tables of a postgres database. The following expression assumes that the name in the tables category and tag has an unique constraint defined. For completion I also created an statement without that constraint (see the examples section).

Postgres statement

WITH image_values(image_name, src, category) AS (
  VALUES 
  ('Goldkraut', 'goldkraut.jpg', 'logo')
),
tag_values(tag_name) AS (
  VALUES
  ('music'), ('band')
),
category_select AS (
  SELECT id, name FROM category
  WHERE name IN (SELECT category FROM image_values) 
),
category_insert AS (
  INSERT INTO category(name) 
  SELECT category FROM image_values
  ON CONFLICT (name) DO NOTHING 
  RETURNING id, name
),
category_created AS (
  SELECT id, name FROM category_select
  UNION ALL
  SELECT id, name FROM category_insert
),
tag_select AS (
  SELECT id, name FROM tag
  WHERE name IN (SELECT tag_name FROM tag_values) 
),
tag_insert AS (
  INSERT INTO tag(name) 
  SELECT tag_name FROM tag_values
  ON CONFLICT (name) DO NOTHING 
  RETURNING id, name
),
tag_created AS (
  SELECT id, name FROM tag_select
  UNION ALL
  SELECT id, name FROM tag_insert
),
image_insert AS (
  INSERT INTO image(src, name, category_id)
  SELECT src, image_name, category_created.id 
  FROM image_values
  LEFT JOIN category_created ON(image_values.category=category_created.name)
  RETURNING id, src, name, category_id
),
image_tag_insert AS (
  INSERT INTO image_tag(image_id, tag_id)
  SELECT image_insert.id, tag_created.id FROM image_insert
  CROSS JOIN tag_created
  RETURNING image_id, tag_id
)
SELECT image_insert.*, category_created.name as category_name, image_tag_insert.*, tag_created.name as "tag.name"
  FROM image_tag_insert 
  LEFT JOIN image_insert ON (image_id = image_insert.id) 
  LEFT JOIN category_created ON (category_created.id = image_insert.category_id) 
  LEFT JOIN tag_created ON (tag_created.id = tag_id)

Explanation to the statement

In the first common table expression (CTE) image_values you will define all values for an image that has in a 1:1 relation. In the next expression tag_values all tag names for that image are defined.

Now lets start with the categories. To know if a category with the name already exist, you query for an category entry in category_select. In the expression category_insert you will create an new entry for the category if not already exits (instead of querying again from the database we use the cte category_select to find out if we already have an category with this name). To store the category id in the image table we need the category entry whether the existing (from category_select) or the inserted (from category_insert) so we union this two expressions in category_created.

Now we use the same pattern for the tags. Query for existing tags tag_select, insert tags if not exist tag_insert and union this entries in tag_created.

At next we insert the image in image_insert. Therefore we select the values from the expression image_values and join the expression category_created to get the id of the category. To insert the the relation image to tag we will need the id of the inserted image so we will return this value. The other return values are not really necessary but we will use them to get a nicer result set in the final query.

Now we have the primary key of the inserted image and we can store the associations of the image to the tags. In the expression image_tag_insert we select the id of the inserted image and cross join this with every tag id we selected or inserted.

For the final statement it will be enough to just do SELECT * FROM image_tag_insert to execute all the expression. But for an overview what was stored in the database i joined all the relations. So the result will look like this:

Joined result

| id |           src |      name | category_id | category_name | image_id | tag_id | tag.name |
|----|---------------|-----------|-------------|---------------|----------|--------|----------|
|  1 | goldkraut.jpg | Goldkraut |           2 |          logo |        1 |      3 |     band |
|  1 | goldkraut.jpg | Goldkraut |           2 |          logo |        1 |      1 |    music |

Example

On this sqlfiddle you will see the given query in action. In another sqlfiddle i have add some extras to the last statement to format all inserted tags as a list. If you have not add a unique constrain to the name column in the tables tag and category you can use this example

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.