27

My context: I'm in node.js using the 'mssql' npm package. My SQL Server is Microsoft SQL Server 2014.

I have a record that may or may not exist in a table already -- if it exists I want to update it, otherwise I want to insert it. I'm not sure what the optimal SQL is, or if there's some kind of 'transaction' I should be running in SQL Server. I've found some options that seem good, but I'm not sure about any of them:

Option 1: how to update if exists or insert

Problem with this is I'm not even sure this is valid syntax in SQL Server. I do like it though, and it seems to support doing multiple rows at once too which I like.

INSERT INTO table (id, user, date, points)
    VALUES (1, 1, '2017-03-03', 25),
           (2, 1, '2017-03-04', 25),
           (3, 2, '2017-03-03', 100),
           (4, 2, '2017-03-04', 150)
    ON DUPLICATE KEY UPDATE points = VALUES(points)

Option 2:

don't know if there's any problem with this one, just not sure if it's optimal. Doesn't seem to support multiple simultaneous rows

update test set name='john' where id=3012
IF @@ROWCOUNT=0
   insert into test(name) values('john');

Option 3: Merge, https://dba.stackexchange.com/questions/89696/how-to-insert-or-update-using-single-query

Some people say this is a bit buggy or something? This also apparently supports multiple at once which I like.

MERGE dbo.Test WITH (SERIALIZABLE) AS T
USING (VALUES (3012, 'john')) AS U (id, name)
    ON U.id = T.id
WHEN MATCHED THEN 
    UPDATE SET T.name = U.name
WHEN NOT MATCHED THEN
    INSERT (id, name) 
    VALUES (U.id, U.name);
7
  • MERGE has "features" in older versions of SQL Server. There are still some "features", but some have been fixed. With a simple statement like this, you should be fine. I'm, however, surprised you didn't also go for a simple UPSERT process: Run an UPDATE using the data, and then an INSERT with a NOT EXISTS. Commented Sep 25, 2020 at 9:15
  • 2
    Read this: michaeljswart.com/2017/07/… and this: sqlperformance.com/2020/09/locking/upsert-anti-pattern plus the comments on both Commented Sep 25, 2020 at 9:19
  • Option 2 throws error. if is not valid. How to handle this? Commented Apr 25, 2023 at 7:28
  • @DaleK please don't make unnecessary edits and bring a 5 years old thread back. Commented Jul 1 at 7:09
  • @DaleK, what do you think you really improved? Commented Jul 1 at 15:32

3 Answers 3

17

If your system is highly concurrent, and performance is important - you can try following pattern, if updates are more common than inserts:

BEGIN TRANSACTION;
 
UPDATE dbo.t WITH (UPDLOCK, SERIALIZABLE) SET val = @val WHERE [key] = @key;
 
IF @@ROWCOUNT = 0
BEGIN
  INSERT dbo.t([key], val) VALUES(@key, @val);
END
 
COMMIT TRANSACTION;

Reference: https://sqlperformance.com/2020/09/locking/upsert-anti-pattern

Also read: https://michaeljswart.com/2017/07/sql-server-upsert-patterns-and-antipatterns/

If inserts are more common:

BEGIN TRY     
  INSERT INTO dbo.AccountDetails (Email, Etc) VALUES (@Email, @Etc);       
END TRY     
BEGIN CATCH     
  -- ignore duplicate key errors, throw the rest.
  IF ERROR_NUMBER() IN (2601, 2627) 
    UPDATE dbo.AccountDetails
       SET Etc = @Etc
     WHERE Email = @Email;     
END CATCH

I wouldn't use merge, while most of the bugs are apparently fixed - we have had major issues with it before in production.


Yes above answers were for single rows - For multiple rows, you'd do something like this: The idea behind the locking is the same though

BEGIN TRANSACTION;
 
  UPDATE t WITH (UPDLOCK, SERIALIZABLE) 
    SET val = tvp.val
  FROM dbo.t AS t
  INNER JOIN @tvp AS tvp
    ON t.[key] = tvp.[key];
 
  INSERT dbo.t([key], val)
    SELECT [key], val FROM @tvp AS tvp
    WHERE NOT EXISTS (SELECT 1 FROM dbo.t WHERE [key] = tvp.[key]);
 
COMMIT TRANSACTION;
Sign up to request clarification or add additional context in comments.

6 Comments

This does, however, only handle single rows. With lots of rows that means lots of INSERT/UPDATE statements which is actually going to be worse for a table that needs high availability, as multiple short locks will take longer than a single lock which will lock for longer than 1 single short lock (to update a single row).
The second example, in fact, would even produce incorrect results. Imagine if you were inserting 50 rows, and 49 of them were new, and 1 an existing row. The one duplicate would fail, and then the UPDATE would be run, updating the single row. Meaning the other 49 rows are lost.
@Larnu - Yes they were for single row example to show the idea. Added edit for multiple row version. Idea behind the locking is the same though. In highly concurrent scenarios your example, without the hints, could give deadlocks, key violation errors etc. Probably not a concern for most people, but in larger distributed systems it makes a big impact in terms of overall performance
@Larnu . . . The OP's question only has single row inserts.
The examples, yes, @GordonLinoff, however, they mention multiple times that they like specific solutions because they do support multiple rows, and that they don't like option 2 because it doesn't. Recommend ing a solution similar to one the OP has already said they dislike (due to it only supporting one row) doesn't really meet to OP's question for alternatives and clarification.
|
8

Every one of them has different purpose, pros and cons.

Option 1 is good for multi row inserts/updates. However It only checks primary key constraints.

Option 2 is good for small sets of data. Single record insertion/update. It is more like script.

Option 3 is best for big queries. Lets say, reading from one table and inserting/updating to another accordingly. You can define which condition to be satisfied for insertion and/or update. You are not limited to primary key/unique constraint.

2 Comments

Thanks for this answer. I liked Option 1 the most, however I've confirmed that this syntax doesn't work on my SQL server, so I'm leaning towards option 3
I'm fine with being limited to primary key constraints though, that's what I'll be going with anyway. Wish there was an option 1 version for MSSQL
1

Extending my comment here. There are known problems with MERGE in SQL Server, however, for what you're doing here you will likely be ok. Aaron Bertrand has an article on the subject which you can find here: Use Caution with SQL Server's MERGE Statement.

An alternative, however, for what you could do here would be using an "UPSERT"; UPDATE the existing rows, and then INSERT the ones that don't exist. This involves 2 separate statements, however, was the method used prior to MERGE:

UPDATE T
SET T.Name = U.Name
FROM dbo.Test T
     JOIN (VALUES (3012, 'john')) AS U (id, name) ON T.id = U.id;

INSERT INTO dbo.Test (Name) --I'm assuming ID is an `IDENTITY` here
SELECT U.name
FROM (VALUES (3012, 'john')) AS U (id, name)
WHERE NOT EXISTS (SELECT 1
                  FROM dbo.Test T
                  WHERE T.ID = U.ID);

Note I have not declared any locking or transactions in this example, but you should in any implemented solution.

2 Comments

Thanks. I looked at your Caution link, and I don't think my use case will have those problems. These tables i'm inserting /updating into will NOT be 'concurrently used' while I'm doing the updates -- they'll just be dormant apart from my updates. If that's the case, I feel like I should just be going with the Merge..
Fortunately, you're not using SQL Server 2008, @TKoL , where MERGE should simply be avoided.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.