0

I have built a stored procedure that aims to identify duplicates in a table and to display the duplicated rows in a meaningful order. It looks like this:

CREATE PROCEDURE [dbo].[spFindDuplicates] 
    @tableName nvarchar(255), 
    @field1 nvarchar(255), 
    @field2 nvarchar(255) = '1', 
    @field3 nvarchar(255) = '2', 
    @field4 nvarchar(255) = '3', 
    @field5 nvarchar(255) = '4'

AS

BEGIN

DECLARE @query AS nvarchar(MAX);

SET @query = '
SELECT *
FROM ' + @tableName + '
WHERE CAST(' + @field1 + ' AS nvarchar(255)) + CAST(' + @field2 + ' AS nvarchar(255)) + CAST(' + @field3 + ' AS nvarchar(255)) + CAST(' + @field4 + ' AS nvarchar(255)) + CAST(' + @field5 + ' AS nvarchar(255)) 
IN 
(
    SELECT CAST(' + @field1 + ' AS nvarchar(255)) + CAST(' + @field2 + ' AS nvarchar(255)) + CAST(' + @field3 + ' AS nvarchar(255)) + CAST(' + @field4 + ' AS nvarchar(255)) + CAST(' + @field5 + ' AS nvarchar(255))
    FROM ' + @tableName + '
    GROUP BY CAST(' + @field1 + ' AS nvarchar(255)) + CAST(' + @field2 + ' AS nvarchar(255)) + CAST(' + @field3 + ' AS nvarchar(255)) + CAST(' + @field4 + ' AS nvarchar(255)) + CAST(' + @field5 + ' AS nvarchar(255))
    HAVING COUNT(*) > 1
)
ORDER BY ' + @field1 + ', ' + @field2 + ', ' + @field3 + ', ' + @field4 + ', ' + @field5

EXECUTE(@query);

END

GO

--Example:

EXEC spFindDuplicates @tableName = 'someRandomTable', @field1 = 'firstField', @field2 = 'secondField', @field3 = 'thirdField'

As you can see, I can use at most 5 different fields that I concatenate in order for me to get a key used to determine whether we have a duplicate or not. Please note that I use the CAST function to be able to concatenate fields with various datatypes (varchar, int, dates, etc.).

When I execute the above stored procedure with 5 different fields, it works fine. But I would like to be able to run it with a variable number of fields (from 1 to 5), which is why I provided default values for @field2 to @field5.

But when I execute it with the above example (3 fields provided), I get the following error message:

A column has been specified more than once in the order by list. Columns in the order by list must be unique.

QUESTION: How can I keep ordering the resulting table without getting an error?

BONUS QUESTION: If you find a dynamic way to use that stored procedure with any number of fields (4, 17, or whatever), that'd be even more useful to me.

9
  • you can do that dynamically, by using single column param where you will pass comma separated columns or with table types Commented Jan 24, 2019 at 13:47
  • 1
    What you have there has a huge security flaw; it's wide open to SQL Injection. You should be parameterising your query, by using sp_executesql and quoting your objects by using QUOTENAME. Right now you just have a vulnerability waiting to be exploited. Commented Jan 24, 2019 at 13:49
  • 1
    "Larnu Let's consider I don't mind about SQL injection" Impossible. you do. If you don't, then have a rethink, change your mind, and then fix the problem. Commented Jan 24, 2019 at 14:28
  • 2
    If this isn't an xy problem I don't know what is. Commented Jan 24, 2019 at 14:34
  • 1
    So, you want to allow duplicate rows in many of your tables but you also need to identify those duplicates on a regular basis? Are you sure you're solving the right problem? Because more normally, it indicates you need some unique constraints to stop the duplicates from existing in the first place. Commented Jan 24, 2019 at 14:50

2 Answers 2

3

Like I said in the comments, injection is a huge problem here, and you need to consider it. Saying "Let's consider I don't mind about injection" is naïve and you need to change that attitude. Always make your SQL safe; then there are no excuses and chances for your application being compromised.

As what you are after, I suspect this achieves the goal. There's no need for the subquery to scan your table with an IN here, you can make use of COUNT and the OVER clause within a CTE.

CREATE PROCEDURE [dbo].[FindDuplicates] --I've removed te sp prefix, as sp_ is reserved by MS
    @tableName sysname, 
    @field1 sysname, 
    @field2 sysname = NULL, 
    @field3 sysname = NULL, 
    @field4 sysname = NULL, 
    @field5 sysname = NULL

AS BEGIN

    DECLARE @query AS nvarchar(MAX);

    SET @query = N'WITH CTE AS(' + NCHAR(10) +
                 N'    SELECT *' + NCHAR(10) + 
                 N'           COUNT(*) OVER (PARTITION BY ' + STUFF(CONCAT(N',' + QUOTENAME(@field1),N',' + QUOTENAME(@field2),N',' + QUOTENAME(@field3),N',' + QUOTENAME(@field4),N',' + QUOTENAME(@field5)),1,1,N'') + N' AS RowCount' + NCHAR(10) +
                 N'    FROM ' + QUOTENAME(@tableName) + N')' + NCHAR(10) +
                 N'SELECT *' + NCHAR(10) +
                 N'FROM CTE' + NCHAR(10) +
                 N'WHERE RowCount > 1' + NCHAR(10) + 
                 N'ORDER BY ' + STUFF(CONCAT(N',' + QUOTENAME(@field1),N',' + QUOTENAME(@field2),N',' + QUOTENAME(@field3),N',' + QUOTENAME(@field4),N',' + QUOTENAME(@field5)),1,1,N'') + N';';

    PRINT @query;
    --EXEC sys.sp_executesql @query; --Uncomment to rrun the actual query
END
GO

For the command you gave us EXEC dbo.FindDuplicates @tableName = 'someRandomTable', @field1 = 'firstField', @field2 = 'secondField', @field3 = 'thirdField';, this returns the SQL:

WITH CTE AS(
    SELECT *
           COUNT(*) OVER (PARTITION BY [firstField],[secondField],[thirdField] AS RowCount
    FROM [someRandomTable])
SELECT *
FROM CTE
WHERE RowCount > 1
ORDER BY [firstField],[secondField],[thirdField];

Which, I believe gives you the behaviour you are after.

Sign up to request clarification or add additional context in comments.

3 Comments

Well done. Took me a second to figure out how you handled NULLs but that is pretty clever.
CONCAT and STUFF works a dream for NULLable parameters, and similar, @SeanLange. To explain for others (maybe i should put in the answer), CONCAT (unlike normal string concatenation) doesn't return NULL when one of the expressions has the value NULL. So 'a' + NULL would be NULL, however, CONCAT('a',NULL, NULL, 'b') would be 'ab'. I use that to "my" advantage, meaning that the NULL value parameters are effectively "discarded" when building the PARTITION BY and ORDER BY clauses
Yeah I just had never considered using like this. Pretty cool.
-2

Edited the code to check if the column list exists on the sys.columns there by making sure we get only the columns which are appropriate.

CREATE FUNCTION dbo.fn_SplitString
(
   @List       NVARCHAR(MAX),
   @Delimiter  NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
   RETURN 
   (  
      SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
      FROM 
      ( 
        SELECT x = CONVERT(XML, '<i>' 
          + REPLACE(@List, @Delimiter, '</i><i>') 
          + '</i>').query('.')
      ) AS a CROSS APPLY x.nodes('i') AS y(i)
   );
GO
ALTER PROCEDURE [dbo].[spFindDuplicates] 
    @tableName nvarchar(255), 
    @columnlist nvarchar(max)  

AS

BEGIN

DECLARE @query AS nvarchar(MAX);

SET @columnlist = (SELECT STUFF((SELECT ','+'['+[name]+']'
FROM SYS.columns
WHERE object_id = object_id(@tableName)
AND [Name] IN
(
   SELECT Item
   FROM dbo.fn_SplitString(@columnlist,',')
)
FOR XML PATH('')
)
,1,1,''))

PRINT @columnlist

SET @query = 'SELECT * FROM (SELECT '+CAST(@columnlist AS NVARCHAR(MAX))+'
FROM '+CAST(@tableName AS nvarchar(MAX))+'
GROUP BY '+CAST(@columnlist AS NVARCHAR(MAX))+'
HAVING COUNT(*) > 1)Res1
ORDER BY '+@columnlist


EXEC SP_EXECUTESQL @query;

END

GO

9 Comments

This is far from going to solve the injection issue. In fact, ORDER BY '+@columnlist makes it even worse; as that is impossible to "make safe".
@Dheerendra Your suggestion is almost good but the 'SELECT *' is important to me, which is why I need to use a subquery and to concatenate fields. And in that case, your code is failing when it comes to concatenating
@Guillaume - I have edited the code. An i think to eliminate the sql injection, we can split the columns from the column list and match with the columns of the sys.columns and consider only the one's which are being match and update the comma seperated list as an enhancement to the procedure
The edit does not make this safe from sql injection. It is wide open.
Using a while loop to split strings is just awful. I realize you copy and pasted this. Here are several much better ways to deal with splitting strings. An even better way would be to use table valued parameters instead so you don't have to parse it at all.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.