2

I have created a recursive SQL Server scalar-valued function that converts XML data to a JSON string. The function works well for most cases, including nested elements and handling of arrays (using a json:Array attribute).

CREATE OR ALTER FUNCTION dbo.XmlToJson(@XmlData xml)  
RETURNS nvarchar(max)
WITH RETURNS NULL ON NULL INPUT
AS  
BEGIN  
    DECLARE @m nvarchar(max);

    WITH XMLNAMESPACES (N'http://james.newtonking.com/projects/json' AS json)
    SELECT @m = '{' + STRING_AGG(
  '"' + STRING_ESCAPE(name, 'json') + '":' + value,
  ','
) + '}'
    FROM 
        (SELECT
             v.name,
             CONCAT(CASE WHEN COUNT(*) > 1 OR MAX(isArray) = 1 THEN '[' END,
                    STRING_AGG(ISNULL('"' + REPLACE(STRING_ESCAPE(x.a.value('text()[1]', 'nvarchar(max)'), 'json'), '\', '\\') + '"', dbo.XmlToJson(x.a.query('./*'))), ','),
                    CASE WHEN COUNT(*) > 1 OR MAX(isArray) = 1 THEN ']' END
                   ) AS value
         FROM @XmlData.nodes('./*') x(a)
         CROSS APPLY 
             (SELECT
                  x.a.value('local-name(.)', 'nvarchar(4000)') AS name,
                  x.a.value('xs:int(xs:boolean(@json:Array))', 'int') AS isArray) v
         GROUP BY
             v.name) grouped;

    SET @m = ISNULL(@m, 'null');
    SET @m = REPLACE(@m, '\/', '/');

    RETURN @m;
END;

However, I'm facing an issue with escaping backslashes in text content. Specifically, when an XML element's text content ends with a backslash (), my current logic results in an extra backslash escape in the final JSON output.

The desired output for a path like C:\Books\Book1\Book1.pdf\ should be "C:\\Books\\Book1\\Book1.pdf\\". My current output is producing "C:\\\\Books\\\\Book1\\\\Book1.pdf\\\\".

For this input

DECLARE @xml xml = N'<root>
    <Book>Book1</Book>
    <TransactionId  xmlns:json="http://james.newtonking.com/projects/json" json:Array="true">abc123</TransactionId>
    <Publisher>Amazon</Publisher>
    <Edition  xmlns:json="http://james.newtonking.com/projects/json" json:Array="true">
        <Name>Ed1</Name>
        <Color>Red</Color>
        <Price>100</Price>
        <file>C:\Books\Book1\Book1.pdf\</file>
    </Edition>
    <PublisherId>1</PublisherId>
    <UserId>1234</UserId>
    <Release />
</root>
';

I get this output:

{"Book":"Book1","Edition":[{"Color":"Red","file":"C:\\Books\\Book1\\Book1.pdf\\","Name":"Ed1","Price":"100"}],"Publisher":"Amazon","PublisherId":"1","Release":null,"TransactionId":["abc123"],"UserId":"1234"}

The issue seems to stem from a conflict between STRING_ESCAPE and a manual REPLACE I'm using to handle general backslashes within the string, and how this interacts when the character is at the very end of the text.

Attached the DB fiddle for reference: https://dbfiddle.uk/rUlklVK8

However I cannot replicate the same issue which I'm facing on my SQL Server.

Details:

Microsoft SQL Server 2019 (RTM-CU22-GDR) (KB5029378) - 15.0.4326.1 (X64) 
Copyright (C) 2019 Microsoft Corporation 
Developer Edition (64-bit) on Windows Server 2019 Standard 10.0 <X64> (Build 17763: ) (Hypervisor) 
8
  • The question is missing the important parts despite its verbosity. Input XML, expected output, SQL Server version, in the question itself. And an explanation why any of this is used. SQL Server has XML support since 2005, JSON since 2016. You posted only client tool and library versions, which aren't relevant. If you use STRING_AGG though, you can use FOR JSON too Commented Nov 6 at 8:01
  • 1
    a conflict between STRING_ESCAPE and a manual REPLACE there's no such thing, but the overcomplicated query and the unnecessary string manipulations mean you don't really know what's getting escaped. And without the input, we can't guess either. At the very least extract the parts of the query that cause issues into separate queries and check the values before applying functions. How do they look? How do they look after you apply STRING_ESCAPE ? After applying REPLACE(STRING_ESCAPE ? You're either escaping or replacing the wrong things. Commented Nov 6 at 8:05
  • Besides, are you sure there's any problem to begin with? Where did you see \\\\ instead of \\ ? In SSMS or a debugger? Debuggers display special characters in their escaped form. Commented Nov 6 at 8:07
  • 1
    Post the actual data and query in the question itself instead of describing it. And once again, SSMS is just the client tool, not SQL Server. Besides, you can generate nested JSON with FOR JSON PATH instead of JSON AUTO as this fiddle shows. Have you tried that? Commented Nov 6 at 8:45
  • 5
    PS: NOLOCK is a serious bug. It's not a "go fast" switch, it actually causes problems (like duplicate rows or random crashes) and extra (schema-level) locking. To make a query go faster and avoid blocking, use appropriate indexes and write a better query. For reporting queries you can use snapshot isolation. If your query has to scan an entire table to find matching rows, other queries won't be able to make modifications to that table. If the ID columns and Bookcode are covered by indexes though, only the rows you want will be read and locked Commented Nov 6 at 8:50

3 Answers 3

0

You shouldn't use REPLACE at all here. STRING_ESCAPE already correctly escapes backslashes. DBFiddle has a bit of problem with double backslashes, it displays them as single, but the actual data is correct.

CREATE OR ALTER FUNCTION dbo.XmlToJson(@XmlData xml)  
RETURNS nvarchar(max)
WITH RETURNS NULL ON NULL INPUT
AS  
BEGIN  
    DECLARE @m nvarchar(max);

    WITH XMLNAMESPACES (
        N'http://james.newtonking.com/projects/json' AS json
    )
    SELECT @m = '{' + STRING_AGG(
      '"' + STRING_ESCAPE(name, 'json') + '":' + value,
      ','
    ) + '}'
    FROM (
        SELECT
            v.name,
            CONCAT(
              CASE WHEN COUNT(*) > 1 OR MAX(isArray) = 1 THEN '[' END,
              STRING_AGG(
                ISNULL(
                  '"' + STRING_ESCAPE(x.a.value('text()[1]', 'nvarchar(max)'), 'json') + '"',
                  dbo.XmlToJson(x.a.query('./*'))
                ),
                ','
              ),
              CASE WHEN COUNT(*) > 1 OR MAX(isArray) = 1 THEN ']' END
           ) AS value
        FROM @XmlData.nodes('./*') x(a)
        CROSS APPLY (SELECT
            x.a.value('local-name(.)', 'nvarchar(4000)') AS name,
            x.a.value('xs:int(xs:boolean(@json:Array))', 'int') AS isArray
        ) v
        GROUP BY
            v.name
    ) grouped;
    SET @m = ISNULL(@m, 'null');
    RETURN @m;
END;

db<>fiddle

In SQL Server 2025 and Azure, you can use JSON_ARRAYAGG and JSON_OBJECTAGG which will be much easier.

CREATE OR ALTER FUNCTION dbo.XmlToJson(@XmlData xml)  
RETURNS nvarchar(max)
WITH RETURNS NULL ON NULL INPUT
AS  
BEGIN  
    DECLARE @m nvarchar(max);

    WITH XMLNAMESPACES (
        N'http://james.newtonking.com/projects/json' AS json
    )
    SELECT @m = JSON_OBJECTAGG(name : value)
    FROM (
        SELECT
            v.name,
            CASE WHEN COUNT(*) > 1 OR MAX(isArray) = 1 THEN
              JSON_ARRAY_AGG(
                ISNULL(
                  x.a.value('text()[1]', 'nvarchar(max)'),
                  dbo.XmlToJson(x.a.query('./*'))
                )
              )
            ELSE
              ISNULL(
                  x.a.value('text()[1]', 'nvarchar(max)'),
                  dbo.XmlToJson(x.a.query('./*'))
                )
            END AS value
        FROM @XmlData.nodes('./*') x(a)
        CROSS APPLY (SELECT
            x.a.value('local-name(.)', 'nvarchar(4000)') AS name,
            x.a.value('xs:int(xs:boolean(@json:Array))', 'int') AS isArray
        ) v
        GROUP BY
            v.name
    ) grouped;
    SET @m = ISNULL(@m, 'null');
    RETURN @m;
END;
Sign up to request clarification or add additional context in comments.

Comments

0

This function isn't needed at all. You can generate JSON from query results in various forms using FOR JSON and produce nested results if needed.

For example, this query (fiddle)

select 
    root.value('Book[1]','nvarchar(50)') as Book,
    root.value('Publisher[1]','varchar(50)') as Publisher,
    root.value('PublisherId[1]','varchar(50)') as PublisherId,
    root.value('Release[1]','varchar(50)') as Release,
    root.value('UserId[1]','varchar(50)') as UserId    ,
    ed.value('Name[1]','varchar(20)')  as [Edition.Name],
    ed.value('Color[1]','varchar(20)') as [Edition.Color],
    ed.value('Price[1]','varchar(20)') as [Edition.Price],
    ed.value('file[1]','varchar(20)')  as [Edition.File],
    TransactionId.value('.','varchar(20)') as [TransactionId.TransactionId]
from 
@xml.nodes('/root') x(root)
outer apply root.nodes('Edition') q(ed)
outer apply root.nodes('TransactionId') t(TransactionId)
for JSON PATH,WITHOUT_ARRAY_WRAPPER

Produces this result :

{"Book":"Book1","Publisher":"Amazon","PublisherId":"1","Release":"","UserId":"1234","Edition":{"Name":"Ed1","Color":"Red","Price":"100","File":"C:\\Books\\Book1\\Book1"},"TransactionId":{"TransactionId":"abc123"}}

or, pretty-printed :

{
  "Book": "Book1",
  "Publisher": "Amazon",
  "PublisherId": "1",
  "Release": "",
  "UserId": "1234",
  "Edition": {
    "Name": "Ed1",
    "Color": "Red",
    "Price": "100",
    "File": "C:\\Books\\Book1\\Book1"
  },
  "TransactionId": {
    "TransactionId": "abc123"
  }
}

The Fiddle output displays the unescaped strings, which can lead to confusion.

The data doesn't contain any arrays so it's unclear why Edition and TransitionId should appear as arrays. That json:Array="true" means nothing. http://james.newtonking.com/projects/json is a sample namespace from a JSON.NET sample project.

If the XML actually contained an array of objects, eg :

DECLARE @xml xml = N'<root>
    <Book>Book1</Book>
    <TransactionId  xmlns:json="http://james.newtonking.com/projects/json" json:Array="true">abc123</TransactionId>
    <Publisher>Amazon</Publisher>
    <Editions>
        <Edition>
            <Name>Ed1</Name>
            <Color>Red</Color>
            <Price>100</Price>
            <file>C:\Books\Book1\Book1.pdf\</file>
        </Edition>
        <Edition>
            <Name>Ed2</Name>
            <Color>Green</Color>
            <Price>200</Price>
            <file>C:\Books\Book1\Book2.pdf\</file>
        </Edition>
    </Editions>
    <PublisherId>1</PublisherId>
    <UserId>1234</UserId>
    <Release />
</root>
';

You could generate an array of Editions in the output with :

select 
    root.value('Book[1]','nvarchar(50)') as Book,
    root.value('Publisher[1]','varchar(50)') as Publisher,
    root.value('PublisherId[1]','varchar(50)') as PublisherId,
    root.value('Release[1]','varchar(50)') as Release,
    root.value('UserId[1]','varchar(50)') as UserId    ,
    (
     select
        ed.value('Name[1]','varchar(20)')  as Name,
        ed.value('Color[1]','varchar(20)') as Color,
        ed.value('Price[1]','varchar(20)') as Price,
        ed.value('file[1]','varchar(20)')  as [File]
        from root.nodes('Editions/Edition') q(ed) FOR JSON PATH
    ) as Editions,
    TransactionId.value('.','varchar(20)') as TransactionId
from 
@xml.nodes('/root') x(root)
outer apply root.nodes('TransactionId') t(TransactionId)
for json auto,WITHOUT_ARRAY_WRAPPER;

This produces :

{
  "Book": "Book1",
  "Publisher": "Amazon",
  "PublisherId": "1",
  "Release": "",
  "UserId": "1234",
  "Editions": [
    {
      "Name": "Ed1",
      "Color": "Red",
      "Price": "100",
      "File": "C:\\Books\\Book1\\Book1"
    },
    {
      "Name": "Ed2",
      "Color": "Green",
      "Price": "200",
      "File": "C:\\Books\\Book1\\Book2"
    }
  ],
  "TransactionId": "abc123"
}

The results of the subquery became the Editions array attribute

3 Comments

I want to use this function to work dynamically across all the data in the DB. There are a lot of tables with different xml data, so it will be easy to write stored procedure if this function does the job instead of writing a lengthy select queries. So basically, if I can able to handle replacing the '\' with '\\' at the end of the string so it can be a valid JSON escape sequence.
Don't use SQL then, it's the absolutely wrong language for this. Any client language can do this faster. Besides, it's your own code that calls REPLACE( .....,'\','\\'), even after using STRING_ESCAPE.
I concur with @PanagiotisKanavos, XSLT 3.0 and later has a generic built-in function xml-to-json(). saxonica.com/html/documentation12/functions/fn/xml-to-json.html
0

I think the problem is that you don't need the extra replace, but that DbFiddle is fooling you:

See:

https://dbfiddle.uk/CWdL_2BU

When running, the second column seem to miss an extra \ even though the LEN thing returns 6: Fiddle output

When running your function code on my local computer but without the REPLACE(..., '', '\') i get proper results:

CREATE OR ALTER FUNCTION dbo.XmlToJson(@XmlData xml)  
RETURNS nvarchar(max)
WITH RETURNS NULL ON NULL INPUT
AS  
BEGIN  
    DECLARE @m nvarchar(max);

    WITH XMLNAMESPACES (
        N'http://james.newtonking.com/projects/json' AS json
    )
    SELECT @m = '{' + STRING_AGG(
      '"' + STRING_ESCAPE(name, 'json') + '":' + value,
      ','
    ) + '}'
    FROM (
        SELECT
            v.name,
            CONCAT(
              CASE WHEN COUNT(*) > 1 OR MAX(isArray) = 1 THEN '[' END,
              STRING_AGG(
                ISNULL(
                  '"' + STRING_ESCAPE(x.a.value('text()[1]', 'nvarchar(max)'), 'json') + '"',
                  dbo.XmlToJson(x.a.query('./*'))
                ),
                ','
              ),
              CASE WHEN COUNT(*) > 1 OR MAX(isArray) = 1 THEN ']' END
           ) AS value
        FROM @XmlData.nodes('./*') x(a)
        CROSS APPLY (SELECT
            x.a.value('local-name(.)', 'nvarchar(4000)') AS name,
            x.a.value('xs:int(xs:boolean(@json:Array))', 'int') AS isArray
        ) v
        GROUP BY
            v.name
    ) grouped;
    SET @m = ISNULL(@m, 'null');
    SET @m = REPLACE(@m, '\/', '/');
    RETURN @m;
END;

GO

...

SELECT dbo.XmlToJson(@xml.query('/root/*'));

Outputs:

{"Book":"Book1","Edition":[{"Color":"Red","file":"C:\\Books\\Book1\\Book1.pdf\\","Name":"Ed1","Price":"100"}],"Publisher":"Amazon","PublisherId":"1","Release":null,"TransactionId":["abc123"],"UserId":"1234"}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.