0

I got a JSON file being generated each day. It could have different structure depends on the source data (attributes could be more or less in each JSON file). For example: File 1 has attributes: productCode, productType, sellerName, sellerCountry File 2 has all above attributes plus sellerState next file could have more or less attributes than file 2

How could I flatten this kind of JSON file into a SQL table dynamically?

File1:

[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  }
]

File2:

[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  },
  {
    "productCode": "00003",
    "productType": "Toy",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "AUS",
        "sellerState": "VIC"
      }
    ]
  }
]
4
  • There is no automatic normalization process (excepting ChatGPT basically doing your job for you). You'll need to figure it out based on your knowledge of the problem-domain. Commented Oct 17, 2023 at 4:38
  • Why should I "tag my RDBMS"? Commented Oct 17, 2023 at 4:59
  • What happens if property doesnt exist in the table. How dynamic is your dynamic Commented Oct 17, 2023 at 7:57
  • It can be helpful to also include a manual constructed example of the expected output. Commented Oct 17, 2023 at 12:56

1 Answer 1

0

It is possible to do this dynamically using dynamic SQL generation, but in most cases that is not necessary.

In JSON files it is common to exclude properties for individual entities if their value is null. This means that although it looks like the output is dynamic systems that generate these files will usually conform to a standard schema. In this case there is a maximal structure that is possible but if you do not know this schema you can determine it from comparing files.

Once you have a fixed schema, we can read all files according to that schema and the missing properties will be imported with a null value.

A complicating factor in this input is that the sellerDetails is a nested array, in SQL this is a new dimension or table, so we need to join on the results of extracting the root element properties which will flatten the data.

This is described in How to use OPENJSON on multiple rows

Let us therefore define the maximal schema assuming that sellerDetails has the 3 properties: sellerName, sellerCountry, sellerState. If you identify more fields, add them to the WITH declaration.

This fiddle demonstrates a solution in MS SQL Server: http://sqlfiddle.com/#!18/7d5fd/2

I have unioned the two inputs into the same resultset so you can compare the output

SELECT DISTINCT product.ProductCode, product.productType, seller.*
FROM
(
  SELECT @json1 as json
  UNION 
  SELECT @json2
) input
CROSS APPLY OPENJSON (json)
WITH (
    productCode varchar(10), 
    productType varchar(50), 
    sellerDetails NVARCHAR(MAX) as JSON
) product
OUTER APPLY OPENJSON (product.sellerDetails) 
WITH (
  sellerName varchar(50),
  sellerCountry varchar(50),
  sellerState varchar(50)
) seller
ProductCode productType sellerName sellerCountry sellerState
00001 Food Cosco UK NULL
00002 Clothing NULL NULL NULL
00003 Toy Cosco AUS VIC

Notice the code ignores the missing sellerState from the first file and injects NULL

UPDATE:

Thank you to @Charlieface for pointing out that we should consider using OUTER APPLY OPENJSON (product.sellerDetails) for nested OPENJSON calls. CROSS APPLY behaves like an INNER JOIN, so using cross apply here results in fintering out product records that did not have any sellerDetails records, like 'Clothing' in this example.

DISTINCT is also added to this query to remove the duplicate entries, product 00002 Clothing for instance appears in both files.


These are the variable declarations:

DECLARE @json1 varchar(max) = '[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  }
]
';
DECLARE @json2 varchar(max) = '[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  },
  {
    "productCode": "00003",
    "productType": "Toy",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "AUS",
        "sellerState": "VIC"
      }
    ]
  }
]
';
Sign up to request clarification or add additional context in comments.

5 Comments

OUTER APPLY OPENJSON (product.sellerDetails) that way you get the rows without that whole property
I'm not sure what you mean @Charlieface, that is what I'm doing.
OUTER not CROSS
You're absolutely right... sorry its late at night here ;)
Thank you Chris, there are so much knowledge in your detailed solution. I like SQL fiddle tool as well. I actually need to use ADF to implement solution, i didn't ask my question properly. That's fine, I got directions with what I need to do.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.