How to import JSON file with different values into a SQL table?

Question

I got a JSON file being generated each day. It could have different structure depends on the source data (attributes could be more or less in each JSON file). For example: File 1 has attributes: productCode, productType, sellerName, sellerCountry File 2 has all above attributes plus sellerState next file could have more or less attributes than file 2

How could I flatten this kind of JSON file into a SQL table dynamically?

File1:

[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  }
]

File2:

[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  },
  {
    "productCode": "00003",
    "productType": "Toy",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "AUS",
        "sellerState": "VIC"
      }
    ]
  }
]

There is no automatic normalization process (excepting ChatGPT basically doing your job for you). You'll need to figure it out based on your knowledge of the problem-domain. — Dai
– Dai, Commented Oct 17, 2023 at 4:38
What happens if property doesnt exist in the table. How dynamic is your dynamic — siggemannen
– siggemannen, Commented Oct 17, 2023 at 7:57
It can be helpful to also include a manual constructed example of the expected output. — Chris Schaller
– Chris Schaller, Commented Oct 17, 2023 at 12:56

Chris Schaller · Accepted Answer · 2023-10-17 13:25:56Z

0

It is possible to do this dynamically using dynamic SQL generation, but in most cases that is not necessary.

In JSON files it is common to exclude properties for individual entities if their value is null. This means that although it looks like the output is dynamic systems that generate these files will usually conform to a standard schema. In this case there is a maximal structure that is possible but if you do not know this schema you can determine it from comparing files.

Once you have a fixed schema, we can read all files according to that schema and the missing properties will be imported with a null value.

A complicating factor in this input is that the sellerDetails is a nested array, in SQL this is a new dimension or table, so we need to join on the results of extracting the root element properties which will flatten the data.

This is described in How to use OPENJSON on multiple rows

Let us therefore define the maximal schema assuming that sellerDetails has the 3 properties: sellerName, sellerCountry, sellerState. If you identify more fields, add them to the WITH declaration.

This fiddle demonstrates a solution in MS SQL Server: http://sqlfiddle.com/#!18/7d5fd/2

I have unioned the two inputs into the same resultset so you can compare the output

SELECT DISTINCT product.ProductCode, product.productType, seller.*
FROM
(
  SELECT @json1 as json
  UNION 
  SELECT @json2
) input
CROSS APPLY OPENJSON (json)
WITH (
    productCode varchar(10), 
    productType varchar(50), 
    sellerDetails NVARCHAR(MAX) as JSON
) product
OUTER APPLY OPENJSON (product.sellerDetails) 
WITH (
  sellerName varchar(50),
  sellerCountry varchar(50),
  sellerState varchar(50)
) seller

ProductCode	productType	sellerName	sellerCountry	sellerState
00001	Food	Cosco	UK	NULL
00002	Clothing	NULL	NULL	NULL
00003	Toy	Cosco	AUS	VIC

Notice the code ignores the missing sellerState from the first file and injects NULL

UPDATE:

Thank you to @Charlieface for pointing out that we should consider using OUTER APPLY OPENJSON (product.sellerDetails) for nested OPENJSON calls. CROSS APPLY behaves like an INNER JOIN, so using cross apply here results in fintering out product records that did not have any sellerDetails records, like 'Clothing' in this example.

DISTINCT is also added to this query to remove the duplicate entries, product 00002 Clothing for instance appears in both files.

These are the variable declarations:

DECLARE @json1 varchar(max) = '[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  }
]
';
DECLARE @json2 varchar(max) = '[
  {
    "productCode": "00001",
    "productType": "Food",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "UK"
      }
    ]
  },
  {
    "productCode": "00002",
    "productType": "Clothing"
  },
  {
    "productCode": "00003",
    "productType": "Toy",
    "sellerDetails": [
      {
        "sellerName": "Cosco",
        "sellerCountry": "AUS",
        "sellerState": "VIC"
      }
    ]
  }
]
';

edited Oct 17, 2023 at 13:25

answered Oct 17, 2023 at 6:32

Chris Schaller

17.1k3 gold badges57 silver badges101 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Charlieface Over a year ago

OUTER APPLY OPENJSON (product.sellerDetails) that way you get the rows without that whole property

Chris Schaller Over a year ago

I'm not sure what you mean @Charlieface, that is what I'm doing.

Charlieface Over a year ago

OUTER not CROSS

Chris Schaller Over a year ago

You're absolutely right... sorry its late at night here ;)

Hello World Over a year ago

Thank you Chris, there are so much knowledge in your detailed solution. I like SQL fiddle tool as well. I actually need to use ADF to implement solution, i didn't ask my question properly. That's fine, I got directions with what I need to do.

Collectives™ on Stack Overflow

How to import JSON file with different values into a SQL table?

1 Answer 1

UPDATE:

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

UPDATE:

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related