My solution was to first convert the xml to json;
xml_to_json
Then I defined:
-- converts xml from data in spreadsheet templates table into a table via json
CREATE OR REPLACE FUNCTION public.xml_to_table(xml)
RETURNS TABLE(sheetname text, attributename text, attributevalue text)
LANGUAGE 'sql'
COST 100
VOLATILE PARALLEL UNSAFE
ROWS 1000
AS $BODY$
-- from the records returned by the subquery below this returns records with columns SheetName,
-- attribute name (address) and attribute value e.g.:
-- Sheet1 E54 3
-- Sheet1 G23 1.1
-- Sheet1 N87 0
-- Sheet2 W32 thing
-- ...
select
e.sheetname,
jsonb_object_keys(e.attr) as attributename,
e.attr ->> jsonb_object_keys(e.attr) as attributevalue
from
(
-- removes the rows with null for the list of attributes from the results from the subquery under this, and
-- separates each attribute to its own row e.g.:
-- Sheet1 {"E54": "3"}
-- Sheet1 {"G23": "1.1"}
-- Sheet1 {"N87": "0"}
-- Sheet2 {"W32": "thing"}
-- ...
select
d.sheetname,
jsonb_array_elements(d.exceldata) as attr
from
(
-- separates each line from the subquery under this into records containging columns for sheetname, and
-- this can handle xml having more than one element at the addresses level
-- (e.g. it can handle a NamedeCells element alongside Addresses )
-- Sheet1 [{"E54": "3"}, ...
-- Sheet2 [{"W32": "thing"}]
-- ...
select
b.sheetname,
b.records -> jsonb_object_keys(b.records) -> 'attr' as exceldata
from
(
-- separates each line from the subquery under this into records with columns for sheetname and
-- a row for the JSON for each of Addresses e.g.:
-- Sheet1 {Addresses: {attr: ...
-- Sheet2 {Addresses: {attr: ...
-- ...
select
jsonb_object_keys(a.sheetjson) as sheetname,
jsonb_array_elements((a.sheetjson->jsonb_object_keys(a.sheetjson) -> 'childs')) as records
from
(
-- separates the supplied xml into json records for each sheet e.g.:
-- {Sheet1: {attr: ...
-- {Sheet2: {attr: ...
-- ...
select
jsonb_array_elements(xml_to_json($1)->'Sheets'->'childs') as sheetjson
) as a
) as b
) as d
) as e;
$BODY$;
this can be called with
select * from xml_to_table('<Sheets>
<Sheet1>
<Addresses E54="3" G23="1.1" N87="0"/>
</Sheet1>
<Sheet2>
<Addresses W32="thing"/>
</Sheet2>
</Sheets>')
to produce
| sheetname |
attributename |
attributevalue |
| Sheet1 |
E54 |
3 |
| Sheet1 |
G23 |
1.1 |
| Sheet1 |
N87 |
0 |
| Sheet2 |
W32 |
thing |
This function is not a general as I'd like but it will suffice for my data clean up needs
Any comments welcome.