-1

I have the following nested XML, which I would like to stream parse with Node.js to a Postgres database. The XML is reduced to a reproducible example, but is in fact large.

<MarketDocument>
    <createdDateTime>2018-02-17T16:42:28Z</createdDateTime>
    <TimeSeries>
        <Type>A01<Type>
        <Period>
            <Point><position>1</position></Point>
            <Point><position>2</position></Point>
        </Period>
    </TimeSeries>
    <TimeSeries>
        <Type>B01<Type>
        <Period>
            <Point><position>3</position></Point>
            <Point><position>4</position></Point>
        </Period>
    </TimeSeries>
</MarketDocument>

Expected output: [["A01", 1], ["A01", 2], ["B01", 3], ["B01", 4]]

Main problem: iterating over the parent (<Type>). Haven't found good documentation on this problem. Would like to work along the approach by forrert

Question:
1) Do you have an idea to parse this correctly with Node.js?
2) Maybe there is another approach: let me know.


I basically need help with the following part:

var XmlStream = require('xml-stream');
var stream = fs.createReadStream('./here.xml'); // or stream directly from your online source
var xml = new XmlStream(stream);

xml.on('endElement: TimeSeries', function(item) {

    // PHP-code: How do you do this in nodejs
    foreach ($item->Period->Point as $point) {
    $position = $point->position;
    $array[] = "('$Type', '$position')";
    }

});

Your help would be appreciated!

6
  • Have you considered using a stream parser? Commented Apr 19, 2018 at 18:33
  • Yes and got it working but the main problem is the iteration part Commented Apr 19, 2018 at 18:36
  • Not sure what problem you're having... using a stream parser, you can watch for that element type and output it as you go. If you're doing anything else, you should be querying your database after you have loaded the XML into it. XML is just a transfer format. Commented Apr 19, 2018 at 18:59
  • Sure, I will clarify. The problem is constructing the array in the expected form. I will add code Commented Apr 19, 2018 at 19:10
  • So, your problem has nothing to do with XML parsing? You just want to do arr.push([type, pos])? Commented Apr 19, 2018 at 19:34

1 Answer 1

1

All the approaches that were mentioned in forrert's answer seem fine to me.. If the xml is REALLY huge, you can split it to a few chunks, and work on it one chunk at a time, in order to not block the whole process

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for you comment. But how do you iterate over e.g. <Type>A01</Type> ?
The built-in XML parser should be fine, though for heavy parsing it might be better do it in the server and then convert it to JSON.. Thought the question was more about approaches for large text parsing, do you need help in understanding how to parse it in the first place?
That's exactly right. I would like to do large text parsing on the server side. I have already written similar code in PHP, but would like to get it to work in NodeJS. Thanks though. I have added an example.
Splitting an XML into chunks on your own seems like an overkill - would you implement a stateful token parser? There's a lot of modules that could stream the xml, like sax stream

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.