I generate a ~200'000-element array of objects (using object literal notation inside map rather than new Constructor()), and I'm saving a JSON.stringify'd version of it to disk, where it takes up 31 MB, including newlines and one-space-per-indentation level (JSON.stringify(arr, null, 1)).
Then, in a new node process, I read the entire file into a UTF-8 string and pass it to JSON.parse:
var fs = require('fs');
var arr1 = JSON.parse(fs.readFileSync('JMdict-all.json', {encoding : 'utf8'}));
Node memory usage is about 1.05 GB according to Mavericks' Activity Monitor! Even typing into a Terminal feels laggier on my ancient 4 GB RAM machine.
But if, in a new node process, I load the file's contents into a string, chop it up at element boundaries, and JSON.parse each element individually, ostensibly getting the same object array:
var fs = require('fs');
var arr2 = fs.readFileSync('JMdict-all.json', {encoding : 'utf8'}).trim().slice(1,-3).split('\n },').map(function(s) {return JSON.parse(s+'}');});
node is using just ~200 MB of memory, and no noticeable system lag. This pattern persists across many restarts of node: JSON.parseing the whole array takes a gig of memory while parsing it element-wise is much more memory-efficient.
Why is there such a huge disparity in memory usage? Is this a problem with JSON.parse preventing efficient hidden class generation in V8? How can I get good memory performance without slicing-and-dicing strings? Must I use a streaming JSON parse 😭?
For ease of experimentation, I've put the JSON file in question in a Gist, please feel free to clone it.
node --expose-gc, run the first code snippet (using up 1 GB memory), and runglobal.gc();about fifty times, node memory usage slowly drops to 100~ MB. The implications—wow.git clone https://gist.github.com/909090f86ab5d9e12985.git. Or if you just want to look at a bit of the JSON file, Github will show a few thousand lines gist.github.com/fasiha/909090f86ab5d9e12985/revisions