Nodejs - transform an array

Question

I have an array in the form:

[
{id:'4704',sub_elements:['123456','7890123','5566778']},
{id:'4706',sub_elements:['223344','7890123','1111111']},
....
]

The array is about 2500 elements with each sub array also having potentially hundreds of elements.

I wish to transform this array with the elements swapped. i.e.

[
'123456': ['4704'],
'7890123': ['4704','4706'],
'5566778': ['4704'],
'223344' : ['4706'],
'1111111' : ['4706']
]

One way is to take all the sub elements from the first array and push them onto an array. Then just de-dupe the array.

So you end up with something like:

var sub_elements = ['123456','7890123','5566778', '223344','7890123','1111111', ....] //50,000+ elements

Then just iterate over that (pretty massive list). Pseudocode:

var original = [
{id:'4704',sub_elements:['123456','7890123','5566778']},
{id:'4706',sub_elements:['223344','7890123','1111111']},
....
]; //2,000+ elements
var elements = ['123456','7890123','5566778', '223344','7890123','1111111'];

var result = {};

for(element in elements){
    var ids = [];
    for(var x in original){
        if(original[x].sub_elements.indexOf(elements[element]) >= 0){
            ids.push(original[x].id);
        }
    }
    result[elements[element]] = ids;
}

Problem is with so many elements in the de-duped array this takes an absolute age in Node. There has to be a more efficient way of doing this. In reality the size of elements is 50K+ elements. So it's iterating over all 50K elements and for each iteration it iterates over the original array.

Wood for the trees at the moment - maybe someone here has done something like this already.

The thing you say you want is not valid JavaScript; do you want an array as a result, or just an object? — Pointy
– Pointy, Commented Sep 15, 2015 at 22:21
Please edit your question to make the 2nd code block into legal Javascript. It is unclear what you want. — jfriend00
– jfriend00, Commented Sep 16, 2015 at 3:22

Pointy · Accepted Answer · 2015-09-15 22:36:33Z

1

Assuming what you really want is an object that maps from the "sub element" values onto the "id" values in the original, you can do that in no time flat I think:

var transformed = original.reduce(function(rv, entry) {
  entry.sub_elements.forEach(function(subelem) {
    if (rv[subelem])
      rv[subelem].push(entry.id);
    else
      rv[subelem] = [entry.id];
  });
  return rv;
}, {});

That'll give you an object where the property names are the collection of sub-element values, and the value of each property is the array of ids (possibly with duplicates) where the sub-element values appeared in the original.

The code works by iterating through the original array with .reduce(), starting with a fresh empty object. For each original array entry, the "sub element" list is then iterated within to build up the list of "id" values in the overall result under the key of each sub-element entry.

edit — If it's really important that the result should be a real Array instance and not an object (which doesn't seem like it'd make a difference, because it's not clear why you'd care about what .length would tell you in that case), you can do that with the same code as above except:

var transformed = original.reduce(function(rv, entry) {
  // same same
}, []); // <--- [] instead of {} to initialize the process

edited Sep 15, 2015 at 22:36

answered Sep 15, 2015 at 22:24

Pointy

415k62 gold badges600 silver badges631 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Vinnie Over a year ago

The idea is that the final array is keyed with the sub-elements and referencing the array by key returns a list of the ids. So when it's finally used I can do something like: var ids = result['7890123'] would return: ['4704','4706']

Pointy Over a year ago

@Vinnie yes, and that's exactly what the code above does.

Pointy Over a year ago

@Vinnie Do you want the final result to specifically be an Array instance?

Vinnie Over a year ago

Sorry for not getting back to you earlier. Answer is perfect thanks a million.

Jose Mato · Accepted Answer · 2015-09-16 00:21:45Z

1

By default, node js run only in one thread, so all queue of request (pool request) should be async BUT your code to parse the array (more than 50K) is sync code and will block all incoming request (or the requests in the pool) and you will lose a lot of performance. To parse a big array like this you should use node js ticket function, something like this:

var result = {};
var original = [{'id': 1, subElements: [1,2,3]}, {...}, {...}, ...];

var processBigArray = function() {
    // 
    if(original.length !== 0) {
        process.nextTick(function processItemFromArray() {
           var item = original.shift();
           // perform the operations
           // .... doing operations ....

           // if there are more elements, process on next tick
           if(original.length !== 0) {
               process.nextTick();
           }
        });
    }

};

processBigArray();

Using process.nextTick if there are some incoming request in node (or other events to do in the pool) then node will process the next incoming event and then the next item of your array, in this way, incoming events will never be blocked.

answered Sep 16, 2015 at 0:21

Jose Mato

2,8091 gold badge20 silver badges18 bronze badges

1 Comment

Jose Mato Over a year ago

Welcome @Vinnie :)) ... btw, i wrote this code a looong time ago, so, nextTick is not really good, for this example please, use setInmediate (because nextTick will put your function processBigArray to be processed inmediately, blocking other tasks, and setInmediate will perform some waiting tasks and then will execute the given function)

hexa heart · Accepted Answer · 2015-09-15 23:21:29Z

0

update : don't use for ... in on real arrays. The values come in arbitrary order

for ... in

edited Sep 15, 2015 at 23:21

answered Sep 15, 2015 at 22:29

hexa heart

1811 silver badge6 bronze badges

6 Comments

Pointy Over a year ago

It's true that one shouldn't use for ... in on real arrays, but the reasons have nothing at all to do with performance. The iteration will be essentially exactly as fast one way or the other. What's making the OP's code slow the nested .indexOf() search.

hexa heart Over a year ago

Yes .indexOf() have height cost, but don't underestimate the impact of for ... in on large array !

Pointy Over a year ago

There is almost zero impact of for ... in. Try it in jsperf if you don't believe me; it won't make any significant difference, especially compared to the gross algorithmic inefficiency in the OP code.

RobG Over a year ago

The only performance issue with for..in comes from iterating over enumerable inherited properties, so a hasOwnProperty check should be used. If it's known that there are no such properties, the check can be omitted and, as Pointy says, there should be no performance issue. There is a semantic issue though, as properties may not be returned in any particular order and for arrays, only numeric properties are usually sought and they are usually visited in numeric order.

Pointy Over a year ago

Yes definitely, don't use for ... in on real arrays (unless you really know what you're doing and why), but it's not a performance thing.

|

Collectives™ on Stack Overflow

Nodejs - transform an array

3 Answers 3

4 Comments

1 Comment

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related