1

I have an array in the form:

[
{id:'4704',sub_elements:['123456','7890123','5566778']},
{id:'4706',sub_elements:['223344','7890123','1111111']},
....
]

The array is about 2500 elements with each sub array also having potentially hundreds of elements.

I wish to transform this array with the elements swapped. i.e.

[
'123456': ['4704'],
'7890123': ['4704','4706'],
'5566778': ['4704'],
'223344' : ['4706'],
'1111111' : ['4706']
]

One way is to take all the sub elements from the first array and push them onto an array. Then just de-dupe the array.

So you end up with something like:

var sub_elements = ['123456','7890123','5566778', '223344','7890123','1111111', ....] //50,000+ elements

Then just iterate over that (pretty massive list). Pseudocode:

var original = [
{id:'4704',sub_elements:['123456','7890123','5566778']},
{id:'4706',sub_elements:['223344','7890123','1111111']},
....
]; //2,000+ elements
var elements = ['123456','7890123','5566778', '223344','7890123','1111111'];

var result = {};

for(element in elements){
    var ids = [];
    for(var x in original){
        if(original[x].sub_elements.indexOf(elements[element]) >= 0){
            ids.push(original[x].id);
        }
    }
    result[elements[element]] = ids;
}

Problem is with so many elements in the de-duped array this takes an absolute age in Node. There has to be a more efficient way of doing this. In reality the size of elements is 50K+ elements. So it's iterating over all 50K elements and for each iteration it iterates over the original array.

Wood for the trees at the moment - maybe someone here has done something like this already.

3
  • 2
    The thing you say you want is not valid JavaScript; do you want an array as a result, or just an object? Commented Sep 15, 2015 at 22:21
  • Yea the result would be an array. Commented Sep 15, 2015 at 22:37
  • Please edit your question to make the 2nd code block into legal Javascript. It is unclear what you want. Commented Sep 16, 2015 at 3:22

3 Answers 3

1

Assuming what you really want is an object that maps from the "sub element" values onto the "id" values in the original, you can do that in no time flat I think:

var transformed = original.reduce(function(rv, entry) {
  entry.sub_elements.forEach(function(subelem) {
    if (rv[subelem])
      rv[subelem].push(entry.id);
    else
      rv[subelem] = [entry.id];
  });
  return rv;
}, {});

That'll give you an object where the property names are the collection of sub-element values, and the value of each property is the array of ids (possibly with duplicates) where the sub-element values appeared in the original.

The code works by iterating through the original array with .reduce(), starting with a fresh empty object. For each original array entry, the "sub element" list is then iterated within to build up the list of "id" values in the overall result under the key of each sub-element entry.

edit — If it's really important that the result should be a real Array instance and not an object (which doesn't seem like it'd make a difference, because it's not clear why you'd care about what .length would tell you in that case), you can do that with the same code as above except:

var transformed = original.reduce(function(rv, entry) {
  // same same
}, []); // <--- [] instead of {} to initialize the process
Sign up to request clarification or add additional context in comments.

4 Comments

The idea is that the final array is keyed with the sub-elements and referencing the array by key returns a list of the ids. So when it's finally used I can do something like: var ids = result['7890123'] would return: ['4704','4706']
@Vinnie yes, and that's exactly what the code above does.
@Vinnie Do you want the final result to specifically be an Array instance?
Sorry for not getting back to you earlier. Answer is perfect thanks a million.
1

By default, node js run only in one thread, so all queue of request (pool request) should be async BUT your code to parse the array (more than 50K) is sync code and will block all incoming request (or the requests in the pool) and you will lose a lot of performance. To parse a big array like this you should use node js ticket function, something like this:

var result = {};
var original = [{'id': 1, subElements: [1,2,3]}, {...}, {...}, ...];

var processBigArray = function() {
    // 
    if(original.length !== 0) {
        process.nextTick(function processItemFromArray() {
           var item = original.shift();
           // perform the operations
           // .... doing operations ....

           // if there are more elements, process on next tick
           if(original.length !== 0) {
               process.nextTick();
           }
        });
    }

};

processBigArray();

Using process.nextTick if there are some incoming request in node (or other events to do in the pool) then node will process the next incoming event and then the next item of your array, in this way, incoming events will never be blocked.

1 Comment

Welcome @Vinnie :)) ... btw, i wrote this code a looong time ago, so, nextTick is not really good, for this example please, use setInmediate (because nextTick will put your function processBigArray to be processed inmediately, blocking other tasks, and setInmediate will perform some waiting tasks and then will execute the given function)
0

update : don't use for ... in on real arrays. The values come in arbitrary order

for ... in

6 Comments

It's true that one shouldn't use for ... in on real arrays, but the reasons have nothing at all to do with performance. The iteration will be essentially exactly as fast one way or the other. What's making the OP's code slow the nested .indexOf() search.
Yes .indexOf() have height cost, but don't underestimate the impact of for ... in on large array !
There is almost zero impact of for ... in. Try it in jsperf if you don't believe me; it won't make any significant difference, especially compared to the gross algorithmic inefficiency in the OP code.
The only performance issue with for..in comes from iterating over enumerable inherited properties, so a hasOwnProperty check should be used. If it's known that there are no such properties, the check can be omitted and, as Pointy says, there should be no performance issue. There is a semantic issue though, as properties may not be returned in any particular order and for arrays, only numeric properties are usually sought and they are usually visited in numeric order.
Yes definitely, don't use for ... in on real arrays (unless you really know what you're doing and why), but it's not a performance thing.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.