unique() for arrays in JavaScript [duplicate]

Question

As everybody knows, there isn't any built-in function to remove the duplicates from an array in JavaScript. I've noticed this is also lacking in jQuery (which has a unique function for DOM selections only), and the most common snippet I found checks the entire array and a subset of it for each element (not very efficient I think), like:

for (var i = 0; i < arr.length; i++)
    for (var j = i + 1; j < arr.length; j++)
        if (arr[i] === arr[j])
            // Whatever

so I made my own:

function unique (arr) {
    var hash = {}, result = [];
    for (var i = 0; i < arr.length; i++)
        if (!(arr[i] in hash)) { // It works with objects! In Firefox, at least
            hash[arr[i]] = true;
            result.push(arr[i]);
        }
    return result;
}

I wonder if there's any other algorithm accepted as the best for this case (or if you see any obvious flaw that could be fixed), or, what do you do when you need this in JavaScript (I'm aware that jQuery is not the only framework and some others may have this already covered).

Do these array contain only scalar values, or is there a chance that it will contain objects and arrays? — Justin Johnson
– Justin Johnson, Commented Dec 11, 2009 at 19:23

Justin Johnson · Accepted Answer · 2009-12-11 22:23:19Z

33

Using the object literal is exactly what I would do. A lot of people miss this technique a lot of the time, opting instead for typical array walks as the original code that you showed. The only optimization would be to avoid the arr.length lookup each time. Other than that, O(n) is about as good as you get for uniqueness and is much better than the original O(n^2) example.

function unique(arr) {
    var hash = {}, result = [];
    for ( var i = 0, l = arr.length; i < l; ++i ) {
        if ( !hash.hasOwnProperty(arr[i]) ) { //it works with objects! in FF, at least
            hash[ arr[i] ] = true;
            result.push(arr[i]);
        }
    }
    return result;
}

// * Edited to use hasOwnProperty per comments

Time complexities to summarize

  f()    | unsorted | sorted | objects | scalar | library
____________________________________________________________
unique   |   O(n)   |  O(n)  |   no    |  yes   |    n/a
original |  O(n^2)  | O(n^2) |   yes   |  yes   |    n/a
uniq     |  O(n^2)  |  O(n)  |   yes   |  yes   | Prototype
_.uniq   |  O(n^2)  |  O(n)  |   yes   |  yes   | Underscore

As with most algorithms, there are trade offs. If you are only sorting scalar values, you're modifications to the original algorithm give the most optimal solution. However, if you need to sort non-scalar values, then using or mimicking the uniq method of either of the libraries discussed would be your best choice.

edited Dec 11, 2009 at 22:23

answered Dec 11, 2009 at 19:10

Justin Johnson

31.4k7 gold badges67 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Chetan S Over a year ago

It is better to use hash.hasOwnProperty(arr[i]). The in operator returns true for inherited properties like toString. ("toString" in {}) => true

Xavi Over a year ago

Wouldn't the unique function have O(n) complexity for sorted lists as well?

Justin Johnson Over a year ago

Sorry, yes. Copy+paste got the best of me.

Tobbe Over a year ago

According the this answer to another question using result[result.length] = arr[i]; might be better than push()

Justin Johnson Over a year ago

Thats perfectly valid, but it is a micro optimization. The optimizations that we are looking at here are in terms of orders of complexity.

Fabien Ménager · Accepted Answer · 2009-12-11 19:10:06Z

5

I think your version won't work when you'll have objects or function in the array that give string representation like [Object object]. Because you can only have strings as keys in objects (in the "hash" object here). You'll need to loop into the result array to find if the new entry already exists. It will still be faster than the first method.

Prototype JS has a "uniq" method, you may get inspiration from it.

answered Dec 11, 2009 at 19:10

Fabien Ménager

140k3 gold badges45 silver badges61 bronze badges

6 Comments

Justin Johnson Over a year ago

Good point, I didn't consider the toString issue.

Roatin Marth Over a year ago

The first method doesn't work with Objects either though, if I understand you correctly. IOW === doesn't work on objects. So presuming the array will only contain "scalars" that can be compared directly with == or === (eg ints, floats, bools, strings) do you still think the second one won't work?

Roatin Marth Over a year ago

er, wait. I guess == works fine on object references. nm then!

Fabien Ménager Over a year ago

All the objects that are compared will be considered equal, which is false with the first method. But the second method will work and will be really faster if the array contains only scalar values.

Roatin Marth Over a year ago

fabien: IOW the first one is better in the general case, while the second one is better in the "scalar" case.

|

Manav · Accepted Answer · 2012-08-09 16:42:24Z

5

fun with fun (ctional)

function uniqueNum(arr) {
    return Object.keys(arr.reduce(
        function(o, x) {o[x]=1; return o;}, {})).map(Number);
}

answered Aug 9, 2012 at 16:42

Manav

10.4k7 gold badges47 silver badges52 bronze badges

Comments

jeremyosborne · Accepted Answer · 2009-12-11 19:12:09Z

2

I'm not an algorithm expert by any means, but I've been keeping an eye on underscore.js. They have this as a function called uniq:

http://documentcloud.github.com/underscore/#uniq

I looked at the code in their library, and copied it here for reference (not my code, this code belongs to underscore.js):

// Produce a duplicate-free version of the array. If the array has already
// been sorted, you have the option of using a faster algorithm.
_.uniq = function(array, isSorted) {
    return _.reduce(array, [], function(memo, el, i) {
        if (0 == i || (isSorted === true ? _.last(memo) != el : !_.include(memo, el))) memo.push(el);
        return memo;
    });
};

EDIT: You need to walk through the rest of the underscore.js code, and I almost took this code out because of it. I left the code snippet in just in case this was still useful.

answered Dec 11, 2009 at 19:12

jeremyosborne

1,0921 gold badge10 silver badges17 bronze badges

4 Comments

Roatin Marth Over a year ago

I'm sure !_.include iterates the array from scratch too.

Justin Johnson Over a year ago

I hadn't heard of this library before, so I took a walk through the code looking specifically at _.include and _.last. It looks like sorted arrays will take O(n) and unsorted will be O(n^2), so it's not a constant improvement.

Roatin Marth Over a year ago

Justin: good sleuthing. The OPs code sample (first one) looks to be assuming the array is sorted. It starts the inner loop from the current index + 1.

Justin Johnson Over a year ago

Turns out, this is implemented the same way Prototype implements uniq

Rafał Dowgird · Accepted Answer · 2009-12-11 22:12:55Z

Unfortunately JS objects have no identity accessible from the language - as other posters have mentioned, using objects as keys in a dictionary will fail when different objects have equal string representations and there is no id() function in the language.

There is a way to avoid the O(n^2) all-pairs check for === identity if you can modify the objects. Pick a random string, walk the array once to check that no object has a property by that name, then just do arr[i][randomPropertyName]=1 for each i. If the next object in the array already has that property, then it is a duplicate.

Unfortunately, the above will only work for modifiable objects. It fails for array values that don't allow property setting (e.g. integers, 42['random']=1 just doesn't work :( )

Collectives™ on Stack Overflow

unique() for arrays in JavaScript [duplicate]

5 Answers 5

5 Comments

6 Comments

Comments

4 Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

5 Comments

6 Comments

Comments

4 Comments

Comments

Linked

Related