9

As everybody knows, there isn't any built-in function to remove the duplicates from an array in JavaScript. I've noticed this is also lacking in jQuery (which has a unique function for DOM selections only), and the most common snippet I found checks the entire array and a subset of it for each element (not very efficient I think), like:

for (var i = 0; i < arr.length; i++)
    for (var j = i + 1; j < arr.length; j++)
        if (arr[i] === arr[j])
            // Whatever

so I made my own:

function unique (arr) {
    var hash = {}, result = [];
    for (var i = 0; i < arr.length; i++)
        if (!(arr[i] in hash)) { // It works with objects! In Firefox, at least
            hash[arr[i]] = true;
            result.push(arr[i]);
        }
    return result;
}

I wonder if there's any other algorithm accepted as the best for this case (or if you see any obvious flaw that could be fixed), or, what do you do when you need this in JavaScript (I'm aware that jQuery is not the only framework and some others may have this already covered).

2
  • 1
    Do these array contain only scalar values, or is there a chance that it will contain objects and arrays? Commented Dec 11, 2009 at 19:23
  • And is there the assumption of sorted or not? Commented Dec 11, 2009 at 19:26

5 Answers 5

33

Using the object literal is exactly what I would do. A lot of people miss this technique a lot of the time, opting instead for typical array walks as the original code that you showed. The only optimization would be to avoid the arr.length lookup each time. Other than that, O(n) is about as good as you get for uniqueness and is much better than the original O(n^2) example.

function unique(arr) {
    var hash = {}, result = [];
    for ( var i = 0, l = arr.length; i < l; ++i ) {
        if ( !hash.hasOwnProperty(arr[i]) ) { //it works with objects! in FF, at least
            hash[ arr[i] ] = true;
            result.push(arr[i]);
        }
    }
    return result;
}

// * Edited to use hasOwnProperty per comments

Time complexities to summarize

  f()    | unsorted | sorted | objects | scalar | library
____________________________________________________________
unique   |   O(n)   |  O(n)  |   no    |  yes   |    n/a
original |  O(n^2)  | O(n^2) |   yes   |  yes   |    n/a
uniq     |  O(n^2)  |  O(n)  |   yes   |  yes   | Prototype
_.uniq   |  O(n^2)  |  O(n)  |   yes   |  yes   | Underscore

As with most algorithms, there are trade offs. If you are only sorting scalar values, you're modifications to the original algorithm give the most optimal solution. However, if you need to sort non-scalar values, then using or mimicking the uniq method of either of the libraries discussed would be your best choice.

Sign up to request clarification or add additional context in comments.

5 Comments

It is better to use hash.hasOwnProperty(arr[i]). The in operator returns true for inherited properties like toString. ("toString" in {}) => true
Wouldn't the unique function have O(n) complexity for sorted lists as well?
Sorry, yes. Copy+paste got the best of me.
According the this answer to another question using result[result.length] = arr[i]; might be better than push()
Thats perfectly valid, but it is a micro optimization. The optimizations that we are looking at here are in terms of orders of complexity.
5

I think your version won't work when you'll have objects or function in the array that give string representation like [Object object]. Because you can only have strings as keys in objects (in the "hash" object here). You'll need to loop into the result array to find if the new entry already exists. It will still be faster than the first method.

Prototype JS has a "uniq" method, you may get inspiration from it.

6 Comments

Good point, I didn't consider the toString issue.
The first method doesn't work with Objects either though, if I understand you correctly. IOW === doesn't work on objects. So presuming the array will only contain "scalars" that can be compared directly with == or === (eg ints, floats, bools, strings) do you still think the second one won't work?
er, wait. I guess == works fine on object references. nm then!
All the objects that are compared will be considered equal, which is false with the first method. But the second method will work and will be really faster if the array contains only scalar values.
fabien: IOW the first one is better in the general case, while the second one is better in the "scalar" case.
|
5

fun with fun (ctional)

function uniqueNum(arr) {
    return Object.keys(arr.reduce(
        function(o, x) {o[x]=1; return o;}, {})).map(Number);
}  

Comments

2

I'm not an algorithm expert by any means, but I've been keeping an eye on underscore.js. They have this as a function called uniq:

http://documentcloud.github.com/underscore/#uniq

I looked at the code in their library, and copied it here for reference (not my code, this code belongs to underscore.js):

// Produce a duplicate-free version of the array. If the array has already
// been sorted, you have the option of using a faster algorithm.
_.uniq = function(array, isSorted) {
    return _.reduce(array, [], function(memo, el, i) {
        if (0 == i || (isSorted === true ? _.last(memo) != el : !_.include(memo, el))) memo.push(el);
        return memo;
    });
};

EDIT: You need to walk through the rest of the underscore.js code, and I almost took this code out because of it. I left the code snippet in just in case this was still useful.

4 Comments

I'm sure !_.include iterates the array from scratch too.
I hadn't heard of this library before, so I took a walk through the code looking specifically at _.include and _.last. It looks like sorted arrays will take O(n) and unsorted will be O(n^2), so it's not a constant improvement.
Justin: good sleuthing. The OPs code sample (first one) looks to be assuming the array is sorted. It starts the inner loop from the current index + 1.
Turns out, this is implemented the same way Prototype implements uniq
1

Unfortunately JS objects have no identity accessible from the language - as other posters have mentioned, using objects as keys in a dictionary will fail when different objects have equal string representations and there is no id() function in the language.

There is a way to avoid the O(n^2) all-pairs check for === identity if you can modify the objects. Pick a random string, walk the array once to check that no object has a property by that name, then just do arr[i][randomPropertyName]=1 for each i. If the next object in the array already has that property, then it is a duplicate.

Unfortunately, the above will only work for modifiable objects. It fails for array values that don't allow property setting (e.g. integers, 42['random']=1 just doesn't work :( )

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.