0

I looping through a large dataset (contained in a multidimensional associative array $values in this example) with many duplicate index values with the goal of producing an array containing only the unique values from a given index 'data'.

Currently I am doing this like:

foreach ($values as $value) {
   $unique[$value['data']] = true;
}

Which accomplishes the objective because duplicate array keys simply get replaced. But this feels a bit odd since the indexes themselves don't actually contain any data.

It was suggested that I build the array first and then use array_unique() to removes duplicates. I'm inclined to stick with the former method but am wondering are there pitfalls or problems I should be aware of with this approach? Or any benefits to using array_unique() instead?

4
  • 1
    have you tried doing a unique selection instead? Commented Oct 24, 2014 at 0:44
  • 3
    As you already have an array, $unique = array_unique($values) is enough.If you need to reindex them too you can do $unique = array_values(array_unique($values)) Commented Oct 24, 2014 at 0:53
  • @rjdown, the original array $values is just an example. In reality it is a huge multidimensional associative array which needs to remain untouched so can't be operated upon like that. Sorry I didn't make that clear in my question - I've updated the question for clarification. Commented Oct 24, 2014 at 2:32
  • 1
    What you are doing is the fastest way for large arrays as it uses a lookup by a 'hash' key for each entry. Using 'in_array' would be slow as it has to scan the the array sequentially and will therefore tend to search half the array on average assuming random distribution of values. Commented Oct 24, 2014 at 15:31

1 Answer 1

1

I would do it like this.

$unique = array();
foreach($values as $value) {
if(!in_array($value, $unique) {
         $unique[] = value;
    }
}
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks, I had considered this as well.. however this really doesn't answer my question. That being said, can you offer any reason why this is superior to the current method outlined above? The question is not "How do I do this", but rather, "What are the benefits or problems that could arise from the method as outlined".
This will use less memory per instance, instead of loading it all and then removing the duplicates.
Can you explain how your suggestion is an improvement over what I'm currently doing, which does not involve array_unique()?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.