180

How to remove duplicates from an Array<String?> in Kotlin?

1
  • If someone is looking for consecutive characters to remove then visit handyopinion.com/… Commented Apr 20, 2020 at 22:08

2 Answers 2

350

Use the distinct extension function:

val a = arrayOf("a", "a", "b", "c", "c")
val b = a.distinct() // ["a", "b", "c"]

There's also distinctBy function that allows one to specify how to distinguish the items:

val a = listOf("a", "b", "ab", "ba", "abc")
val b = a.distinctBy { it.length } // ["a", "ab", "abc"]

As @mfulton26 suggested, you can also use toSet, toMutableSet and, if you don't need the original ordering to be preserved, toHashSet. These functions produce a Set instead of a List and should be a little bit more efficient than distinct.


You may find useful:

Sign up to request clarification or add additional context in comments.

10 Comments

You can also use toSet or toMutableSet which have less overhead than distinct and if ordering does not matter you can use toHashSet.
@Buckstabue if you only need a Collection back (and it doesn't matter if it is a List or a Set) then using a Collection optimized for unique elements will be more efficient. The current implementation of distinct uses toMutableSet() in its implementation and then converts it to a List so by using toSet et. al. directly you avoid the extra intermediary Collection instance (kotlin/_Arrays.kt:9145-9155 at master · JetBrains/kotlin).
@Buckstabue I see, I believe we're talking about two different issues: 1) to*Set is more efficient (space & time) than distinct[By] because it returns the Set directly instead of using a Set internally and converting it to a List as its return value and 2) distinctBy is can be more efficient than distinct simply because you can avoid full object equality comparison. Both are valid points. I ran with your statement that "certainly it doesn't always have overhead" and I was replying to that and overlooked that you were comparing distinct with distinctBy (and not with to*Set).
@mfulton26, you are correct. I mostly meant that sometimes it's better to use List + distinctBy than Set, because Set intensively use equals/hashCode which potentially might be expensive to call
At time of writing, Iterable.distinct actually does toMutableSet().toList() internally. So don't worry about performance :-)
|
1

Algorithm

If you need to remove duplicates in-place use the following extension function:

fun <T : Comparable<T>> Array<T?>.distinctInPlace(): Int {
    this.sortBy { it }
    var placed = 1
    var removed = 0
    var i = 1
    while (i < size) {
        if (this[i] == this[i - 1])
            removed++
        else {
            this[placed] = this[i]
            placed++
        }
        i++
    }
    for (iter in size - removed..lastIndex)
        this[iter] = null
    return size - removed
}

This method will return the amount of unique elements in O(n log(n)) time. All of them will be sorted. Last for loop is used to set all other elements to null.

Note: if you had a null element in the array, it will be placed at the 0 index - so you can distinguish whether you had any nulls or they were added after.

Examples

fun main() {
    val arr = arrayOf("a", null, "b", null, "c", "ab", "ab")
    arr.distinctInPlace() // returns 5, arr is now [null, "a", "ab", "b", "c", null, null]

    val withoutNulls = arrayOf("a", "a", "aa", "aaa", "aa")
    withoutNulls.distinctInPlace() // returns 3, arr is now ["a", "aa", "aaa"]
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.