How to get unicode code point(s) representation of character/string in Swift?

Question

As a generic solution, how can we get the unicode code point/s for a character or a string in Swift?

Consider the following:

let A: Character = "A"     // "\u{0041}"
let Á: Character = "Á"     // "\u{0041}\u{0301}"

let sparklingHeart = "💖"  // "\u{1F496}"
let SWIFT = "SWIFT"        // "\u{0053}\u{0057}\u{0049}\u{0046}\u{0054}"

If I am not mistaking, the desired function might return an array of strings, for instance:

extension Character {
    func getUnicodeCodePoints() -> [String] {
        //...
    }
}

A.getUnicodeCodePoints()
// the output should be: ["\u{0041}"]

Á.getUnicodeCodePoints()
// the output should be: ["\u{0041}", "\u{0301}"]

sparklingHeart.getUnicodeCodePoints()
// the output should be: ["\u{1F496}"]

SWIFT.getUnicodeCodePoints()
// the output should be: ["\u{0053}", "\u{0057}", "\u{0049}", "\u{0046}", "\u{0054}"]

Any more suggested elegant approach would be appreciated.

Martin R · Accepted Answer · 2018-08-16 20:20:12Z

6

Generally, the unicodeScalars property of a String returns a collection of its unicode scalar values. (A Unicode scalar value is any Unicode code point except high-surrogate and low-surrogate code points.)

Example:

print(Array("Á".unicodeScalars))  // ["A", "\u{0301}"]
print(Array("💖".unicodeScalars)) // ["\u{0001F496}"]

Up to Swift 3 there is no way to access the unicode scalar values of a Character directly, it has to be converted to a String first (for the Swift 4 status, see below).

If you want to see all Unicode scalar values as hexadecimal numbers then you can access the value property (which is a UInt32 number) and format it according to your needs.

Example (using the U+NNNN notation for Unicode values):

extension String {
    func getUnicodeCodePoints() -> [String] {
        return unicodeScalars.map { "U+" + String($0.value, radix: 16, uppercase: true) }
    }
}

extension Character {
    func getUnicodeCodePoints() -> [String] {
        return String(self).getUnicodeCodePoints()
    }
}


print("A".getUnicodeCodePoints())     // ["U+41"]
print("Á".getUnicodeCodePoints())     // ["U+41", "U+301"]
print("💖".getUnicodeCodePoints())    // ["U+1F496"]
print("SWIFT".getUnicodeCodePoints()) // ["U+53", "U+57", "U+49", "U+46", "U+54"]
print("🇯🇴".getUnicodeCodePoints())    // ["U+1F1EF", "U+1F1F4"]

Update for Swift 4:

As of Swift 4, the unicodeScalars of a Character can be accessed directly, see SE-0178 Add unicodeScalars property to Character. This makes the conversion to a String obsolete:

let c: Character = "🇯🇴"
print(Array(c.unicodeScalars)) // ["\u{0001F1EF}", "\u{0001F1F4}"]

edited Aug 16, 2018 at 20:20

answered Jul 9, 2017 at 9:59

Martin R

541k98 gold badges1.3k silver badges1.4k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Ahmad F Over a year ago

Thank you for your answer. Please note the output that I got of print(Array("Á".unicodeScalars)) was ["\u{00C1}"] but not ["A", "\u{0301}"], similar to this case, I tried: print(Array("é".unicodeScalars)) and the output was ["\u{00E9}"] but not ["u", "e\u{0301}"]; I know it should be ok when comparing, they should be equals, but I wonder what's the reason of this...

Martin R Over a year ago

@AhmadF: That's because there is a "precomposed" and a "decomposed" representation of characters with combining diacritical marks. Try "Á".precomposedStringWithCanonicalMapping.getUnicodeCodePoints() and "Á".decomposedStringWithCanonicalMapping.getUnicodeCodePoints()

Collectives™ on Stack Overflow

How to get unicode code point(s) representation of character/string in Swift?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related