2

As a generic solution, how can we get the unicode code point/s for a character or a string in Swift?

Consider the following:

let A: Character = "A"     // "\u{0041}"
let Á: Character = "Á"     // "\u{0041}\u{0301}"

let sparklingHeart = "💖"  // "\u{1F496}"
let SWIFT = "SWIFT"        // "\u{0053}\u{0057}\u{0049}\u{0046}\u{0054}"

If I am not mistaking, the desired function might return an array of strings, for instance:

extension Character {
    func getUnicodeCodePoints() -> [String] {
        //...
    }
}

A.getUnicodeCodePoints()
// the output should be: ["\u{0041}"]

Á.getUnicodeCodePoints()
// the output should be: ["\u{0041}", "\u{0301}"]

sparklingHeart.getUnicodeCodePoints()
// the output should be: ["\u{1F496}"]

SWIFT.getUnicodeCodePoints()
// the output should be: ["\u{0053}", "\u{0057}", "\u{0049}", "\u{0046}", "\u{0054}"]

Any more suggested elegant approach would be appreciated.

1 Answer 1

6

Generally, the unicodeScalars property of a String returns a collection of its unicode scalar values. (A Unicode scalar value is any Unicode code point except high-surrogate and low-surrogate code points.)

Example:

print(Array("Á".unicodeScalars))  // ["A", "\u{0301}"]
print(Array("💖".unicodeScalars)) // ["\u{0001F496}"]

Up to Swift 3 there is no way to access the unicode scalar values of a Character directly, it has to be converted to a String first (for the Swift 4 status, see below).

If you want to see all Unicode scalar values as hexadecimal numbers then you can access the value property (which is a UInt32 number) and format it according to your needs.

Example (using the U+NNNN notation for Unicode values):

extension String {
    func getUnicodeCodePoints() -> [String] {
        return unicodeScalars.map { "U+" + String($0.value, radix: 16, uppercase: true) }
    }
}

extension Character {
    func getUnicodeCodePoints() -> [String] {
        return String(self).getUnicodeCodePoints()
    }
}


print("A".getUnicodeCodePoints())     // ["U+41"]
print("Á".getUnicodeCodePoints())     // ["U+41", "U+301"]
print("💖".getUnicodeCodePoints())    // ["U+1F496"]
print("SWIFT".getUnicodeCodePoints()) // ["U+53", "U+57", "U+49", "U+46", "U+54"]
print("🇯🇴".getUnicodeCodePoints())    // ["U+1F1EF", "U+1F1F4"]

Update for Swift 4:

As of Swift 4, the unicodeScalars of a Character can be accessed directly, see SE-0178 Add unicodeScalars property to Character. This makes the conversion to a String obsolete:

let c: Character = "🇯🇴"
print(Array(c.unicodeScalars)) // ["\u{0001F1EF}", "\u{0001F1F4}"]
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you for your answer. Please note the output that I got of print(Array("Á".unicodeScalars)) was ["\u{00C1}"] but not ["A", "\u{0301}"], similar to this case, I tried: print(Array("é".unicodeScalars)) and the output was ["\u{00E9}"] but not ["u", "e\u{0301}"]; I know it should be ok when comparing, they should be equals, but I wonder what's the reason of this...
@AhmadF: That's because there is a "precomposed" and a "decomposed" representation of characters with combining diacritical marks. Try "Á".precomposedStringWithCanonicalMapping.getUnicodeCodePoints() and "Á".decomposedStringWithCanonicalMapping.getUnicodeCodePoints()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.