Check range of Unicode Value of a character
In Objective-c...
If I have a character like "∆" how can I get the unicode value and then determine if it is in a certain range of values.
For example if I want to know if a certain character is in the unicode range of
U+1F300
to U+1F6FF
Answer:
NSString
uses UTF-16 to store codepoints internally, so those in the range you're looking for (U+1F300
to U+1F6FF
) will be stored as a surrogate pair (four bytes). Despite its name, characterAtIndex:
(and unichar
) doesn't know about codepoints and will give you the two bytes that it sees at the index you give it (the 55357
you're seeing is the lead surrogate of the codepoint in UTF-16).
To examine the raw codepoints, you'll want to convert the string/characters into UTF-32 (which encodes them directly). To do this, you have a few options:
- Get all UTF-16 bytes that make up the codepoint, and use either this algorithm or
CFStringGetLongCharacterForSurrogatePair
to convert the surrogate pairs to UTF-32. - Use either
dataUsingEncoding:
orgetBytes:maxLength:usedLength:encoding:options:range:remainingRange:
to convert theNSString
to UTF-32, and interpret the raw bytes as auint32_t
. - Use a library like ICU.
http://stackoverflow.com/questions/14822793/check-range-of-unicode-value-of-a-character?rq=1