objective-c – 如何在iOS上转换为“组合变音符号”

在我的应用程序中,我的字符后跟着他们的“修饰符变音标记”(例如“o”,其中“”是unicode 0x02c6),我想将其转换为完全预先组合的字符(例如“ô” – unicode 0x00f4) .我尝试使用NSString方法precomposedStringWithCanonicalMapping,但经过几个小时的撞击墙试图弄清楚它为什么不起作用,我发现它只将“组合变音符号”(
http://www.unicode.org/charts/PDF/U0300.pdf)转换为预组合字符.好吧,所以我需要做的就是将我的所有“修饰符变音符号”转换为“组合变音符号”,然后对结果字符串执行precomposedStringWithCanonicalMapping,我就完成了.这确实有效,但我想知道是否有一个不那么乏味/错误的方法来做到这一点?这是我的NSString类别方法,似乎可以修复大多数字符 –

- (instancetype)combineDiacritics
{
    static NSDictionary<NSNumber *, NSNumber *> *sDiacriticalSubstDict; //unichar of diacritic -> unichar of combining diacritic
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        //http://www.unicode.org/charts/PDF/U0300.pdf
        sDiacriticalSubstDict = @{ @(0x02cb) : @(0x0300), @(0x00b4) : @(0x0301), @(0x02c6) : @(0x0302), @(0x02dc) : @(0x0303), @(0x02c9) : @(0x0304),   //Grave, Acute, Circumflex, Tilde, Macron
                                   @(0x00af) : @(0x0305), @(0x02d8) : @(0x0306), @(0x02d9) : @(0x0307), @(0x00a8) : @(0x0308), @(0x02c0) : @(0x0309),   //Overline, Breve, Dot above, Diaeresis
                                   @(0x00b0) : @(0x030a), @(0x02da) : @(0x030b), @(0x02c7) : @(0x030c), @(0x02c8) : @(0x030d), @(0x02bb) : @(0x0312),   //Ring above, Double Acute, Caron, Vertical line above, Cedilla above
                                   @(0x02bc) : @(0x0313), @(0x02bd) : @(0x0314), @(0x02b2) : @(0x0321), @(0x02d4) : @(0x0323), @(0x02b1) : @(0x0324),   //Comma above, Reversed comma above, Palatalized hook below, Dot below, Diaeresis below
                                   @(0x00b8) : @(0x0327), @(0x02db) : @(0x0328), @(0x02cc) : @(0x0329), @(0x02b7) : @(0x032b), @(0x02cd) : @(0x0331),   //Cedilla, Ogonek, Vert line below, Inverted double arch below, Macron below
                                   };
    });
    NSMutableString* __block buffer = [NSMutableString stringWithCapacity:self.length];
    [self enumerateSubstringsInRange:NSMakeRange(0, self.length) options:NSStringEnumerationByComposedCharacterSequences usingBlock: ^(NSString* substring, NSRange substringRange, NSRange enclosingRange, BOOL* stop) {
                          NSString *newString = nil;
                          if (substring.length == 1)    //The diacriticals are all Unicode BMP.
                          {
                              unichar uniChar = [substring characterAtIndex:0];
                              unichar newUniChar = [sDiacriticalSubstDict[@(uniChar)] integerValue];
                              if (newUniChar != 0)
                              {
                                  NSLog(@"Unichar %04x => %04x", uniChar, newUniChar);
                                  newString = [NSString stringWithCharacters:&newUniChar length:1];
                              }
                          }
                          if (newString)
                              [buffer appendString:newString];
                          else
                              [buffer appendString:substring];
                      }];

    NSString *precomposedStr = [buffer precomposedStringWithCanonicalMapping];
    return precomposedStr;
}

有没有人知道更多内置的方式来进行这种转换?

最佳答案 没有内置方法来执行此转换,因为间距修改器字母块(U 02B0..U 02FF)中的字符不打算用作变音符号.从Unicode标准的7.8节开始:

They are not formally combining marks (gc=Mn or gc=Mc) and do not graphically combine with the base letter that they modify. They are base characters in their own right.

Spacing Clones of Diacritics. Some corporate standards explicitly specify spacing and nonspacing forms of combining diacritical marks, and the Unicode Standard provides matching codes for these interpretations when practical.

如果要将它们转换为组合表单,则需要从Spacing Modifier Letters code chart中的交叉引用构建一个表(就像您已经在做的那样).

点赞