Character properties this module is based on are defined by Unicode Standards version 6.1.0.
Character(s) assigned to CB are not resolved.
Characters assigned to CJ are always resolved to NS. More flexible tailoring mechanism is provided.
When word segmentation for South East Asian writing systems is not supported, characters assigned to SA are resolved to AL, except that characters that have Grapheme_Cluster_Break property value Extend or SpacingMark be resolved to CM.
Characters assigned to SG or XX are resolved to AL.
Code points of following UCS ranges are given fixed property values even if they have not been assigned any characers.
Ranges |
UAX #14 |
UAX #11 |
Description |
---|---|---|---|
U+3400..U+4DBF |
ID |
W |
CJK ideographs |
U+4E00..U+9FFF |
ID |
W |
CJK ideographs |
U+D800..U+DFFF |
AL (SG) |
N |
Surrogates |
U+E000..U+F8FF |
AL (XX) |
F or N (A) |
Private use |
U+F900..U+FAFF |
ID |
W |
CJK ideographs |
U+20000..U+2FFFD |
ID |
W |
CJK ideographs |
U+30000..U+3FFFD |
ID |
W |
Old hanzi |
U+F0000..U+FFFFD |
AL (XX) |
F or N (A) |
Private use |
U+100000..U+10FFFD |
AL (XX) |
F or N (A) |
Private use |
Other unassigned |
AL (XX) |
N |
Unassigned, reserved or noncharacters |
Characters belonging to General Category Mn, Me, Cc, Cf, Zl or Zp have the property value Z (nonspacing) defined by this module, regardless of East_Asian_Width property values assigned by [UAX11].