Implementation Notes

Character properties this module is based on are defined by Unicode Standards version 6.1.0.

UAX #14 and UAX #11

  • Character(s) assigned to CB are not resolved.

  • Characters assigned to CJ are always resolved to NS. More flexible tailoring mechanism is provided.

  • When word segmentation for South East Asian writing systems is not supported, characters assigned to SA are resolved to AL, except that characters that have Grapheme_Cluster_Break property value Extend or SpacingMark be resolved to CM.

  • Characters assigned to SG or XX are resolved to AL.

  • Code points of following UCS ranges are given fixed property values even if they have not been assigned any characers.

    Ranges

    UAX #14

    UAX #11

    Description

    U+3400..U+4DBF

    ID

    W

    CJK ideographs

    U+4E00..U+9FFF

    ID

    W

    CJK ideographs

    U+D800..U+DFFF

    AL (SG)

    N

    Surrogates

    U+E000..U+F8FF

    AL (XX)

    F or N (A)

    Private use

    U+F900..U+FAFF

    ID

    W

    CJK ideographs

    U+20000..U+2FFFD

    ID

    W

    CJK ideographs

    U+30000..U+3FFFD

    ID

    W

    Old hanzi

    U+F0000..U+FFFFD

    AL (XX)

    F or N (A)

    Private use

    U+100000..U+10FFFD

    AL (XX)

    F or N (A)

    Private use

    Other unassigned

    AL (XX)

    N

    Unassigned, reserved or noncharacters

  • Characters belonging to General Category Mn, Me, Cc, Cf, Zl or Zp have the property value Z (nonspacing) defined by this module, regardless of East_Asian_Width property values assigned by [UAX11].

UAX #29

  • This module implements default algorithm for determining grapheme cluster boundaries. Tailoring mechanism has not been supported yet.

Table Of Contents

Previous topic

Customization

Next topic

References

This Page