GlossaryΒΆ

alphabetic character
Characters that usually no line breaks are allowed between pairs of them, except that other characters provide break oppotunities (this term is inaccurate from the point of view by grammatology). [UAX14] classifies most of alphabetic characters to line breaking class AL. See also ideographic character.
ambiguous quotation mark
To be written
complex breaking
Heuristic line breaking based on dictionary for several scripts on which breaking positions are not obvious by each characters. [UAX14] classifys characters of several South East Asian scripts which need complex breaking to line breaking class SA.
direct break
A line break opportunity exists between two adjacent characters. See also indirect break, mandatory break.
East_Asian_Width
Informative property of Unicode characters defined by Unicode Standard Annex #11 ([UAX11]). It corresponds to the “width” (glyph spacing) of each characters on implenentations for East Asian encodings. See also number of columns.
grapheme cluster
A concept defined by Unicode Standard Annex #29 ([UAX29]). Grapheme cluster is a sequence of Unicode character(s) that consists of one grapheme base and optional grapheme extender and/or “prepend” character. It is close in that people consider as “character”.
hangul
A syllabary used for Korean language. In traditional sense, hangul characters behave as ideographic characters, while each character consists of a few jamo which represent features of pronounciation.
ideographic character
Characters that usually allow line breaks both before and after themselves (this term is inaccurate from the point of view by grammatology). [UAX14] classifies most of ideographic characters to line breaking class ID. See also alphabetic character.
indirect break
A line break opportunity exists between two characters only if they are separated by one or more spaces. See also direct break, mandatory break.
line breaking class
Classification of Unicode characters defined by Unicode Standard Annex #14 ([UAX14]).
mandatory break
Obligatory line breaking behavior defined by core rules and performed regardless of surrounding characters. See also direct break, indirect break.
non-starter
The character that cannot be placed at beginning of lines. [UAX14] classifies non-starters to line breaking class NS or CJ. It includes small hiragana, small katakana and some punctuations.
number of columns
Number of columns of a string is not always equal to the number of characters it contains: Each of characters is either wide, narrow or nonspacing; they occupy 2, 1 or 0 columns, respectively. Several characters may be both wide and narrow by the contexts they are used. Characters may have more various widths by customization.
virama sign
The sign that many Brahmi-derived abugidas in South Asia and South East Asia are endowed with. Its primary use is to cancel inherent vowel of consonants. By several writing systems, they are used to form consonantal clusters. By Unicode Standard, some characters of virama signs are also used to represent transformation of ligated character sequences.

Previous topic

References

Next topic

pytextseg Changelog

This Page