.. pytextseg documentation glossary .. _glossary: Glossary ======== .. glossary:: :sorted: mandatory break Obligatory line breaking behavior defined by core rules and performed regardless of surrounding characters. See also :term:`direct break`, :term:`indirect break`. direct break A line break opportunity exists between two adjacent characters. See also :term:`indirect break`, :term:`mandatory break`. indirect break A line break opportunity exists between two characters only if they are separated by one or more spaces. See also :term:`direct break`, :term:`mandatory break`. alphabetic character Characters that usually no line breaks are allowed between pairs of them, except that other characters provide break oppotunities (this term is inaccurate from the point of view by grammatology). [UAX14]_ classifies most of alphabetic characters to :term:`line breaking class` AL. See also :term:`ideographic character`. ideographic character Characters that usually allow line breaks both before and after themselves (this term is inaccurate from the point of view by grammatology). [UAX14]_ classifies most of ideographic characters to :term:`line breaking class` ID. See also :term:`alphabetic character`. complex breaking Heuristic line breaking based on dictionary for several scripts on which breaking positions are not obvious by each characters. [UAX14]_ classifys characters of several South East Asian scripts which need complex breaking to :term:`line breaking class` SA. number of columns Number of columns of a string is not always equal to the number of characters it contains: Each of characters is either *wide*, *narrow* or nonspacing; they occupy 2, 1 or 0 columns, respectively. Several characters may be both wide and narrow by the contexts they are used. Characters may have more various widths by customization. grapheme cluster A concept defined by Unicode Standard Annex #29 ([UAX29]_). Grapheme cluster is a sequence of Unicode character(s) that consists of one *grapheme base* and optional *grapheme extender* and/or *"prepend" character*. It is close in that people consider as "character". line breaking class Classification of Unicode characters defined by Unicode Standard Annex #14 ([UAX14]_). East_Asian_Width Informative property of Unicode characters defined by Unicode Standard Annex #11 ([UAX11]_). It corresponds to the "width" (glyph spacing) of each characters on implenentations for East Asian encodings. See also :term:`number of columns`. non-starter The character that cannot be placed at beginning of lines. [UAX14]_ classifies non-starters to :term:`line breaking class` NS or CJ. It includes small hiragana, small katakana and some punctuations. ambiguous quotation mark *To be written* virama sign The sign that many Brahmi-derived *abugida*\ s in South Asia and South East Asia are endowed with. Its primary use is to cancel inherent vowel of consonants. By several writing systems, they are used to form consonantal clusters. By Unicode Standard, some characters of virama signs are also used to represent transformation of ligated character sequences. hangul A syllabary used for Korean language. In traditional sense, hangul characters behave as :term:`ideographic character`\ s, while each character consists of a few *jamo* which represent features of pronounciation.