Pattern TokenizerΒΆ

A tokenizer of type pattern that can flexibly separate text into terms via a regular expression. Accepts the following settings:

Setting Description
pattern The regular expression pattern, defaults to W+.
flags The regular expression flags.
group Which group to extract into tokens. Defaults to -1 (split).

IMPORTANT: The regular expression should match the token separators, not the tokens themselves.

Previous topic

Pathhierarchy Tokenizer

Next topic

Uaxurlemail Tokenizer

This Page