ElasticSearch自带的分词类型

2020-12-11 leiting (5964阅读)

Analyzersedit

Elasticsearch ships with a wide range of built-in analyzers, which can be used in any index without further configuration:

Standard Analyzer
The standard analyzer divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm. It removes most punctuation, lowercases terms, and supports removing stop words.
Simple Analyzer
The simple analyzer divides text into terms whenever it encounters a character which is not a letter. It lowercases all terms.
Whitespace Analyzer
The whitespace analyzer divides text into terms whenever it encounters any whitespace character. It does not lowercase terms.
Stop Analyzer
The stop analyzer is like the simple analyzer, but also supports removal of stop words.
Keyword Analyzer
The keyword analyzer is a “noop” analyzer that accepts whatever text it is given and outputs the exact same text as a single term.
Pattern Analyzer
The pattern analyzer uses a regular expression to split the text into terms. It supports lower-casing and stop words.
Language Analyzers
Elasticsearch provides many language-specific analyzers like english or french.
Fingerprint Analyzer
The fingerprint analyzer is a specialist analyzer which creates a fingerprint which can be used for duplicate detection.

If you do not find an analyzer suitable for your needs, you can create a custom analyzer which combines the appropriate character filters, tokenizer, and token filters.

IT PHP 编程语言开发编程 Linux 科技 Elasticsearch 数据库面试 HTML/CSS/XML 网络 JAVA NoSQL 操作系统 C/C++ Golang Git 算法正则表达式 Redis 互联网 MySql 软件运维 JavaScript 国际架构设计商业 Mac OS TCP/IP Excel Windows Oracle Socket VR Vim MongoDB 运营 Python MemCache 硬件电子娱乐设计摄影 nginx 游戏 WordPress HTTP 团建数码电器 Docker 大模型