Question:
Installed elastic. Set up indexing. But in requests some strangeness leaves. I don't understand why the search doesn't work.
I made requests using lib in ruby, and blamed it, but when I sniffed the request, I realized that the problem was in elastic itself. Yes, this is not even a problem, rather a feature.
prod = Post.search "ра", fields: [:title] // На выходе n-элементов
prod = Post.search "разд", fields: [:title] // На выходе 0-элементов
prod = Post.search "раздел", fields: [:title] // На выходе n-элементов
Most likely, somewhere you need to find a setting that will improve the search.
Answer:
Most likely, this is the behavior of the analyzer. ES uses a slightly more complex search scheme than the usual exact match:
- When a document is loaded, all text fields are passed through an analyzer consisting of a tokenizer and filters
- Tokenizer beats input into individual tokens (usually words)
- Filters change, add and remove tokens
- Tokens are written to the end index
- When searching, the request is again passed through the parser, beaten into tokens, and ES looks for matches between the request tokens and document tokens
I suspect that this is the problem – in ES there are "ra" and "section" tokens, but there is no "section" token.
Most likely, the point is exactly what tokens the request is split into when searching by the current analyzer. To check this, you need to look at the matching document tokens and the request tokens:
curl -XGET <es host>:9200/<index name>/_analyze -d '{
"text": "разд"
}'
curl -XGET <es host>:9200/<index name>/_analyze -d '{
"text": "раздел"
}'
curl -XGET <es host>:9200/<index name>/_analyze -d '{
"text": "<текст документа>"
}'
* If the search uses an analyzer other than the default index analyzer – most likely, the question does not imply this – it can be specified in the "analyzer"
field
* Document tokens can be obtained directly from elasticsearch , but this is a bit more complicated
If you used all the default settings, then you have a standard analyzer, which (at least for me) will give the following results:
- Ra
- Section
- Chapter
(moreover, I was able to achieve a search for the last two queries, but even with abnormally large fuzziness, ES refused to find me anything by "ra", which pushes me to the fact that either you still have a non-standard analyzer, or I completely forgot the elastic)
After that, you can see exactly how the document matches or doesn't match using the explain API:
curl -XGET <es host>:9200/<index name>/<type name>/<document id>/_explain -d '{
"query": {
"query_string" : {
"query": "ра"
}
}
}'
* directly the request may differ and depends on your library
This way you can find the reason for this behavior on your own – from the community side it is almost impossible until you have mapped the corresponding index to ES, documents that match in two out of three requests, directly the request itself, which passes the library and the ES version. However, as far as I understand, it is specifically behind this question that there is a need to autocomplete – in this case, you do not need a search inside ES, but suggester completion – this is a slightly different ES functionality that was created specifically for autocomplete implementation.