No results once implementing an analyzer in Elasticsearch

No results once implementing an analyzer in Elasticsearch - php

I am needing to ignore the apostrophe with indexed results so that searching for "Johns potato" will show results for "John's potato"
I was able to get the analyzer accepted but now I return no search results. Does anyone see something obvious that I am missing?
$params = [
'index' => $index,
'body' => [
'settings' => [
'number_of_shards' => 5,
'number_of_replicas' => 2,
'analysis' => [
"analyzer" => [
"my_analyzer" => [
"tokenizer" => "keyword",
"char_filter" => [
"my_char_filter"
]
]
],
"char_filter" => [
"my_char_filter" => [
"type" => "mapping",
"mappings" => [
"' => "
]
]
]
]
],
'mappings' => [
$type => [
'_source' => [
'enabled' => true
],
'properties' => [
'title' => [
'type' => 'text',
'analyzer' => 'my_analyzer'
],
'content' => [
'type' => 'text',
'analyzer' => 'my_analyzer'
]
]
]
]
]
];
I did find out that removing the analyzer from my field mappings allowed results to reappear, but I get no results the second I add the analyzer.
Here's an example query that I make.
{
"body": {
"query": {
"bool": {
"must": {
"multi_match": {
"query": "apples",
"fields": [
"title",
"content"
]
}
},
"filter": {
"terms": {
"site_id": [
"1351",
"1349"
]
}
},
"must_not": [
{
"match": {
"visible": "false"
}
},
{
"match": {
"locked": "true"
}
}
]
}
}
}
}

Probably, what you really want, is to use the english analyzer that is provided. The standard analyzer which is the default will tokenize on whitespace and some punctuation, but will leave apostrophes alone. The english analyzer can stem and remove stop words since the language is known.
Here is the standard analyzer's output, where you can see "john's":
POST _analyze
{
"analyzer": "standard",
"text": "John's potato"
}
{
"tokens": [
{
"token": "john's",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "potato",
"start_offset": 7,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
}
]
}
And here is the english analyzer where you can see the 's is removed. The stemming will allow "John's", "Johns", and "John" to all match the document.
POST _analyze
{
"analyzer": "english",
"text": "John's potato"
}
{
"tokens": [
{
"token": "john",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "potato",
"start_offset": 7,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
}
]
}

Related

How to search in elastic search in array of objects?

I have structure in my es doc like :
"urls": {
"de": [
{
"page_type": 3,
"language_id": 13,
"url": "some/watteninseln/"
},
{
"page_type": 5,
"language_id": 13,
"url": "none/watteninseln/"
}
],
"pt": [
{
"page_type": 3,
"language_id": 22,
"url": "some/west-frisian-islands/"
}
]
}
And I want to be able get this doc with params
url and language
so,
$query[] =
[
"bool" => [
"minimum_should_match" => 1,
"should" => [
[
"exists" => [
"field" => 'urls.' . $filters['lang']. $filters['url']
]
],
]
]
];
Im trying like this, but it will be work if we have associative in key urls. But I need to find value in array of objects
Could someone tell me correct way to do it ?

Giving priority to prefix match in elasticsearch in php

Is there a way in elasticsearch to give more priority for the prefix match than to the string that contains that word?
For ex.- priorities of words if I search for ram should be like this:
Ram Reddy
Joy Ram Das
Kiran Ram Goel
Swati Ram Goel
Ramesh Singh
I have tried mapping as given in here.
I have done like this:
$params = [
"index" => $myIndex,
"body" => [
"settings"=> [
"analysis"=> [
"analyzer"=> [
"start_with_analyzer"=> [
"tokenizer"=> "my_edge_ngram",
"filter"=> [
"lowercase"
]
]
],
"tokenizer"=> [
"my_edge_ngram"=> [
"type"=> "edge_ngram",
"min_gram"=> 3,
"max_gram"=> 15
]
]
]
],
"mappings"=> [
"doc"=> [
"properties"=> [
"label"=> [
"type"=> "text",
"fields"=> [
"keyword"=> [
"type"=> "keyword"
],
"ngramed"=> [
"type"=> "text",
"analyzer"=> "start_with_analyzer"
]
]
]
]
]
]
]
];
$response = $client->indices()->create($params); // create an index
and searching like this:
$body = [
"size" => 100,
'_source' => $select,
"query"=> [
"bool"=> [
"should"=> [
[
"query_string"=> [
"query"=> "ram*",
"fields"=> [
"value"
],
"boost"=> 5
]
],
[
"query_string"=> [
"query"=> "ram*",
"fields"=> [
"value.ngramed"
],
"analyzer"=> "start_with_analyzer",
"boost"=> 2
]
]
],
"minimum_should_match"=> 1
]
]
];
$params = [
'index' => $myIndex,
'type' => $myType,
'body' => []
];
$params['body'] = $body;
$response = $client->search($params);
The json of query is as follows:
{
"size": 100,
"_source": [
"label",
"value",
"type",
"sr"
],
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "ram*",
"fields": [
"value"
],
"boost": 5
}
},
{
"query_string": {
"query": "ram*",
"fields": [
"value.ngramed"
],
"analyzer": "start_with_analyzer",
"boost": 2
}
}
],
"minimum_should_match": 1,
"must_not": {
"match_phrase": {
"type": "propertyValue"
}
}
}
}
}
I am using elasticsearch 5.3.2
Is there any other way to sort the results for the search in the relational database using the search method in php?

You should not enable fielddata unless really required. To overcome this you can use sub field.
Make the following changes to your code:
"label"=>[
"type"=>"text",
//"fielddata"=> true, ---->remove/comment this line
"analyzer"=>"whitespace",
"fields"=>[
"keyword"=>[
"type"=>"keyword"
]
]
]
To sort on type field use type.keyword instead. This change apply to any field of text type and has a sub-field of type keyword available (assuming the name of this field is keyword). So change as below:
'sort' => [
["type.keyword"=>["order"=>"asc"]],
["sr"=>["order"=>"asc"]],
["propLabels"=>["order"=>"asc"]],
["value"=>["order"=>"asc"]]
]
Update : Index creation and query to get desired output
Create the index as below:
{
"settings": {
"analysis": {
"analyzer": {
"start_with_analyzer": {
"tokenizer": "my_edge_ngram",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"my_edge_ngram": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 15
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"ngramed": {
"type": "text"
}
}
}
}
}
}
}
Use the query below to get the desired result:
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "Ram",
"fields": [
"name"
],
"boost": 5
}
},
{
"query_string": {
"query": "Ram",
"fields": [
"name.ngramed"
],
"analyzer": "start_with_analyzer",
"boost": 2
}
}
],
"minimum_should_match": 1
}
}
}
In the above the query with boost value 5 increases the score for those documents where Ram is present in name. The other query with boost 2 further increases the score for the documents where name starts with Ram.
Sample O/P:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 2.0137746,
"_source": {
"name": "Ram Reddy"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 1.4384104,
"_source": {
"name": "Joy Ram Das"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "3",
"_score": 0.5753642,
"_source": {
"name": "Ramesh Singh"
}
}
]

ElasticSearch query with diacritics / accents in PHP

I have the following expression: "noapte bună" and I'm trying to get the same result when I'm searching for "bună" or "buna".
I have followed to tutorial here : https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html but to no result.
This is my code:
$params = ['index' => 'asciiv3', 'body' => [
"settings" => [
"analysis" => [
"analyzer" => [
"folding" => [
"tokenizer" => "standard",
"filter" => [ "lowercase", "asciifolding" ]
]
]
]
],
"mappings" => [
"asciiv3" => [
"properties" => [
"saying" => [
"type" => "string",
"analyzer" => "standard",
"fields" => [
"folded" => [
"type" => "string",
"analyzer" => "folding"
]
]
]
]
]
]
]];
self::$instance->indices()->create($params);
and this is the query array:
'multi_match' =>
array(
"type" => "most_fields",
"query" => "bună",
"fields" => [ "saying", "saying.folded" ]
)
Does anyone know what I'm doing wrong?

It works for me. This is my setup:
PUT asciiv3
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"asciiv3": {
"properties": {
"saying": {
"type": "string",
"analyzer": "standard",
"fields": {
"folded": {
"type": "string",
"analyzer": "folding"
}
}
}
}
}
}
}
POST /asciiv3/asciiv3/1
{
"saying":"bună ziua"
}
POST /asciiv3/asciiv3/2
{
"saying":"buna ziua"
}
GET /asciiv3/_search
{
"query": {
"multi_match": {
"type": "most_fields",
"query": "bună",
"fields": [
"saying",
"saying.folded"
]
}
}
}
With these results:
"hits": {
"total": 2,
"max_score": 0.2712221,
"hits": [
{
"_index": "asciiv3",
"_type": "asciiv3",
"_id": "1",
"_score": 0.2712221,
"_source": {
"saying": "bună ziua"
}
},
{
"_index": "asciiv3",
"_type": "asciiv3",
"_id": "2",
"_score": 0.028130025,
"_source": {
"saying": "buna ziua"
}
}
]
}

How to combine bool must and sort for elasticsearch

i am trying to sort some data, where in my base skeleton my sorting is not working and if i remove the sorting it works fine.
So how can i put sorting in my base skeleton and sort some data.
i can't put just
$params['body'] = [
'sort' => [['title' => ['order' => 'asc']]]];
$results = $client->search($params);
Because i have other condition where i need the must condition.
Can anyone knows how it can be solve.
Any advice will be really appreciate.
// my base skeleton
$params = array(
'index' => "myIndex",
'type' => "myType",
'body' => array(
'query' => array(
'bool' => array(
'must' => array(
// empty should clause for starters
)
)
),
'sort' => array()
)
);
// sorting is not working with bool and must
if ($request->request->get('salarySort')) {
$params['body']['query']['bool']['must'][] = array(
'sort' => array(
"title" => array('order' => 'asc')
)
);
}
this is what i get as a json_encode ---
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1066,
"max_score": null,
"hits": [
{
"_index": "myIndex",
"_type": "myType",
"_id": "pe065319de73937aa6ef46413afd7aac26a58a611",
"_score": null,
"_source": {
"title": "Smarason trycker ",
"content": "HIF gör 2-0 mot Halmstad.",
"tag": [
"Soprts"
],
"category": [
"Sports"
]
},
"sort": [
"0"
]
},
{
"_index": "myIndex",
"_type": "myType",
"_id": "pebc44a70008f53f74f23ab23f8a1f79b2b729448",
"_score": null,
"_source": {
"title": "Anders Svenssons tips gav 1-0",
"content": "Anders Svenssons tips i halvtid Kalmar FF.",
"source": "Unknown",
"tag": [
"Soprts"
],
"category": [
"Sports"
]
},
"sort": [
"0"
]
}
]
}
}
query in JSON ---
{
"index": "myIndex",
"type": "myType",
"size": 30,
"body": {
"query": {
"match_all": []
},
"sort": [
{
"title": "asc"
}
]
}
}

You're almost there. You've correctly placed the empty sort array at the same level as your query, which is correct.
The issue comes later when you try to feed it as a bool/must constraint instead of in the empty sort array.
// sorting is not working with bool and must
if ($request->request->get('salarySort')) {
$params['body']['sort'][] = array( <---- this line needs to be changed
"Salary" => 'asc' <---- this line needs to be changed, too
);
}

Elasticsearch Snowball Analyzer wants exact word

I Have been using Elastic Search for a project, but I find the result of Snowball Analyzer a bit strange.
Below is my example of Mapping used.
$myTypeMapping = array(
'_source' => array(
'enabled' => true
),
'properties' => array(
'id' => array(
'type' => 'integer',
'index' => 'not_analyzed'
),
'name' => array(
'type' => 'string',
'analyzer' => 'snowball',
'boost' => 2.0
),
'food_types' => array(
'type' => 'string',
'analyzer' => 'keyword'
),
'location' => array(
'type' => 'geo_point',
"geohash_precision"=> 4
),
'city' => array(
'type' => 'string',
'analyzer' => 'keyword'
)
)
);
$indexParams['body']['mappings']['online_pizza'] = $myTypeMapping;
// Create the index
$elastic_client->indices()->create($indexParams);
On quering the http://localhost:9200/online_pizza/online_pizza/_mapping I get the following results,
{
"online_pizza": {
"properties": {
"city": {
"type": "string",
"analyzer": "keyword"
},
"food_types": {
"type": "string",
"analyzer": "keyword"
},
"id": {
"type": "integer"
},
"location": {
"type": "geo_point",
"geohash_precision": 4
},
"name": {
"type": "string",
"boost": 2,
"analyzer": "snowball"
}
}
}
}
My Question is, I have data, which has Name field as "Milano". On querying for "Milano" I get the desired result, but if I query for "Milan" or "Mil" I get no result found.
{
"query": {
"query_string": {
"default_field": "name",
"query": "Milan"
}
}
}
I've also tried to snowball analyzer during querying, no help.
{
"query": {
"query_string": {
"default_field": "name",
"query": "Milan",
"analyzer": "snowball"
}
}
}
Second Question is Keyword Search is case sensitive, eg, Pizza != pizza, how do i get away with this ?
Thanks,

The snowball stemmer doesn't want exact words. If you try it with jumping, it outputs jump as expected.
However, depending on the case, you word may be understemmed as it doesn't match any stemmer rule.
If you use the analyze API endpoint (more info here), you will see that analyzing Milano with snowball analyzer gives you the token milano :
GET _analyze?analyzer=snowball&text=Milano
Output :
{
"tokens": [
{
"token": "milano",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 1
}
]
}
Then, using same snowball analyzer on Mil like this :
GET _analyze?analyzer=snowball&text=Mil
gives you this token :
{
"tokens": [
{
"token": "mil",
"start_offset": 0,
"end_offset": 3,
"type": "<ALPHANUM>",
"position": 1
}
]
}
That's why searching for 'milan' or 'mil' won't match 'Milano' documents : it doesn't match the milano term stored in index.
For your second question, you can prepare a custom analyzer combining keyword tokenizer and a lowercase tokenfilter in order to have your keyword search case-insensitive (if you use the same analyzer at search time) :
POST index_name
{
"analysis": {
"analyzer": {
"case_insensitive_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
}
Test :
GET analyse/_analyze?analyzer=case_insensitive_keyword&text=Choo Choo
Output :
{
"tokens": [
{
"token": "choo choo",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 1
}
]
}
I hope I'm clear enough in my explainations :)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

No results once implementing an analyzer in Elasticsearch - php

Related

How to search in elastic search in array of objects?

Giving priority to prefix match in elasticsearch in php

ElasticSearch query with diacritics / accents in PHP

How to combine bool must and sort for elasticsearch

Elasticsearch Snowball Analyzer wants exact word

Categories

Resources