how to Improving relevancy in elasticsearch? - php

This is how my mapping looks
$arr = [
'index' => 'test1',
'body' => [
'settings' => [
'analysis' => [
'analyzer' => [
'name_analyzer' => [
'type' => 'custom',
'tokenizer' => 'standard',
'filter' => [
'lowercase',
'asciifolding',
'word_delimiter'
]
]
]
]
],
"mappings" => [
"info" => [
"properties" => [
"Name" => [// this field is analyzed
"type" => "string",
"fields" => [
"raw" => [ //subfield of Name is not analyzed so that we can avoid a known issue of space saperated bucket generation
"type" => "string",
"index" => "not_analyzed"
]
]
],
"Address" => [
"type" => "string",
"index" => "analyzed",
"analyzer" => "name_analyzer"
]
]
]
]
]
];
And this is my query
$query['index'] = 'test1';
$query['type'] = 'info';
//without bool & should also it will work
$query['body'] = [
'query'=> [
'bool' => [
'should' => [
'query_string' => [
'fields' => ['Name'],
'query' => 'sa*',
'analyze_wildcard' => 'true'
]
]
]
],
'size'=> '0',
'aggregations' => [
'actor' => [
'terms' => [
'field' => 'Name.raw',
'size' => 10
]
]
]
];
My output is
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"actor": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Salma Hayak",
"doc_count": 1
},
{
"key": "Salman Khan",
"doc_count": 1
},
{
"key": "Salman Shaikh",
"doc_count": 1
}
]
}
}
}
What I want is since Salman Khan is the most searched actor as compare to Salma Hayak, having said that when user searched for "sa" they should see salman khan first rather than salma hayak.
Can anyone please help me on this?

Related

Elasticsearch PHP - Exact Word Matches Before Partial Matches

So I'm trying to sort my search results to show the exact matches before all the partial matches. What I mean by this is if I have the documents with names:
Set 4/102
Set 44/102
Set 94/102
I'm searching on the term 4/102 and it returns all documents. This is fine, however, I want the Set 4/102 to show up first but it seemingly sorts them randomly. Is there a way to use script sorting or something like that to have the exact term match to show up first?
These are my mappings and settings:
$settingsParams = [
'index' => 'products',
'body' => [
'settings' => [
'analysis' => [
'analyzer' => [
'substring_analyzer' => [
'tokenizer' => 'substring_tokenizer',
'filter' => [
'lowercase'
]
],
'fullword_analyzer' => [
'tokenizer' => 'whitespace',
'filter' => [
'lowercase'
]
],
],
'tokenizer' => [
'substring_tokenizer' => [
'type' => 'nGram',
'min_gram' => 3,
'max_gram' => 12,
'token_chars' => [
'letter',
'digit',
'symbol',
'custom'
],
'custom_token_chars' => '/'
]
]
],
'max_ngram_diff' => 20
]
]
];
$mappingParams = [
'index' => 'products',
'body' => [
'_source' => [
'enabled' => true
],
'properties' => [
'name' => [
'type' => 'text',
'fields' => [
'keyword' => [
'type' => 'keyword'
]
],
'analyzer' => 'substring_analyzer',
'search_analyzer' => 'fullword_analyzer'
],
'min_price' => [
'type' => 'double'
],
'saleprice' => [
'type' => 'double'
],
'list_price' => [
'type' => 'double'
],
'root_category_rank' => [
'type' => 'integer'
],
'interest_level' => [
'type' => 'integer'
],
'root_categoryid' => [
'type' => 'integer'
]
]
]
];
Adding a working example
Index Mapping:
{
"settings": {
"analysis": {
"analyzer": {
"substring_analyzer": {
"tokenizer": "substring_tokenizer"
},
"fullword_analyzer": {
"tokenizer": "whitespace"
}
},
"tokenizer": {
"substring_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 12,
"token_chars": [
"letter",
"digit",
"custom",
"symbol"
],
"custom_token_chars": "/"
}
}
},
"max_ngram_diff": 50
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "substring_analyzer",
"search_analyzer": "fullword_analyzer"
}
}
}
}
Search Query:
{
"query": {
"match": {
"name": "4/102"
}
}
}
Search Result:
The document "name": "4/102" is having a higher score as compared to other documents
"hits": [
{
"_index": "66232066",
"_type": "_doc",
"_id": "1",
"_score": 0.15275992,
"_source": {
"name": "4/102" // note this
}
},
{
"_index": "66232066",
"_type": "_doc",
"_id": "2",
"_score": 0.12562492,
"_source": {
"name": "44/102"
}
},
{
"_index": "66232066",
"_type": "_doc",
"_id": "3",
"_score": 0.12562492,
"_source": {
"name": "94/102"
}
}
]

PHP ElasticSearch compound query with has_child

When I query Elasticsearch for products from a specific manufacturer, this works:
$params = ['index' => 'products',
'type' => 'product',
'body' => ['query' =>
['match' => ['manufacturers_id' => $query],],
],
];
But when I also want to add on a condition that the product comes in color Silver, which I have added as a child record to the product record, I get a syntax error:
$params = ['index' => 'products',
'type' => 'product',
'body' => ['query' =>
['match' => ['manufacturers_id' => $query],],
['query' =>
['has_child' =>
['type' => 'attributes',
['query' =>
['color' => 'Silver'],],
],
],
],
],
];
The error is
{
"error": {
"col": 49,
"line": 1,
"reason": "Unknown key for a START_OBJECT in [0].",
"root_cause": [
{
"col": 49,
"line": 1,
"reason": "Unknown key for a START_OBJECT in [0].",
"type": "parsing_exception"
}
],
"type": "parsing_exception"
},
"status": 400
}
Also tried
$params = ['index' => 'products',
'type' => 'product',
'body' => ["query"=> [
"match"=> [
"manufacturers_id"=> [11]
],
"has_child"=> [
"type"=> "attributes",
"query"=> [
"match"=> [
"color"=> "silver"
],
],
],
],
],
];
I get "Can't get text on a START_ARRAY at 1:39."
Try this:
"query"=> [
"match"=> [
"manufacturers_id"=> [1,2,3]
],
"has_child"=> [
"type"=> "attributes",
"query"=> [
"match"=> [
"color"=> "silver"
]
]
]
]
I also recommend Sense, it's a plugin for Chrome browser which helps writing ES queries.
See the screenshot
Finally got this to work. Big thanks to #pawle for his suggestion of Sense, which really helped.
$params = ['index' => 'products',
'type' => 'product',
'body' =>
[
"query" => [
"bool" => [
"must" => [[
"has_child" => [
"type" => "attributes",
"query" => [
"match" => [
"attributes_value" => "silver"
]
]
]
],
[
"match" => [
"manufacturers_id" => 4
]
]
]
]
]
],
];

Elasticsearch find input word and all synonyms

Using elasticsearch I try find all items by word "skiing".
My mapping (PHP array):
"properties" => [
"title" => [
"type" => "string",
"boost" => 1.0,
"analyzer" => "autocomplete"
]
]
Settings:
"settings"=> [
"analysis" => [
"analyzer" => [
"autocomplete" => [
"type" => "custom",
"tokenizer" => "standard",
"filter" => ["lowercase", "trim", "synonym", "porter_stem"],
"char_filter" => ["html_strip"]
]
],
"filter" => [
"synonym" => [
"type" => "synonym",
"synonyms_path" => "analysis/synonyms.txt"
]
]
]
]
Search query:
[
"index" => "articles",
"body" => [
"query" => [
"filtered" => [
"query" => [
"bool" => [
"must" => [
"indices" => [
"indices" => ["articles"],
"query" => [
"bool" => [
"should" => [
"multi_match" => [
"query" => "skiing",
"fields" => ["title"]
]
]
]
]
]
]
]
]
]
],
"sort" => [
"_score" => [
"order" => "desc"
]
]
],
"size" => 10,
"from" => 0,
"search_type" => "dfs_query_then_fetch",
"explain" => true
];
In the sysnonyms.txt have skiing => xanthic.
I want get all items with "skiing" (because it is input word), "ski" (by porter_stem tokenizer) and then "xanthic" (by synonyms file). But get result only with word "xanthic".
Please, tell me why? How I need configure the index?
In the synonyms file you need to have "skiing, xanthic". In the way you have it now you are replacing skiing with xanthic, but you want to keep both. And I think you need to reindex the data to see the change.
Thanx, but this is decision. I changed mapping:
"properties" => [
"title" => [
"type" => "string",
"boost" => 1.5,
"analyzer" => "standard",
"fields" => [
"english" => [
"type" => "string",
"analyzer" => "standard",
"search_analyzer" => "english",
"boost" => 1.0
],
"synonym" => [
"type" => "string",
"analyzer" => "standard",
"search_analyzer" => "synonym",
"boost" => 0.5
]
]
]
]
Settings:
"settings"=> [
"analysis" => [
"analyzer" => [
"synonym" => [
"type" => "custom",
"tokenizer" => "standard",
"filter" => ["lowercase", "trim", "synonym"],
"char_filter" => ["html_strip"]
]
],
"filter" => [
"synonym" => [
"type" => "synonym",
"synonyms_path" => "analysis/synonyms.txt"
]
]
]
]

ElasticSearch match query multiple terms PHP

I am trying to construct must query on multiple terms, the array looks like this:
$params = [
'body' => [
'query' => [
"bool" => [
"must" => [
"terms" => [
"categories" => [
"Seating",
],
],
"terms" => [
"attributes.Color" => [
"Black",
],
]
],
"filter" => [
"range" => [
"price" => [
"gte" => 39,
"lte" => 2999,
],
],
],
],
],
'from' => 0,
'size' => 3,
],
];
Which is represented in JSON like this:
{
"query": {
"bool": {
"must": {
"terms": {
"attributes.Color": ["Black"]
}
},
"filter": {
"range": {
"price": {
"gte": "39",
"lte": "2999"
}
}
}
}
},
"from": 0,
"size": 3
}
The problem is, JSON objects are represented as arrays in PHP so if I setup key for one array, it is rewritten. Do you have any idea on how to create multiple terms query in PHP?
Thanks in advance.
You need to add an additional array to enclose all your terms queries
$params = [
'body' => [
'query' => [
"bool" => [
"must" => [
[
"terms" => [
"categories" => [
"Seating",
],
]
],
[
"terms" => [
"attributes.Color" => [
"Black",
],
]
]
],
"filter" => [
"range" => [
"price" => [
"gte" => 39,
"lte" => 2999,
],
],
],
],
],
'from' => 0,
'size' => 3,
],
];

elasticsearch return only one document of id field

I have this data returned with my actual query.
{
"id": 1,
"chantierId": 60,
"location": {
"lat": 49.508804203333,
"lon": 2.4385195366667
}
},
{
"id": 2,
"chantierId": 60,
"location": {
"lat": 49.508780168333,
"lon": 2.43844484
}
},
{
"id": 3,
"chantierId": 33,
"location": {
"lat": 49.50875823,
"lon": 2.4383772216667
}
}
This my Elasticsearch query which search the point with geo_point. :
[
"query" => [
"filtered" => [
"query" => [
"match_all" => []
],
"filter" => [
"geo_distance" => [
"distance" => "100m",
"location" => ['lat' => 49.508804203333, 'lon => 2.4385195366667]
]
]
]
],
"sort" => [
"_geo_distance" => [
"location" => ['lat' => 49.508804203333, 'lon => 2.4385195366667],
"order" => "asc"
]
]
]
How can I to have only one documents of chantierId for 33, 60 and the must nearest of my location.
Thanks
You can add size parameter before query as the number of documents you want to recieve. The modified query will be:
[ "size" => 1,
"query" => [
"filtered" => [
"query" => [
"match_all" => []
],
"filter" => [
"geo_distance" => [
"distance" => "100m",
"location" => ['lat' => 49.508804203333, 'lon => 2.4385195366667]
]
]
]
],
"sort" => [
"_geo_distance" => [
"location" => ['lat' => 49.508804203333, 'lon => 2.4385195366667],
"order" => "asc"
]
]
]
I Resolved my problem with this answer of stackoverflow question : Remove duplicate documents from a search in Elasticsearch
So :
[
"query" => [
"filtered" => [
"query" => [
"match_all" => []
],
"filter" => [
"geo_distance" => [
"distance" => "100m",
"location" => $location
]
]
]
],
"sort" => [
"_geo_distance" => [
"location" => $location,
"order" => "asc"
]
],
"aggs" => [
"Geoloc" => [
"terms" => [
"field" => "chantierId"
],
"aggs" => [
"Geoloc_docs" => [
"top_hits" => [
"size" => 1
]
]
]
]
]
]);
Thanks to #Tanu who tried to help me

Categories