ElasticSearch query with diacritics / accents in PHP

ElasticSearch query with diacritics / accents in PHP - php

I have the following expression: "noapte bună" and I'm trying to get the same result when I'm searching for "bună" or "buna".
I have followed to tutorial here : https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html but to no result.
This is my code:
$params = ['index' => 'asciiv3', 'body' => [
"settings" => [
"analysis" => [
"analyzer" => [
"folding" => [
"tokenizer" => "standard",
"filter" => [ "lowercase", "asciifolding" ]
]
]
]
],
"mappings" => [
"asciiv3" => [
"properties" => [
"saying" => [
"type" => "string",
"analyzer" => "standard",
"fields" => [
"folded" => [
"type" => "string",
"analyzer" => "folding"
]
]
]
]
]
]
]];
self::$instance->indices()->create($params);
and this is the query array:
'multi_match' =>
array(
"type" => "most_fields",
"query" => "bună",
"fields" => [ "saying", "saying.folded" ]
)
Does anyone know what I'm doing wrong?

It works for me. This is my setup:
PUT asciiv3
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [
"lowercase",
"asciifolding"
]
}
}
}
},
"mappings": {
"asciiv3": {
"properties": {
"saying": {
"type": "string",
"analyzer": "standard",
"fields": {
"folded": {
"type": "string",
"analyzer": "folding"
}
}
}
}
}
}
}
POST /asciiv3/asciiv3/1
{
"saying":"bună ziua"
}
POST /asciiv3/asciiv3/2
{
"saying":"buna ziua"
}
GET /asciiv3/_search
{
"query": {
"multi_match": {
"type": "most_fields",
"query": "bună",
"fields": [
"saying",
"saying.folded"
]
}
}
}
With these results:
"hits": {
"total": 2,
"max_score": 0.2712221,
"hits": [
{
"_index": "asciiv3",
"_type": "asciiv3",
"_id": "1",
"_score": 0.2712221,
"_source": {
"saying": "bună ziua"
}
},
{
"_index": "asciiv3",
"_type": "asciiv3",
"_id": "2",
"_score": 0.028130025,
"_source": {
"saying": "buna ziua"
}
}
]
}

Related

Giving priority to prefix match in elasticsearch in php

Is there a way in elasticsearch to give more priority for the prefix match than to the string that contains that word?
For ex.- priorities of words if I search for ram should be like this:
Ram Reddy
Joy Ram Das
Kiran Ram Goel
Swati Ram Goel
Ramesh Singh
I have tried mapping as given in here.
I have done like this:
$params = [
"index" => $myIndex,
"body" => [
"settings"=> [
"analysis"=> [
"analyzer"=> [
"start_with_analyzer"=> [
"tokenizer"=> "my_edge_ngram",
"filter"=> [
"lowercase"
]
]
],
"tokenizer"=> [
"my_edge_ngram"=> [
"type"=> "edge_ngram",
"min_gram"=> 3,
"max_gram"=> 15
]
]
]
],
"mappings"=> [
"doc"=> [
"properties"=> [
"label"=> [
"type"=> "text",
"fields"=> [
"keyword"=> [
"type"=> "keyword"
],
"ngramed"=> [
"type"=> "text",
"analyzer"=> "start_with_analyzer"
]
]
]
]
]
]
]
];
$response = $client->indices()->create($params); // create an index
and searching like this:
$body = [
"size" => 100,
'_source' => $select,
"query"=> [
"bool"=> [
"should"=> [
[
"query_string"=> [
"query"=> "ram*",
"fields"=> [
"value"
],
"boost"=> 5
]
],
[
"query_string"=> [
"query"=> "ram*",
"fields"=> [
"value.ngramed"
],
"analyzer"=> "start_with_analyzer",
"boost"=> 2
]
]
],
"minimum_should_match"=> 1
]
]
];
$params = [
'index' => $myIndex,
'type' => $myType,
'body' => []
];
$params['body'] = $body;
$response = $client->search($params);
The json of query is as follows:
{
"size": 100,
"_source": [
"label",
"value",
"type",
"sr"
],
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "ram*",
"fields": [
"value"
],
"boost": 5
}
},
{
"query_string": {
"query": "ram*",
"fields": [
"value.ngramed"
],
"analyzer": "start_with_analyzer",
"boost": 2
}
}
],
"minimum_should_match": 1,
"must_not": {
"match_phrase": {
"type": "propertyValue"
}
}
}
}
}
I am using elasticsearch 5.3.2
Is there any other way to sort the results for the search in the relational database using the search method in php?

You should not enable fielddata unless really required. To overcome this you can use sub field.
Make the following changes to your code:
"label"=>[
"type"=>"text",
//"fielddata"=> true, ---->remove/comment this line
"analyzer"=>"whitespace",
"fields"=>[
"keyword"=>[
"type"=>"keyword"
]
]
]
To sort on type field use type.keyword instead. This change apply to any field of text type and has a sub-field of type keyword available (assuming the name of this field is keyword). So change as below:
'sort' => [
["type.keyword"=>["order"=>"asc"]],
["sr"=>["order"=>"asc"]],
["propLabels"=>["order"=>"asc"]],
["value"=>["order"=>"asc"]]
]
Update : Index creation and query to get desired output
Create the index as below:
{
"settings": {
"analysis": {
"analyzer": {
"start_with_analyzer": {
"tokenizer": "my_edge_ngram",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"my_edge_ngram": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 15
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"ngramed": {
"type": "text"
}
}
}
}
}
}
}
Use the query below to get the desired result:
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "Ram",
"fields": [
"name"
],
"boost": 5
}
},
{
"query_string": {
"query": "Ram",
"fields": [
"name.ngramed"
],
"analyzer": "start_with_analyzer",
"boost": 2
}
}
],
"minimum_should_match": 1
}
}
}
In the above the query with boost value 5 increases the score for those documents where Ram is present in name. The other query with boost 2 further increases the score for the documents where name starts with Ram.
Sample O/P:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 2.0137746,
"_source": {
"name": "Ram Reddy"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 1.4384104,
"_source": {
"name": "Joy Ram Das"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "3",
"_score": 0.5753642,
"_source": {
"name": "Ramesh Singh"
}
}
]

Elasticseach or query for comma separated values

I am saving id's in the database as comma separated and indexing the same to ElasticSearch. Now I need to retrieve if the user_id matches with the value.
For example it it saving like this in the indexing for the column user_ids (database type is varchar(500) in elasticsearch it is text)
8938,8936,8937
$userId = 8936; // For example expecting to return that row
$whereCondition = [];
$whereCondition[] = [
"query_string" => [
"query"=> $userId,
"default_field" => "user_ids",
"default_operator" => "OR"
]
];
$searchParams = [
'query' => [
'bool' => [
'must' => [
$whereCondition
],
'must_not' => [
['exists' => ['field' => 'deleted_at']]
]
]
],
"size" => 10000
];
User::search($searchParams);
Json Query
{
"query": {
"bool": {
"must": [
[{
"query_string": {
"query": 8936,
"default_field": "user_ids",
"default_operator": "OR"
}
}]
],
"must_not": [
[{
"exists": {
"field": "deleted_at"
}
}]
]
}
},
"size": 10000
}
Mapping details
{
"user_details_index": {
"aliases": {},
"mappings": {
"test_type": {
"properties": {
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"deleted_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"updated_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"user_ids": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
},
"settings": {
"index": {
"creation_date": "1546404165500",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "krpph26NTv2ykt6xE05klQ",
"version": {
"created": "6020299"
},
"provided_name": "user_details_index"
}
}
}
}
I am trying with above logic, but not unable to retrieve. Can someone help on this.

Since the field user_ids is of type text any no analyzer is specified for it by default it will use standard analyzer which won't break 8938,8936,8937 into terms 8938, 8936 and 8937 and hence the id can't match.
To solve this I would suggest you to store array of ids to user_ids field instead of csv. So while indexing you json input should look as below:
{
...
"user_ids": [
8938,
8936,
8937
]
...
}
Since user ids are integer values following changes should be done in mapping:
{
"user_ids": {
"type": "integer"
}
}
The query will be now as follow:
{
"query": {
"bool": {
"filter": [
[
{
"terms": {
"userIds": [
8936
]
}
}
]
],
"must_not": [
[
{
"exists": {
"field": "deleted_at"
}
}
]
]
}
},
"size": 10000
}

No results once implementing an analyzer in Elasticsearch

I am needing to ignore the apostrophe with indexed results so that searching for "Johns potato" will show results for "John's potato"
I was able to get the analyzer accepted but now I return no search results. Does anyone see something obvious that I am missing?
$params = [
'index' => $index,
'body' => [
'settings' => [
'number_of_shards' => 5,
'number_of_replicas' => 2,
'analysis' => [
"analyzer" => [
"my_analyzer" => [
"tokenizer" => "keyword",
"char_filter" => [
"my_char_filter"
]
]
],
"char_filter" => [
"my_char_filter" => [
"type" => "mapping",
"mappings" => [
"' => "
]
]
]
]
],
'mappings' => [
$type => [
'_source' => [
'enabled' => true
],
'properties' => [
'title' => [
'type' => 'text',
'analyzer' => 'my_analyzer'
],
'content' => [
'type' => 'text',
'analyzer' => 'my_analyzer'
]
]
]
]
]
];
I did find out that removing the analyzer from my field mappings allowed results to reappear, but I get no results the second I add the analyzer.
Here's an example query that I make.
{
"body": {
"query": {
"bool": {
"must": {
"multi_match": {
"query": "apples",
"fields": [
"title",
"content"
]
}
},
"filter": {
"terms": {
"site_id": [
"1351",
"1349"
]
}
},
"must_not": [
{
"match": {
"visible": "false"
}
},
{
"match": {
"locked": "true"
}
}
]
}
}
}
}

Probably, what you really want, is to use the english analyzer that is provided. The standard analyzer which is the default will tokenize on whitespace and some punctuation, but will leave apostrophes alone. The english analyzer can stem and remove stop words since the language is known.
Here is the standard analyzer's output, where you can see "john's":
POST _analyze
{
"analyzer": "standard",
"text": "John's potato"
}
{
"tokens": [
{
"token": "john's",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "potato",
"start_offset": 7,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
}
]
}
And here is the english analyzer where you can see the 's is removed. The stemming will allow "John's", "Johns", and "John" to all match the document.
POST _analyze
{
"analyzer": "english",
"text": "John's potato"
}
{
"tokens": [
{
"token": "john",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "potato",
"start_offset": 7,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
}
]
}

Change the JSON format

I am working with drupal 8. I am trying to get the JSON of all nodes of the content type. I got a json as given bellow. But Now I want to change the Following JSON to
[
{
"nid": [
{
"value": "17"
}
],
"uuid": [
{
"value": "3614e0c8-88d4-4e8d-a732-5089698556d5"
}
],
"vid": [
{
"value": "17"
}
],
"type": [
{
"target_id": "resume_creator"
}
],
"langcode": [
{
"value": "en"
}
],
"title": [
{
"value": "uyi"
}
],
"uid": [
{
"target_id": "1"
}
],
"status": [
{
"value": "1"
}
],
"created": [
{
"value": "1452060690"
}
],
"changed": [
{
"value": "1452060709"
}
],
"promote": [
{
"value": "1"
}
],
"sticky": [
{
"value": "0"
}
],
"revision_timestamp": [
{
"value": "1452060709"
}
],
"revision_uid": [
{
"target_id": "1"
}
],
"revision_log": [],
"revision_translation_affected": [
{
"value": "1"
}
],
"default_langcode": [
{
"value": "1"
}
],
"path": [],
"field_communication_address": [
{
"value": "rtyrtytr\r\nuu;\r\nsdgfdh"
}
],
"field_education": [
{
"value": "ytutyuii"
}
],
"field_emails": [
{
"value": "gtf#fgfg.com"
}
],
"field_experiece": [
{
"value": "fghtutyu"
}
],
"field_name": [
{
"value": "ytt"
}
]
}
]
to a format of
[
{
"nid":"17",
"uuid":"3614e0c8-88d4-4e8d-a732-5089698556d5",
"vid": "17",
"type":"resume_creator",
"langcode":"en",
"title":"uyi",
"uid":"1",
"status":"1",
"created":"1452060690",
"changed":"1452060709",
"promote":"1",
"sticky":"0",
"revision_timestamp":"1452060709",
"revision_uid":"1",
"revision_log": [],
"path":[],
"field_communication_address":"rtyrtytr\r\nuu;\r\nsdgfdh",
"field_education":"ytutyuii",
"field_emails":"gtf#fgfg.com",
"field_experiece":"fghtutyu",
"field_name":"ytt"
}
]
using php. Then only I can manage a form angular js. Thanks in advance

Try this
$json = '{
"nid": [
{
"value": "17"
}
],
"uuid": [
{
"value": "3614e0c8-88d4-4e8d-a732-5089698556d5"
}
],
"vid": [
{
"value": "17"
}
],
"type": [
{
"target_id": "resume_creator"
}
],
"langcode": [
{
"value": "en"
}
],
"title": [
{
"value": "uyi"
}
],
"uid": [
{
"target_id": "1"
}
],
"status": [
{
"value": "1"
}
],
"created": [
{
"value": "1452060690"
}
],
"changed": [
{
"value": "1452060709"
}
],
"promote": [
{
"value": "1"
}
],
"sticky": [
{
"value": "0"
}
],
"revision_timestamp": [
{
"value": "1452060709"
}
],
"revision_uid": [
{
"target_id": "1"
}
],
"revision_log": [],
"revision_translation_affected": [
{
"value": "1"
}
],
"default_langcode": [
{
"value": "1"
}
],
"path": [],
"field_communication_address": [
{
"value": "rtyrtytr\r\nuu;\r\nsdgfdh"
}
],
"field_education": [
{
"value": "ytutyuii"
}
],
"field_emails": [
{
"value": "gtf#fgfg.com"
}
],
"field_experiece": [
{
"value": "fghtutyu"
}
],
"field_name": [
{
"value": "ytt"
}
]
}';
$json = json_decode($json,true);
foreach ($json as $key => $value){
if(isset($json[$key][0]['value'])){
$json[$key] = $json[$key][0]['value'];
}
if(isset($json[$key][0]['target_id'])){
$json[$key] = $json[$key][0]['target_id'];
}
// $json[$key] = $json[$key][0]['value'];
}
$json = json_encode($json);
print_r($json);

It is simple.
<?php
$arr = array('nid' => 17, 'uuid' => '3614e0c8-88d4-4e8d-a732-5089698556d5', ...);
echo json_encode($arr);
?>
If you have some misunderstanding, ask me.

Elasticsearch. Nested query for nested in nested

My mapping is (part of it):
$index = [
"mappings" => [
"goods" => [
"dynamic_templates"=> [
[
"iattribute_id"=> [
"match_mapping_type"=> "string",
"match"=> "attribute_id",
"mapping"=> [
"type"=> "integer"
]
]
],
[
"iattribute_value"=> [
"match_mapping_type"=> "string",
"match"=> "attribute_value",
"mapping"=> [
"type"=> "string",
"index" => "not_analyzed"
]
]
]
],
"properties" => [
...
"individual_attributes" => [
"type" => "nested",
"properties" => [
"template_id" => ["type" => "integer"],
"attributes_set" => [
"type" => "nested",
"properties" => [
"attribute_id" => ["type" => "integer"],
"attribute_value" => ["type" => "string", "index" => "not_analyzed"]
]
]
]
]
...
]
]
]
];
How can I query attribute_id and attribute_value? They are nested inside nested. I can't understand how to specify path to fields.
I've composed query but it doesn't work.
GET /index/type/_search
{
"query" : {
"nested" : {
"path" : "individual_attributes.attributes_set",
"score_mode" : "none",
"filter": {
"bool": {
"must": [
{
"term" : {
"individual_attributes.attributes_set.attribute_id": "20"
}
},
{
"term" : {
"individual_attributes.attributes_set.attribute_value": "commodi"
}
}
]
}
}
}
}
}

Try this:
{
"query": {
"nested": {
"path": "individual_attributes",
"score_mode": "none",
"filter": {
"nested": {
"path": "individual_attributes.attributes_set",
"query": {
"bool": {
"must": [
{
"term": {
"individual_attributes.attributes_set.attribute_id": "20"
}
},
{
"term": {
"individual_attributes.attributes_set.attribute_value": "commodi"
}
}
]
}
}
}
}
}
}
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

ElasticSearch query with diacritics / accents in PHP - php

Related

Giving priority to prefix match in elasticsearch in php

Elasticseach or query for comma separated values

No results once implementing an analyzer in Elasticsearch

Change the JSON format

Elasticsearch. Nested query for nested in nested

Categories

Resources