Elasticseach or query for comma separated values - php

I am saving id's in the database as comma separated and indexing the same to ElasticSearch. Now I need to retrieve if the user_id matches with the value.
For example it it saving like this in the indexing for the column user_ids (database type is varchar(500) in elasticsearch it is text)
8938,8936,8937
$userId = 8936; // For example expecting to return that row
$whereCondition = [];
$whereCondition[] = [
"query_string" => [
"query"=> $userId,
"default_field" => "user_ids",
"default_operator" => "OR"
]
];
$searchParams = [
'query' => [
'bool' => [
'must' => [
$whereCondition
],
'must_not' => [
['exists' => ['field' => 'deleted_at']]
]
]
],
"size" => 10000
];
User::search($searchParams);
Json Query
{
"query": {
"bool": {
"must": [
[{
"query_string": {
"query": 8936,
"default_field": "user_ids",
"default_operator": "OR"
}
}]
],
"must_not": [
[{
"exists": {
"field": "deleted_at"
}
}]
]
}
},
"size": 10000
}
Mapping details
{
"user_details_index": {
"aliases": {},
"mappings": {
"test_type": {
"properties": {
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"deleted_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"updated_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"user_ids": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
},
"settings": {
"index": {
"creation_date": "1546404165500",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "krpph26NTv2ykt6xE05klQ",
"version": {
"created": "6020299"
},
"provided_name": "user_details_index"
}
}
}
}
I am trying with above logic, but not unable to retrieve. Can someone help on this.

Since the field user_ids is of type text any no analyzer is specified for it by default it will use standard analyzer which won't break 8938,8936,8937 into terms 8938, 8936 and 8937 and hence the id can't match.
To solve this I would suggest you to store array of ids to user_ids field instead of csv. So while indexing you json input should look as below:
{
...
"user_ids": [
8938,
8936,
8937
]
...
}
Since user ids are integer values following changes should be done in mapping:
{
"user_ids": {
"type": "integer"
}
}
The query will be now as follow:
{
"query": {
"bool": {
"filter": [
[
{
"terms": {
"userIds": [
8936
]
}
}
]
],
"must_not": [
[
{
"exists": {
"field": "deleted_at"
}
}
]
]
}
},
"size": 10000
}

Related

Giving priority to prefix match in elasticsearch in php

Is there a way in elasticsearch to give more priority for the prefix match than to the string that contains that word?
For ex.- priorities of words if I search for ram should be like this:
Ram Reddy
Joy Ram Das
Kiran Ram Goel
Swati Ram Goel
Ramesh Singh
I have tried mapping as given in here.
I have done like this:
$params = [
"index" => $myIndex,
"body" => [
"settings"=> [
"analysis"=> [
"analyzer"=> [
"start_with_analyzer"=> [
"tokenizer"=> "my_edge_ngram",
"filter"=> [
"lowercase"
]
]
],
"tokenizer"=> [
"my_edge_ngram"=> [
"type"=> "edge_ngram",
"min_gram"=> 3,
"max_gram"=> 15
]
]
]
],
"mappings"=> [
"doc"=> [
"properties"=> [
"label"=> [
"type"=> "text",
"fields"=> [
"keyword"=> [
"type"=> "keyword"
],
"ngramed"=> [
"type"=> "text",
"analyzer"=> "start_with_analyzer"
]
]
]
]
]
]
]
];
$response = $client->indices()->create($params); // create an index
and searching like this:
$body = [
"size" => 100,
'_source' => $select,
"query"=> [
"bool"=> [
"should"=> [
[
"query_string"=> [
"query"=> "ram*",
"fields"=> [
"value"
],
"boost"=> 5
]
],
[
"query_string"=> [
"query"=> "ram*",
"fields"=> [
"value.ngramed"
],
"analyzer"=> "start_with_analyzer",
"boost"=> 2
]
]
],
"minimum_should_match"=> 1
]
]
];
$params = [
'index' => $myIndex,
'type' => $myType,
'body' => []
];
$params['body'] = $body;
$response = $client->search($params);
The json of query is as follows:
{
"size": 100,
"_source": [
"label",
"value",
"type",
"sr"
],
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "ram*",
"fields": [
"value"
],
"boost": 5
}
},
{
"query_string": {
"query": "ram*",
"fields": [
"value.ngramed"
],
"analyzer": "start_with_analyzer",
"boost": 2
}
}
],
"minimum_should_match": 1,
"must_not": {
"match_phrase": {
"type": "propertyValue"
}
}
}
}
}
I am using elasticsearch 5.3.2
Is there any other way to sort the results for the search in the relational database using the search method in php?
You should not enable fielddata unless really required. To overcome this you can use sub field.
Make the following changes to your code:
"label"=>[
"type"=>"text",
//"fielddata"=> true, ---->remove/comment this line
"analyzer"=>"whitespace",
"fields"=>[
"keyword"=>[
"type"=>"keyword"
]
]
]
To sort on type field use type.keyword instead. This change apply to any field of text type and has a sub-field of type keyword available (assuming the name of this field is keyword). So change as below:
'sort' => [
["type.keyword"=>["order"=>"asc"]],
["sr"=>["order"=>"asc"]],
["propLabels"=>["order"=>"asc"]],
["value"=>["order"=>"asc"]]
]
Update : Index creation and query to get desired output
Create the index as below:
{
"settings": {
"analysis": {
"analyzer": {
"start_with_analyzer": {
"tokenizer": "my_edge_ngram",
"filter": [
"lowercase"
]
}
},
"tokenizer": {
"my_edge_ngram": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 15
}
}
}
},
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
},
"ngramed": {
"type": "text"
}
}
}
}
}
}
}
Use the query below to get the desired result:
{
"query": {
"bool": {
"should": [
{
"query_string": {
"query": "Ram",
"fields": [
"name"
],
"boost": 5
}
},
{
"query_string": {
"query": "Ram",
"fields": [
"name.ngramed"
],
"analyzer": "start_with_analyzer",
"boost": 2
}
}
],
"minimum_should_match": 1
}
}
}
In the above the query with boost value 5 increases the score for those documents where Ram is present in name. The other query with boost 2 further increases the score for the documents where name starts with Ram.
Sample O/P:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 2.0137746,
"_source": {
"name": "Ram Reddy"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 1.4384104,
"_source": {
"name": "Joy Ram Das"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "3",
"_score": 0.5753642,
"_source": {
"name": "Ramesh Singh"
}
}
]

Sort parent user list by inner hits elasticsearch

I have 200K users in elasticsearch and each user has its own inbox. Now suppose threeo users user A,B and C. User A and user C send message to user B. So when user B fetch users list from elasticsearch then user A and C should be on the top of the user list because A and B most recent sent message to user B. I write my elasticsearch query that is given below
{
"_source": [
"db_id",
"username",
"message_privacy"
],
"from": "0",
"size": "40",
"sort": [{"messages_received.created_at" : "desc"}],
"query": {
"bool": {
"must": [
{
"term":{
"type":"user"
}
},
{
"has_child": {
"type": "messages_received",
"inner_hits": {
"sort": [
{
"created_at": "desc"
}
],
"size": 1,
"_source": [
"id",
"user_id",
"object_id",
"created_at"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"object_id": "u-5"
}
}
]
}
}
}
}
]
}
}
}
But when I run query it gives me error
{ "error": {
"root_cause": [
{
"type": "query_shard_exception",
"reason": "No mapping found for [messages_received.created_at] in order to sort on",
"index_uuid": "5jsM1khYRrC0cjWbRjsx5A",
"index": "trending"
}
],
I search this problem on google but not usefull solution found for my scenario.
Mapping
{
"type": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"post": [
"comments",
"place",
"media",
"views",
"likes",
"post_box"
],
"box": "posts",
"user": [
"user_views",
"user_likes",
"followers",
"post",
"blocked",
"followings",
"box",
"block",
"notifications",
"messages_received",
"messages_sent"
],
"posts": "posts_views"
}
}}

Number format Exception For string type

I have a mapping like this
{
"settings": {
"analysis": {
"filter": {
"nGramFilter": {
"type": "nGram",
"min_gram": 3,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
},
"email" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"([^#]+)",
"(\\p{L}+)",
"(\\d+)",
"#(.+)"
]
},
"number" : {
"type" : "pattern_capture",
"preserve_original" : 1,
"patterns" : [
"([^+-]+)",
"(\\d+)"
]
},
"edgeNGramFilter": {
"type": "nGram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGramAnalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"nGramFilter"
]
},
"whitespaceAnalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase"
]
},
"email" : {
"tokenizer" : "uax_url_email",
"filter" : [
"email",
"lowercase",
"unique"
]
},
"number" : {
"tokenizer" : "whitespace",
"filter" : [ "number", "unique" ]
},
"edgeNGramAnalyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"edgeNGramFilter"
]
}
}
}
},
"users": {
"mappings": {
"user_profiles": {
"properties": {
"firstName": {
"type": "string",
"analyzer": "nGramAnalyzer",
"search_analyzer": "whitespaceAnalyzer"
},
"lastName": {
"type": "string",
"analyzer": "nGramAnalyzer",
"search_analyzer": "whitespaceAnalyzer"
},
"email": {
"type": "string",
"analyzer": "email",
"search_analyzer": "whitespaceAnalyzer"
},
"score" : {
"type": "string"
},
"homeLandline": {
"type": "string",
"analyzer": "number",
"search_analyzer": "whitespaceAnalyzer"
},
"dob": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"mobile": {
"type": "integer"
},
"residenceCity": {
"type": "string",
"analyzer": "edgeNGramAnalyzer",
"search_analyzer": "whitespaceAnalyzer"
},
"created_at": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
}
}
}
}
}
I can get the score as integer as well as "NA" so I mapped the type as string but while posting data to the index i am getting Number Format Exception.
For Example:
if I post first data as integer and followed by "NA". I am getting these exception.
while checking my log file I am getting this errors:
[2016-08-29 15:19:01] elasticlog.WARNING: Response ["{\"error\":{\"root_cause\":[{\"type\":\"mapper_parsing_exception\",\"reason\":\"failed
to parse
[score]\"}],\"type\":\"mapper_parsing_exception\",\"reason\":\"failed
to parse
[score]\",\"caused_by\":{\"type\":\"number_format_exception\",\"reason\":\"For
input string: \"NH\"\"}},\"status\":400}"] []
Your mapping is incorrect. It should be, assuming, users is the index name and user_profiles is the type:
{
"users": {
"mappings": {
"user_profiles": {
"properties": {
"score": {
"type": "string"
}
}
}
}
}
}
You have a missing mappings before user_profiles.

PHP and JSON. How to get a specific element? //Multidimensional Array

My JSON file
{
"shopId": 29,
"last": 46977914,
"freshfood": [
{
"freshfood_id": 2629,
"food": [
{
"food_id": 1740851,
"type": "fruit",
"status": 1
},
{
"food_id": 1730905,
"type": "vegetable",
"status": 1
},
]
}
]
}
I need to get second food_id (1730905)
I try this, but it does not work.
$string = file_get_contents("food.json");
$json_a=json_decode($string,true);
echo $GetFreshFoodId = $json_a['freshfood'][1]['freshfood_id'];
$json_a['freshfood']['food'][1]['food_id'];
It´s a syntrax error in your json file.
Your last entry in "food": [ ... ] has a comma.
That´s the reason why you get NULL when you´re run json_decode
{
"shopId": 29,
"last": 46977914,
"freshfood": [
{
"freshfood_id": 2629,
"food": [
{
"food_id": 1740851,
"type": "fruit",
"status": 1
},
{
"food_id": 1730905,
"type": "vegetable",
"status": 1
},
]
}
]
}

ElasticSearch match combination in array

I'm implementing ElasticSearch into my Laravel application using the php package from ElasticSearch.
My application is a small jobboard and currently my job document is looking like this:
{
"_index":"jobs",
"_type":"job",
"_id":"19",
"_score":1,
"_source":{
"0":"",
"name":"Programmer",
"description":"This is my first job! :)",
"text":"Programming is awesome",
"networks":[
{
"id":1,
"status":"PRODUCTION",
"start":"2015-02-26",
"end":"2015-02-26"
},
{
"id":2,
"status":"PAUSE",
"start":"2015-02-26",
"end":"2015-02-26"
}
]
}
}
As you can see a job can be attached to multiple networks. In my search query I would like to include WHERE network.id == 1 AND network.status == PRODUCTION.
My current query looks like this, however this returns documents where it has a network of id 1, if it has any network of status PRODUCTION. Is there anyway i can enforce both to be true within one network?
$query = [
'index' => $this->index,
'type' => $this->type,
'body' => [
'query' => [
'bool' => [
'must' => [
['networks.id' => 1]],
['networks.status' => 'PRODUCTION']]
],
'should' => [
['match' => ['name' => $query]],
['match' => ['text' => $query]],
['match' => ['description' => $query]],
],
],
],
],
];
You need to specify that the objects in the networks array should be stored as individual objects in the index, this will allow you to perform a search on individual network objects. You can do so using the nested type in Elasticsearch.
Also, if you doing exact matches it is better to use a filter rather than a query as the filters are cached and always give you better performance than a query.
Create your index with a new mapping. Use the nested type for the networks array.
POST /test
{
"mappings": {
"job": {
"properties": {
"networks": {
"type": "nested",
"properties": {
"status": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}
Add a document:
POST /test/job/1
{
"0": "",
"name": "Programmer",
"description": "This is my first job! :)",
"text": "Programming is awesome",
"networks": [
{
"id": 1,
"status": "PRODUCTION",
"start": "2015-02-26",
"end": "2015-02-26"
},
{
"id": 2,
"status": "PAUSE",
"start": "2015-02-26",
"end": "2015-02-26"
}
]
}
As you have a nested type you will need to use a nested filter.
POST /test/job/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "networks",
"filter": {
"bool": {
"must": [
{
"term": {
"networks.id": "1"
}
},
{
"term": {
"networks.status.raw": "PRODUCTION"
}
}
]
}
}
}
}
}
}
}

Categories