Elasticsearch Range filter with year only - php

I need to filter my data with year only using elastic search. I am using PHP to fetch and show the results. Here is my JSON Format data
{ loc_cityname: "New York",
location_countryname: "US",
location_primary: "North America"
admitted_date : "1994-12-10"
},
{ loc_cityname: "New York",
location_countryname: "US",
location_primary: "North America"
admitted_date : "1995-12-10"
},
I am using below codes to filter the values by year.
$options='{
"query": {
"range" : {
"admitted_date" : {
"gte" : 1994,
"lte" : 2000
}
}
},
"aggs" : {
"citycount" : {
"cardinality" : {
"field" : "loc_cityname",
"precision_threshold": 100
}
}
}
}';
How can i filter the results with year only. Please somebody help me to fix this.
Thanks in advance,

You simply need to add the format parameter to your range query like this:
$options='{
"query": {
"range" : {
"admitted_date" : {
"gte" : 1994,
"lte" : 2000,
"format": "yyyy" <--- add this line
}
}
},
"aggs" : {
"citycount" : {
"cardinality" : {
"field" : "loc_cityname",
"precision_threshold": 100
}
}
}
}';
UPDATE
Note that the above solution only works for ES 1.5 and above. With previous versions of ES, you could use a script filter instead:
$options='{
"query": {
"filtered": {
"filter": {
"script": {
"script": "(min..max).contains(doc.admitted_date.date.year)",
"params": {
"min": 1994,
"max": 2000
}
}
}
}
},
"aggs": {
"citycount": {
"cardinality": {
"field": "loc_cityname",
"precision_threshold": 100
}
}
}
}';
In order to be able to run this script filter, you need to make sure that you have enabled scripting in elasticsearch.yml:
script.disable_dynamic: false

Related

Searching for exact phrase with synonyms

I am trying to build a query, where I am using exact phrase match and synonyms and I can't figure it out. Also, when using wildcard approach I don't know how to use fuzziness. Is it even possible with wildcards? It would be great to get same results for terms "call of duty", "cod" or "call of dutz".
I have created this index:
PUT exact_search
{
"settings": {
"index": {
"number_of_shards": "1",
"number_of_replicas": "0",
"analysis": {
"analyzer": {
"analyzer_exact": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase",
"icu_folding",
"synonyms"
]
}
},
"filter": {
"synonyms": {
"type": "synonym",
"synonyms_path": "synonyms.txt"
}
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "keyword",
"fields": {
"analyzer_exact": {
"type": "text",
"analyzer": "analyzer_exact"
}
}
}
}
}
}
And I fill it with these items:
POST exact_search/_doc/1
{
"name": "Hoodie Call of Duty"
}
POST exact_search/_doc/2
{
"name": "Call of Duty 2"
}
POST exact_search/_doc/3
{
"name": "Call of Duty: Modern Warfare 2"
}
POST exact_search/_doc/4
{
"name": "COD: Modern Warfare 2"
}
POST exact_search/_doc/5
{
"name": "Call of duty"
}
POST exact_search/_doc/6
{
"name": "Call of the sea"
}
POST exact_search/_doc/7
{
"name": "Heavy Duty"
}
synonyms.txt looks like this:
cod,call of duty
And what I am trying to achieve is, to get all the results (exept call of the sea and heavy duty) when I search "call of duty" or "cod".
So far, I constructed this query, but it does not work as expected when using "cod" search term (term "call of duty" works fine):
GET exact_search/_search
{
"explain": false,
"query":{
"bool":{
"must":[
{
"wildcard": {
"name.analyzer_exact": {
"value": "*cod*"
}
}
}
]
}
}
}
But the result is only two items:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "exact_search",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"name" : "COD: Modern Warfare 2"
}
},
{
"_index" : "exact_search",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"name" : "Call of duty"
}
}
]
}
}
It looks like that the synonyms are working, because it returns "call of duty" game, but it ignores the wildcards - it won't return Call of Duty 2 for example.
I need to look for the exact phrase match, because I dont't want to get results Heavy Duty or Call of the sea (when words "call" and "duty" match).
Thank you for pointing me in the right direction.
I have my doubts if the analyzer would generate the tokens synonymous with the analyzer_exact "tokenizer": "keyword".
I would change a few things to make it work.
keyword -> standard
"analyzer_exact": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"synonyms"
]
}
I would use match phrase to eliminate names other than call of duty and cod.
{
"match_phrase": {
"name.analyzer_exact": "cod"
}
}
Response after changes
{
"hits": {
"hits": [
{
"_source": {
"name": "Call of duty"
}
},
{
"_source": {
"name": "COD: Modern Warfare 2"
}
},
{
"_source": {
"name": "Call of Duty 2"
}
},
{
"_source": {
"name": "hoddies Call of Duty"
}
},
{
"_source": {
"name": "Call of Duty: Modern Warfare 2"
}
}
]
}

Multiple search field elasticsearphp

Hello i want to do something like that with elasticsearch enter image description here
I already have some knowledge in elasticsearch but I can't understand how can I do this , multiple search
You can use a combination of bool/must/should clause to combine multiple conditions
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "tag"
}
},
{
"match": {
"answers": 0
}
},
{
"match": {
"user": 1234
}
},
{
"multi_match": {
"query": "words here",
"type": "phrase"
}
},
{
"match": {
"score": 3
}
},
{
"match": {
"isaccepted": "yes"
}
}
]
}
}
}
If you want to search on multiple fields then you can use multi_match query
If no fields are provided, the multi_match query defaults to the
index.query.default_field index settings, which in turn defaults to *.
This extracts all fields in the mapping that are eligible to term queries and filters the metadata fields. All extracted fields are then
combined to build a query.
Adding a working example with index data, search query, and search result
Index Data:
{
"answers": 0,
"isaccepted": "no"
}
{
"answers": 0,
"isaccepted": "yes"
}
Search Query:
{
"query": {
"multi_match" : {
"query" : "yes"
}
}
}
Search Result:
"hits": [
{
"_index": "67542669",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"answers": 0,
"isaccepted": "yes"
}
}
]

Elasticsearch : How to use multiple filter and calculation in aggregations?

I'm trying to do a function on kibana.
I have an index with orders with some fields :
datetime1, datetime2 with format : yyyy-MM-dd HH:mm
First I have to check if datetime1 exist.
Secondly I have to check the difference between this 2 datime datetime2 - datetime1
To finish I have to put the result in differents aggs if the difference is:
less than 24h
between 24 and 48h
48 - 72
....
What I tried :
GET orders/_search
{
"size": 0,
"aggs": {
"test1": {
"filters": {
"filters": {
"exist_datetime1": {
"exists": {
"field": "datetime1"
}
},
"24_hours": {
"script": {
"script": {
"source": "doc['datetime2'].value - doc['datetime1'].value < 24",
"lang": "painless"
}
}
}
}
}
}
}
}
How can I do multiple filter and do a subtraction between date ?
Thank for your help :)
That's a good start, however, I think you need something slightly different. Here is an attempt at providing the ranges your need using the range aggregation powered by your script.
You need to make sure both date fields have values (query part) and then you can define the buckets you need (< 24h, 24h - 48h, etc)
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "datetime1"
}
},
{
"exists": {
"field": "datetime2"
}
}
]
}
},
"aggs": {
"ranges": {
"range": {
"script": {
"lang": "painless",
"source": "(doc['datetime2'].value.millis - doc['datetime1'].value.millis) / 3600000"
},
"ranges": [
{
"to": 24,
"key": "< 24h"
},
{
"from": 24,
"to": 48,
"key": "24h-48h"
},
{
"from": 48,
"key": "> 48h"
}
]
}
}
}
}

How to calculate average minimal times in Elasticsearch?

Use Elasticsearch version is 5.4.2
I'd like to build an Elasticsearch query to satisfy three conditions.
filter by championId
get minimal time to buy various item per game
calculate avg minimal time to buy each item in all games.
I did 1 and 2. But I could not find solving 3. Is it possible to execute 1 to 3 in the query? Just in case, I will use the result on Laravel 5.4, one of PHP frameworks.
My data format is the following:
"_index": "timelines",
"_type": "timeline"
"_source": {
"gameId": 152735348,
"participantId": 3,
"championId": 35,
"role": "NONE",
"lane": "JUNGLE",
"win": 1,
"itemId": 1036,
"timestamp": 571200
}
My current Elasticsearch query is this
GET timelines/_search?size=0&pretty
{
"query": {
"bool": {
"must": [
{ "match": { "championId": 22 }}
]
}
},
"aggs": {
"games": {
"terms": {
"field": "gameId"
},
"aggs": {
"items": {
"terms": {
"field": "itemId",
"order" : { "min_buying_time" : "asc" }
},
"aggs": {
"min_buying_time": {
"min": {
"field": "timestamp"
}
}
}
}
}
}
}
}
As #Sönke Liebau said pipeline aggregation is the key, but if you want to count average minimal time of all games per item you should first aggregate by itemID. Following code should help:
POST misko/_search
{
"query": {
"bool": {
"must": [
{ "match": { "championId": 22 }}
]
}
},
"aggs": {
"items": {
"terms": {
"field": "itemId"
},
"aggs": {
"games": {
"terms": {
"field": "gameId"
},
"aggs": {
"min_buying_time": {
"min": {
"field": "timestamp"
}
}
}
},
"avg_min_time": {
"avg_bucket": {
"buckets_path": "games>min_buying_time"
}
}
}
}
}
}
If I understand your objective correctly you should be able to solve this with pipeline aggregations. More specifically to your use case, the Avg Bucket aggregation should be helpful, check out the example in the documentation, that should be very close to what you need I think.
Something like:
"avg_min_buying_time": {
"avg_bucket": {
"buckets_path": "games>min_buying_time"
}
}

Elasticsearch - Distinct Values, Not Counts

I am trying to do something similar to this SQL query:
SELECT * FROM table WHERE fileContent LIKE '%keyword%' AND company_id = '1' GROUP BY email
Having read posts similar to this I have this:
{
"query": {
"bool": {
"must": [{
"match": {
"fileContent": {
"query": "keyword"
}
}
}],
"filter": [{
"terms": {
"company_id": [1]
}
}]
}
},
"aggs": {
"group_by_email": {
"terms": {
"field": "email",
"size": 1000
}
}
},
"size": 0
}
Field mappings are:
{
"cvs" : {
"mappings" : {
"application" : {
"_meta" : {
"model" : "Acme\\AppBundle\\Entity\\Application"
},
"dynamic_date_formats" : [ ],
"properties" : {
"email" : {
"type" : "keyword"
},
"fileContent" : {
"type" : "text"
},
"company_id" : {
"type" : "text"
}
}
}
}
}
}
... which are generated from Symfony config.yml:
fos_elastica:
clients:
default:
host: "%elastica.host%"
port: "%elastica.port%"
indexes:
cvs:
client: default
types:
application:
properties:
fileContent: ~
email:
index: not_analyzed
company_id: ~
persistence:
driver: orm
model: Acme\AppBundle\Entity\Application
provider: ~
finder: ~
The filter works fine, but I am finding that hits:hits returns no items (or all results matching the search if I remove size:0) and aggregations:group_by_email:buckets has a count of the groups but not the records themselves. The records that were grouped aren't returned and it's these that I need.
I have also tried with FOSElasticBundle using the query builder if this is your preferred flavour (this works but doesn't have the grouping/aggregation):
$boolQuery = new \Elastica\Query\BoolQuery();
$filterKeywords = new \Elastica\Query\Match();
$filterKeywords->setFieldQuery('fileContent', 'keyword');
$boolQuery->addMust($filterKeywords);
$filterUser = new \Elastica\Query\Terms();
$filterUser->setTerms('company_id', array('1'));
$boolQuery->addFilter($filterUser);
$finder = $this->get('fos_elastica.finder.cvs.application');
Thanks.
For this you need top_hits aggregation inside the terms one you are already using:
"aggs": {
"group_by_email": {
"terms": {
"field": "email",
"size": 1000
},
"aggs": {
"sample_docs": {
"top_hits": {
"size": 100
}
}
}
}
}
top_hits:{size:1} appears to be what I need, having played around with Andrei's answer. This will return one record for each bucket in the aggregation
"aggs": {
"group_by_email": {
"terms": {
"field": "email",
"size": 1000
},
"aggs": {
"sample_docs": {
"top_hits": {
"size": 1
}
}
}
}
}
Ref: top_hits
top_hits helped me too. I had some trouble too, but eventually figured out how to resolve it. So here is my solution:
{
"query": {
"nested": {
"path": "placedOrders",
"query": {
"bool": {
"must": [
{
"term": {
"placedOrders.ownerId": "0a9fdef0-4508-4f9c-aa8c-b3984e39ad1e"
}
}
]
}
}
}
},
"aggs": {
"custom_name1": {
"nested": {
"path": "placedOrders"
},
"aggs": {
"custom_name2": {
"terms": {
"field": "placedOrders.propertyId"
},
"aggs": {
"custom_name3": {
"top_hits": {
"size": 1,
"sort": [
{
"placedOrders.propertyId": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}

Categories