match closest number value elasticsearchDSL(php) - php

I'm having trouble finding an answer on SO, elastic search docs, or google to find this answer for my use case:
Find the closest number to X input that is still lower then X.
I have a mapping that looks like this:
{
"rule": {
"properties": {
"price": { "type": "long" },
"from": { "type": "long" }
}
}
}
What I need the closest matching from, that is less then the input value.
So for example I have these rules:
{
{ "rule": {"from": 1, "price": 5} },
{ "rule": {"from": 50, "price": 4} },
{ "rule": {"from": 100, "price": 3} },
{ "rule": {"from": 150, "price": 2} }
}
If I search for the from with the value off 75, I'd want the rule for "from": 50.
Most of the answers I found were relating to geo/ip or text, I could not find an example that made it click for me.

Range query can be used to get all rules which are less than equal to input value. Top 1 sorted document(by from ) can be returned
Query:
{
"query": {
"range": {
"rule.from": {
"lte": 75
}
}
},
"size": 1,
"sort": [
{
"rule.from": {
"order": "desc"
}
}
]
}

Related

Elasticsearch : How to use multiple filter and calculation in aggregations?

I'm trying to do a function on kibana.
I have an index with orders with some fields :
datetime1, datetime2 with format : yyyy-MM-dd HH:mm
First I have to check if datetime1 exist.
Secondly I have to check the difference between this 2 datime datetime2 - datetime1
To finish I have to put the result in differents aggs if the difference is:
less than 24h
between 24 and 48h
48 - 72
....
What I tried :
GET orders/_search
{
"size": 0,
"aggs": {
"test1": {
"filters": {
"filters": {
"exist_datetime1": {
"exists": {
"field": "datetime1"
}
},
"24_hours": {
"script": {
"script": {
"source": "doc['datetime2'].value - doc['datetime1'].value < 24",
"lang": "painless"
}
}
}
}
}
}
}
}
How can I do multiple filter and do a subtraction between date ?
Thank for your help :)
That's a good start, however, I think you need something slightly different. Here is an attempt at providing the ranges your need using the range aggregation powered by your script.
You need to make sure both date fields have values (query part) and then you can define the buckets you need (< 24h, 24h - 48h, etc)
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "datetime1"
}
},
{
"exists": {
"field": "datetime2"
}
}
]
}
},
"aggs": {
"ranges": {
"range": {
"script": {
"lang": "painless",
"source": "(doc['datetime2'].value.millis - doc['datetime1'].value.millis) / 3600000"
},
"ranges": [
{
"to": 24,
"key": "< 24h"
},
{
"from": 24,
"to": 48,
"key": "24h-48h"
},
{
"from": 48,
"key": "> 48h"
}
]
}
}
}
}

Aggs multiple buckets with nested documents in Elasticsearch

I'm currently working on an Elasticsearch project. I want to aggregate data from our existing documents.
The (simplified) structure is as follows:
{
"products" : {
"mappings" : {
"product" : {
"properties" : {
"created" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss"
},
"description" : {
"type" : "text"
},
"facets" : {
"type" : "nested",
"properties" : {
"facet_id" : {
"type" : "long"
}
"name_slug" : {
"type" : "keyword"
},
"value_slug" : {
"type" : "keyword"
}
}
},
}
}
}
}
}
Want I want to achieve with one query:
Select the unique facet_name values
Under the facet_names I want all corresponding facet_values
Something like this:
- facet_name
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
- facet_name
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
Can you guys point me in the right direction? I've looked at the aggs query, but the documentation is not clearly enough in order to realise this.
You'll be using nested terms aggregations. Since the facet names & values are under the same path, you can try this:
GET products/_search
{
"size": 0,
"aggs": {
"by_facet_names_parent": {
"nested": {
"path": "facets"
},
"aggs": {
"by_facet_names_nested": {
"terms": {
"field": "facets.name_slug",
"size": 10
},
"aggs": {
"by_facet_subvalues": {
"terms": {
"field": "facets.value_slug",
"size": 10
}
}
}
}
}
}
}
}
And your response should look like something along these lines:
{
"took": 26,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 30,
"max_score": 0,
"hits": []
},
"aggregations": {
"by_facet_names_parent": {
"doc_count": 90,
"by_facet_names_nested": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 80,
"buckets": [
{
"key": "0JDcya7Y7Y", <-------- your facet name keyword
"doc_count": 4,
"by_facet_subvalues": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "3q4E9R6h5k", <-------- one of the facet values + its count
"doc_count": 3
},
{
"key": "1q4E9R6h5k", <-------- another facet value & count
"doc_count": 1
}
]
}
},
{
"key": "0RyRKWugU1",
"doc_count": 1,
"by_facet_subvalues": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Af7qeCsXz6",
"doc_count": 1
}
]
}
}
.....
]
}
}
}
}
Notice how the number of nested buckets might be >= the number of your actual products docs. This is because the nested aggregations treat the nested subdocuments as separate documents within the parent documents. This takes some time to digest but it'll make sense when you play around with them long enough.

How to calculate average minimal times in Elasticsearch?

Use Elasticsearch version is 5.4.2
I'd like to build an Elasticsearch query to satisfy three conditions.
filter by championId
get minimal time to buy various item per game
calculate avg minimal time to buy each item in all games.
I did 1 and 2. But I could not find solving 3. Is it possible to execute 1 to 3 in the query? Just in case, I will use the result on Laravel 5.4, one of PHP frameworks.
My data format is the following:
"_index": "timelines",
"_type": "timeline"
"_source": {
"gameId": 152735348,
"participantId": 3,
"championId": 35,
"role": "NONE",
"lane": "JUNGLE",
"win": 1,
"itemId": 1036,
"timestamp": 571200
}
My current Elasticsearch query is this
GET timelines/_search?size=0&pretty
{
"query": {
"bool": {
"must": [
{ "match": { "championId": 22 }}
]
}
},
"aggs": {
"games": {
"terms": {
"field": "gameId"
},
"aggs": {
"items": {
"terms": {
"field": "itemId",
"order" : { "min_buying_time" : "asc" }
},
"aggs": {
"min_buying_time": {
"min": {
"field": "timestamp"
}
}
}
}
}
}
}
}
As #Sönke Liebau said pipeline aggregation is the key, but if you want to count average minimal time of all games per item you should first aggregate by itemID. Following code should help:
POST misko/_search
{
"query": {
"bool": {
"must": [
{ "match": { "championId": 22 }}
]
}
},
"aggs": {
"items": {
"terms": {
"field": "itemId"
},
"aggs": {
"games": {
"terms": {
"field": "gameId"
},
"aggs": {
"min_buying_time": {
"min": {
"field": "timestamp"
}
}
}
},
"avg_min_time": {
"avg_bucket": {
"buckets_path": "games>min_buying_time"
}
}
}
}
}
}
If I understand your objective correctly you should be able to solve this with pipeline aggregations. More specifically to your use case, the Avg Bucket aggregation should be helpful, check out the example in the documentation, that should be very close to what you need I think.
Something like:
"avg_min_buying_time": {
"avg_bucket": {
"buckets_path": "games>min_buying_time"
}
}

ElasticSearch: Relevancy score override by function_score

Following is my elastic search query, I am using function_score to order my result in following manner:
1. First query should match to some fields (Fulltext search)
2. order result as per user current location (using gauss function)
3. give more weight to those service providers who has some recommendations (using gauss function)
4. give more preference to those service providers which has been recently reviewed (using script score)
Now (2,3,4) point ordering will be done on resulting set, but problem is whenever i am using geo location function exact matched service provider reordered and down in the listing and my query shows result those are near to users location irrespective its less matching to other documents.
Following is my query, please help me resolve this issue. Please also suggest to optimize my this scenario what is the best way to solve this issue.
{
"from": 0,
"size": 15,
"sort": {
"_score": {
"order": "desc"
}
},
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"term": {
"status": "1"
}
},
{
"query_string": {
"default_field": "_all",
"query": "Parag Gadhia Parenting classes",
"fields": [
"service_prrovider_name^3",
"location^2",
"category_name^5",
"keyword"
],
"use_dis_max": true
}
},
{
"term": {
"city_id": "1"
}
}
],
"should": [
{
"term": {
"category.category_name": "parenting classes"
}
}
]
}
},
"functions": [
{
"gauss": {
"geo_location": {
"origin": {
"lat": "19.451624199999998",
"lon": "72.7966481"
},
"offset": "20km",
"scale": "3km"
}
}
},
{
"gauss": {
"likes_count": {
"origin": 3,
"offset": "5",
"scale": "20"
}
},
"weight": 2
},
{
"script_score": {
"script": "(0.08 / ((3.16*pow(10,-11)) * abs(1426072330 - doc[\"reviews.created_time\"].value) + 0.05)) + 1.0"
}
}
]
}
}
}
Yes this is "normal", script_score will override the previous score.
You can use _score variable inside script to use it.
(Such as "script": "_score * (0.08 / ((3.16*pow(10,-11)) * abs(1426072330 - doc[\"reviews.created_time\"].value) + 0.05)) + 1.0")

Elastic Search different reults on URL query and JSON POST

I'm completing a search function on a big online webstore.
I have a problem with additional fields. When I try searching for some fields in browser, it works, but when posting a JSON using bool filter, it gives me 0 results (doesn't raise an error).
Basically: when I visit localhost:9200/search/items/_search?pretty=true&q=field-7:Diesel
It works well, however, in JSON it doesn't.
I've been googling all day and couldn't find any help in ElasticSeach documents. What frustrates me even more is that some other fields in bool query work OK, but this one doesn't.
I don't have any mapping and ES works for me out of the box - querying on the "name" field works well, as well as any other field, as well as for this field too - but only inside browser.
I realise that querying ES over browser uses so called "query string query".
Anyway, here is an example JSON that I'm posting to ElasticSearch.
(searching all items that have "golf mk5" in their name, which have diesel fuel type - by searching field-7).
{
"query": {
"filtered": {
"filter": {
"bool": {
"must_not": [
{
"term": {
"sold": "1"
}
},
{
"term": {
"user_id": "0"
}
}
],
"must": [
{
"term": {
"locked": "0"
}
},
{
"term": {
"removed": "0"
}
},
{
"terms": {
"field-7": [
"Diesel"
]
}
}
]
}
},
"query": {
"match": {
"name": {
"operator": "and",
"query": "+golf +Mk5"
}
}
}
}
},
"sort": [
{
"ordering": {
"price": "desc"
}
}
],
"from": 0,
"size": 24,
"facets": {
"category_count": {
"terms": {
"field": "category_id",
"size": 20,
"order": "count"
}
},
"price": {
"statistical": {
"field": "price"
}
}
}
}
Using a query_string-query, the text is analyzed. With the term-query (and -filter), it is not.
Since you're not specifying a mapping, you'll get the standard-analyzer for string fields. It tokenizes, lowercases and removes stopwords.
Thus, the term Diesel will be indexed as diesel. Your terms-filter is looking up the exact term Diesel, which is different.

Categories