Aggs multiple buckets with nested documents in Elasticsearch - php

I'm currently working on an Elasticsearch project. I want to aggregate data from our existing documents.
The (simplified) structure is as follows:
{
"products" : {
"mappings" : {
"product" : {
"properties" : {
"created" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss"
},
"description" : {
"type" : "text"
},
"facets" : {
"type" : "nested",
"properties" : {
"facet_id" : {
"type" : "long"
}
"name_slug" : {
"type" : "keyword"
},
"value_slug" : {
"type" : "keyword"
}
}
},
}
}
}
}
}
Want I want to achieve with one query:
Select the unique facet_name values
Under the facet_names I want all corresponding facet_values
Something like this:
- facet_name
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
- facet_name
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
-- facet_sub_value (counter?)
Can you guys point me in the right direction? I've looked at the aggs query, but the documentation is not clearly enough in order to realise this.

You'll be using nested terms aggregations. Since the facet names & values are under the same path, you can try this:
GET products/_search
{
"size": 0,
"aggs": {
"by_facet_names_parent": {
"nested": {
"path": "facets"
},
"aggs": {
"by_facet_names_nested": {
"terms": {
"field": "facets.name_slug",
"size": 10
},
"aggs": {
"by_facet_subvalues": {
"terms": {
"field": "facets.value_slug",
"size": 10
}
}
}
}
}
}
}
}
And your response should look like something along these lines:
{
"took": 26,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 30,
"max_score": 0,
"hits": []
},
"aggregations": {
"by_facet_names_parent": {
"doc_count": 90,
"by_facet_names_nested": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 80,
"buckets": [
{
"key": "0JDcya7Y7Y", <-------- your facet name keyword
"doc_count": 4,
"by_facet_subvalues": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "3q4E9R6h5k", <-------- one of the facet values + its count
"doc_count": 3
},
{
"key": "1q4E9R6h5k", <-------- another facet value & count
"doc_count": 1
}
]
}
},
{
"key": "0RyRKWugU1",
"doc_count": 1,
"by_facet_subvalues": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Af7qeCsXz6",
"doc_count": 1
}
]
}
}
.....
]
}
}
}
}
Notice how the number of nested buckets might be >= the number of your actual products docs. This is because the nested aggregations treat the nested subdocuments as separate documents within the parent documents. This takes some time to digest but it'll make sense when you play around with them long enough.

Related

Elasticsearch : How to use multiple filter and calculation in aggregations?

I'm trying to do a function on kibana.
I have an index with orders with some fields :
datetime1, datetime2 with format : yyyy-MM-dd HH:mm
First I have to check if datetime1 exist.
Secondly I have to check the difference between this 2 datime datetime2 - datetime1
To finish I have to put the result in differents aggs if the difference is:
less than 24h
between 24 and 48h
48 - 72
....
What I tried :
GET orders/_search
{
"size": 0,
"aggs": {
"test1": {
"filters": {
"filters": {
"exist_datetime1": {
"exists": {
"field": "datetime1"
}
},
"24_hours": {
"script": {
"script": {
"source": "doc['datetime2'].value - doc['datetime1'].value < 24",
"lang": "painless"
}
}
}
}
}
}
}
}
How can I do multiple filter and do a subtraction between date ?
Thank for your help :)
That's a good start, however, I think you need something slightly different. Here is an attempt at providing the ranges your need using the range aggregation powered by your script.
You need to make sure both date fields have values (query part) and then you can define the buckets you need (< 24h, 24h - 48h, etc)
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "datetime1"
}
},
{
"exists": {
"field": "datetime2"
}
}
]
}
},
"aggs": {
"ranges": {
"range": {
"script": {
"lang": "painless",
"source": "(doc['datetime2'].value.millis - doc['datetime1'].value.millis) / 3600000"
},
"ranges": [
{
"to": 24,
"key": "< 24h"
},
{
"from": 24,
"to": 48,
"key": "24h-48h"
},
{
"from": 48,
"key": "> 48h"
}
]
}
}
}
}

match closest number value elasticsearchDSL(php)

I'm having trouble finding an answer on SO, elastic search docs, or google to find this answer for my use case:
Find the closest number to X input that is still lower then X.
I have a mapping that looks like this:
{
"rule": {
"properties": {
"price": { "type": "long" },
"from": { "type": "long" }
}
}
}
What I need the closest matching from, that is less then the input value.
So for example I have these rules:
{
{ "rule": {"from": 1, "price": 5} },
{ "rule": {"from": 50, "price": 4} },
{ "rule": {"from": 100, "price": 3} },
{ "rule": {"from": 150, "price": 2} }
}
If I search for the from with the value off 75, I'd want the rule for "from": 50.
Most of the answers I found were relating to geo/ip or text, I could not find an example that made it click for me.
Range query can be used to get all rules which are less than equal to input value. Top 1 sorted document(by from ) can be returned
Query:
{
"query": {
"range": {
"rule.from": {
"lte": 75
}
}
},
"size": 1,
"sort": [
{
"rule.from": {
"order": "desc"
}
}
]
}

How to calculate average minimal times in Elasticsearch?

Use Elasticsearch version is 5.4.2
I'd like to build an Elasticsearch query to satisfy three conditions.
filter by championId
get minimal time to buy various item per game
calculate avg minimal time to buy each item in all games.
I did 1 and 2. But I could not find solving 3. Is it possible to execute 1 to 3 in the query? Just in case, I will use the result on Laravel 5.4, one of PHP frameworks.
My data format is the following:
"_index": "timelines",
"_type": "timeline"
"_source": {
"gameId": 152735348,
"participantId": 3,
"championId": 35,
"role": "NONE",
"lane": "JUNGLE",
"win": 1,
"itemId": 1036,
"timestamp": 571200
}
My current Elasticsearch query is this
GET timelines/_search?size=0&pretty
{
"query": {
"bool": {
"must": [
{ "match": { "championId": 22 }}
]
}
},
"aggs": {
"games": {
"terms": {
"field": "gameId"
},
"aggs": {
"items": {
"terms": {
"field": "itemId",
"order" : { "min_buying_time" : "asc" }
},
"aggs": {
"min_buying_time": {
"min": {
"field": "timestamp"
}
}
}
}
}
}
}
}
As #Sönke Liebau said pipeline aggregation is the key, but if you want to count average minimal time of all games per item you should first aggregate by itemID. Following code should help:
POST misko/_search
{
"query": {
"bool": {
"must": [
{ "match": { "championId": 22 }}
]
}
},
"aggs": {
"items": {
"terms": {
"field": "itemId"
},
"aggs": {
"games": {
"terms": {
"field": "gameId"
},
"aggs": {
"min_buying_time": {
"min": {
"field": "timestamp"
}
}
}
},
"avg_min_time": {
"avg_bucket": {
"buckets_path": "games>min_buying_time"
}
}
}
}
}
}
If I understand your objective correctly you should be able to solve this with pipeline aggregations. More specifically to your use case, the Avg Bucket aggregation should be helpful, check out the example in the documentation, that should be very close to what you need I think.
Something like:
"avg_min_buying_time": {
"avg_bucket": {
"buckets_path": "games>min_buying_time"
}
}

ElasticSearch: Relevancy score override by function_score

Following is my elastic search query, I am using function_score to order my result in following manner:
1. First query should match to some fields (Fulltext search)
2. order result as per user current location (using gauss function)
3. give more weight to those service providers who has some recommendations (using gauss function)
4. give more preference to those service providers which has been recently reviewed (using script score)
Now (2,3,4) point ordering will be done on resulting set, but problem is whenever i am using geo location function exact matched service provider reordered and down in the listing and my query shows result those are near to users location irrespective its less matching to other documents.
Following is my query, please help me resolve this issue. Please also suggest to optimize my this scenario what is the best way to solve this issue.
{
"from": 0,
"size": 15,
"sort": {
"_score": {
"order": "desc"
}
},
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"term": {
"status": "1"
}
},
{
"query_string": {
"default_field": "_all",
"query": "Parag Gadhia Parenting classes",
"fields": [
"service_prrovider_name^3",
"location^2",
"category_name^5",
"keyword"
],
"use_dis_max": true
}
},
{
"term": {
"city_id": "1"
}
}
],
"should": [
{
"term": {
"category.category_name": "parenting classes"
}
}
]
}
},
"functions": [
{
"gauss": {
"geo_location": {
"origin": {
"lat": "19.451624199999998",
"lon": "72.7966481"
},
"offset": "20km",
"scale": "3km"
}
}
},
{
"gauss": {
"likes_count": {
"origin": 3,
"offset": "5",
"scale": "20"
}
},
"weight": 2
},
{
"script_score": {
"script": "(0.08 / ((3.16*pow(10,-11)) * abs(1426072330 - doc[\"reviews.created_time\"].value) + 0.05)) + 1.0"
}
}
]
}
}
}
Yes this is "normal", script_score will override the previous score.
You can use _score variable inside script to use it.
(Such as "script": "_score * (0.08 / ((3.16*pow(10,-11)) * abs(1426072330 - doc[\"reviews.created_time\"].value) + 0.05)) + 1.0")

Executing bitmask with elastics database query

I would like to select data from an elastics database where the data it returns will be based on the (bitmap) evaluation of a number in the query.
Something like $x & 32 == 32
The query is as follows:
{
"size":1000,
"sort": {
"timestamp": "desc"
},
"fields" : ["id","timestamp", "eval_id"],
"query": {
"bool": {
"must": [
{
"term": {
"id": "450"
}
},
{
"term": {"eval_id": "161"}
},
{
"range": {
"timestamp": {
"gte": 1427061600000,
"lte": 1427147999000
}
}
}
]
}
}
}
So the "eval_id" must pass a bitmap evaluation in order to be returned by the JSON result.
So eval_id can be 161 or 681 or 421 and so on..
In SQL it looks like this: SUM(If ((eval_id & 1 = 1), 1,0)) as 'EVAL_value'
Can anyone help?

Categories