Elasticsearch aggregation taking long time

Elasticsearch aggregation taking long time - php

I am running value count aggregation and cardinal aggregation in my dataset and is using the following query.
GET my_index/my_type/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [{
"range": {
"time": {
"gt": "2015-03-04 00:00:00",
"lt": "2015-03-04 23:59:59"
}
}
}
],
"should": [
{
"term": {
"andi.raw": "1d3d7bac8ce4c620"
}
}
]
}
}
}
},
"aggs": {
"user_count": {
"cardinality": {
"field": "andi.raw"
}
}
}
}
I am running this inside a loop for various dates and at a time the timeframe will be one day and term filter have 50 terms (andi.raw aganist 50 values). One such iteration is taking around 2.5 secods to load and I have almost 50-80 iterations most of the time. So it is taking a lot of time. Is there anyway to optimize this for increase in performance ?

Your queries look pretty intense to me. I had a similar set of queries and the multi-search API saved me a few seconds.
See multi-search API

Related

Can you use named queries on Elasticsearch with PHP?

I know you can use named queries in Elasticsearch to test which document matched the best in Kibana but I'm running Wordpress with Jetpack search which uses elasticsearch PHP (v2.4) and I want to be able to test my queries and return the named queries on each result so I can better understand that my queries returned what I had intended. This is how it's done in Elasticsearch (json):
...
"must": [
{
"match": {
"body": {
"query": "Will Smith",
"_name": "match_will_smith"
}
}
}
],
"should": [
{
"match_phrase": {
"body": {
"query": "Will Smith",
"slop": 5,
"_name": "should_match_phrase_will_smith_with_slop"
}
}
},
]...
Result:
"matched_queries" : [
"match_will_smith",
"should_match_phrase_will_smith_with_slop"
]
That would be awesome if I could get the value of the "matched_queries" object and print it to my php page on every result to I can see what each article is matching. Anyone knows if this is possible?

I think the closest thing is using explain parameter in your query
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html
it would give how the score of your document was calculated by using which sections of your query
but I would not use it in production environment

ElasticSearch - Exclude hit with metadata

I'm using ES 6.6 and I'm doing a search for documents that are older than the current date. There are only 2 documents, but I get 3 items returned. The 2 existing documents and the third, are the settings and mappings. I only want to get the two documents.
I tried to add a filter with "exists", but then ES not return any document:
GET _search
{
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "products"
}
},
{
"range": {
"happening_at": {
"gte": "now"
}
}
}
]
}
}
}
When I search only with the range, I receive the 2 correct documents, but with extra "hit" without document, only with settings and mappings.

Welcome to SO, Adrián.
You are firing a _search across all indices since you've not specified any index name. Please try GET <your_index_name>/_search { ... request body ...}.
Also, "gte": "now" will hardly return any records since it means date greater than or equal to current date. In your case, you want records older than current date. So you could use lt:now or better still lt:now/d since now/d is good in terms of performance and allows caching.
Try the below:
GET <your_index_name>/_search
{
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "products"
}
},
{
"range": {
"happening_at": {
"lt": "now/d"
}
}
}
]
}
}
}

You have to POST your query :). If you want make a get please dont forget the /.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html

Custom sorting in Elasticsearch

Does anyone know if it's possible to custom sort in elasticsearch?
I have a sort on the category field. Which groups all of the records together by category. This works great.
However could you then give the sort a list e.g cars, books, food.
It would then show the cars first, then books and finally food?

You can use a function_score query, something like this:
{
"query": {
"function_score": {
"query": { "match_all": {} },
"boost": "5",
"functions": [
{
"filter": { "match": { "category": "cars" } },
"weight": 100
},
{
"filter": { "match": { "category": "books" } },
"weight": 50
},
{
"filter": { "match": { "category": "food" } },
"weight": 1
}
],
"score_mode": "max",
"boost_mode": "replace"
}
}
}
Where you, of course, put whichever query you are using now instead of the match_all query, and leave off the sort (the default is by score, which is what you want here).
This is replacing the score elasticsearch normally generates, with a custom score for each category. You could experiment with other boost_mode in order to have a reasonable ranking within the categories. In case you need to understand what is happening with the scoring, you can add "explain": true to the query at the top level.

You can use custom script for your own scoring.
More details at in Script Based Sorting section: https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-request-sort.html

Elasticsearch "Join" tables

I need to do "Join" between 2 indexes (tables) and preform a check on specific field on documents that exists in both indexes.
I want to add condition like "dateExpiry" below, but I get an error. Is it possible to join 2 or more indexes?
GET cache-*/_search
{
"query": {
"bool": {
"must_not": [
{
"query": {
"terms": {
"TagId": {
"index": "domain_block-2016.06",
"type": "cBlock",
"id": "57692ef6ae8c50f67e8b45",
"path": "TagId",
"range" : {
"dateExpiry" : {
"gte" : "20160705T12:00:00"
}
}
}
}
}
]
}
}
}

Filters within a Terms Query Lookup are currently not supported. However, Elasticsearch has some great documentation on joins / relationships here.
Your best bet may be to run two queries against Elasticsearch - one to fetch the list of TagIds, then another that includes the list as an exclusion clause.

multiple group by in elasticsearch including missing values

I'm trying to do a group by in elasticsearch, by multiple fields. I know that nested aggregation exists, but what I want is including in a certain bucket the record for which the field I'm grouping by is empty.
Say that we have this kind of data structure:
SONG_ID | SONG_GENRE | SONG_ARTIST
and i want to group by genere, artists.
I would like to have a group for each possibile combination, i.e
group by genre gives me 5 buckets (if genres are 5) plus the bucket in which there are the songs without a genre. grouping then by artist gives me, for each genre, bucket by artists plus the one with songs without an artist.
Basically, I'd like to have the same results that I have using a group by. Is that even possible?

You can approach in different ways to solve your need.
The simplest way would be to index a fix value say "notmentioned" against the genre field of songs if genre is not present. you can do it while indexing or by defining "null_value" in your field mapping.
"SONG_GENRE": {"type": "string", "null_value": "notmentioned"},
"SONG_ARTIST": {"type": "string", "null_value": "notmentioned"},
So during aggregation (nested) you will automatically find the count against "notmentioned" for songs not having genre.
Another approach would be to use the missing filter as another aggregation along with normal aggregation. Something like below.
{
"aggs": {
"SONG_GENRE": {
"terms": {
"field": "SONG_GENRE"
},
"aggs": {
"SONG_ARTIST": {
"terms": {
"field": "SONG_ARTIST"
}
},
"MISSING_SONG_ARTIST": {
"filter": {
"missing": {
"field": "SONG_ARTIST"
}
}
}
}
},
"MISSING_SONG_GENRE": {
"filter": {
"missing": {
"field": "SONG_GENRE"
}
},
"aggs": {
"MISSING_SONG_GENRE_SONG_ARTIST": {
"terms": {
"field": "SONG_ARTIST"
}
},
"MISSING_SONG_GENRE_MISSING_SONG_ARTIST": {
"filter": {
"missing": {
"field": "SONG_ARTIST"
}
}
}
}
}
}
}
I haven't verified the syntax. It is just to give you an idea
Another hacking way could be to treat the missing count (total hits - all aggregation count) as the count against no genre.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Elasticsearch aggregation taking long time - php

Your queries look pretty intense to me. I had a similar set of queries and the multi-search API saved me a few seconds. See multi-search API

Related

Can you use named queries on Elasticsearch with PHP?

ElasticSearch - Exclude hit with metadata

Custom sorting in Elasticsearch

Elasticsearch "Join" tables

multiple group by in elasticsearch including missing values

Categories

Resources