I'm trying to do a group by in elasticsearch, by multiple fields. I know that nested aggregation exists, but what I want is including in a certain bucket the record for which the field I'm grouping by is empty.
Say that we have this kind of data structure:
SONG_ID | SONG_GENRE | SONG_ARTIST
and i want to group by genere, artists.
I would like to have a group for each possibile combination, i.e
group by genre gives me 5 buckets (if genres are 5) plus the bucket in which there are the songs without a genre. grouping then by artist gives me, for each genre, bucket by artists plus the one with songs without an artist.
Basically, I'd like to have the same results that I have using a group by. Is that even possible?
You can approach in different ways to solve your need.
The simplest way would be to index a fix value say "notmentioned" against the genre field of songs if genre is not present. you can do it while indexing or by defining "null_value" in your field mapping.
"SONG_GENRE": {"type": "string", "null_value": "notmentioned"},
"SONG_ARTIST": {"type": "string", "null_value": "notmentioned"},
So during aggregation (nested) you will automatically find the count against "notmentioned" for songs not having genre.
Another approach would be to use the missing filter as another aggregation along with normal aggregation. Something like below.
{
"aggs": {
"SONG_GENRE": {
"terms": {
"field": "SONG_GENRE"
},
"aggs": {
"SONG_ARTIST": {
"terms": {
"field": "SONG_ARTIST"
}
},
"MISSING_SONG_ARTIST": {
"filter": {
"missing": {
"field": "SONG_ARTIST"
}
}
}
}
},
"MISSING_SONG_GENRE": {
"filter": {
"missing": {
"field": "SONG_GENRE"
}
},
"aggs": {
"MISSING_SONG_GENRE_SONG_ARTIST": {
"terms": {
"field": "SONG_ARTIST"
}
},
"MISSING_SONG_GENRE_MISSING_SONG_ARTIST": {
"filter": {
"missing": {
"field": "SONG_ARTIST"
}
}
}
}
}
}
}
I haven't verified the syntax. It is just to give you an idea
Another hacking way could be to treat the missing count (total hits - all aggregation count) as the count against no genre.
Related
I'm using ES 6.6 and I'm doing a search for documents that are older than the current date. There are only 2 documents, but I get 3 items returned. The 2 existing documents and the third, are the settings and mappings. I only want to get the two documents.
I tried to add a filter with "exists", but then ES not return any document:
GET _search
{
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "products"
}
},
{
"range": {
"happening_at": {
"gte": "now"
}
}
}
]
}
}
}
When I search only with the range, I receive the 2 correct documents, but with extra "hit" without document, only with settings and mappings.
Welcome to SO, Adrián.
You are firing a _search across all indices since you've not specified any index name. Please try GET <your_index_name>/_search { ... request body ...}.
Also, "gte": "now" will hardly return any records since it means date greater than or equal to current date. In your case, you want records older than current date. So you could use lt:now or better still lt:now/d since now/d is good in terms of performance and allows caching.
Try the below:
GET <your_index_name>/_search
{
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "products"
}
},
{
"range": {
"happening_at": {
"lt": "now/d"
}
}
}
]
}
}
}
You have to POST your query :). If you want make a get please dont forget the /.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html
i'm working on elastic search but not expert in making elastic search queries. find my query bellow and if possible to convert into elasticsearch query then take thanks in advance
SELECT
`currency`.`id` AS `cur_id`,
`currency`.`currency_name` AS `cur_name`,
`currency`.`currency_code` AS `cur_code`,
`currency`.`currency_slug` AS `cur_slug`,
`currency`.`logo` AS `cur_logo`,
`currency`.`added_date` AS `cur_added_date`,
`currency`.`mineable_or_not` AS `mineable_or_not`,
`currency`.`market_cap` AS `cur_market_cap`,
`currency`.`circulating_supply` AS `cur_circulating_supply`,
`currency`.`max_supply` AS `cur_max_supply`,
`currency`.`total_supply` AS `cur_total_supply`,
`currency`.`market_cap` AS `ng_cur_market_cap`,
`currency`.`added_date` AS `ng_cur_added_date`,
`currency`.`circulating_supply` AS `ng_cur_circulating_supply`,
`calculations`.`volume_1hour` AS `cal_volume_1hour`,
`calculations`.`volume_24hour` AS `cal_volume_24hour`,
`calculations`.`volume_168hour` AS `cal_volume_168hour`,
`calculations`.`volume_720hour` AS `cal_volume_720hour`,
`calculations`.`volume_24hour_btc` AS `cal_volume_24hour_btc`,
`calculations`.`current_price` AS `cal_current_price`,
`calculations`.`percentage_change` AS `cal_percentage_change_24h`,
`calculations`.`percentage_change_1h` AS `cal_percentage_change_1h`,
`calculations`.`percentage_change_168h` AS `cal_percentage_change_168h`,
`calculations`.`volume_24hour` AS `ng_cal_volume_24hour`,
`calculations`.`current_price` AS `ng_cal_current_price`
FROM `currency`
JOIN `calculations` ON `calculations`.`currency_id` = `currency`.`id`
WHERE `calculations`.`update_status` = 1 AND `currency`.`currency_type` != 3 AND `calculations`.`update_status` = 1 AND `currency`.`status` = 1
ORDER BY `market_cap` DESC
LIMIT 100
As eliasah commented, there is no join operation in elastic search.
Joining queries
In general you can't really perform joining queries in ES. You can have a parent/child relationship on documents that are under the same index, but that is something I would not opt into. My best advice is to denormalize your data and have each document as 'self-contained' as possible. In this specific example, one possible solution is to store the calculations inside the currency, you would end up with a query like:
{
"_source": ["id", "logo", ..., "calculations.volume_1h","calculations.volume_24h",...],
"query": {
"bool": {
"must":[
{
"match":{
"calculations.update_status":1
}
},
{
"match":{
"currency_type":3
}
},
{
"match":{
"status":1
}
}
]
},
"sort" : [
{
"market_cap": {
"order": "desc"
}
}
]
"size":100
}
Does anyone know if it's possible to custom sort in elasticsearch?
I have a sort on the category field. Which groups all of the records together by category. This works great.
However could you then give the sort a list e.g cars, books, food.
It would then show the cars first, then books and finally food?
You can use a function_score query, something like this:
{
"query": {
"function_score": {
"query": { "match_all": {} },
"boost": "5",
"functions": [
{
"filter": { "match": { "category": "cars" } },
"weight": 100
},
{
"filter": { "match": { "category": "books" } },
"weight": 50
},
{
"filter": { "match": { "category": "food" } },
"weight": 1
}
],
"score_mode": "max",
"boost_mode": "replace"
}
}
}
Where you, of course, put whichever query you are using now instead of the match_all query, and leave off the sort (the default is by score, which is what you want here).
This is replacing the score elasticsearch normally generates, with a custom score for each category. You could experiment with other boost_mode in order to have a reasonable ranking within the categories. In case you need to understand what is happening with the scoring, you can add "explain": true to the query at the top level.
You can use custom script for your own scoring.
More details at in Script Based Sorting section: https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-request-sort.html
I need to do "Join" between 2 indexes (tables) and preform a check on specific field on documents that exists in both indexes.
I want to add condition like "dateExpiry" below, but I get an error. Is it possible to join 2 or more indexes?
GET cache-*/_search
{
"query": {
"bool": {
"must_not": [
{
"query": {
"terms": {
"TagId": {
"index": "domain_block-2016.06",
"type": "cBlock",
"id": "57692ef6ae8c50f67e8b45",
"path": "TagId",
"range" : {
"dateExpiry" : {
"gte" : "20160705T12:00:00"
}
}
}
}
}
]
}
}
}
Filters within a Terms Query Lookup are currently not supported. However, Elasticsearch has some great documentation on joins / relationships here.
Your best bet may be to run two queries against Elasticsearch - one to fetch the list of TagIds, then another that includes the list as an exclusion clause.
I am relatively new to ElasticSearch. I am using it as a search platform for pdf documents. I break the PDFs into text-pages and enter each one as an elasticSearch record with it's corresponding page ID, parent info, etc.
What I'm finding difficult is matching a given query not only to a single document in ES, but making it match any document with the same parent ID. So if two terms are searched, if the terms existed on page 1 and 7 of the actual PDF document (2 separate entries into ES), I want to match this result.
Essentially my goal is to be able to search through the multiple pages of a single PDF, matching happening on any of the document-pages in the PDF, and to return a list of matching PDF documents for the search result, instead of matching "pages"
You will need to use the "has_child" query on pages. I'm assumed that you're already defined the mapping for parent/child relationship of documents and pages. Then you can write a "has_child" query that search on pages (child type) but return PDF documents (parent type):
{
"query": {
"has_child": {
"type": "your_pages_type",
"score_type": "max", // read document for more
"query": {
"query_string": {
"query": "some text to search",
"fields": [
"your_pages_body"
],
"default_operator": "and" // "and" if you want to search all words, "or" if you want to search any of words in query
}
}
}
}
}
It's somewhat tricky. First of all, you will have to split your query into terms yourself. Having a list of terms (let's say foo, bar and baz, you can create a bool query against type representing PDFs (parent type) that would look like this:
{
"bool" : {
"must" : [{
"has_child" : {
"type": "page",
"query": {
"match": {
"page_body": "foo"
}
}
}
}, {
"has_child" : {
"type": "page",
"query": {
"match": {
"page_body": "bar"
}
}
}
}, {
"has_child" : {
"type": "page",
"query": {
"match": {
"page_body": "baz"
}
}
}
}]
}
}
This query will find you all PDFs that contain at least one page with each term.