ElasticSearch - Exclude hit with metadata

ElasticSearch - Exclude hit with metadata - php

I'm using ES 6.6 and I'm doing a search for documents that are older than the current date. There are only 2 documents, but I get 3 items returned. The 2 existing documents and the third, are the settings and mappings. I only want to get the two documents.
I tried to add a filter with "exists", but then ES not return any document:
GET _search
{
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "products"
}
},
{
"range": {
"happening_at": {
"gte": "now"
}
}
}
]
}
}
}
When I search only with the range, I receive the 2 correct documents, but with extra "hit" without document, only with settings and mappings.

Welcome to SO, Adrián.
You are firing a _search across all indices since you've not specified any index name. Please try GET <your_index_name>/_search { ... request body ...}.
Also, "gte": "now" will hardly return any records since it means date greater than or equal to current date. In your case, you want records older than current date. So you could use lt:now or better still lt:now/d since now/d is good in terms of performance and allows caching.
Try the below:
GET <your_index_name>/_search
{
"query": {
"bool": {
"filter": [
{
"exists": {
"field": "products"
}
},
{
"range": {
"happening_at": {
"lt": "now/d"
}
}
}
]
}
}
}

You have to POST your query :). If you want make a get please dont forget the /.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html

Related

Custom sorting in Elasticsearch

Does anyone know if it's possible to custom sort in elasticsearch?
I have a sort on the category field. Which groups all of the records together by category. This works great.
However could you then give the sort a list e.g cars, books, food.
It would then show the cars first, then books and finally food?

You can use a function_score query, something like this:
{
"query": {
"function_score": {
"query": { "match_all": {} },
"boost": "5",
"functions": [
{
"filter": { "match": { "category": "cars" } },
"weight": 100
},
{
"filter": { "match": { "category": "books" } },
"weight": 50
},
{
"filter": { "match": { "category": "food" } },
"weight": 1
}
],
"score_mode": "max",
"boost_mode": "replace"
}
}
}
Where you, of course, put whichever query you are using now instead of the match_all query, and leave off the sort (the default is by score, which is what you want here).
This is replacing the score elasticsearch normally generates, with a custom score for each category. You could experiment with other boost_mode in order to have a reasonable ranking within the categories. In case you need to understand what is happening with the scoring, you can add "explain": true to the query at the top level.

You can use custom script for your own scoring.
More details at in Script Based Sorting section: https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-request-sort.html

Elasticsearch "Join" tables

I need to do "Join" between 2 indexes (tables) and preform a check on specific field on documents that exists in both indexes.
I want to add condition like "dateExpiry" below, but I get an error. Is it possible to join 2 or more indexes?
GET cache-*/_search
{
"query": {
"bool": {
"must_not": [
{
"query": {
"terms": {
"TagId": {
"index": "domain_block-2016.06",
"type": "cBlock",
"id": "57692ef6ae8c50f67e8b45",
"path": "TagId",
"range" : {
"dateExpiry" : {
"gte" : "20160705T12:00:00"
}
}
}
}
}
]
}
}
}

Filters within a Terms Query Lookup are currently not supported. However, Elasticsearch has some great documentation on joins / relationships here.
Your best bet may be to run two queries against Elasticsearch - one to fetch the list of TagIds, then another that includes the list as an exclusion clause.

multiple group by in elasticsearch including missing values

I'm trying to do a group by in elasticsearch, by multiple fields. I know that nested aggregation exists, but what I want is including in a certain bucket the record for which the field I'm grouping by is empty.
Say that we have this kind of data structure:
SONG_ID | SONG_GENRE | SONG_ARTIST
and i want to group by genere, artists.
I would like to have a group for each possibile combination, i.e
group by genre gives me 5 buckets (if genres are 5) plus the bucket in which there are the songs without a genre. grouping then by artist gives me, for each genre, bucket by artists plus the one with songs without an artist.
Basically, I'd like to have the same results that I have using a group by. Is that even possible?

You can approach in different ways to solve your need.
The simplest way would be to index a fix value say "notmentioned" against the genre field of songs if genre is not present. you can do it while indexing or by defining "null_value" in your field mapping.
"SONG_GENRE": {"type": "string", "null_value": "notmentioned"},
"SONG_ARTIST": {"type": "string", "null_value": "notmentioned"},
So during aggregation (nested) you will automatically find the count against "notmentioned" for songs not having genre.
Another approach would be to use the missing filter as another aggregation along with normal aggregation. Something like below.
{
"aggs": {
"SONG_GENRE": {
"terms": {
"field": "SONG_GENRE"
},
"aggs": {
"SONG_ARTIST": {
"terms": {
"field": "SONG_ARTIST"
}
},
"MISSING_SONG_ARTIST": {
"filter": {
"missing": {
"field": "SONG_ARTIST"
}
}
}
}
},
"MISSING_SONG_GENRE": {
"filter": {
"missing": {
"field": "SONG_GENRE"
}
},
"aggs": {
"MISSING_SONG_GENRE_SONG_ARTIST": {
"terms": {
"field": "SONG_ARTIST"
}
},
"MISSING_SONG_GENRE_MISSING_SONG_ARTIST": {
"filter": {
"missing": {
"field": "SONG_ARTIST"
}
}
}
}
}
}
}
I haven't verified the syntax. It is just to give you an idea
Another hacking way could be to treat the missing count (total hits - all aggregation count) as the count against no genre.

Elasticsearch aggregation taking long time

I am running value count aggregation and cardinal aggregation in my dataset and is using the following query.
GET my_index/my_type/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [{
"range": {
"time": {
"gt": "2015-03-04 00:00:00",
"lt": "2015-03-04 23:59:59"
}
}
}
],
"should": [
{
"term": {
"andi.raw": "1d3d7bac8ce4c620"
}
}
]
}
}
}
},
"aggs": {
"user_count": {
"cardinality": {
"field": "andi.raw"
}
}
}
}
I am running this inside a loop for various dates and at a time the timeframe will be one day and term filter have 50 terms (andi.raw aganist 50 values). One such iteration is taking around 2.5 secods to load and I have almost 50-80 iterations most of the time. So it is taking a lot of time. Is there anyway to optimize this for increase in performance ?

Your queries look pretty intense to me. I had a similar set of queries and the multi-search API saved me a few seconds.
See multi-search API

Matching across Multiple documents with ElasticSearch

I am relatively new to ElasticSearch. I am using it as a search platform for pdf documents. I break the PDFs into text-pages and enter each one as an elasticSearch record with it's corresponding page ID, parent info, etc.
What I'm finding difficult is matching a given query not only to a single document in ES, but making it match any document with the same parent ID. So if two terms are searched, if the terms existed on page 1 and 7 of the actual PDF document (2 separate entries into ES), I want to match this result.
Essentially my goal is to be able to search through the multiple pages of a single PDF, matching happening on any of the document-pages in the PDF, and to return a list of matching PDF documents for the search result, instead of matching "pages"

You will need to use the "has_child" query on pages. I'm assumed that you're already defined the mapping for parent/child relationship of documents and pages. Then you can write a "has_child" query that search on pages (child type) but return PDF documents (parent type):
{
"query": {
"has_child": {
"type": "your_pages_type",
"score_type": "max", // read document for more
"query": {
"query_string": {
"query": "some text to search",
"fields": [
"your_pages_body"
],
"default_operator": "and" // "and" if you want to search all words, "or" if you want to search any of words in query
}
}
}
}
}

It's somewhat tricky. First of all, you will have to split your query into terms yourself. Having a list of terms (let's say foo, bar and baz, you can create a bool query against type representing PDFs (parent type) that would look like this:
{
"bool" : {
"must" : [{
"has_child" : {
"type": "page",
"query": {
"match": {
"page_body": "foo"
}
}
}
}, {
"has_child" : {
"type": "page",
"query": {
"match": {
"page_body": "bar"
}
}
}
}, {
"has_child" : {
"type": "page",
"query": {
"match": {
"page_body": "baz"
}
}
}
}]
}
}
This query will find you all PDFs that contain at least one page with each term.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

ElasticSearch - Exclude hit with metadata - php

You have to POST your query :). If you want make a get please dont forget the /. https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-body.html

Related

Custom sorting in Elasticsearch

Elasticsearch "Join" tables

multiple group by in elasticsearch including missing values

Elasticsearch aggregation taking long time

Matching across Multiple documents with ElasticSearch

Categories

Resources