Elasticsearch - Distinct Values, Not Counts - php

I am trying to do something similar to this SQL query:
SELECT * FROM table WHERE fileContent LIKE '%keyword%' AND company_id = '1' GROUP BY email
Having read posts similar to this I have this:
{
"query": {
"bool": {
"must": [{
"match": {
"fileContent": {
"query": "keyword"
}
}
}],
"filter": [{
"terms": {
"company_id": [1]
}
}]
}
},
"aggs": {
"group_by_email": {
"terms": {
"field": "email",
"size": 1000
}
}
},
"size": 0
}
Field mappings are:
{
"cvs" : {
"mappings" : {
"application" : {
"_meta" : {
"model" : "Acme\\AppBundle\\Entity\\Application"
},
"dynamic_date_formats" : [ ],
"properties" : {
"email" : {
"type" : "keyword"
},
"fileContent" : {
"type" : "text"
},
"company_id" : {
"type" : "text"
}
}
}
}
}
}
... which are generated from Symfony config.yml:
fos_elastica:
clients:
default:
host: "%elastica.host%"
port: "%elastica.port%"
indexes:
cvs:
client: default
types:
application:
properties:
fileContent: ~
email:
index: not_analyzed
company_id: ~
persistence:
driver: orm
model: Acme\AppBundle\Entity\Application
provider: ~
finder: ~
The filter works fine, but I am finding that hits:hits returns no items (or all results matching the search if I remove size:0) and aggregations:group_by_email:buckets has a count of the groups but not the records themselves. The records that were grouped aren't returned and it's these that I need.
I have also tried with FOSElasticBundle using the query builder if this is your preferred flavour (this works but doesn't have the grouping/aggregation):
$boolQuery = new \Elastica\Query\BoolQuery();
$filterKeywords = new \Elastica\Query\Match();
$filterKeywords->setFieldQuery('fileContent', 'keyword');
$boolQuery->addMust($filterKeywords);
$filterUser = new \Elastica\Query\Terms();
$filterUser->setTerms('company_id', array('1'));
$boolQuery->addFilter($filterUser);
$finder = $this->get('fos_elastica.finder.cvs.application');
Thanks.

For this you need top_hits aggregation inside the terms one you are already using:
"aggs": {
"group_by_email": {
"terms": {
"field": "email",
"size": 1000
},
"aggs": {
"sample_docs": {
"top_hits": {
"size": 100
}
}
}
}
}

top_hits:{size:1} appears to be what I need, having played around with Andrei's answer. This will return one record for each bucket in the aggregation
"aggs": {
"group_by_email": {
"terms": {
"field": "email",
"size": 1000
},
"aggs": {
"sample_docs": {
"top_hits": {
"size": 1
}
}
}
}
}
Ref: top_hits

top_hits helped me too. I had some trouble too, but eventually figured out how to resolve it. So here is my solution:
{
"query": {
"nested": {
"path": "placedOrders",
"query": {
"bool": {
"must": [
{
"term": {
"placedOrders.ownerId": "0a9fdef0-4508-4f9c-aa8c-b3984e39ad1e"
}
}
]
}
}
}
},
"aggs": {
"custom_name1": {
"nested": {
"path": "placedOrders"
},
"aggs": {
"custom_name2": {
"terms": {
"field": "placedOrders.propertyId"
},
"aggs": {
"custom_name3": {
"top_hits": {
"size": 1,
"sort": [
{
"placedOrders.propertyId": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}

Related

Multiple search field elasticsearphp

Hello i want to do something like that with elasticsearch enter image description here
I already have some knowledge in elasticsearch but I can't understand how can I do this , multiple search
You can use a combination of bool/must/should clause to combine multiple conditions
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "tag"
}
},
{
"match": {
"answers": 0
}
},
{
"match": {
"user": 1234
}
},
{
"multi_match": {
"query": "words here",
"type": "phrase"
}
},
{
"match": {
"score": 3
}
},
{
"match": {
"isaccepted": "yes"
}
}
]
}
}
}
If you want to search on multiple fields then you can use multi_match query
If no fields are provided, the multi_match query defaults to the
index.query.default_field index settings, which in turn defaults to *.
This extracts all fields in the mapping that are eligible to term queries and filters the metadata fields. All extracted fields are then
combined to build a query.
Adding a working example with index data, search query, and search result
Index Data:
{
"answers": 0,
"isaccepted": "no"
}
{
"answers": 0,
"isaccepted": "yes"
}
Search Query:
{
"query": {
"multi_match" : {
"query" : "yes"
}
}
}
Search Result:
"hits": [
{
"_index": "67542669",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"answers": 0,
"isaccepted": "yes"
}
}
]

Elastisearch Failed to JSON encode issue

I working on elastic search and I have 1K phone numbers when I pass this phone numbers array to elastic search to search users through phone numbers it gives me exception
Failed to JSON encode /var/app/current/vendor/elasticsearch/elasticsearch/src/Elasticsearch/Serializers/SmartSerializer.php
Below is my Elasticsearch client initializing
$client = ClientBuilder::create()->setHosts([$host])->build();
And my working query in Elasticsearch
{
"_source": [
"id"
],
"query": {
"bool": {
"must": [
{
"term": {
"type": "user"
}
},
{
"bool": {
"should": [
{
"prefix": {
"phone": {
"value": "923047698099"
}
}
},
{
"prefix": {
"phone": {
"value": "92313730320"
}
}
},
.
.
.
]
}
}
],
"must_not": [
{
"has_child": {
"type": "blocked",
"query": {
"term": {
"user_id": "u-2"
}
}
}
},
{
"has_child": {
"type": "block",
"query": {
"term": {
"user_id": "u-2"
}
}
}
},
{
"term": {
"db_id": 2
}
}
]
}
}
}
I don't know that where I doing mistake. Either at client initializing or writing elasticserch query. I searched this issue but not usefull solution found or might be I did't understand clearly. But still I am stucked on this issue that how to solve this problem. Suggest any usefull link or solution.
Thanks

How to calculate average minimal times in Elasticsearch?

Use Elasticsearch version is 5.4.2
I'd like to build an Elasticsearch query to satisfy three conditions.
filter by championId
get minimal time to buy various item per game
calculate avg minimal time to buy each item in all games.
I did 1 and 2. But I could not find solving 3. Is it possible to execute 1 to 3 in the query? Just in case, I will use the result on Laravel 5.4, one of PHP frameworks.
My data format is the following:
"_index": "timelines",
"_type": "timeline"
"_source": {
"gameId": 152735348,
"participantId": 3,
"championId": 35,
"role": "NONE",
"lane": "JUNGLE",
"win": 1,
"itemId": 1036,
"timestamp": 571200
}
My current Elasticsearch query is this
GET timelines/_search?size=0&pretty
{
"query": {
"bool": {
"must": [
{ "match": { "championId": 22 }}
]
}
},
"aggs": {
"games": {
"terms": {
"field": "gameId"
},
"aggs": {
"items": {
"terms": {
"field": "itemId",
"order" : { "min_buying_time" : "asc" }
},
"aggs": {
"min_buying_time": {
"min": {
"field": "timestamp"
}
}
}
}
}
}
}
}
As #Sönke Liebau said pipeline aggregation is the key, but if you want to count average minimal time of all games per item you should first aggregate by itemID. Following code should help:
POST misko/_search
{
"query": {
"bool": {
"must": [
{ "match": { "championId": 22 }}
]
}
},
"aggs": {
"items": {
"terms": {
"field": "itemId"
},
"aggs": {
"games": {
"terms": {
"field": "gameId"
},
"aggs": {
"min_buying_time": {
"min": {
"field": "timestamp"
}
}
}
},
"avg_min_time": {
"avg_bucket": {
"buckets_path": "games>min_buying_time"
}
}
}
}
}
}
If I understand your objective correctly you should be able to solve this with pipeline aggregations. More specifically to your use case, the Avg Bucket aggregation should be helpful, check out the example in the documentation, that should be very close to what you need I think.
Something like:
"avg_min_buying_time": {
"avg_bucket": {
"buckets_path": "games>min_buying_time"
}
}

ElasticSearch query to search parent docs and consider that at least has one child

I have implemented parent child relation ship in elasticsearch, parent type name is participant_tests and its child type name is participant_test_question_answers, I have some search conditions for parent type as bellow:
query": {
"bool":{
"should":[
{
"range":{
"created_at":{
"gte":"2016-01-01",
"lte":"2016-05-11",
"format":"YYYY-MM-dd"
}
}
},
{
"terms":{
"created_by.id":["1000001"]
}
},
{
"multi_match":{
"query":"test human time",
"fields":[
"course_tests.course_id.course_no",
"participant_test_questions.answer_text",
"participants.first_name",
"participants.last_name",
"participants.promote_unique_user_id","result"
],
"operator":"OR"
}
}
]
}
}
Above query return result but if I want to check has_child no result returned.
I have added this lines after bool class.
"has_child": {
"type": "participant_test_question_answers",
"min_children": 1,
"query": {
"match_all": {}
}
}
Try this query:
{
"query": {
"bool": {
"should": [
{
"range": {
"created_at": {
"gte": "2016-01-01",
"lte": "2016-05-11",
"format": "YYYY-MM-dd"
}
}
},
{
"terms": {
"created_by.id": [
"1000001"
]
}
},
{
"multi_match": {
"query": "test human time",
"fields": [
"course_tests.course_id.course_no",
"participant_test_questions.answer_text",
"participants.first_name",
"participants.last_name",
"participants.promote_unique_user_id",
"result"
],
"operator": "OR"
}
}
],
"must": [
{
"has_child": {
"type": "participant_test_question_answers",
"min_children": 1,
"query": {
"match_all": {}
}
}
}
]
}
}
}

Same aggregation on multiple metrics Elasticsearch

I have setup snowplow with Elasticsearch.
When I want to get the data out I just do normal queries and use aggregates to get them by day, country etc.
So I want to figure out clickthru rate for these aggregations, I have 2 kind of events: page views and clicks.
Currently I do 2 queries:
Page Views:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"event": "page_view"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
}
}
}
}
Clicks:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"event": "struct"
}
},
{
"term": {
"se_action": "click"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
}
}
}
}
I format the response to something easier to use and then merge them in PHP using something like this.
function merge_metrics($pv,$c){
$r = array();
if(count($pv) > 0){
foreach ($pv as $key => $value) {
$r[$value['name']]['page_views'] += $value['count'];
}
}
if(count($c) > 0){
foreach ($c as $key => $value) {
$r[$value['name']]['clicks'] += $value['count'];
}
}
$rf = array();
foreach ($r as $key => $value) {
$tmp_clicks = isset($value['clicks']) ? $value['clicks'] : 0;
$tmp_page_views = isset($value['page_views']) ? isset($value['page_views']) : 0;
$rf[] = array(
'name' => $key,
'page_views' => $tmp_page_views,
'clicks' => $tmp_clicks,
'ctr' => ctr($tmp_clicks,$tmp_page_views)
);
}
return $rf;
}
Both $pv and $c are arrays that contain the aggregates that result from querying Elasticsearch and I do some formatting for ease of use.
My question is:
Is it possible get multiple metrics(in my case page views and clicks, these are specific filters) and perform same aggregations on both ? then returning the aggregations something like :
{
"data": [
{
"day": "2015-10-13",
"page_views": 61,
"clicks": 0,
},
{
"day": "2015-10-14",
"page_views": 135,
"clicks": 1,
},
{
"day": "2015-10-15",
"page_views": 39,
"clicks": 0,
}
]
}
But without me having to manually merge them ?
Yes, it is definitely possible if you merge your aggregations into one single query. For instance, I suppose you have one query like this for page views:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"page_views_per_day": {
"sum": {
"field": "page_views"
}
}
}
}
}
}
And another query like this for clicks:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"clicks_per_day": {
"sum": {
"field": "clicks"
}
}
}
}
}
}
Provided you have the same constraints in your query, you can definitely merge them together at the date_histogram level, like this:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"page_views_per_day": {
"sum": {
"field": "page_views"
}
},
"clicks_per_day": {
"sum": {
"field": "clicks"
}
}
}
}
}
}
UPDATE
Since your queries are different for each of your aggregations, we need to do it slightly differently, i.e. by using an additional filters aggregation, like this:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"terms": {
"event": [
"page_view",
"struct"
]
}
}
],
"should": {
"term": {
"se_action": "click"
}
},
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
},
"aggs": {
"my_filters": {
"filters": {
"filters": {
"page_views_filter": {
"bool": {
"must": [
{
"term": {
"event": "page_view"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
},
"clicks_filter": {
"bool": {
"must": [
{
"term": {
"event": "struct"
}
},
{
"term": {
"se_action": "click"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
}
}
}
}
}
}
Now for each daily bucket, you're going to end up with two sub-buckets, one for the count of page views and another for the count of clicks.

Categories