How to use highlight_query with Elastica / FOSElasticaBundle? - php

I'm new to Elastica and I'm searching for a way to highlight nested query results with highlight_query. I checked the code and $query->setHighlight() accepts only an array as a parameter. Maybe there's another way to achieve this result using Elastica.
This is the json query I'm trying to translate to Elastica:
{
"query": {
"bool": {
"must": [
{
"match": {
"publishAt": "2016"
}
},
{
"nested": {
"path": "translations",
"query": {
"multi_match": {
"query": "leadership",
"fields": [
"translations.*"
]
}
}
}
},
{
"nested": {
"path": "translations",
"query": {
"bool": {
"must": [
{
"match": {
"translations.locale": "fr"
}
}
]
}
}
}
}
]
}
},
"highlight": {
"highlight_query": {
"match": {
"translations.*": "leadership"
}
},
"fields": {
"translations.*": {}
}
}
I'm using FosElasticaBundle and I have this query without the highlight:
$query = new Query();
$bool = new Bool();
$yearQuery = new Match();
$yearQuery->setField('publishAt', 2016);
$bool->addMust($yearQuery);
$nestedQuery = new Query\Nested();
$nestedQuery->setPath('translations');
$multiMatch = new Query\MultiMatch();
$multiMatch->setQuery($string);
$multiMatch->setFields('translations.*');
$nestedQuery->setQuery($multiMatch);
$nestedQuery2 = new Query\Nested();
$nestedQuery2->setPath('translations');
$nestedBool = new Bool();
$localeQuery = new Match();
$localeQuery->setField('translations.locale', $request->getLocale());
$nestedBool->addMust($localeQuery);
$nestedQuery2->setQuery($nestedBool);
$bool->addMust($nestedQuery);
$bool->addMust($nestedQuery2);
$query->setQuery($bool);
$results = $finder->findHybrid($query);

Related

Elastisearch Failed to JSON encode issue

I working on elastic search and I have 1K phone numbers when I pass this phone numbers array to elastic search to search users through phone numbers it gives me exception
Failed to JSON encode /var/app/current/vendor/elasticsearch/elasticsearch/src/Elasticsearch/Serializers/SmartSerializer.php
Below is my Elasticsearch client initializing
$client = ClientBuilder::create()->setHosts([$host])->build();
And my working query in Elasticsearch
{
"_source": [
"id"
],
"query": {
"bool": {
"must": [
{
"term": {
"type": "user"
}
},
{
"bool": {
"should": [
{
"prefix": {
"phone": {
"value": "923047698099"
}
}
},
{
"prefix": {
"phone": {
"value": "92313730320"
}
}
},
.
.
.
]
}
}
],
"must_not": [
{
"has_child": {
"type": "blocked",
"query": {
"term": {
"user_id": "u-2"
}
}
}
},
{
"has_child": {
"type": "block",
"query": {
"term": {
"user_id": "u-2"
}
}
}
},
{
"term": {
"db_id": 2
}
}
]
}
}
}
I don't know that where I doing mistake. Either at client initializing or writing elasticserch query. I searched this issue but not usefull solution found or might be I did't understand clearly. But still I am stucked on this issue that how to solve this problem. Suggest any usefull link or solution.
Thanks

Elasticsearch - Distinct Values, Not Counts

I am trying to do something similar to this SQL query:
SELECT * FROM table WHERE fileContent LIKE '%keyword%' AND company_id = '1' GROUP BY email
Having read posts similar to this I have this:
{
"query": {
"bool": {
"must": [{
"match": {
"fileContent": {
"query": "keyword"
}
}
}],
"filter": [{
"terms": {
"company_id": [1]
}
}]
}
},
"aggs": {
"group_by_email": {
"terms": {
"field": "email",
"size": 1000
}
}
},
"size": 0
}
Field mappings are:
{
"cvs" : {
"mappings" : {
"application" : {
"_meta" : {
"model" : "Acme\\AppBundle\\Entity\\Application"
},
"dynamic_date_formats" : [ ],
"properties" : {
"email" : {
"type" : "keyword"
},
"fileContent" : {
"type" : "text"
},
"company_id" : {
"type" : "text"
}
}
}
}
}
}
... which are generated from Symfony config.yml:
fos_elastica:
clients:
default:
host: "%elastica.host%"
port: "%elastica.port%"
indexes:
cvs:
client: default
types:
application:
properties:
fileContent: ~
email:
index: not_analyzed
company_id: ~
persistence:
driver: orm
model: Acme\AppBundle\Entity\Application
provider: ~
finder: ~
The filter works fine, but I am finding that hits:hits returns no items (or all results matching the search if I remove size:0) and aggregations:group_by_email:buckets has a count of the groups but not the records themselves. The records that were grouped aren't returned and it's these that I need.
I have also tried with FOSElasticBundle using the query builder if this is your preferred flavour (this works but doesn't have the grouping/aggregation):
$boolQuery = new \Elastica\Query\BoolQuery();
$filterKeywords = new \Elastica\Query\Match();
$filterKeywords->setFieldQuery('fileContent', 'keyword');
$boolQuery->addMust($filterKeywords);
$filterUser = new \Elastica\Query\Terms();
$filterUser->setTerms('company_id', array('1'));
$boolQuery->addFilter($filterUser);
$finder = $this->get('fos_elastica.finder.cvs.application');
Thanks.
For this you need top_hits aggregation inside the terms one you are already using:
"aggs": {
"group_by_email": {
"terms": {
"field": "email",
"size": 1000
},
"aggs": {
"sample_docs": {
"top_hits": {
"size": 100
}
}
}
}
}
top_hits:{size:1} appears to be what I need, having played around with Andrei's answer. This will return one record for each bucket in the aggregation
"aggs": {
"group_by_email": {
"terms": {
"field": "email",
"size": 1000
},
"aggs": {
"sample_docs": {
"top_hits": {
"size": 1
}
}
}
}
}
Ref: top_hits
top_hits helped me too. I had some trouble too, but eventually figured out how to resolve it. So here is my solution:
{
"query": {
"nested": {
"path": "placedOrders",
"query": {
"bool": {
"must": [
{
"term": {
"placedOrders.ownerId": "0a9fdef0-4508-4f9c-aa8c-b3984e39ad1e"
}
}
]
}
}
}
},
"aggs": {
"custom_name1": {
"nested": {
"path": "placedOrders"
},
"aggs": {
"custom_name2": {
"terms": {
"field": "placedOrders.propertyId"
},
"aggs": {
"custom_name3": {
"top_hits": {
"size": 1,
"sort": [
{
"placedOrders.propertyId": {
"order": "desc"
}
}
]
}
}
}
}
}
}
}
}

Multi indices search with nested fields

I have two indices:
First, questions, have nested field answers. Second, articles do not have this field.
I try search by multi indices:
{
"index": "questions, articles",
"body":{
"query":{
"bool":{
"must":{
"nested":{
"path": "answer",
...
}
}
}
}
}
}
and get error "query_parsing_exception: [nested] failed to find nested object under path [answer]"
How I can search without errors, when one index have nested field, but another does not have?
I think you need to use the indices query and to use a different query for each index. Something like this:
GET /questions,articles/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"indices": {
"indices": [
"questions"
],
"query": {
"nested": {
"path": "answer",
"query": {
"term": {
"text": "bla"
}
}
}
}
}
},
{
"match_all": {}
}
]
}
},
{
"term": {
"some_common_field": {
"value": "whatever"
}
}
}
]
}
}
}

ElasticSearch query to search parent docs and consider that at least has one child

I have implemented parent child relation ship in elasticsearch, parent type name is participant_tests and its child type name is participant_test_question_answers, I have some search conditions for parent type as bellow:
query": {
"bool":{
"should":[
{
"range":{
"created_at":{
"gte":"2016-01-01",
"lte":"2016-05-11",
"format":"YYYY-MM-dd"
}
}
},
{
"terms":{
"created_by.id":["1000001"]
}
},
{
"multi_match":{
"query":"test human time",
"fields":[
"course_tests.course_id.course_no",
"participant_test_questions.answer_text",
"participants.first_name",
"participants.last_name",
"participants.promote_unique_user_id","result"
],
"operator":"OR"
}
}
]
}
}
Above query return result but if I want to check has_child no result returned.
I have added this lines after bool class.
"has_child": {
"type": "participant_test_question_answers",
"min_children": 1,
"query": {
"match_all": {}
}
}
Try this query:
{
"query": {
"bool": {
"should": [
{
"range": {
"created_at": {
"gte": "2016-01-01",
"lte": "2016-05-11",
"format": "YYYY-MM-dd"
}
}
},
{
"terms": {
"created_by.id": [
"1000001"
]
}
},
{
"multi_match": {
"query": "test human time",
"fields": [
"course_tests.course_id.course_no",
"participant_test_questions.answer_text",
"participants.first_name",
"participants.last_name",
"participants.promote_unique_user_id",
"result"
],
"operator": "OR"
}
}
],
"must": [
{
"has_child": {
"type": "participant_test_question_answers",
"min_children": 1,
"query": {
"match_all": {}
}
}
}
]
}
}
}

Same aggregation on multiple metrics Elasticsearch

I have setup snowplow with Elasticsearch.
When I want to get the data out I just do normal queries and use aggregates to get them by day, country etc.
So I want to figure out clickthru rate for these aggregations, I have 2 kind of events: page views and clicks.
Currently I do 2 queries:
Page Views:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"event": "page_view"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
}
}
}
}
Clicks:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"event": "struct"
}
},
{
"term": {
"se_action": "click"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
}
}
}
}
I format the response to something easier to use and then merge them in PHP using something like this.
function merge_metrics($pv,$c){
$r = array();
if(count($pv) > 0){
foreach ($pv as $key => $value) {
$r[$value['name']]['page_views'] += $value['count'];
}
}
if(count($c) > 0){
foreach ($c as $key => $value) {
$r[$value['name']]['clicks'] += $value['count'];
}
}
$rf = array();
foreach ($r as $key => $value) {
$tmp_clicks = isset($value['clicks']) ? $value['clicks'] : 0;
$tmp_page_views = isset($value['page_views']) ? isset($value['page_views']) : 0;
$rf[] = array(
'name' => $key,
'page_views' => $tmp_page_views,
'clicks' => $tmp_clicks,
'ctr' => ctr($tmp_clicks,$tmp_page_views)
);
}
return $rf;
}
Both $pv and $c are arrays that contain the aggregates that result from querying Elasticsearch and I do some formatting for ease of use.
My question is:
Is it possible get multiple metrics(in my case page views and clicks, these are specific filters) and perform same aggregations on both ? then returning the aggregations something like :
{
"data": [
{
"day": "2015-10-13",
"page_views": 61,
"clicks": 0,
},
{
"day": "2015-10-14",
"page_views": 135,
"clicks": 1,
},
{
"day": "2015-10-15",
"page_views": 39,
"clicks": 0,
}
]
}
But without me having to manually merge them ?
Yes, it is definitely possible if you merge your aggregations into one single query. For instance, I suppose you have one query like this for page views:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"page_views_per_day": {
"sum": {
"field": "page_views"
}
}
}
}
}
}
And another query like this for clicks:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"clicks_per_day": {
"sum": {
"field": "clicks"
}
}
}
}
}
}
Provided you have the same constraints in your query, you can definitely merge them together at the date_histogram level, like this:
{
"query": {...}
"aggregations": {
"by_day": {
"date_histogram": {
"field": "day",
"interval": "day"
},
"aggs": {
"page_views_per_day": {
"sum": {
"field": "page_views"
}
},
"clicks_per_day": {
"sum": {
"field": "clicks"
}
}
}
}
}
}
UPDATE
Since your queries are different for each of your aggregations, we need to do it slightly differently, i.e. by using an additional filters aggregation, like this:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"terms": {
"event": [
"page_view",
"struct"
]
}
}
],
"should": {
"term": {
"se_action": "click"
}
},
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
},
"aggs": {
"dates": {
"date_histogram": {
"field": "collector_tstamp",
"interval": "day"
},
"aggs": {
"my_filters": {
"filters": {
"filters": {
"page_views_filter": {
"bool": {
"must": [
{
"term": {
"event": "page_view"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
},
"clicks_filter": {
"bool": {
"must": [
{
"term": {
"event": "struct"
}
},
{
"term": {
"se_action": "click"
}
}
],
"must_not": {
"term": {
"br_family": "Robot"
}
}
}
}
}
}
}
}
}
}
}
Now for each daily bucket, you're going to end up with two sub-buckets, one for the count of page views and another for the count of clicks.

Categories