ElasticSearch sort on multiple fields with summation - php

I am doing a search using "filtered" query and then want to sort on SUM of 3 columns.
Eg:
"query": {
...
},
"sort": {
view_count + comment_count + like_count
order: DESC
}
The result should be in descending order of the sum of the above 3 counts.
How to achieve the SUM the columns and then order the results.
Any help is appreciated.

Use script sorting if you can't change the data/don't have control over the mapping/the three fields are changing (ie they are not insert and forget).
{
"sort": {
"_script": {
"type": "number",
"script": "return doc['view_count'].value + doc['comment_count'].value + doc['like_count'].value,
"lang": "groovy",
"order": "desc"
}
}
}
If you update the documents frequently, you can also add another field - let's call it sum - where you already index the sum of the three fields. And then you simply sort on the sum field.

In my case (Elasticsearch 5.6), below code sorts by sum of multiple fields.
GET /_search
{
"query" : {
"term" : { "user" : "kimchy" }
},
"sort" : {
"_script" : {
"type" : "number",
"script" : {
"lang": "painless",
"source": "doc['retweet_count'].value + doc['favorite_count'].value"
},
"order" : "desc"
}
}}
Source: Script-based Sorting

Related

Custom sorting in Elasticsearch

Does anyone know if it's possible to custom sort in elasticsearch?
I have a sort on the category field. Which groups all of the records together by category. This works great.
However could you then give the sort a list e.g cars, books, food.
It would then show the cars first, then books and finally food?
You can use a function_score query, something like this:
{
"query": {
"function_score": {
"query": { "match_all": {} },
"boost": "5",
"functions": [
{
"filter": { "match": { "category": "cars" } },
"weight": 100
},
{
"filter": { "match": { "category": "books" } },
"weight": 50
},
{
"filter": { "match": { "category": "food" } },
"weight": 1
}
],
"score_mode": "max",
"boost_mode": "replace"
}
}
}
Where you, of course, put whichever query you are using now instead of the match_all query, and leave off the sort (the default is by score, which is what you want here).
This is replacing the score elasticsearch normally generates, with a custom score for each category. You could experiment with other boost_mode in order to have a reasonable ranking within the categories. In case you need to understand what is happening with the scoring, you can add "explain": true to the query at the top level.
You can use custom script for your own scoring.
More details at in Script Based Sorting section: https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-request-sort.html

Sub-queries with "union" in elasticsearch

I'm currently busy working on a project in which we chose to use Elasticsearch as the search engine for a classifieds website.
Currently, I have the following business rule:
List 25 adverts per page. Of these 25, 10 of the displayed adverts must be "Paid Adverts", and the other 15 must be "Free". All 25 must be relevant to the search performed (i.e. Keywords, Region, Price, Category, etc.)
I know I can do this using two seperate queries, but this seems like an immense waste of resources. Is it possible to do "sub-queries" (if you can call them that?) and union these results into a single result set? Somehow only fetching 10 "Paid" adverts and 15 "Free" ones from elasticsearch, in one single query? Assuming of course that there are enough adverts to make this requirement possible.
Thanks for any help!
edit - Just adding my mapping information for more clarity.
"properties": {
"advertText": {
"type": "string",
"boost": 2,
"store": true,
"analyzer": "snowball"
},
"canonical": {
"type": "string",
"store": true
},
"category": {
"properties": {
"id": {
"type": "string",
"store": true
},
"name": {
"type": "string",
"store": true
},
"parentCategory": {
"type": "string",
"store": true
}
}
},
"contactNumber": {
"type": "string",
"index": "not_analyzed",
"store": true
},
"emailAddress": {
"type": "string",
"store": true,
"analyzer": "url_email_analyzer"
},
"advertType": {
"type": "string",
"index": "not_analyzed"
},
...
}
What I want then is to be able to query this and get 10 results where "advertType": "Paid" and 15 where "advertType": "Free"...
A couple of approaches you can take.
First, you can try using the multi-search API:
Multi Search API
The multi search API allows to execute several search requests within
the same API. The endpoint for it is _msearch.
The format of the request is similar to the bulk API format
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-multi-search.html
A basic example:
curl -XGET 'http://127.0.0.1:9200/advertising_index/_msearch?pretty=1' -d '
{}
{"query" : {"match" : {"Paid_Ads" : "search terms"}}, "size" : 10}
{}
{"query" : {"match" : {"Free" : "search terms"}}, "size" : 15}
'
I've made up the fields and query but overall you should get the idea - you hit the _msearch endpoint and pass it a series of queries starting with empty brackets {}. For Paid I've set size to 10 and for Free I've set size to 15.
Subject to the details of your own implementation you should be able to use something like this.
If that does not work for whatever reason you can also try using a limit filter:
Limit Filter
A limit filter limits the number of documents (per shard) to execute
on. For example:
{
"filtered" : {
"filter" : {
"limit" : {"value" : 100}
},
"query" : {
"term" : { "name.first" : "shay" }
}
}
}
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-limit-filter.html
Note that the limits are per shard, not per index. Given a default of 5 primary shards per index, to get a total response of 10 you would set limit to 2 (2X5 == 10). Also note that this can produce incomplete results if you have multiple matches on one shard but none on another.
You would then combine two filters with a bool filter:
Bool Filter
A filter that matches documents matching boolean combinations of other
queries. Similar in concept to Boolean query, except that the clauses
are other filters. Can be placed within queries that accept a filter.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-bool-filter.html
I've not fleshed this one out in any detail as it will require more information about your specific indexes, mappings, data and queries.
Try using limit filter that limits number of docs returned
{
"filtered" : {
"filter" : {
"limit" : {"value" : 10}
},
"query" : {
"term" : { "name.first" : "shay" }
}
}
}
Change value to 2 to get 10 results and 3 to get 15
You are asking for query?
(select * from tablename where advert = "Paid Advert" limit 10) union (select * from tablename where advert = "Free" limit 15);
of logic to generate limit per page?

Percentage of OR conditions matched in mongodb

I have got my data in following format..
{
"_id" : ObjectId("534fd4662d22a05415000000"),
"product_id" : "50862224",
"ean" : "8808992479390",
"brand" : "LG",
"model" : "37LH3000",
"features" : [{
{
"key" : "Screen Format",
"value" : "16:9",
}, {
"key" : "DVD Player / Recorder",
"value" : "No",
},
"key" : "Weight in kg",
"value" : "12.6",
}
... so on
]
}
I need to compare features of one product with others and divide the result into separate categories ( 100% match, 50-99 % match) based on % of feature matches..
My initial thought was to prepare a dynamic query with or condition for each feature and do the percentage thing in php but then that means mongodb will return me even those product which only have 1 feature matching. And I I think nearly all products of a category might have some feature in common, so I fear I might be working on lot of products in php.
I have two questions basically.
is there any alternate ways?
And is the data structure I am using is good enough to support the functionality I am looking for, Or should I consider changing it
Well your solution really should be MongoDB specific otherwise you will end up doing your calculations and possible matching on the client side, and that is not going to be good for performance.
So of course what you really want is a way for that to have that processing on the server side:
db.products.aggregate([
// Match the documents that meet your conditions
{ "$match": {
"$or": [
{
"features": {
"$elemMatch": {
"key": "Screen Format",
"value": "16:9"
}
}
},
{
"features": {
"$elemMatch": {
"key" : "Weight in kg",
"value" : { "$gt": "5", "$lt": "8" }
}
}
},
]
}},
// Keep the document and a copy of the features array
{ "$project": {
"_id": {
"_id": "$_id",
"product_id": "$product_id",
"ean": "$ean",
"brand": "$brand",
"model": "$model",
"features": "$features"
},
"features": 1
}},
// Unwind the array
{ "$unwind": "$features" },
// Find the actual elements that match the conditions
{ "$match": {
"$or": [
{
"features.key": "Screen Format",
"features.value": "16:9"
},
{
"features.key" : "Weight in kg",
"features.value" : { "$gt": "5", "$lt": "8" }
},
]
}},
// Count those matched elements
{ "$group": {
"_id": "$_id",
"count": { "$sum": 1 }
}},
// Restore the document and divide the mated elements by the
// number of elements in the "or" condition
{ "$project": {
"_id": "$_id._id",
"product_id": "$_id.product_id",
"ean": "$_id.ean",
"brand": "$_id.brand",
"model": "$_id.model",
"features": "$_id.features",
"matched": { "$divide": [ "$count", 2 ] }
}},
// Sort by the matched percentage
{ "$sort": { "matched": -1 } }
])
So as you know the "length" of the $or condition being applied, then you simply need to find out how many of the elements in the "features" array match those conditions. So that is what the second $match in the pipeline is all about.
Once you have that count, you simply divide by the number of conditions what were passed in as your $or. The beauty here is that now you can do something useful with this like sort by that relevance and then even "page" the results server side.
Of course if you want some additional "categorization" of this, all you would need to do is add another $project stage to the end of the pipeline:
{ "$project": {
"product_id": 1
"ean": 1
"brand": 1
"model": 1,
"features": 1,
"matched": 1,
"category": { "$cond": [
{ "$eq": [ "$matched", 1 ] },
"100",
{ "$cond": [
{ "$gte": [ "$matched", .7 ] },
"70-99",
{ "$cond": [
"$gte": [ "$matched", .4 ] },
"40-69",
"under 40"
]}
]}
]}
}}
Or as something similar. But the $cond operator can help you here.
The architecture should be fine as you have it as you can have a compound index on the "key" and "value" for the entries in your features array and this should scale well for queries.
Of course if you actually need something more than that, such as faceted searching and results, you can look at solutions like Solr or elastic search. But the full implementation of that would be a bit lengthy for here.
I'm assuming that you'd like to compare the rest of the collection to a given product, which is a textbook example of aggregation:
lookingat = db.products.findOne({product_id:'50862224'})
matches = db.products.aggregate([
{ $unwind: '$features' },
{ $match: { features: { $in: lookingat.features }}},
{ $group: { _id: '$product_id', matchedfeatures: { $sum:1 }}},
{ $sort: { matchedfeatures: -1 }},
{ $limit: 5 },
{ $project: { _id:0, product_id: '$_id',
pctmatch: { $multiply: [ '$matchedfeatures',
100/lookingat.features.length ]}
}}
])
Walking through this briefly from the perspective of a product in the collection that has 6 features, and comparing it to the target product ('lookingat') which has 4 features, 3 of which match:
$unwind turns 1 document with 6 features into 6 otherwise-identical documents with 1 feature each
$match looks for that feature in the target's feature array (be aware that two documents are "equal" only if they have the same field names and values, in the same order), discards the 3 that don't match, and passes along the 3 that do
$group consumes those 3 matching documents and produces a new one that tells you there were 3 documents that matched that product_id
$sort and $limit give you the most relevant results and leave behind all those 1-feature matches you were concerned about
$project lets you rename the _id from the $group step back to product_id and also math the number of matching features into a percentage (we avoided a $divide operation by recognizing that 2 of the 3 terms in our calculation are constants and can be divided in JS)

Getting array element which has minimum value on a specific field on the element

I'm trying to remove an element from an array inside a collection. To remove the element, I have to look at the index (or date) field, and remove the one which has the lowest value. An example representation of my collection named "pages":
{
"_id": {
"$oid": "52e12df7e4b06e4ed65a554c"
},
"posts": [
{
"postId": {
"$oid": "52e12e5933a9fbec1100002d"
},
"date": 1390489177.267876,
"index": 1
},
{
"postId": {
"$oid": "52e12e5f33a9fb141800002c"
},
"date": 1390489183.277084,
"index": 2
}
],
"skillname": "Bilardo",
"skilltag": "Bilardo",
"currentIndex": 2
}
I need to remove this element from the posts array:
{
"postId": {
"$oid": "52e12e5933a9fbec1100002d"
},
"date": 1390489177.267876,
"index": 1
}
Whatever I would do, I could not manage to find the minimum "index" field in the array. The last php code I achieved to write is below:
$this->db->pages->find(array( 'skilltag' => 'Bilardo'), array('posts.index' => 1))->sort( array("posts.index" => -1));
I'm not willing to use "index" fields and also "currentIndex" field. But I put them anyways in case of not being able to work on "date" field which has timestamp values.
The above code returns an array with only one element which is an array that holds 2 elements:
{
"postId": {
"$oid": "52e12e5933a9fbec1100002d"
},
"date": 1390489177.267876,
"index": 1
},
{
"postId": {
"$oid": "52e12e5f33a9fb141800002c"
},
"date": 1390489183.277084,
"index": 2
}
Isn't there an aggregation function which does filtering on inside the array? I could not find any aggregation that returns the minimum value. I only found aggregation to return values between min and max values which must be given by the user -- which does not work in my case.
Are the embedded objects in your posts array field always ordered by their date and index, ascending? Based on your schema, I would assume that new posts are added to the document via the $push update operator.
If the element to remove is always at the front of the array, you could very easily use $pop to remove it.

How to conditionally find items by Tag in mongodb

I am new to NoSQL and MongoDB and I am a little puzzled on what type of queries I can do and how to do them. my knowledge is limited to simpler queries
I would like to make what I think its a complicated query within MongoDB instead of using PHP to sort it but I do not know if it is possible or how to do it.
I have a tag field within my collection that is an array. {tag: ["blue","red","yellow","green","violet"]}.
First level problem: Let says I want to find all birds that have the tag blue & yellow & green, where blue is a must have tag and any other colours are optional.
Second level problem: Then I would like to order the query so that the birds that have all the queried colours appear first.
Is it possible to create this query in mongoDB? and if it is How could I do it?
You can use aggregation framework. So for the next dataset:
{ "_id":ObjectId(...), "bird":1, "tags":["blue","red","yellow","green","violet"]}
{ "_id":ObjectId(...), "bird":2, "tags":["red","yellow","green","violet"] }
{ "_id":ObjectId(...), "bird":3, "tags":["blue","yellow","violet"] }
{ "_id":ObjectId(...), "bird":4, "tags":["blue","yellow","red","violet"] }
{ "_id":ObjectId(...), "bird":5, "tags":["blue"] }
we can apply next query:
colors = ["blue","red","yellow","green"];
db.birds.aggregate(
{ $match: {tags: 'blue'} },
{ $project: {_id:0, bird:1, tags:1} },
{ $unwind: '$tags' },
{ $match: {tags: {$in: colors}} },
{ $group: {_id:'$bird', score: {$sum:1}} },
{ $sort: {score:-1} },
{ $project: {bird:'$_id', score:1, _id:0} }
)
and will get result like this:
{
"result" : [
{ "score" : 4, "bird" : 1 },
{ "score" : 3, "bird" : 4 },
{ "score" : 2, "bird" : 3 },
{ "score" : 1, "bird" : 5 }
],
"ok" : 1
}
Most of this you will have to do in your application. In order to find all documents where a bird has the tag "blue", you can do this:
db.collection.find( { tag: "blue" } );
Which colours are optional doesn't matter, as you have to find by the required tag anyway.
After finding them, you need to do a sort. But sorting like you want (by their 3 colours) is not something you can do in MongoDB, and something you will have to do in PHP instead.

Categories