Elasticsearch query date sorting parent-child relation (recurring events)

Elasticsearch query date sorting parent-child relation (recurring events) - php

I’m currently working on an app where we are handling events.
So, in Elasticsearch, we do have a document named Event.
Previously, we only had one kind of event (unique event happening the 13 May from 9 AM to 11 AM), the sorting was simple (sort by start_date with an order)
We recently added a new feature that allows us to create recurring events, that means that we now have 2 levels inside Elasticsearch (parent-child relation).
We can have a parent event that is from the 12 May from 2 PM to the 14 May from 6 PM, linked to that event, we have the children that are daily, for example. So we’d have: 12 May 2PM-6PM, 13 May 2PM-6PM, 14 May 2PM-6PM.
The problem with the actual sort is that when we are the 12 May at 10 PM, we’ll find the recurring event on top of the list and after that, will come the unique event.
I’d like to have a sorting where the nearest date has a higher priority. In that case, the unique event should have been the first on the list.
To make that happen, I have indexed node children on recurring event parent, in order to have the children start_date.
The idea would be to get the nearest date out of the children node for every recurring event and sort that one with the start_date of every unique event.
I do not have a big experience with elasticsearch, so I’m kind of stuck, I saw a lot of information in the documentation (parent-child, nested objects, scripts, etc.) but I don’t know how to handle this case.
I hope that I have explained myself correctly if you have any questions, feel free to ask them, I would be happy to provide you with additional information.

For the future googlers, here's how I fixed it.
Had to use scripts and sort with it, here's a partial exemple of the request I'm using
GET /event/_search
{
"query" : {
"match_all": {}
},
"sort" : {
"_script" : {
"type" : "number",
"script": {
"lang": "painless",
"params": {
"currentDate": 1560230000
},
"source": """
def isRecurrenceParent = params._source.is_recurrence_parent;
def countChildren = params._source.children.length;
def currentDate = params.currentDate;
if (isRecurrenceParent === false) {
return params._source.timestamp;
}
def nearest = 0;
def lowestDiff = currentDate;
for (int i = 0; i < countChildren; i++) {
def child = params._source.children[i];
def diff = child.timestamp - currentDate;
if (diff > 0 && diff < lowestDiff) {
lowestDiff = diff;
nearest = child.timestamp;
}
}
return nearest;
"""
},
"order" : "asc"
}
}
}

First thing you should consider is parent and child docs are saved separately. It means Parent-Event::1 and Child-Event::1 are saved in a same shard (ES routes to shard where parent located by its id hash) but document types are different. So, you should fetch Parent and Children documents separately by query and sort by date.
(You can make following queries in php if works)
P.S: I have also same situation but I had to implement in Java. So, I made a ES query builder (https://github.com/mashhur/java-elasticsearch-querybuilder) which supports parent-child relationship queries too, you can take a look for the reference.
// search child events and sort by date
GET events/_search {
"query": {
"has_parent": {
"parent_type": "parent-event",
"query": {
"match_all": {}
}
},
"sort": [{"start_date": {"desc"}}]
}
}
// search parent events and sort by date
GET events/_search {
"query": {
"has_child": {
"type": "child-event",
"query": {
"match_all": {}
}
},
"sort": [{"start_date": {"desc"}}]
}
}

Related

update value using nested element match in mongo [duplicate]

I have a document in mongodb with 2 level deep nested array of objects that I need to update, something like this:
{
id: 1,
items: [
{
id: 2,
blocks: [
{
id: 3
txt: 'hello'
}
]
}
]
}
If there was only one level deep array I could use positional operator to update objects in it but for second level the only option I've came up is to use positional operator with nested object's index, like this:
db.objects.update({'items.id': 2}, {'$set': {'items.$.blocks.0.txt': 'hi'}})
This approach works but it seems dangerous to me since I'm building a web service and index number should come from client which can send say 100000 as index and this will force mongodb to create an array with 100000 indexes with null value.
Are there any other ways to update such nested objects where I can refer to object's ID instead of it's position or maybe ways to check if supplied index is out of bounds before using it in query?

Here's the big question, do you need to leverage Mongo's "addToSet" and "push" operations? If you really plan to modify just individual items in the array, then you should probably build these arrays as objects.
Here's how I would structure this:
{
id: 1,
items:
{
"2" : { "blocks" : { "3" : { txt : 'hello' } } },
"5" : { "blocks" : { "1" : { txt : 'foo'}, "2" : { txt : 'bar'} } }
}
}
This basically transforms everything in to JSON objects instead of arrays. You lose the ability to use $push and $addToSet but I think this makes everything easier. For example, your query would look like this:
db.objects.update({'items.2': {$exists:true} }, {'$set': {'items.2.blocks.0.txt': 'hi'}})
You'll also notice that I've dumped the "IDs". When you're nesting things like this you can generally replace "ID" with simply using that number as an index. The "ID" concept is now implied.
This feature has been added in 3.6 with expressive updates.
db.objects.update( {id: 1 }, { $set: { 'items.$[itm].blocks.$[blk].txt': "hi", } }, { multi: false, arrayFilters: [ { 'itm.id': 2 }, { 'blk.id': 3} ] } )

The ids which you are using are linear number and it has to come from somewhere like an additional field such 'max_idx' or something similar.
This means one lookup for the id and then update. UUID/ObjectId can be used for ids which will ensure that you can use Distributed CRUD as well.

Building on Gates' answer, I came up with this solution which works with nested object arrays:
db.objects.updateOne({
["items.id"]: 2
}, {
$set: {
"items.$.blocks.$[block].txt": "hi",
},
}, {
arrayFilters: [{
"block.id": 3,
}],
});

MongoDB 3.6 added all positional operator $[] so if you know the id of block that need update, you can do something like:
db.objects.update({'items.blocks.id': id_here}, {'$set': {'items.$[].blocks.$.txt': 'hi'}})

db.col.update({"items.blocks.id": 3},
{ $set: {"items.$[].blocks.$[b].txt": "bonjour"}},
{ arrayFilters: [{"b.id": 3}] }
)
https://docs.mongodb.com/manual/reference/operator/update/positional-filtered/#update-nested-arrays-in-conjunction-with

This is pymongo function for find_one_and_update. I searched a lot to find the pymongo function. Hope this will be useful
find_one_and_update(filter, update, projection=None, sort=None, return_document=ReturnDocument.BEFORE, array_filters=None, hint=None, session=None, **kwargs)
Added reference and pymongo documentation in comments

Load specific relations in a nested eager loading in laravel

I have the following related tables:
tableA
- id
- value
tableB
- id
- tableA_id
- value
tableC
- id
- tableB_id
- value
tableD
- id
- tableC_id
- value
I normally use a nested eager loading to get the object of tableaA from tableD, for example:
$table_d = TableD::with('TableC.TableB.TableA')->find($id);
And I get an object like this:
{
"id": 1,
"value": "value",
"tableC_id": 1,
"tablec": {
"id": 1,
"value": "value",
"tableB_id": 1,
"tableb": {
"id": 1,
"value": "value",
"tableA_id": 1,
"tablea": {
"id": 1,
"value": "value"
}
}
}
}
What I want to achieve is to obtain only the object of table D, with its object from table A related, without having table C and table B in the final object, something like this:
{
"id": 1,
"value": "value",
"tablea": {
"id": 1,
"value": "value"
}
}
}
I tried adding this function in the model file of Table D:
public function TableA()
{
return $this->belongsTo('App\Models\TableC', 'tableC_id')
->join('tableB','tableC.tableB_id','=','tableB.id')
->join('tableA','tableB.tableA_id','=','tableA.id')
->select('tableA.id', 'tableA.value');
}
but it does not work because when I do the following query, it returns some good objects and others with tableA = null:
$tables_d = TableD::with('TableA')->get()
Am I doing something wrong or is there another way to achieve what I want?

You may be able to skip a table with
this->hasManyThrough() but depending on what you really want as 'future features', you may want to have multiple relations with whatever code you desire according to your needs. QueryScopes aswell.

One can generally use a has many through relationship for mapping tables when it is just two tables and a linking table between. You have yet another join beyond that so it won't really be much better than what you have currently.
Have you considered another mapping table from D to A directly or a bit of denormalization? If you always need to load it like that you might benefit from having a bit of duplicated fks to save on the joins.
This will really depend on your needs and it is not 3NF (third normal form), maybe it's not even 2NF, but that's why denormalization is like comma use...follow the rules generally but break them for specific reasons; in this case to reduce the number of required joins by duplicating a FK reference in a table.
https://laravel.com/docs/5.6/eloquent-relationships#has-many-through

You can try to do this:
- add a method in TableD Model:
public function table_a()
{
return $this->TableC->TableB->TableA();
}
then use: TableD::with(table_a);

Calculate skip value for given record for sorted paging

I'm trying to calculate the skip value for a given record in a mongo db collection using the php driver. So taking a given record, find out the index of that record within the entire collection. Is this possible?
Currently I'm selecting all records and manually doing an index of on the array of results.

This is called "forward paging" which is a concept you can use to "efficiently page" through results in a "forward" direction when using "sorted" results.
JavaScript logic included (because it works in the shell), but not hard to translate.
The concept in general:
{ "_id": 1, "a": 3 },
{ "_id": 2, "a": 3 },
{ "_id": 3, "a": 3 },
{ "_id": 4, "a": 2 },
{ "_id": 5, "a": 1 },
{ "_id": 6, "a": 0 }
Consider those "already sorted" documents ( for convienience ) as an example of results we want to "page" by "two" items per page.
In the first instance you do something like this:
var lastVal = null,
lastSeen = [];
db.collection.find().sort({ "a": -1 }).limit(2).forEach(function(doc) {
if ( lastVal != doc.a ) {
lastSeen = [];
}
lastVal = doc.a;
lastSeen.push( doc._id );
// do something useful with each document matched
});
Now those lastVal and lastSeen are something you store in something like a "session variable" than can be accessed on the next request in terms of web applications, or otherwise something similar where not.
What they should contain though are the very last value you were sorting on and the list of "unique" _id values that were seen since that value did not change. Hence:
lastVal = 3,
lastSeen = [1,2];
The point is that when the request for the "next page" comes around then you want to use those variables for something like this:
var lastVal = 3,
lastSeen = [1,2];
db.collection.find({
"_id": { "$nin": lastSeen },
"a": { "$lte": lastVal }
}).sort({ "a": -1 }).limit(2).forEach(function(doc) {
if ( lastVal != doc.a ) {
lastSeen = [];
}
lastVal = doc.a;
lastSeen.push( doc._id );
// do something useful with each document matched
});
What that does is "exclude" all values of _id that are recorded in lastSeen from the list of results, as well as make sure that all results need to be "less than or equal to" ( descending order ) the lastVal recorded for the sort field "a".
This yields the next two results in the collection:
{ "_id": 3, "a": 3 },
{ "_id": 4, "a": 2 },
But after processing our values now look like this:
lastVal = 2,
lastSeen = [4];
So now the logic follows that you don't need to exclude the other _id values seen before since you are only really looking for values of "a" than are "less than or equal to" the lastVal and since there was only "one" _id value seen at that value then only exclude that one.
This of course yields the next page on using the same code as just above:
{ "_id": 5, "a": 1 },
{ "_id": 6, "a": 0 }
That is the most effiecient way to "forward page" through results in general and is particularly useful for efficient paging of "sorted" results.
If however you want to "jump" to page 20 or similar action at any stage then this is not for you. You are stuck with the traditional .skip() and .limit() approach to be able to do this by "page number" since there is no other rational way to "calculate" this.
So it all depends on how your application is implementing "paging" and what you can live with. The .skip() and .limit() approach suffers the performance of "skipping" and can be avoided by using the approach here.
On the other hand, if you want "jump to page" then "skipping" is your only real option unless you want to build a "cache" of results. But that's another issue entirely.

PHP MongoDB selecting fields by key (distinct)

I have a database with lottery games across the world.
This is how each document look like:
Each game can appear in different country_code or state_code if country have states (Canada, USA).
Selecting all game_id's and then all the countries and/or states it belongs to is done like this:
// get all games
// $colCurrent = MongoCollection Object
$gamesRes = $colCurrent->distinct('game_id');
foreach($gamesRes as $gameId) {
$disCountries = $colCurrent->distinct('country_code',array('game_id' => $gameId));
$disStates = $colCurrent->distinct('state_code',array('game_id' => $gameId));
}
I believe this is inappropriate way to do it, as it does a lot of queries to database.
I've tried using aggregate function, but it only select 1 field like distinct.
Anyone can help optimizing this query?
Thanks a lot!

Depending on what you are trying to achieve and the size of your data set, there are a few different approaches you can take.
Some examples using the Aggregation Framework in the mongo shell (MongoDB 2.2+):
1) Find all games and for each game create the set of unique country_code and state_code values:
db.games.aggregate(
{ $group: {
_id: { gameId: "$game_id" },
countries: { $addToSet: "$country_code" },
states: { $addToSet: "$state_code" }
}}
)
2) Find all games, and group by the unique combination of gameId, country_code, and state_code including a count:
db.games.aggregate(
{ $group: {
_id: {
gameId: "$game_id",
country_code: "$country_code",
state_code: "$state_code"
},
total: { $sum: 1 }
}}
)
In this second example, note that the _id used for grouping can include multiple fields.
If you don't want to group on all the documents in the collection, you could make these aggregations more efficient by starting with the $match operator to limit the pipeline to the data you need ($match can also take advantage of a suitable index).

Assuming you mean the total number of distinct countries and the total number of distinct states for each game_name (assuming one per game_id and that this is more readable [ interchange if needed ] )
Posting as mongo shell for general clarity, adapt to your driver and language as required:
db.lottery.aggregate([
{$project: { country_code: 1, state_code: 1, game_name: 1 } },
{$group: {
_id: "$game_name",
countries: {$addToSet: "$country_code"},
states: {$addToSet: "$state_code"}
}},
{$unwind: "$countries"},
{$group:{ _id: { id: "$_id", states: "$states" }, country_count: {$sum: 1 } }},
{$project: { _id: 0, game: "$_id.id", countries: "$country_count", states: "$_id.states" }},
{$unwind: "$states"},
{$group: { _id: { id: "$game", countries: "$countries" }, state_count: {$sum: 1 } }},
{$project: { _id: 0, game: "$_id.id", countries: "$_id.countries", states: "$state_count" }},
{$sort: { game: 1 }}
])
So there are a few fancy stages here:
Project the fields that are needed only
Group on the game and push each country and state into an array of its own
Now unwind the countries to get one record per country
Group a sum of the countries while retaining the game and states field array
Project --optional-- to make the records appear more natural. $group messes with _id
Unwind the states to get one record per state
Group a sum of the states while retaining game and countries count
Project into something more natural
Sorting --optional-- by what you like. In this case the name of the game
Phew! A reasonably hefty aggregate but it does show a way to work out the problem.
DISCLAIMER:
I have made the huge assumption here that your data does make some sense already and that there are not multiple records of games per country and/or per state. The additional "I didn't do it" part is that your code did not discern states within countries so "I didn't do it either" :-P
You can add in $group stages to do that though. Part of the fun of programming is learning and working out how to do things by yourself. So this should be a good place to start if not a perfect fit.
The reference is a really good place for learning how to apply all the operators used here. Apply one stage at a time ( data size permitting ) to get a good idea of what is going on in each step.

Elasticsearch - Create report filters using Bool (MUST & AND) DSL query

I am trying to create some report filters where the user can search for profiles using any fields on the report. For example: search for any profile with firstname that starts with ann and grade that starts with vi etc.
Here is a query I have written so far:
{
from: 20,
size: 20,
query: {
filtered: {
query: {
match_all: [ ]
},
filter: {
bool: {
must: [
{
prefix: {
firstname: "ann"
}
},
{
prefix: {
grade: "vi"
}
}
]
}
}
}
},
sort: {
grade: {
order: "asc"
}
}
}
If I remove one child of must (in the bool filter), it works. But it doesn't return any results once I use more than one filters and I need to be able to use any number of entries in there.
Also, if I use should instead of must, it works. I'm not sure if I'm misunderstanding the logic, but to my understanding (in this case) must should return ONLY results with firstname that starts with ann and grade that starts with vi.
They do exist, but this query just doesn't find them.
Am I missing something here?
Thanks

Since, I cannot post comments yet. I'm answering with some assumptions.
First of all, I'm using ES 0.90.2 version and your query works fine for my inputs. However, depending on your input size and the platform that you executed your query, my answer may not be the right one.
Assumption: Number of data in the index is less than 20.
I've added following inputs to my index:
'{"name": "ann", "grade": "vi"}'
'{"name": "ann", "grade": "ii"}'
'{"name": "johan", "grade": "vi"}'
'{"name": "johan", "grade": "ii"}'
And my test query was the same as yours, and here is the result:
"hits" : {
"total" : 2,
"max_score" : null,
"hits" : [ ] // <-- see this part is blank
}
As you can see, it didn't listed hits, but there are two hits. That's because of the from:20 code segment. If you change that value, you can see some results. If you want to see all results just delete that part.
Note: Well if this is not the case, sorry for bothering :(

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.