Mongodb and sorting sub array

Mongodb and sorting sub array - php

Not sure if this can be done, so thought I would ask.
I have the following mongodb/s
{
"store":"abc",
"offers":[{
"spend":"100.00",
"cashback":"10.00",
"percentage":"0.10"
},{
"spend":"50.00",
"cashback":"5.00",
"percentage":"0.10"
}]
}
and
{
"store":def",
"offers":[{
"spend":"50.00",
"cashback":"2.50",
"percentage":"0.05"
},{
"spend":"20.00",
"cashback":"1.00",
"percentage":"0.05"
}]
}
and
{
"store":ghi",
"offers":[{
"spend":"50.00",
"cashback":"5.00",
"percentage":"0.10"
},{
"spend":"20.00",
"cashback":"2.00",
"percentage":"0.10"
}]
}
the sort needs to be by percentage.
I am not sure if I would have to use usort of another PHP function to do it, or if Mongodb is smart enough to do what I want to do.

Amazingly, yes, mongodb can do this:
// Sort ascending, by minimum percentage value in the docs' offers array.
db.collection.find({}).sort({ 'offers.percentage': 1 });
// Sort descending, by maximum percentage value in the docs' offers array.
db.collection.find({}).sort({ 'offers.percentage': -1 });

Given your data structure of arrays within documents, I don't think it makes sense to do this sort in MongoDB -- Mongo will be returning entire documents (not arrays).
If you are trying to compare offers it would probably make more sense to have a separate collection instead of an embedded array. For example, you could then find offers matching a cashback of at least $5 sorted by spend or percentage discount.
If you are just trying to order the offers within a single document, you could do this in PHP with a usort().

Related

Updating field in nested documents in mongodb php

I use MongoDB with PHP driver, so for convenience I will write the query with this syntax,
I would like to find a more elegant solution that I found today for the following problem.
I have this collection "Story" with nested document:
Collection Story:
{
"_id":"Story1",
"title":null,
"slug":null,
"sections":[
{
"id_section":"S1",
"index":0,
"type":"0",
"elements":[
{
"id_element":"001",
"text":"img",
"layout":1
}
]
},
{
"id_section":"S2",
"index":0,
"type":"0",
"elements":[
{
"id_element":"001",
"text":"hello world",
"layout":1
},
{
"id_element":"002",
"text":"default text",
"layout":1
},
{
"id_element":"003",
"text":"hello world 3",
"layout":"2"
}
]
}
]
}
Assuming you want to change the value of the element with id_element => 002 present in section with id_section => S2 of Story with _id => Story1
The solution I've found now is to find the "position" of element 002 and do the following
1]
$r=$m->db->plot->findOne(array("_id" => 'Story1',
"sections.id_section"=>'S2'),
array('_id'=>false,'sections.$.elements'=>true));
2]
foreach($r['sections'][0]['elements'] as $key=>$value){
if($value['id_element']=='002'){
$position=$key;
break;
}
3]
$m->db->story->update(array('_id'=>'Story1','sections.id_section'=>'S2','sections.elements.id_element'=>'002'),
array('$set'=>array('sections.$.elements.'.$position.'.text'=>'NEW TEXT')),
array('w'=>1));
I repeat that I do not think an elegant solution, and I noticed that it is a common problem.
Thank you for your help
S.

You can't use $ to match multiple levels of nested arrays. This is why it's not a good idea to nest arrays in MongoDB if you anticipate searching on properties anywhere deeper than the top level array. The alternatives for a fixed document structure are to know which positions in all but one of the arrays you want to update at (or to retrieve the document and find out the indexes, as you are doing) or to retrieve the document, update it in the client, and reinsert it.
The other option is to rethink how the data is modeled as documents in MongoDB so that nested arrays don't happen/ In your case, a story is a collection of sections which are collections of elements. Instead of making a story document, you could have a story be represented by multiple section documents. The section documents would share some common field value to indicate they belong to the same story. The above update would then be possible as an update on one section document using the $ positional operator to match and update the correct element.

Mongo DB MapReduce in PHP

First of all it's my first time in Mongo...
Concept:
A user is able to describe an image in natural language.
Divide the user input and store the words he described in a Collection called
words.
Users must be able to go through the most used words and add those words to their description.
The system will use the most used words (for all users) and use
those words to describe the image.
My words document (currently) is as follows (example)
{
"date": "date it was inserted"
"reported": 0,
"image_id": "image id"
"image_name": "image name"
"user": "user _id"
"word": "awesome"
}
The words will be duplicated so that each word can be associated to a user...
Problem: I need to perform a Mongo query to allow me to know the most used words (to describe an image) that were not created by a given user. (to meet point 3. above)
I've seen MapReduce algorithm, but from what I read there are a couple of issues with it:
Can't sort results (I can order from the most used to less used)
In millions of documents it can have a large processing time.
Can't limit the number of the results returned
I've thought about running a task at a given time each day to store on a document (in a different collection) the list the rank of words that a given user hasn't used to describe the given image. I would have to limit this to 300 results or something (any idea on a proper limit??) Something like:
{
user_id: "the user id"
[
{word: test, count: 1000},
{word: test2, count: 980},
{word: etc, count: 300}
]
}
Problems I see with this solution are:
Results would have quite a delay which is not desirable.
Server loads while generating this documents for all users can spike (I actually know very little about this in Mongo so this is just an assumption)
Maybe my approach doesn't make any sense... And maybe my lack of experience in Mongo is pointing me at the wrong "schema design".
Any idea of what could be a good approach for this kind of problem?
Sorry for the big post and thanks for your time and help!
João

As already mentioned you could use the group command which is easy to use, but you will need to sort the result on the client side. Also the result is returned as a single BSON object and for this reason must be fairly small – less than 10,000 keys, else you will get an exception.
Code example based on your data structure:
db.words.group({
key : {"word" : true},
initial: {count : 0},
reduce: function(obj, prev) { prev.count++},
cond: {"user" :{ $ne : "USERNAME_TO_IGNORE"}}
})
Another option is to use the new Aggregation framework, which will be released in the 2.2 version. Something like that should work.
db.words.aggregate({
$match : { "user" : { "$ne" : "USERNAME_TO_IGNORE"} },
$group : {
_id : "$word",
count: { $sum : 1}
}
})
Or you can still use MapReduce. Actually you can limit and sort the output, because the result is
an collection. Just use .sort() and .limit() on the output. Also you can use the incremental
map-reduce output option, which will help you solve your performance issues. Have a look at the out parameter in MapReduce.
Bellow it's an example, which use the incremental feature to merge the existing collection with new data in a words_usage collection:
m = function() {
emit(this.word, {count: 1});
};
r = function( key , values ){
var sum = 0;
values.forEach(function(doc) {
sum += doc.count;
});
return {count: sum};
};
db.runCommand({
mapreduce : "words",
map : m,
reduce : r,
out : { reduce: "words_usage"},
query : <query filter object>
})
# retrieve the top 10 words
db.words_usage.find().sort({"value.count" : -1}).sort({"value.count" : -1}).limit(10)
I guess you can run the above MapReduce command in a cron every few minutes/hours, depends how accurate results you want. For the update query criteria you can use the words documents creation date.
Once you have the system top words collection you can build per user top words or just compute them in real time (depends on the system size).

The group function is supposed to be a simpler version of MapReduce. You could use it like this to get a sum for each word:
db.coll.group(
{key: { a:true, b:true },
cond: { active:1 },
reduce: function(obj,prev) { prev.csum += obj.c; },
initial: { csum: 0 }
});

Mongodb like statement with array

I am trying to save some db action by compiling a looped bit of code with a single query, Before I was simply adding to the the like statements using a loop before firing off the query but i cant get the same idea going in Mongo, id appreciate any ideas....
I am basically trying to do a like, but with the value as an array
('app', replaces 'mongodb' down to my CI setup )
Here's how I was doing it pre mongofication:
foreach ($workids as $workid):
$this->ci->app->or_like('work',$workid) ;
endforeach;
$query = $this->ci->db->get("who_users");
$results = $query->result();
print_r($results);
and this is how I was hoping I could get it to work, but no joy here, that function is only designed to accept strings
$query = $this->ci->app->like('work',$workids,'.',TRUE,TRUE)->get("who_users");
print_r($query);
If anyone can think of a way any cunning methods I can get my returned array with a single call again it would be great I've not found any documentation on this sort of query, The only way i can think of is to loop over the query and push it into a new results array.... but that is really gonna hurt if my app scales up.

Are you using codeigniter-mongodb-library? Based on the existing or_like() documentation, it looks like CI wraps each match with % wildcards. The equivalent query in Mongo would be a series of regex matches in an $or clause:
db.who_users.find({
$or: [
{ work: /.*workIdA.*/ },
{ work: /.*workIdB.*/ },
...
]});
Unfortunately, this is going to be quite inefficient unless (1) the work field is indexed and (2) your regexes are anchored with some constant value (e.g. /^workId.*/). This is described in more detail in Mongo's regex documentation.
Based on your comments to the OP, it looks like you're storing multiple ID's in the work field as a comma-delimited string. To take advantage of Mongo's schema, you should model this as an array of strings. Thereafter, when you query on the work field, Mongo will consider all values in the array (documented discussed here).
db.who_users.find({
work: "workIdA"
});
This query would match a record whose work value was ["workIdA", "workIdB"]. And if we need to search for one of a set of ID's (taking this back to your OR query), we can extend this example with the $in operator:
db.who_users.find({
work: { $in: ["workIdA", "workIdB", ...] }
});
If that meets your needs, be sure to index the work field as well.

MongoDB - Is it possible to query by associative array key?

I need to store some data that is essentially just an array of key-value pairs of date/ints, where the dates will always be unique.
I'd like to be able to store it like an associative array:
array(
"2012-02-26" => 5,
"2012-02-27" => 2,
"2012-02-28" => 17,
"2012-02-29" => 4
)
but I also need to be able to query the dates (ie. get everything where date > 2012-02-27), and so suspect that I'll need to use a schema more like:
array(
array("date"=>"2012-02-26", "value"=>5),
array("date"=>"2012-02-27", "value"=>2),
array("date"=>"2012-02-28", "value"=>17),
array("date"=>"2012-02-29", "value"=>4),
)
Obviously the former is much cleaner and more concise, but will I be able to query it in the way that I am wanting, and if not are there any other schemas that may be more suitable?

You've described two methods, let me break them down.
Method #1 - Associative Array
The key tool for querying by "associative array" is the $exists operator. Here are details on the operator.
So you can definitely run a query like the following:
db.coll.find( { $exists: { 'field.2012-02-27' } } );
Based on your description you are looking for range queries which does not match up well with the $exists operator. The "associative array" version is also difficult to index.
Method #2 - Array of objects
This definitely has better querying functionality:
db.coll.find( { 'field.date': { $gt: '2012-02-27' } } );
It can also be indexed
db.coll.ensureIndex( { 'field.date': 1 } );
However, there is a trade-off on updating. If you want to increment the value for a specific date you have to use this unwieldy $ positional operator. This works for an array of objects, but it fails for anything with further nesting.
Other issues
One issue with either of these methods is the long-term growth of data. As you expand the object size it will take more space on disk and in memory. If you have an object with two years worth of data that entire array of 700 items will need to be in memory for you to update data for today. This may not be an issue for your specific data, but it should be considered.
In the same vein, MongoDB queries always return the top-level object. Again, if you have an array of 700 items, you will get all of them for each document that matches. There are ways to filter out the fields that are returned, but they don't work for "arrays of objects".

Can MongoDB and its drivers preserve the ordering of document elements

I am considering using MongoDB to store documents that include a list of key/value pairs. The safe but ugly and bloated way to store this is as
[ ['k1' : 'v1'] , ['k2' : 'v2'], ...]
But document elements are inherently ordered within the underlying BSON data structure, so in principle:
{k1 : 'v1',
k2 : 'v2', ...}
should be enough. However I expect most language bindings will interpret these as associative arrays, and thus potentially scramble the ordering. So what I need to know is:
Does MongoDB itself promise to preserve item ordering of the second form.
Do language bindings have some API which can extract it ordered form -- even if the usual "convenient" API returns an associative array.
I am mostly interested in Javascript and PHP here, but I would also like to know about other languages. Any help is appreciated, or just a link to some documentation where I can go RTM.

From Version 2.6 on, MongoDB preserves the order of fields where possible. However, the _id field always comes first an renaming fields can lead to re-ordering. However, I'd generally try not to rely on details like this. As the original question mentions, there are also additional layers to consider which each must provide some sort of guarantee for the stability of the order...
Original Answer:
No, MongoDB does not make guarantees about the ordering of fields:
"There is no guarantee that the field order will be consistent, or the same, after an update."
In particular, in-place updates that change the document size will usually change the ordering of fields. For example, if you $set a field whose old value was of type number and the new value is NumberLong, fields usually get re-ordered.
However, arrays preserve ordering correctly:
[ {'key1' : 'value1'}, {'key2' : 'value2'}, ... ]
I don't see why this is "ugly" and "bloated" at all. Storing a list of complex objects couldn't be easier. However, abusing objects as lists is definitely ugly: Objects have associative array semantics (i.e. there can only be one field of a given name), while lists/arrays don't:
// not ok:
db.foo2.insert({"foo" : "bar", "foo" : "lala" });
db.foo2.find();
{ "_id" : ObjectId("4ef09cd9b37bc3cdb0e7fb26"), "foo" : "lala" }
// a list can do that
db.foo2.insert({ 'array' : [ {'foo' : 'bar'}, { 'foo' : 'lala' } ]});
db.foo2.find();
{ "_id" : ObjectId("4ef09e01b37bc3cdb0e7fb27"), "array" :
[ { "foo" : "bar" }, { "foo" : "lala" } ] }
Keep in mind that MongoDB is an object database, not a key/value store.

As of Mongo 2.6.1, it DOES keep the order of your fields:
MongoDB preserves the order of the document fields following write operations except for the following cases:
The _id field is always the first field in the document.
Updates that
include renaming of field names may result in the reordering of
fields in the document.
http://docs.mongodb.org/manual/release-notes/2.6/#insert-and-update-improvements

One of the pain points of this is comparing documents to one another in the shell.
I've created a project that creates a custom mongorc.js which sorts the document keys by default for you when they are printed out so at least you can see what is going on clearly in the shell. It's called Mongo Hacker if you want to give it a whirl.

Though it's true that, as of Mongo 2.6.1, it does preserve order, one should still be careful with update operations.
mattwad makes the point that updates can reorder things, but there's at least one other concern I can think of.
For example $addToSet:
https://docs.mongodb.com/manual/reference/operator/update/addToSet/
$addToSet when used on embedded documents in an array is discussed / exemplified here:
https://stackoverflow.com/a/21578556/3643190
In the post, mnemosyn explains how $addToSet disregards the order when matching elements in its deep value by value comparison.
($addToSet only adds records when they're unique)
This is relevant if one decided to structure data like this:
[{key1: v1, key2: v2}, {key1: v3, key2: v4}]
With an update like this (notice the different order on the embedded doc):
db.collection.update({_id: "id"},{$addToSet: {field:
{key2: v2, key1: v1}
}});
Mongo will see this as a duplicate and NOT this object to the array.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.