I have a pretty big MongoDB document that holds all kinds of data. I need to identify the fields that are of type array in a collection so I can remove them from the displayed fields in the grid that I will populate.
My method now consists of retrieving all the field names in the collection with
This was taken from the response posted here MongoDB Get names of all keys in collection
mr = db.runCommand({
"mapreduce" : "Product",
"map" : function() {
for (var key in this) { emit(key, null); }
},
"reduce" : function(key, stuff) { return null; },
"out": "things" + "_keys"
})
db[mr.result].distinct("_id")
And running for each of the fields a query like this one
db.Product.find( { $where : "Array.isArray(this.Orders)" } ).count()
If there's anything retrieved the field is considered an array.
I don't like that I need to run n+2 queries ( n being the number of different fields in my collection ) and I wouldn't like to hardcode the fields in the model. It would defeat the whole purpose of using MongoDB.
Is there a better method of doing this ?
I made a couple of slight modifications to the code you provided above:
mr = db.runCommand({
"mapreduce" : "Product",
"map" : function() {
for (var key in this) {
if (Array.isArray(this[key])) {
emit(key, 1);
} else {
emit(key, 0);
}
}
},
"reduce" : function(key, stuff) { return Array.sum(stuff); },
"out": "Product" + "_keys"
})
Now, the mapper will emit a 1 for keys that contain arrays, and a 0 for any that do not. The reducer will sum these up, so that when you check your end result:
db[mr.result].find()
You will see your field names with the number of documents in which they contain Array values (and a 0 for any that are never arrays).
So this should give you which fields contain Array types with just the map-reduce job.
--
Just to see it with some data:
db.Product.insert({"a":[1,2,3], "c":[1,2]})
db.Product.insert({"a":1, "b":2})
db.Product.insert({"a":1, "c":[2,3]})
(now run the "mr =" code above)
db[mr.result].find()
{ "_id" : "_id", "value" : 0 }
{ "_id" : "a", "value" : 1 }
{ "_id" : "b", "value" : 0 }
{ "_id" : "c", "value" : 2 }
Related
I have a document in mongodb with 2 level deep nested array of objects that I need to update, something like this:
{
id: 1,
items: [
{
id: 2,
blocks: [
{
id: 3
txt: 'hello'
}
]
}
]
}
If there was only one level deep array I could use positional operator to update objects in it but for second level the only option I've came up is to use positional operator with nested object's index, like this:
db.objects.update({'items.id': 2}, {'$set': {'items.$.blocks.0.txt': 'hi'}})
This approach works but it seems dangerous to me since I'm building a web service and index number should come from client which can send say 100000 as index and this will force mongodb to create an array with 100000 indexes with null value.
Are there any other ways to update such nested objects where I can refer to object's ID instead of it's position or maybe ways to check if supplied index is out of bounds before using it in query?
Here's the big question, do you need to leverage Mongo's "addToSet" and "push" operations? If you really plan to modify just individual items in the array, then you should probably build these arrays as objects.
Here's how I would structure this:
{
id: 1,
items:
{
"2" : { "blocks" : { "3" : { txt : 'hello' } } },
"5" : { "blocks" : { "1" : { txt : 'foo'}, "2" : { txt : 'bar'} } }
}
}
This basically transforms everything in to JSON objects instead of arrays. You lose the ability to use $push and $addToSet but I think this makes everything easier. For example, your query would look like this:
db.objects.update({'items.2': {$exists:true} }, {'$set': {'items.2.blocks.0.txt': 'hi'}})
You'll also notice that I've dumped the "IDs". When you're nesting things like this you can generally replace "ID" with simply using that number as an index. The "ID" concept is now implied.
This feature has been added in 3.6 with expressive updates.
db.objects.update( {id: 1 }, { $set: { 'items.$[itm].blocks.$[blk].txt': "hi", } }, { multi: false, arrayFilters: [ { 'itm.id': 2 }, { 'blk.id': 3} ] } )
The ids which you are using are linear number and it has to come from somewhere like an additional field such 'max_idx' or something similar.
This means one lookup for the id and then update. UUID/ObjectId can be used for ids which will ensure that you can use Distributed CRUD as well.
Building on Gates' answer, I came up with this solution which works with nested object arrays:
db.objects.updateOne({
["items.id"]: 2
}, {
$set: {
"items.$.blocks.$[block].txt": "hi",
},
}, {
arrayFilters: [{
"block.id": 3,
}],
});
MongoDB 3.6 added all positional operator $[] so if you know the id of block that need update, you can do something like:
db.objects.update({'items.blocks.id': id_here}, {'$set': {'items.$[].blocks.$.txt': 'hi'}})
db.col.update({"items.blocks.id": 3},
{ $set: {"items.$[].blocks.$[b].txt": "bonjour"}},
{ arrayFilters: [{"b.id": 3}] }
)
https://docs.mongodb.com/manual/reference/operator/update/positional-filtered/#update-nested-arrays-in-conjunction-with
This is pymongo function for find_one_and_update. I searched a lot to find the pymongo function. Hope this will be useful
find_one_and_update(filter, update, projection=None, sort=None, return_document=ReturnDocument.BEFORE, array_filters=None, hint=None, session=None, **kwargs)
Added reference and pymongo documentation in comments
I've got a problem updating an array element in MongoDB. This is the structure of a document:
{
"_id" : ObjectId("57e2645e11c979157400046e"),
"site" : "BLABLA",
"timestamp_hour" : 1473343200,
"values" : [
{
"1473343200" : 66
},
{
"1473344100" : 230
},
{
"1473345000" : 479
},
{
"1473345900" : 139
}
]
}
Now I want to update the element with key "1473345900". How can I do this? I've tried:
db.COLLECTIONNAME.update({"values.1473345900": {$exists:true}}, {$set: {"values.$": 0}})
But after that the document looks like:
{
"_id" : ObjectId("57e2645e11c979157400046e"),
"site" : "BLABLA",
"timestamp_hour" : 1473343200,
"values" : [
{
"1473343200" : 66
},
{
"1473344100" : 230
},
{
"1473345000" : 479
},
0
]
}
What I'm doing wrong? I only want to update the value of 1473345900 to any value... I don't want to update the complete element...
Thanks a lot!!!
You need to add an additional query in your update that matches the array element you want to update. A typical query would involve checking for the element's value not equal to the one being updated.
The following example update shows this where the $ positional operator identifies the correct index position of the hash key array element { "1473345900": 139 }. If you try to run the update operation without the $ positional operator:
db.COLLECTIONNAME.update(
{ "values.1473345900": { "$exists": true } },
{ "$set": { "values.1473345900": 0 } }
)
mongo will treat the timestamp 1473345900 as the index position and thus you will get the error
can't backfill array to larger than 1500000 elements
Thus the correct way should be:
var val = 32;
db.COLLECTIONNAME.update(
{ "values.1473345900": { "$ne": val, "$exists": true } },
{ "$set": { "values.$.1473345900": val } }
)
I'm new to the map reduce concept and even though I'm making some slow progress, I'm finding some issues that I need some help with.
I have a simple collection consisting of an id, city and and destination, something like this:
{ "_id" : "5230e7e00000000000000000", "city" : "Boston", "to" : "Chicago" },
{ "_id" : "523fe7e00000000000000000", "city" : "New York", "to" : "Miami" },
{ "_id" : "5240e1e00000000000000000", "city" : "Boston", "to" : "Miami" },
{ "_id" : "536fe4e00000000000000000", "city" : "Washington D.C.", "to" : "Boston" },
{ "_id" : "53ffe7e00000000000000000", "city" : "New York", "to" : "Boston" },
{ "_id" : "5740e1e00000000000000000", "city" : "Boston", "to" : "Miami" },
...
(Please do note that this data is just made up for example purposes)
I'd like to group by city the destinations including a count:
{ "city" : "Boston", values : [{"Chicago",1}, {"Miami",2}] }
{ "city" : "New York", values : [{"Miami",1}, {"Boston",1}] }
{ "city" : "Washington D.C.", values : [{"Boston", 1}] }
For this I'm starting to playing with this function to map:
function() {
emit(this.city, this.to);
}
which performs the expected grouping. My reduce function is this:
function(key, values) {
var reduced = {"to":[]};
for (var i in values) {
var item = values[i];
reduced.to.push(item);
}
return reduced;
}
which gives somewhat an expected output:
{ "_id" : ObjectId("522f8a9181f01e671a853adb"), "value" : { "to" : [ "Boston", "Miami" ] } }
{ "_id" : ObjectId("522f933a81f01e671a853ade"), "value" : { "to" : [ "Chicago", "Miami", "Miami" ] } }
{ "_id" : ObjectId("5231f0ed81f01e671a853ae0"), "value" : "Boston" }
As you can see, I still haven't counted the repeated cities, but as can be seen above, for some reason the last result in the output doesn't look good. I'd expected it to be
{ "_id" : ObjectId("5231f0ed81f01e671a853ae0"), "value" : { "to" : ["Boston"] } }
Has this anything to do with the fact that there is a single item? Is there any way to obtain this?
Thank you.
I see you are asking about a PHP issue, but you are using javascript to ask, so I’m assuming a javascript answer will help you move things along. As such here is the javascript needed in the shell to run your aggregation. I strong suggest getting your aggregation working in the shell(or some other javascript editor) in general and then translating it into the language of your choice. It is a lot easier to see what is going on and there faster using this method. You can then run:
use admin
db.runCommand( { setParameter: 1, logLevel: 2 } )
to check the bson output of your selected language vs what the shell looks like. This will appear in the terminal if mongo is in the foreground, otherwise you’ll have ot look in the logs.
Summing the routes in the aggregation framework [AF] with Mongo is fairly strait forward. The AF is faster and easier to use then map reduce[MR]. Though in this case they both have similar issues, simply pushing to an array won’t yield a count in and of itself (in MR you either need more logic in your reduce function or to use a finalize function).
With the AF using the example data provided this pipeline is useful:
db.agg1.aggregate([
{$group:{
_id: { city: "$city", to: "$to" },
count: { $sum: 1 }
}},
{$group: {
_id: "$_id.city",
to:{ $push: {to: "$_id.to", count: "$count"}}
}}
]);
The aggregation framework can only operate on known fields, but many pipeline operations so a problem needs to broken down with that as a consideration.
Above, the 1st stage calculates the numbers need, for which there are 3 fixed fields: the source, the destination, and the count.
The second stage has 2 fixed fields, one of which is an array, which is only being pushed to (all the data for the final form is there).
For MR you can do this:
var map = function() {
var key = {source:this.city, dest:this.to};
emit(key, 1);
};
var reduce = function(key, values) {
return Array.sum(values);
};
A separate function will have to pretty it however.
If you have any additional questions please don’t hesitate to ask.
Best,
Charlie
I am new to NoSQL and MongoDB and I am a little puzzled on what type of queries I can do and how to do them. my knowledge is limited to simpler queries
I would like to make what I think its a complicated query within MongoDB instead of using PHP to sort it but I do not know if it is possible or how to do it.
I have a tag field within my collection that is an array. {tag: ["blue","red","yellow","green","violet"]}.
First level problem: Let says I want to find all birds that have the tag blue & yellow & green, where blue is a must have tag and any other colours are optional.
Second level problem: Then I would like to order the query so that the birds that have all the queried colours appear first.
Is it possible to create this query in mongoDB? and if it is How could I do it?
You can use aggregation framework. So for the next dataset:
{ "_id":ObjectId(...), "bird":1, "tags":["blue","red","yellow","green","violet"]}
{ "_id":ObjectId(...), "bird":2, "tags":["red","yellow","green","violet"] }
{ "_id":ObjectId(...), "bird":3, "tags":["blue","yellow","violet"] }
{ "_id":ObjectId(...), "bird":4, "tags":["blue","yellow","red","violet"] }
{ "_id":ObjectId(...), "bird":5, "tags":["blue"] }
we can apply next query:
colors = ["blue","red","yellow","green"];
db.birds.aggregate(
{ $match: {tags: 'blue'} },
{ $project: {_id:0, bird:1, tags:1} },
{ $unwind: '$tags' },
{ $match: {tags: {$in: colors}} },
{ $group: {_id:'$bird', score: {$sum:1}} },
{ $sort: {score:-1} },
{ $project: {bird:'$_id', score:1, _id:0} }
)
and will get result like this:
{
"result" : [
{ "score" : 4, "bird" : 1 },
{ "score" : 3, "bird" : 4 },
{ "score" : 2, "bird" : 3 },
{ "score" : 1, "bird" : 5 }
],
"ok" : 1
}
Most of this you will have to do in your application. In order to find all documents where a bird has the tag "blue", you can do this:
db.collection.find( { tag: "blue" } );
Which colours are optional doesn't matter, as you have to find by the required tag anyway.
After finding them, you need to do a sort. But sorting like you want (by their 3 colours) is not something you can do in MongoDB, and something you will have to do in PHP instead.
Is that possible to sort data in sub array in mongo database?
{ "_id" : ObjectId("4e3f8c7de7c7914b87d2e0eb"),
"list" : [
{
"id" : ObjectId("4e3f8d0be62883f70c00031c"),
"datetime" : 1312787723,
"comments" :
{
"id" : ObjectId("4e3f8d0be62883f70c00031d")
"datetime": 1312787723,
},
{
"id" : ObjectId("4e3f8d0be62883f70c00031d")
"datetime": 1312787724,
},
{
"id" : ObjectId("4e3f8d0be62883f70c00031d")
"datetime": 1312787725,
},
}
],
"user_id" : "3" }
For example I want to sort comments by field "datetime". Thanks. Or only variant is to select all data and sort it in PHP code, but my query works with limit from mongo...
With MongoDB, you can sort the documents or select only some parts of the documents, but you can't modify the documents returned by a search query.
If the current order of your comments can be changed, then the best solution would be to sort them in the MongoDB documents (find(), then for each doc, sort its comments and update()). If you want to keep the current internal order of comments, then you'll have to sort each document after each query.
In both case, the sort will be done with PHP. Something like:
foreach ($doc['list'] as $list) {
// uses a lambda function, PHP 5.3 required
usort($list['comments'], function($a,$b){ return $a["datetime"] < $b["datetime"] ? -1 : 1; });
}
If you can't use PHP 5.3, replace the lambda function by a normal one. See usort() examples.