**
UPDATE
**
I posted an answer as it's been confirmed to be an issue
**
ORIGINAL
**
First, I apologize -- I have just started using MongoDB yesterday, and I am still pretty new at this. I have a pretty simple query, and using PHP my findings are this:
Mongo version is 2.0.4, running on CentOS 6.2 (Final) x64
$start = microtime(true);
$totalactive = $db->people->count(array('items'=> array('$gt' => 1)));
$end = microtime(true);
printf("Query lasted %.2f seconds\n", $end - $start);
Without index, it returns:
Query lasted 0.15 seconds
I have 280,000 records in people the database. So I thought adding an index on "items" should be helpful, because I query this data a lot. But to my disbelief, after adding the index I get this:
Query lasted 0.25 seconds
Am I doing anything wrong?
Instead of count i used find to get the explain and this is the output:
> db.people.find({ 'items' : { '$gte' : 1 } }).explain();
{
"cursor" : "BtreeCursor items_1",
"nscanned" : 206396,
"nscannedObjects" : 206396,
"n" : 206396,
"millis" : 269,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"items" : [
[
1,
1.7976931348623157e+308
]
]
}
}
If I change my query to be "$ne" 0, it takes 10ms more!
Here are the collection stats:
> db.people.stats()
{
"ns" : "stats.people",
"count" : 281207,
"size" : 23621416,
"avgObjSize" : 84.00009957077881,
"storageSize" : 33333248,
"numExtents" : 8,
"nindexes" : 2,
"lastExtentSize" : 12083200,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 21412944,
"indexSizes" : {
"_id_" : 14324352,
"items_1" : 7088592
},
"ok" : 1
}
I have 1GB of ram free, so I believe the index fits in memory.
Here's the people index, as requested:
> db.people.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "stats.people",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"items" : 1
},
"ns" : "stats.people",
"name" : "items_1"
}
]
Having an index can be beneficial for two reasons:
when accessing only a small part of the collection (because of a restrictive filter that can be satisfied by the index). Rule of thumb is less than 10%.
when the collection does not need to be accessed at all (because all necessary data is in the index, both for the filtering, and for the result set). This will be indicated by "indexOnly = true".
For the "find" query, both of this is not true: You are accessing almost the whole collection (206396 out of 281207) and need all fields data. So you will go through the index first, and then through almost the whole collection anyway, defeating the purpose of the index. Just reading the whole collection would have been faster.
I would have expected the "count" query to perform better (because that can be satisfied by just going through the index). Can you get an explain for that, too?
Look at this:
http://www.mongodb.org/display/DOCS/Indexing+Advice+and+FAQ#IndexingAdviceandFAQ-5.MongoDB%27s%24neor%24ninoperator%27saren%27tefficientwithindexes.
Which made me consider this solution. How about this?
$totalactive = $db->people->count() - $db->people->count(array('items'=> array('$eq' => 1)));
This was confirmed to be a bug or something that needed optimization in the MongoDB engine. I posted this in the mongo mailing list and the response I received from Eliot Horowitz
That's definitely a bug, or at least a path that could be way better
optimized. Made a case: https://jira.mongodb.org/browse/SERVER-5607
Priority: Major
Fix Version/s: 2.3 desired
Type: Bug
Thanks for those who helped confirming this was a bug =)
Can you please provide an example of an object in this collection? The "items" field is an array? If so, I would recommend you add a new field "itemCount" and put an index on that. Doing $gt on this field will be extremely fast.
This is because your queries are near-full collection scans. The query optimizer is picking to use the index, when it should not use it for optimum performance. It's counterintuitive, yes, but it's because the cursor is walking the index b-tree and fetching the documents that the tree points to, which is slower than just walking the collection if it has to scan the almost the whole tree.
If you really need to do this kind of query, and you want to use that index for other things, like sorting, you can use .hint({$natural: 1}), to tell the query to not use the index.
Coincidentally, I posted about a similar issue in a blog post recently: http://wes.skeweredrook.com/testing-with-mongodb-part-1/
Related
I have a document in mongodb with 2 level deep nested array of objects that I need to update, something like this:
{
id: 1,
items: [
{
id: 2,
blocks: [
{
id: 3
txt: 'hello'
}
]
}
]
}
If there was only one level deep array I could use positional operator to update objects in it but for second level the only option I've came up is to use positional operator with nested object's index, like this:
db.objects.update({'items.id': 2}, {'$set': {'items.$.blocks.0.txt': 'hi'}})
This approach works but it seems dangerous to me since I'm building a web service and index number should come from client which can send say 100000 as index and this will force mongodb to create an array with 100000 indexes with null value.
Are there any other ways to update such nested objects where I can refer to object's ID instead of it's position or maybe ways to check if supplied index is out of bounds before using it in query?
Here's the big question, do you need to leverage Mongo's "addToSet" and "push" operations? If you really plan to modify just individual items in the array, then you should probably build these arrays as objects.
Here's how I would structure this:
{
id: 1,
items:
{
"2" : { "blocks" : { "3" : { txt : 'hello' } } },
"5" : { "blocks" : { "1" : { txt : 'foo'}, "2" : { txt : 'bar'} } }
}
}
This basically transforms everything in to JSON objects instead of arrays. You lose the ability to use $push and $addToSet but I think this makes everything easier. For example, your query would look like this:
db.objects.update({'items.2': {$exists:true} }, {'$set': {'items.2.blocks.0.txt': 'hi'}})
You'll also notice that I've dumped the "IDs". When you're nesting things like this you can generally replace "ID" with simply using that number as an index. The "ID" concept is now implied.
This feature has been added in 3.6 with expressive updates.
db.objects.update( {id: 1 }, { $set: { 'items.$[itm].blocks.$[blk].txt': "hi", } }, { multi: false, arrayFilters: [ { 'itm.id': 2 }, { 'blk.id': 3} ] } )
The ids which you are using are linear number and it has to come from somewhere like an additional field such 'max_idx' or something similar.
This means one lookup for the id and then update. UUID/ObjectId can be used for ids which will ensure that you can use Distributed CRUD as well.
Building on Gates' answer, I came up with this solution which works with nested object arrays:
db.objects.updateOne({
["items.id"]: 2
}, {
$set: {
"items.$.blocks.$[block].txt": "hi",
},
}, {
arrayFilters: [{
"block.id": 3,
}],
});
MongoDB 3.6 added all positional operator $[] so if you know the id of block that need update, you can do something like:
db.objects.update({'items.blocks.id': id_here}, {'$set': {'items.$[].blocks.$.txt': 'hi'}})
db.col.update({"items.blocks.id": 3},
{ $set: {"items.$[].blocks.$[b].txt": "bonjour"}},
{ arrayFilters: [{"b.id": 3}] }
)
https://docs.mongodb.com/manual/reference/operator/update/positional-filtered/#update-nested-arrays-in-conjunction-with
This is pymongo function for find_one_and_update. I searched a lot to find the pymongo function. Hope this will be useful
find_one_and_update(filter, update, projection=None, sort=None, return_document=ReturnDocument.BEFORE, array_filters=None, hint=None, session=None, **kwargs)
Added reference and pymongo documentation in comments
I am writing a PHP MongoClient Model which accesses mongodb that stores deploy logs with gitlab information, server hosts, and zend restart instructions. I have a mongo Collection called deployAppConfigs. Its document structure looks like this:
{
"_id" : ObjectId("54de193790ded22d1cd24c36"),
"app_name" : "ai2_api",
"name" : "AI2 Admin API",
"app_directory" : "path_to_app",
"app_owner" : "www-data:deployers",
"directories" : [],
"vcs" : {
"type" : "git",
"name" : "input/ai2-api"
},
"environments" : {
"development" : {
...
},
"qa" : {
...
},
"staging" : {
...
},
"production" : {
...
},
"actions" : {
"post_checkout" : [
"composer_install"
]
}
}
Because there are many documents in this collection, I would like to query the entire collection for only the "vcs" sub document and the "app_name". I am able to execute this command in Robomongo's mongo shell with the following find() query:
db.deployAppConfigs.find({}, {"vcs": 1, "app_name": 1})
This returns exactly what I want for each document in the collection:
{
"_id" : ObjectId("54de193790ded22d1cd24c36"),
"app_name" : "ai2_api",
"vcs" : {
"type" : "git",
"name" : "input/ai2-api"
}
}
I am having a problem writing a PHP MongoClient equivalent to that mongo shell command. I basically want to make a PHP MongoClient version of this mongo docs example on Limit Fields to Return from a Query
I have tried using an empty array to replace the "{}" in the mongo shell command like this, but it hasn't worked:
$query = array (
array(),
array("vcs"=> 1, "app_name"=> 1)
);
All the fields share the vcs.type = "git" so I tried wrote a query that selects all fields in every document based on that shared value. It looks like this:
$query = array (
"vcs.type" => "git"
);
But this returns the entire document, which is what I want to avoid.
The alternative could be to do a limit projection find() for the first document in the collection and then use the MongoCursor to iterate through the whole collection, but I'd rather not have to do the extra loop if possible.
Essentially, I am asking how to limit the return fields of a find() query to only one subdocument of each document in the entire collection.
looks like I was able to find the solution... I will solve the question and leave it up in case it ends up being useful to anyone else.
What I ended up having to do was alter my MongoClient custom class find() function, which calls the $collection->find() query, to include a $fields parameter.
Now, the MongoClient->find() query looks like this:
$collection->find(
array("vcs.type" => "git"),
array("vcs" => 1, "app_name" = 1)
)
Found the answer on the MongoClient::cursor::find() : here
Querying with $gt is not working as expected if the date's are same. It's more like $gte.
But if I add 1 second to query param then it works.
Here is the sample query;
I have a document which it's creation_date 1367414837 timestamp.
db.collection.find({creation_date : {'$gt' : new Date(1367414837000)}});
This query matches with the document which date's 1367414837
If i increment the query timestamp just one like 1367414838. it works expected.
Im using mongo console but i have same problem in php with MongoDate
edit: output of query
db.collection.findOne({creation_date : {'$gt' : new Date(1367414837000)}});
{
"_id" : ObjectId("5181183543c51695ce000000"),
"action" : {
"type" : "comment",
"comment" : {
"id" : 74,
"post_id" : "174",
"owner_id" : "5",
"text" : "ne diyeyim lae :D",
"creation_date" : "2013-05-01 16:27:17"
}
},
"creation_date" : ISODate("2013-05-01T13:27:17.336Z"),
"owner" : {
"id" : "5",
"username" : "tylerdurden"
}
}
edit2: problem is php extension of mongo. it's documented " any precision beyond milliseconds will be lost when the document is sent to/from the database."
http://php.net/manual/en/class.mongodate.php
I incremented query param one second as a turnaround solution.
Dates in BSON are UNIX dates equal to milliseconds since epoch; they're accurate down to the millisecond. If the times you're inserting (and trying to match against) are accurate to the millisecond level, the element you're trying to match is possibly just a few milliseconds later than the timestamp you're querying, and $gt is likely working as expected. (2013-05-01T13:27:17.001Z is indeed later than 2013-05-01T13:27:17Z, for example.)
May be this is a silly question, but anyway I have the doubt.
Please take a look at this query:
db.posts.find({ "blog": "myblog",
"post_author_id": 649,
"shares.total": { "$gt": 0 } })
.limit(10)
.skip(1750)
.sort({ "shares.total": -1, "tstamp_published": -1 });
actually I see into the mongodb profiler this report:
mongos> db.system.profile.find({ nreturned : { $gt : 1000 } }).limit(10).sort( { millis : 1 } ).pretty();
{
"ts" : ISODate("2013-04-04T13:28:08.906Z"),
"op" : "query",
"ns" : "mydb.posts",
"query" : {
"$query" : {
"blog" : "myblog",
"post_author_id" : 649,
"shares.total" : {
"$gt" : 0
}
},
"$orderby" : {
"shares.total" : -1,
"tstamp_published" : -1
}
},
"ntoreturn" : 1760,
"nscanned" : 12242,
"scanAndOrder" : true,
"nreturned" : 1760,
"responseLength" : 7030522,
"millis" : 126,
"client" : "10.0.232.69",
"user" : ""
}
Now the question is: why mongodb is returning 1760 documents when I have explicitly asked to skip 1750?
This is my current Mongodb version, in cluster/sharding.
mongos> db.runCommand("buildInfo")
{
"version" : "2.0.2",
"gitVersion" : "514b122d308928517f5841888ceaa4246a7f18e3",
"sysInfo" : "Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41",
"versionArray" : [
2,
0,
2,
0
],
"bits" : 64,
"debug" : false,
"maxBsonObjectSize" : 16777216,
"ok" : 1
}
Now the question is: why mongodb is returning 1760 documents when I have explicitly asked to skip 1750?
Because the server side skip() does exactly that: it iterates over the first 1750 results and then gets 10 more (according to the limit).
As #devesh says, this is why very large pagination should be avoided since MongoDB does not make effective use of an index for skip() or limit().
I think you have hit a bulls eye , I think it is reason why mongoDB document asks us to avoid the large skips http://docs.mongodb.org/manual/reference/method/cursor.skip/ . Please have a look here It will answer your outcome . Use some other key which will be used with $gt operator will be much faster. Like Datetime stamp of last key in the page 1 then use the $get on the datetime.
The cursor.skip() method is often expensive because it requires the server to walk from the beginning of the collection or index to get the offset or skip position before beginning to return result
I was using a PHP mongo command:
$db->command(array("create" => $name, "size" => $size, "capped" => true, "max" => $max));
And my collections grew way past their supposed capped limits. I put on a fix:
$db->createCollection($name, true, $size, $max);
Currently, the counts are so low I can't tell whether the 'fix' worked.
How can you tell if a collection is capped, either from the shell or PHP? I wasn't able to find this information in the system.namespaces.
Turns out there's also the isCapped() function.
db.foo.isCapped()
In the shell, use db.collection.stats(). If a collection is capped:
> db.my_collection.stats()["capped"]
1
If a collection is not capped, the "capped" key will not be present.
Below are example results from stats() for a capped collection:
> db.my_coll.stats()
{
"ns" : "my_db.my_coll",
"count" : 221,
"size" : 318556,
"avgObjSize" : 1441.4298642533936,
"storageSize" : 1000192,
"numExtents" : 1,
"nindexes" : 0,
"lastExtentSize" : 1000192,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 0,
"indexSizes" : {
},
"capped" : 1,
"max" : 2147483647,
"ok" : 1
}
This is with MongoDB 1.7.4.
From the shell:
db.system.namespaces.find()
You'll see a list of all collections and indexes for the given db. If a collection is capped, that will be indicated.
For PHP:
$collection = $db->selectCollection($name);
$result = $collection->validate();
$isCapped = isset($result['capped']);