How can you tell if a collection is capped? - php

I was using a PHP mongo command:
$db->command(array("create" => $name, "size" => $size, "capped" => true, "max" => $max));
And my collections grew way past their supposed capped limits. I put on a fix:
$db->createCollection($name, true, $size, $max);
Currently, the counts are so low I can't tell whether the 'fix' worked.
How can you tell if a collection is capped, either from the shell or PHP? I wasn't able to find this information in the system.namespaces.

Turns out there's also the isCapped() function.
db.foo.isCapped()

In the shell, use db.collection.stats(). If a collection is capped:
> db.my_collection.stats()["capped"]
1
If a collection is not capped, the "capped" key will not be present.
Below are example results from stats() for a capped collection:
> db.my_coll.stats()
{
"ns" : "my_db.my_coll",
"count" : 221,
"size" : 318556,
"avgObjSize" : 1441.4298642533936,
"storageSize" : 1000192,
"numExtents" : 1,
"nindexes" : 0,
"lastExtentSize" : 1000192,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 0,
"indexSizes" : {
},
"capped" : 1,
"max" : 2147483647,
"ok" : 1
}
This is with MongoDB 1.7.4.

From the shell:
db.system.namespaces.find()
You'll see a list of all collections and indexes for the given db. If a collection is capped, that will be indicated.

For PHP:
$collection = $db->selectCollection($name);
$result = $collection->validate();
$isCapped = isset($result['capped']);

Related

PHP - mongodb client - skip and limit usage

May be this is a silly question, but anyway I have the doubt.
Please take a look at this query:
db.posts.find({ "blog": "myblog",
"post_author_id": 649,
"shares.total": { "$gt": 0 } })
.limit(10)
.skip(1750)
.sort({ "shares.total": -1, "tstamp_published": -1 });
actually I see into the mongodb profiler this report:
mongos> db.system.profile.find({ nreturned : { $gt : 1000 } }).limit(10).sort( { millis : 1 } ).pretty();
{
"ts" : ISODate("2013-04-04T13:28:08.906Z"),
"op" : "query",
"ns" : "mydb.posts",
"query" : {
"$query" : {
"blog" : "myblog",
"post_author_id" : 649,
"shares.total" : {
"$gt" : 0
}
},
"$orderby" : {
"shares.total" : -1,
"tstamp_published" : -1
}
},
"ntoreturn" : 1760,
"nscanned" : 12242,
"scanAndOrder" : true,
"nreturned" : 1760,
"responseLength" : 7030522,
"millis" : 126,
"client" : "10.0.232.69",
"user" : ""
}
Now the question is: why mongodb is returning 1760 documents when I have explicitly asked to skip 1750?
This is my current Mongodb version, in cluster/sharding.
mongos> db.runCommand("buildInfo")
{
"version" : "2.0.2",
"gitVersion" : "514b122d308928517f5841888ceaa4246a7f18e3",
"sysInfo" : "Linux bs-linux64.10gen.cc 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_41",
"versionArray" : [
2,
0,
2,
0
],
"bits" : 64,
"debug" : false,
"maxBsonObjectSize" : 16777216,
"ok" : 1
}
Now the question is: why mongodb is returning 1760 documents when I have explicitly asked to skip 1750?
Because the server side skip() does exactly that: it iterates over the first 1750 results and then gets 10 more (according to the limit).
As #devesh says, this is why very large pagination should be avoided since MongoDB does not make effective use of an index for skip() or limit().
I think you have hit a bulls eye , I think it is reason why mongoDB document asks us to avoid the large skips http://docs.mongodb.org/manual/reference/method/cursor.skip/ . Please have a look here It will answer your outcome . Use some other key which will be used with $gt operator will be much faster. Like Datetime stamp of last key in the page 1 then use the $get on the datetime.
The cursor.skip() method is often expensive because it requires the server to walk from the beginning of the collection or index to get the offset or skip position before beginning to return result

Cannot remove document in MongoDB using PHP driver

I am saving "Article" in MongoDB as bellow with _id of integers.
When I want to delete the article with the _id in php, nothing happens.
The code I use is:
$result = $db->arcitle->remove(
array("_id" =>intVal(41)),
array('safe' => true)
);
I have tried both with and without using "safe" option, neither works. When I echo $result, it is bool(true).
Any suggestion is much appreciated!
{ "_id" : 41,
"when" : Date( 1333318420855 ),
"publisher" : "5",
"title" : "10 Steps To The Perfect Portfolio Website",
"raw" : "",
"preview" : "",
"thumbnail" : "",
"content" : [
"{}" ],
"tags" : null,
"votes" : 0,
"voters" : [],
"comments" : [] }
you've got a spelling mistake in the collection name.
$result = $db->arcitle->remove(
Should probably be:
$result = $db->article->remove(array("_id" => 41));
The safe option will not confirm something was deleted, only that there was no error. Remove will not trigger an error deleting something that doesn't exist.
> db.foo.remove({_id: "I don't exist"})
> db.getLastError()
null
Note that you don't need to recast an integer as an integer - and if you do need to cast input as an integer - use a cast statement:
$string = "42";
$int = (int) $string; // $int === 42

MapReduce for MongoDB on PHP?

I'm new to MongoDB and this is my first use of MapReduce ever.
I have two collections: Shops and Products with the following schema
Products
{'_id', 'type': 'first', 'enabled': 1, 'shop': $SHOP_ID }
{'_id', 'type': 'second', 'enabled': 0, 'shop': $SHOP_ID }
{'_id', 'type': 'second', 'enabled': 1, 'shop': $SHOP_ID }
And
Shops
{'_id', 'name':'L', ... }
{'_id', 'name':'M', ... }
I'm looking for a GROUPBY similar statement for MongoDB with MapReduce to retrieve the Shops with name 'L' that have Products with 'enabled' => 1
How can I do it? Thank you.
It should be possible to retrieve the desired information without a Map Reduce operation.
You could first query the "Products" collection for documents that match {'enabled': 1}, and then take the list of $SHOP_IDs from that query (which I imagine correspond to the _id values in the "Shops" collection), put them in an array, and perform an $in query on the "Shops" collection, combined with the query on "name".
For example, given the two collections:
> db.products.find()
{ "_id" : 1, "type" : "first", "enabled" : 1, "shop" : 3 }
{ "_id" : 2, "type" : "second", "enabled" : 0, "shop" : 4 }
{ "_id" : 3, "type" : "second", "enabled" : 1, "shop" : 5 }
> db.shops.find()
{ "_id" : 3, "name" : "L" }
{ "_id" : 4, "name" : "L" }
{ "_id" : 5, "name" : "M" }
>
First find all of the documents that match {"enabled" : 1}
> db.products.find({"enabled" : 1})
{ "_id" : 1, "type" : "first", "enabled" : 1, "shop" : 3 }
{ "_id" : 3, "type" : "second", "enabled" : 1, "shop" : 5 }
From the above query, generate a list of _ids:
> var c = db.products.find({"enabled" : 1})
> shop_ids = []
[ ]
> c.forEach(function(doc){shop_ids.push(doc.shop)})
> shop_ids
[ 3, 5 ]
Finally, query the shops collection for documents with _id values in the shop_ids array that also match {name:"L"}.
> db.shops.find({_id:{$in:shop_ids}, name:"L"})
{ "_id" : 3, "name" : "L" }
>
Similar questions regarding doing the equivalent of a join operation with Mongo have been asked before. This question provides some links which may provide you with additional guidance:
How to join MongoDB collections in Python?
If you would like to experiment with Map Reduce, here is a link to a blog post from a user who used an incremental Map Reduce operation to combine values from two collections.
http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/
Hopefully the above will allow you to retrieve the desired information from your collections.
Short answer: you can't do that (with a single MapReduce command).
Long answer: MapReduce jobs in MongoDB run only on a single collection and cannot refer other collections in the process. So, JOIN/GROUP BY-like behaviour of SQL is not available here. The new Aggregation Framework also operates on a single collection only.
I propose a two-part solution:
Get all shops with name "L".
Compose and run map-reduce command that will check every product document against this pre-computed list of shops.

How can an index in MongoDB render a query slower?

**
UPDATE
**
I posted an answer as it's been confirmed to be an issue
**
ORIGINAL
**
First, I apologize -- I have just started using MongoDB yesterday, and I am still pretty new at this. I have a pretty simple query, and using PHP my findings are this:
Mongo version is 2.0.4, running on CentOS 6.2 (Final) x64
$start = microtime(true);
$totalactive = $db->people->count(array('items'=> array('$gt' => 1)));
$end = microtime(true);
printf("Query lasted %.2f seconds\n", $end - $start);
Without index, it returns:
Query lasted 0.15 seconds
I have 280,000 records in people the database. So I thought adding an index on "items" should be helpful, because I query this data a lot. But to my disbelief, after adding the index I get this:
Query lasted 0.25 seconds
Am I doing anything wrong?
Instead of count i used find to get the explain and this is the output:
> db.people.find({ 'items' : { '$gte' : 1 } }).explain();
{
"cursor" : "BtreeCursor items_1",
"nscanned" : 206396,
"nscannedObjects" : 206396,
"n" : 206396,
"millis" : 269,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"items" : [
[
1,
1.7976931348623157e+308
]
]
}
}
If I change my query to be "$ne" 0, it takes 10ms more!
Here are the collection stats:
> db.people.stats()
{
"ns" : "stats.people",
"count" : 281207,
"size" : 23621416,
"avgObjSize" : 84.00009957077881,
"storageSize" : 33333248,
"numExtents" : 8,
"nindexes" : 2,
"lastExtentSize" : 12083200,
"paddingFactor" : 1,
"flags" : 0,
"totalIndexSize" : 21412944,
"indexSizes" : {
"_id_" : 14324352,
"items_1" : 7088592
},
"ok" : 1
}
I have 1GB of ram free, so I believe the index fits in memory.
Here's the people index, as requested:
> db.people.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "stats.people",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"items" : 1
},
"ns" : "stats.people",
"name" : "items_1"
}
]
Having an index can be beneficial for two reasons:
when accessing only a small part of the collection (because of a restrictive filter that can be satisfied by the index). Rule of thumb is less than 10%.
when the collection does not need to be accessed at all (because all necessary data is in the index, both for the filtering, and for the result set). This will be indicated by "indexOnly = true".
For the "find" query, both of this is not true: You are accessing almost the whole collection (206396 out of 281207) and need all fields data. So you will go through the index first, and then through almost the whole collection anyway, defeating the purpose of the index. Just reading the whole collection would have been faster.
I would have expected the "count" query to perform better (because that can be satisfied by just going through the index). Can you get an explain for that, too?
Look at this:
http://www.mongodb.org/display/DOCS/Indexing+Advice+and+FAQ#IndexingAdviceandFAQ-5.MongoDB%27s%24neor%24ninoperator%27saren%27tefficientwithindexes.
Which made me consider this solution. How about this?
$totalactive = $db->people->count() - $db->people->count(array('items'=> array('$eq' => 1)));
This was confirmed to be a bug or something that needed optimization in the MongoDB engine. I posted this in the mongo mailing list and the response I received from Eliot Horowitz
That's definitely a bug, or at least a path that could be way better
optimized. Made a case: https://jira.mongodb.org/browse/SERVER-5607
Priority: Major
Fix Version/s: 2.3 desired
Type: Bug
Thanks for those who helped confirming this was a bug =)
Can you please provide an example of an object in this collection? The "items" field is an array? If so, I would recommend you add a new field "itemCount" and put an index on that. Doing $gt on this field will be extremely fast.
This is because your queries are near-full collection scans. The query optimizer is picking to use the index, when it should not use it for optimum performance. It's counterintuitive, yes, but it's because the cursor is walking the index b-tree and fetching the documents that the tree points to, which is slower than just walking the collection if it has to scan the almost the whole tree.
If you really need to do this kind of query, and you want to use that index for other things, like sorting, you can use .hint({$natural: 1}), to tell the query to not use the index.
Coincidentally, I posted about a similar issue in a blog post recently: http://wes.skeweredrook.com/testing-with-mongodb-part-1/

MongoDB limiting the amount of concurrency

I am creating an application that has several servers running at the same time and several process on each server all of those are processing data making query/updates and inserts. So a total of 35+ concurrent connections are being made at all times. These servers are all processing data that is being sent to a single mongodb server (mongod). I am not sharding my database at the moment. The problem is that I am being limited by my mongodb server. Whenever I add more servers the queries/updates/inserts are running slower (they take more time). I was running this mongohq.com, then I just recently created my own amazon server for mongod but I am still getting nearly the same result. List below is my db.serverStatus({}). I am somewhat new to mongodb but basically I need to know how to speed up the process for the amount of concurrent operations going on with my mongo server. I need it to be able to handle a lot of requests. I know sharding is a possible way around this but if it is at all possible can you list some other solutions available. Thanks.
> db.serverStatus({})
{
"host" : "ip-10-108-245-21:28282",
"version" : "2.0.1",
"process" : "mongod",
"uptime" : 11380,
"uptimeEstimate" : 11403,
"localTime" : ISODate("2011-12-13T22:27:56.865Z"),
"globalLock" : {
"totalTime" : 11380429167,
"lockTime" : 86138670,
"ratio" : 0.007569017717695356,
"currentQueue" : {
"total" : 0,
"readers" : 0,
"writers" : 0
},
"activeClients" : {
"total" : 35,
"readers" : 35,
"writers" : 0
}
},
"mem" : {
"bits" : 64,
"resident" : 731,
"virtual" : 6326,
"supported" : true,
"mapped" : 976,
"mappedWithJournal" : 1952
},
"connections" : {
"current" : 105,
"available" : 714
},
"extra_info" : {
"note" : "fields vary by platform",
"heap_usage_bytes" : 398656,
"page_faults" : 1
},
"indexCounters" : {
"btree" : {
"accesses" : 798,
"hits" : 798,
"misses" : 0,
"resets" : 0,
"missRatio" : 0
}
},
"backgroundFlushing" : {
"flushes" : 189,
"total_ms" : 29775,
"average_ms" : 157.53968253968253,
"last_ms" : 185,
"last_finished" : ISODate("2011-12-13T22:27:16.651Z")
},
"cursors" : {
"totalOpen" : 34,
"clientCursors_size" : 34,
"timedOut" : 0,
"totalNoTimeout" : 34
},
"network" : {
"bytesIn" : 89743967,
"bytesOut" : 59379407,
"numRequests" : 840133
},
"opcounters" : {
"insert" : 5437,
"query" : 8957,
"update" : 4312,
"delete" : 0,
"getmore" : 76,
"command" : 821388
},
"asserts" : {
"regular" : 0,
"warning" : 0,
"msg" : 0,
"user" : 0,
"rollovers" : 0
},
"writeBacksQueued" : false,
"dur" : {
"commits" : 29,
"journaledMB" : 0.147456,
"writeToDataFilesMB" : 0.230233,
"compression" : 0.9999932183619632,
"commitsInWriteLock" : 0,
"earlyCommits" : 0,
"timeMs" : {
"dt" : 3031,
"prepLogBuffer" : 0,
"writeToJournal" : 29,
"writeToDataFiles" : 2,
"remapPrivateView" : 0
}
},
"ok" : 1
}
What is surprising about more load generating higher response times from mongod? There are a few possible reasons for degradation of performance.
For example, every write to mongod uses a process wide write lock. So the more servers you add the more updates will be attempted (assuming update load is about stable per server) and thus the longer the process will spend in write lock. You can keep an eye on this through mongostat's "locked %" field.
Additionally if you use JS powered functionality (m/r, db.eval(), etc.) these operations cannot be executed concurrently by mongod due to the fact that each mongod has a single JavaScript context (which is single threaded).
If you want a more specific analysis then you might want to consider posting exact numbers. How many reads and writes per second, what are the query plans for the queries you execute, what effect does adding an additional app server have on your overall database performance, etc.

Categories