Sorry for my english, I need help on mongodb indexes. I have a capped collection (size: 10GB) with some fields for my application logs.
Example structure: Logs[_id, userId, sum, type, time, response, request]. I have created compound index: [userId,time,type]. I get two arrays are grouped records by userId for today, where 'type' is "null" and "1". And my two query example:
$group = array(
array(
'$match' => array(
'userId' => $userId,
'time' => array(
'$gt' => date("Y-m-d")
),
'type' => array('$ne' => null)
)
),
array(
'$group' => array(
"_id" => '$userId',
"total" => array('$sum' => '$sum'),
"count" => array('$sum' => 1)
),
)
);
$results = $collections->aggregate($group);
$group = array(
array(
'$match' => array(
'userId' => $userId,
'time' => array(
'$gt' => date("Y-m-d")
),
'type' => 1
)
),
array(
'$group' => array(
"_id" => '$userId',
"count" => array('$sum' => 1)
),
)
);
$results2 = $collections->aggregate($group);
If current user has more 100000 documents on collection for today - the speed of my query is very slow (more 10 sec). Give me some advices on creating the right index, please :) Thanks.
Based on the explain that you posted, the correct index is being used (BtreeCursor), it is using only the index (i.e. it is a covered index query - indexOnly is true) and nothing is being matched (n = 0) in this case. So, that all checks out generally, though $ne as a clause in the first example is not going to be very efficient.
However the main issue based on the explain is likely the fact that the index does not appear to be fully in memory. There are 13 yields listed and the most common reason for a query like this to yield is when it has to fault to disk to page something in. Since, as mentioned previously, it is only using the index, those yields imply faults to disk for the index and hence indicate that the whole index is not in memory.
If you re-run the query immediately after this it should be faster (assuming the index can actually fit into available memory) because the index will have been paged in by the first run. If it is still slow on the second run and showing yields, then you either don't have enough memory to hold the index in memory or something else is evicting it from memory and you essentially have memory contention causing performance problems.
Related
I have a field in my MongoDB Collection that is hosting two types of data. In some Documents that field has Integer value, e.g.
"campaign_code" : NumberLong(100097)
And in other Documents that field has Array value, e.g.
"campaign_code" : [NumberLong(100087), NumberLong(100136), NumberLong(100137), NumberLong(100138), NumberLong(100135)]
Now, previously I was grouping my result by "campaign_code", but at that time it had only Integer values. Now, the field is having two types of values. The question is is PHP MongoDB driver intelligent to perform the same functionality or do I need to change my code?
My previous PHP code:
$pipeline = array(
array('$match' => array('impression.affiliate_id' => $affiliate_id)),
array(
'$group' => array(
'_id' => array(
'impression.campaign_code' => '$impression.campaign_code'
),
'count' => array('$sum' => 1)
)
),
//sort
array('$sort' => array('count' => -1))
);
I did make some changes and added the following line of code:
array('$unwind' => '$impression.campaign_code')
But this throws an exception:
exception: Value at end of $unwind field path '$impression.campaign_code' must be an Array, but is a NumberLong64
Now the exception is quite valid because few documents have only Integer value in the field. Tell me how I can resolve this issue?
After quite some trying out and web research I go crazy with this query. I want to build a query for 'Clubs' around a geo point (distance max 500 meters) in php on MongoDB.
But when I run query it ignores the distance limit and shows all clubs in database BUT sorted by distance.
Here is my dataset (2dsphere index geoLoc):
{"_id":ObjectId("547c649e30afe32c23000048"),"name":"Club Ritzz","category":"Club","category_list":[{"id":"191478144212980","name":"Night Club"}],"location":{"city":"Mannheim"},"geoLoc":{"type":"Point","coordinates":[8.473665839156,49.484065272756]}}
{"_id":ObjectId("547c649f30afe32c2300004a"),"name":"Das Zimmer Mannheim","category":"Club","category_list":[{"id":"191478144212980","name":"Night Club"}],"geoLoc":{"type":"Point","coordinates":[8.4709362941178,49.487260552592]}}
{"_id":ObjectId("547c64ab30afe32c23000063"),"name":"Nationaltheater Mannheim","category":"Arts/entertainment/nightlife","category_list":[{"id":"173883042668223","name":"Theatre"}],"geoLoc":{"type":"Point","coordinates":[8.4776534992592,49.48782606969]}}
{"_id":ObjectId("547c64a130afe32c2300004f"),"name":"SOHO Bar Club Lounge","category":"Club","category_list":[{"id":"191478144212980","name":"Night Club"},{"id":"164049010316507","name":"Gastropub"}],"geoLoc":{"type":"Point","coordinates":[8.4630844501277,49.49385193591]}}
{"_id":ObjectId("547c64a730afe32c2300005a"),"name":"Loft Club","category":"Club","category_list":[{"id":"191478144212980","name":"Night Club"},{"id":"176139629103647","name":"Dance Club"}],"geoLoc":{"type":"Point","coordinates":[8.4296300196465,49.484211928258]}}
And here my php code (updated Dec-2):
$qry = $pub->find(
array( '$and' =>
array(
array( 'geoLoc' =>
array('$nearSphere' =>
array('$geometry' =>
array('type'=>'Point',
'coordinates'=>
array(
floatval($sLon), floatval($sLat)
)
),
'maxDistance' => 500
)
)
),
array( '$or' =>
array(
array( 'name' => new MongoRegex("/.*club/i")),
array( 'name' => new MongoRegex("/.*zimm/i"))
)
),
array('$or' =>
array(
array('category_list.name' => 'Night Club'),
array('category_list.name' => 'Dance Club'),
array('category' => 'Club')
)
)
)
),
array('id' => 1, 'name' => 1, '_id' => 0)
);
Anyone know why the results are not limited to the specified maxDistance?
I found a similar issue on StackOverflow which outlines that one has to use radians for the maxDistance parameter.
See https://dba.stackexchange.com/questions/23869/nearsphere-returns-too-many-data-what-am-i-missing-am-i-wrong-is-it-a-bug-d
Also it is probably helpful if you'd test the query in mongo shell without using the PHP APIs first (just to see if the query is generally working and append '.explain()' to it to see what generally happens inside DB).
I'm trying to check my code, with count lines. But this code works very slow. how can i optimize this code? is there anyway to count?
$find = $conn_stok->distinct("isbn");
for($i=0;$i<=25; $i++) {
$isbn = $find[$i];
$countit= $conn_kit->find(array('isbn'=>$isbn))->count();
if($countit> 0){
echo "ok<br>";
} else {
echo "error<br>";
}
}
Looks like you are trying to do a simple count(*) group by in the old SQL speak. In MongoDB you would use the aggregation framework to have the database do the work for you instead of doing it in your code.
Here is what the aggregation framework pipeline would look like:
db.collection.aggregate({$group:{_id:"$isbn", count:{$sum:1}}}
I will let you translate that to PHP if you need help there are plenty of examples available.
It looks like you're trying to count the number of 25 top most ISBNs used, and count how often they have been used. In PHP, you would run the following queries. The first one to find all ISBNs, and the second is an aggregation command to do the grouping.
$find = $conn_stok->distinct( 'isbn' );
$aggr = $conn_kit->aggregate(
// find all ISBNs
array( '$match' => array( 'isbn' => array( '$in' => $find ) ) ),
// group those
array( '$group' => array( '_id' => '$isbn', count => array( '$sum' => 1 ) ) ),
// sort by the count
array( '$sort' => array( 'count' => 1 ) ),
// limit to the first 25 items (ie, the 25 most used ISBNs)
array( '$limit' => 25 ),
)
(You're a bit vague as to what $conn_stok and $conn_kit contain and what you want as answer. If you can update your question with that, I can update the answer).
I have a MongoDB aggregate in PHP defined as:
$results = $c->aggregate(array(
array(
'$project' => array(
'year' => array('$year' => array('$add' => array('$executed.getTime()', 3600))),
'month' => array('$month' => array('$add' => array('$executed.getTime()', 3600))),
'day' => array('$dayOfMonth' => array('$add' => array('$executed.getTime()', 3600)))
),
),
array(
'$group' => array(
'_id' => array('year' => '$year', 'month' => '$month', 'day' => '$day'),
'count' => array('$sum' => 1)
),
),
array(
'$sort' => array(
'_id' => 1
),
),
array(
'$limit' => 30
)
));
The problem is that the $add aggregate function in $project is not working.
exception: the $year operator does not accept an object as an operand
What is the correct way to add an arbitrary number of seconds to the date/time field $executed?
Thanks.
The issue you're seeing is a bug in MongoDB, which I've reported in SERVER-9289. A work-around for this entails wrapping the argument to the date operator in an array, as in the following shell example:
> db.foo.drop()
> db.foo.insert({x:ISODate()})
> db.foo.aggregate({$project: {x:1, y: {$year: {$add:['$x',1000]}}}})
Error: Printing Stack Trace
at printStackTrace (src/mongo/shell/utils.js:37:7)
at DBCollection.aggregate (src/mongo/shell/collection.js:897:1)
at (shell):1:8
Mon Apr 8 18:15:15.198 JavaScript execution failed: aggregate failed: {
"errmsg" : "exception: the $year operator does not accept an object as an operand",
"code" : 16021,
"ok" : 0
} at src/mongo/shell/collection.js:L898
> db.foo.aggregate({$project: {x:1, y: {$year: [{$add:['$x',1000]}]}}})
{
"result" : [
{
"_id" : ObjectId("516341333512acfb2d33f156"),
"x" : ISODate("2013-04-08T22:14:11.665Z"),
"y" : 2013
}
],
"ok" : 1
}
It should be trivial to port that over to PHP.
Having said that, your original code does have a bug in the reference to $executed. Per the $project documentation, you can refer to fields in the BSON document by name (or a dotted path to a field within objects/arrays), but there is no support for invoking JavaScript methods on those fields. Along those lines, the aggregation pipeline is operating on the raw BSON documents, so those types are never translated into their JavaScript representations over the course of the pipeline (e.g. the BSON date never becomes an ISODate).
Thankfully, calling $executed.getTime() should not even be necessary with MongoDB 2.4. SERVER-6239 improved support for BSON date handling in $add and $subtract. You can see that ticket for more details, such as the expected result for subtracting two dates, or adding a date and a number.
I'm working on a rating system. When a user rates, $inc increments the field, and addToSet adds the user_id to make sure the user only clicks rate once. I am checking if the user_id is already in the x field before updating, but that is another query which I'd rather avoid. Can I reach this purpose without having to write another query? I mean, $addToSet only adds if there is no value like that; can I instead get affected rows? Can you suggest other queries?
Thank you!
..->update(
array("_id" => $idob),
array(
'$inc' => array($type => (int) 1),
'$addToSet' => array("x" => (int) $user_id)
)
);
Ok I see the problem.
..->update(
array("_id" => $idob),
array(
'$inc' => array($type => (int) 1),
'$addToSet' => array("x" => (int) $user_id)
)
);
The problem is that you need a conditional $inc there so that it only $incs if it does add to set.
This is not possible with a unique index since unique indexes work from the root of the document atm. Also you probably want to use the $inc as a form of pre-aggregation or what not.
One method could be:
update(
array('_id' => $idob, 'x' => array('$nin' => array($user_id))),
array(
'$inc' => array($type => 1),
'$push' => array('x' => (int)$user_id)
)
)
This will only do the update if that user_id does not already exist in x.