How can i write the below query in mongo?
select max(priority) as max, min(priority) as min from queue group by user
I'll highly appreciate if you provide a solution in PHP.
Thank you
Queries like this are performed with the aggregation framwework and the .aggregate() method. They use a $group pipeline stage with the $min and $max operators.
db.collection.aggregate([
{ "$group": {
"_id": "$user",
"max": { "$max": "$priority" },
"min": { "$min": "$priority" }
}}
])
Or more to PHP syntax:
$collection->aggregate(array(
array(
'$group' => array (
'_id' => '$user',
'max' => array( '$max' => '$priority' ),
'min' => array( '$min' => '$priority' )
)
)
));
Also see the SQL to Aggregation Mapping Chart in the documentation
Related
I have a query like
'aggs' => [
'deadline' => [
'date_histogram' => [
'field' => 'deadline',
'interval' => 'month',
'keyed' => true,
'format' => 'MMM'
]
]
]
the result I am getting are buckets with keys as month names.
The problem I am facing is the buckets with the month names as keys for a previous year are over written by another month of the next year (because obviously the key is same).
I want results where doc-count of buckets of previous which are over written merge with the doc_count of the next.
You can either add a separate month field during indexing and perform aggregation on it or use below script
{
"size": 0,
"aggs": {
"deadline": {
"histogram": {
"script": { "inline" : "return doc['deadline'].value.getMonthOfYear()" },
"interval": 1
}
}
}
}
Creating a separate month field will have better performance
Replace the format from MMM to YYYY-MMM as below:
'aggs' => [
'deadline' => [
'date_histogram' => [
'field' => 'deadline',
'interval' => 'month',
'keyed' => true,
'format' => 'YYYY-MMM'
]
]
]
After this you can handle the merging process at your application level
Continuing on my project, I need to translate some SQL statements to mongoDB
My SQL Statement is:
Delete from 'table' where proc_id = $xxx and (day_id < $day OR day_id > $anotherDay)
Now my condition array is this:
$condicion = array(
'proc_id' => $xxx,
'$or' => array(
'day_id' => array(
'$lt' => $day,
'$gt' => $anotherDay
)
)
);
The function made for delete in mongo collections returns cannot delete...
Some help please?
Each "day_id" would be in it's own $or argument:
$query = array(
'proc_id' = > $xxx,
'$or' => array(
array( 'day_id' => array ( '$lt' => $day ) ),
array( 'day_id' => array ( '$gt' => $anotherDay ) ),
)
)
That is how $or conditions work as a "list" of possible expressions.
The JSON syntax is clearer to visualise:
{
"proc_id": $xxx,
"$or": [
{ "day_id": { "$lt": $day } },
{ "day_id": { "$gt": $anotherDay }}
]
}
Since there is a very clear distinction between a "list" and an "object" definition. $or conditions are "lists" of "objects", and that means you list the full condition just as if it were a query in itself. Since this is not called within an $elemMatch.
And of course the "DELETE" part is the .remove() method:
$collection->remove($query)
There are general examples and resources in the core documentation SQL to MongoDB Mapping Chart, where if the examples there do not immediately help, the linked articles and presentations should.
I want to use aggregation to get this array only with those tickets, which have start field after 2015-06-16. Can someone help me with the pipeline?
{
"name" : "array",
"tickets" : [
{
"id" : 1,
"sort" : true,
"start" : ISODate("2015-06-15T22:00:00.000Z")
},
{
"id" : 2,
"sort" : true,
"start" : ISODate("2015-06-16T22:00:00.000Z")
},
{
"id" : 3,
"sort" : true,
"start" : ISODate("2015-06-17T22:00:00.000Z")
}
]
}
It's true that the "standard projection" operations available to MongoDB methods such as .find() will only return at most a "single matching element" from the array to that is queried by either the positional $ operator form in the "query" portion or the $elemMatch in the "projection" portion.
In order to do this sort of "ranged" operation, you need the aggregation framework which has greater "manipulation" and "filtering" capabilities on arrays:
collection.aggregate(
array(
# First match the "document" to reduce the pipeline
array(
'$match' => array(
array(
'tickets.start' => array(
'$gte' => new MongoDate(strtotime('2015-06-16 00:00:00'))
)
)
)
),
# Then unwind the array
array( '$unwind' => '$tickets' ),
# Match again on the "unwound" elements to filter
array(
'$match' => array(
array(
'tickets.start' => array(
'$gte' => new MongoDate(strtotime('2015-06-16 00:00:00'))
)
)
)
),
# Group back to original structure per document
array(
'$group' => array(
'_id' => '$_id',
'name' => array( '$first' => '$name' ),
'tickets' => array(
'$push' => '$tickets'
)
)
)
)
)
Or you can possibly use the $redact operator to simplify with MongoDB 2.6 or greater which basically uses the $cond operator syntax as it's input:
collection.aggregate(
array(
# First match the "document" to reduce the pipeline
array(
'$match' => array(
array(
'tickets.start' => array(
'$gte' => new MongoDate(strtotime('2015-06-16 00:00:00'))
)
)
)
),
# Redact entries from the array
array(
'$redact' => array(
'if' => array(
'$gte' => array(
array( '$ifNull' => array(
'$start',
new MongoDate(strtotime('2015-06-16 00:00:00'))
)),
new MongoDate(strtotime('2015-06-16 00:00:00:00'))
)
),
'then' => '$$DESCEND',
'else' => '$$PRUNE'
)
)
)
)
So both examples do the "same thing" in "filtering" the elements from the array that "do not" match the conditions specified and return "more than one" element, which is something basic projection cannot do.
You should use Aggregation to get output.
You should use following query:
db.collection.aggregate({
$match: {
name: "array"
}
}, {
$unwind: "$tickets"
}, {
$match: {
"tickets.start": {
$gt: ISODate("2015-06-16")
}
}
}, {
$group: {
"_id": "name",
"tickets": {
$push: "$tickets"
}
}
})
I'm using MongoDB to store server statistics that are captured every 15 seconds (so 4 rows get inserted each minute per server) and am trying to get this data plotted onto a graph for all data between a certain timestamp.
For example, the following query can be used:
$tbl->find(
array(
"timestamp" => array('$gte' => '1396310400', '$lte' => '1396915200'),
"service" => 'a715feac3db42f54edbc50ef6fa057b3'
),
array("timestamp" => 1, "system" => 1)
);
Which spits our a bunch of rows that look like this:
Array
(
[53933ad8532965621d97dd3b] => Array
(
[_id] => MongoId Object
(
[$id] => 53933ad8532965621d97dd3b
)
[system] => Array
(
[load] => 0.55
[uptime] => 1171204.47
[processes] => 222
)
[timestamp] => 1396310403
)
)
This works fine for small data ranges, as I can pass this data directly into Flot or HighCharts and let it prettify the time scales itself. However this doesn't work for large data sets (for example querying over a month).
What I'm trying to do is group the data by hour (or 15 minutes), and return the average values (in this example, its system.load that I'm plotting) for that given time period.
I know that the aggregate function is what I need be using, but despite my best efforts I've been unable to get this working.
Right now I'm letting PHP do all of the work (grouping the results by timestamp and working out the averages) but it's extremely slow and I know MongoDB would handle it better.
Any insight would be greatly appreciated!
Edit:
I've been trying to follow the answer posted here but am still struggling - MongoDB Aggregation PHP, Group by Hours
I'm looking at your initial query at the top of your question and it immediately tells me that your "timestamp" values are actually strings. So no doubt that when you are reading this information and doing your "manual aggregation" you are actually casting these values, and possibly others into types that you can manipulate, sum and average.
So the first part here is to fix your data, that looks like it has come from a logging source but you have never converted the values. I'm considering it reasonably possible that this is not just the timestamp values but probably also your metrics under system.
This leaves you with a choice of how to store your timestamp. You can either just keep that as a timestamp number as it currently is in string form, or you can opt for converting to a BSON date type. The first one will be a simple integer cast and save back, the other you should be able to feed to the Date type that is supported by the driver and again save back the data.
When you have done this, then you can happily use the aggregation functions. So as an example for if you choose to keep this as a number, then you just apply date math in order to get the grouping boundaries:
db.collection.aggregate([
// Match documents on the range you want
{ "$match": {
"timestamp": {
"$gte": 1396310400, "$lte": 1396915200
},
"service": "a715feac3db42f54edbc50ef6fa057b3"
}},
// Group on the time intervals, 15 minutes here
{ "$group": {
"_id": {
"service": "$service",
"time": {
"$subtract": [
"$timestamp",
{ "$mod": [ "$timestamp", 60 * 15 ] }
]
}
},
"load": { "$avg": "$system.load" }
}},
// Project to the output form you want
{ "$project": {
"service": "$_id.service",
"time" : "$_id.time",
"load": 1
}}
])
Or to be php specific
$tbl->aggregate(array(
array(
'$match' => array(
'timestamp' => array(
'$gte' => 1396310400, '$lte' => 1396915200
),
'service' => 'a715feac3db42f54edbc50ef6fa057b3'
)
),
array(
'$group' => array(
'_id' => array(
'service' => '$service',
'time' => array(
'$subtract' => array(
'$timestamp',
array( '$mod' => array('$timestamp', 60 * 15 ) )
)
)
),
'load' => array( '$avg' => '$system.load' )
)
),
array(
'$project' => array(
'service' => '$_id.service',
'time' => '$_id.time',
'load' => 1
)
)
))
Otherwise if you choose to convert to BSON dates then you can use the date aggregation operators instead:
db.collection.aggregate([
{ "$match": {
"timestamp": {
"$gte": new Date("2014-04-01"), "$lte": new Date("2014-04-08")
},
"service": "a715feac3db42f54edbc50ef6fa057b3"
}},
{ "$group": {
"service": "$service",
"time": {
"dayOfYear": { "$dayOfYear": "$timestamp" },
"hour": { "$hour": "$timestamp" },
"minute": {
"$subtract": [
{ "$minute": "$timestamp" },
{
"$mod": [
{ "$minute": "$timestamp" },
15
]
}
]
}
},
"load": { "$avg": "$system.load" }
}},
{ "$project": {
"service": "$_id.service",
"time": "$_id.time",
"load": 1
}}
])
So there you have the help of the date aggregation operators to break up parts of the date your have and still use the same modulo operation in order to get interval values.
If you still prefer the date math approach you can still do this with date objects as the result of subtracting one date object from another will be the epoch timestamp value. So moving a BSON date to a epoch timestamp is just a matter of:
{
"$subtract": [
"$dateObjectField",
new Date("1970-01-01")
]
}
So any "date" values you pass in to the pipeline here you can cast using the native type methods of your driver and it will be serialized correctly when the request is sent to MongoDB. The other advantage is the same is true when you read them back, so there is no more need for conversion in client processing.
I have MongoDB collection of documents containing several fields. One of the columns/fields should be numeric only, but some of these fields contain non-numerical (corrupt) data as string values. I should find the highest numerical value of this column, excluding the corrupt, non-numerical data. I am aware of the question Getting the highest value of a column in MongoDB, but AFAIK, this extended case was not covered.
The example below depicts the issue. For the highest value, the document with "age": 70 should be returned:
[
{
"id": "aa001",
"age": "90"
},
{
"id": "bb002",
"age": 70
},
{
"id": "cc003",
"age": 20,
}
]
Providing a PHP example for the find() / findOne() query would be of much help. Thanks a lot!
JohnnyHK came up with the perfect solution. Here's the working PHP code:
$cursor = $collection->find(array('age' => array('$not' => array('$type' => 2))), array('age' => 1));
$cursor->sort(array('age' => -1))->limit(1);
You can use the $type operator with $not in your query to exclude docs where age is a string. In the shell your query would look like:
db.test.find({age: {$not: {$type: 2}}}).sort({age: -1}).limit(1)
Or in PHP from Martti:
$cursor = $collection->find(array('age' => array('$not' => array('$type' => 2))), array('age' => 1));
$cursor->sort(array('price' => -1))->limit(1);
with PHP driver (mongodb)
using findOne()
$filter=[];
$options = ['sort' => ['age' => -1]]; // -1 is for DESC
$result = $collection->findOne(filter, $options);
$maxAge = $result['age']
You can use aggregate function to get maximum number from collections like this.
$data=$collection->aggregate(array
( '$group'=>
array('_id'=>'',
'age'=>array('$max'=>'$age'.)
)
)
);
This works for me
$options = ['limit' => 100,'skip' => 0, 'projection' => ['score' => ['$meta' => 'textScore']], 'sort' => ['score' => ['$meta' => 'textScore']]];