Mongodb, using aggregate combined with filter? - php

I'm trying to convert a PHP script that is based in a mysql database to run it on a MongoDB database. I have resolved the major queries except one.
Imagine i have a library (this is the collection), every document is a book entry. So I need to know how many distinct authors are in the library wroten in a certain language (another field).
At the moment I have that code and I don't know how to continue:
$test = array(
array(
'$group' => array(
'_id' => array('author' => '$author' )
)
)
);
$out = $db->$collection->aggregate($test);
Thanks.

If you want to put a filter, you need to start the pipeline with a $match,the command will be:
db.collection.aggregate([
{ "$match" : {lang : "en"} },
{"$group":{"_id": {"author" : "$author"} , "total" : {"$sum" : 1} }}
])
If you convert that to PHP it will be:
$test = array(
array('$match' => array("lang" => "en")),
array(
'$group' => array(
"_id" => array('author' => '$author'),
"total" => array('$sum' => 1)
),
),
);
In this case you have to pass the language in the first step (match/filter).
If you want to could for "all languages" you just need to put the language in the $group operator
db.collection.aggregate([
{"$group":{"_id": {"author" : "$author", lang : "$lang"} , "total" : {"$sum" : 1} }}
])
I let you push this in PHP
You can find more informations about $match here:
http://docs.mongodb.org/manual/reference/operator/aggregation/match/

Related

How to aggregate by dynamic or unknown fields in Elasticsearch 6.x

I'm fairly new to ElasticSearch, currently using v6.2 and I seem to have run into a problem while trying to add some aggregations to a query. Trying to wrap my head around the various types of aggregation, as well as the best ways to store the data.
When the query runs, I have some variable attributes that I would like to aggregate and then return as filters to the user. For example, one character may have attributes for "size", "shape" and "colour", while another only has "shape" and "colour".
The full list of attributes is unknown so I don't think I would be able to construct the query that way.
My data is currently structured like this:
{
id : 1,
title : 'New Character 1',
group : 1,
region : 1,
attrs : [
moves : 2,
# These would be dynamic, would only apply to some rows, not others.
var_colours : ['Blue', Green', 'Red'],
var_shapes : ['Round', 'Square', 'Etc'],
effects : [
{ id : 1, value: 20},
{ id : 2, value: 60},
{ id : 3, value: 10},
]
]
}
I currently have an aggregation of groups and regions that looks like this. It seems to be working wonderfully and I would like to add something similar for the attributes.
[
'aggs' => [
'group_ids' => [
'terms' => [
'field' => 'group',
'order' => [ '_count' => 'desc' ]
]
],
'region_ids' => [
'terms' => [
'field' => 'region',
'order' => [ '_count' => 'desc' ]
]
]
]
]
I'm hoping to get a result that looks like the below. I am also not sure if the data structure is setup in the best way either, I can make changes there if necessary.
[aggregations] => [
[groups] => [
[doc_count_error_upper_bound] => 0
[sum_other_doc_count] => 0
[buckets] => [
[0] => [
[key] => 5
[doc_count] => 27
],
[1] => [
[key] => 2
[doc_count] => 7
]
]
],
[var_colours] => [
[doc_count_error_upper_bound] => 0
[sum_other_doc_count] => 0
[buckets] => [
[0] => [
[key] => 'Red'
[doc_count] => 27
],
[1] => [
[key] => 'Blue'
[doc_count] => 7
]
]
],
[var_shapes] => [
[doc_count_error_upper_bound] => 0
[sum_other_doc_count] => 0
[buckets] => [
[0] => [
[key] => 'Round'
[doc_count] => 27
],
[1] => [
[key] => 'Polygon'
[doc_count] => 7
]
]
]
// ...
]
Any insight that anyone could provide would be extremely appreciated.
You should do this within your PHP script.
I can think of the following:
Use the Dynamic field mapping for your index.
By default, when a previously unseen field is found in a document, Elasticsearch will add the new field to the type mapping. This behaviour can be disabled, both at the document and at the object level, by setting the dynamic parameter to false (to ignore new fields) or to strict (to throw an exception if an unknown field is encountered).
Get all the existing fields in your index. Use the Get mapping API for this.
Loop over the results of Step 2 so you can get all the existing fields in your index. You can store them in a list (or array), for example.
You can create a PHP Elasticsearch terms aggregation for each of the fields in your list (or array). This is: create an empty or base query with no terms aggregation and add one terms for each element you got from step 3.
Add to each terms, the missing field with an empty empty string ("").
That's it. Following this, you have creating a query in such way that, no matter what index you're searching, you'll get a terms agg with all the existing fields for it.
Advantages:
Your terms aggregations will be generated dynamically with all the existing fields.
For each of the doc that does not contain any of the fields, an empty string will be shown.
Disadvantages:
Looping through the GET mapping API's result could be a little frustrating (but I trust you).
Performance (time & resources) will be affected for every new field you find in your mappings.
I hope this is helpful! :D

how to aggregate mongodb collection data in laravel

i have collection like this
{
"wl_total" : 380,
"player_id" : 1241,
"username" : "Robin",
"hand_id" : 292656,
"time" : 1429871584
}
{
"wl_total" : -400,
"player_id" : 1243,
"username" : "a",
"hand_id" : 292656,
"time" : 1429871584
}
as both collection have same hand_id i want to aggregate both these collection on the basis of hand_id
i want result as combine of
data=array(
'hand_id'=>292656,
'wl_total'=>
{
0=>380,
1=>-400
},
'username'=>
{
0=>"Robin",
1=>"a"
},
"time"=>1429871584
)
You basically want a $group by the "hand_id" common to all players, and then $push to different arrays in the document and then also do something with "time", I took $max. Nees to be an accumulator of some sort at any rate.
Also not sure what your underlying collection name is, but you can call this in laravel with a construct like this:
$result = DB::collection('collection_name')->raw(function($collection)
{
return $collection->aggregate(array(
array(
'$group' => array(
'_id' => '$hand_id',
'wl_total' => array(
'$push' => '$wl_total'
),
'username' => array(
'$push' => '$username'
),
'time' => array(
'$max' => '$time'
)
)
)
));
});
Which returns output ( shown in json ) like this:
{
"_id" : 292656,
"wl_total" : [
380,
-400
],
"username" : [
"Robin",
"a"
],
"time" : 1429871584
}
Personally I would have gone for a single array with all the infomation in it for the grouped "hand", but I supose you have your reasons why you want it this way.

Complex statement in MongoDB AND & OR conditions

Continuing on my project, I need to translate some SQL statements to mongoDB
My SQL Statement is:
Delete from 'table' where proc_id = $xxx and (day_id < $day OR day_id > $anotherDay)
Now my condition array is this:
$condicion = array(
'proc_id' => $xxx,
'$or' => array(
'day_id' => array(
'$lt' => $day,
'$gt' => $anotherDay
)
)
);
The function made for delete in mongo collections returns cannot delete...
Some help please?
Each "day_id" would be in it's own $or argument:
$query = array(
'proc_id' = > $xxx,
'$or' => array(
array( 'day_id' => array ( '$lt' => $day ) ),
array( 'day_id' => array ( '$gt' => $anotherDay ) ),
)
)
That is how $or conditions work as a "list" of possible expressions.
The JSON syntax is clearer to visualise:
{
"proc_id": $xxx,
"$or": [
{ "day_id": { "$lt": $day } },
{ "day_id": { "$gt": $anotherDay }}
]
}
Since there is a very clear distinction between a "list" and an "object" definition. $or conditions are "lists" of "objects", and that means you list the full condition just as if it were a query in itself. Since this is not called within an $elemMatch.
And of course the "DELETE" part is the .remove() method:
$collection->remove($query)
There are general examples and resources in the core documentation SQL to MongoDB Mapping Chart, where if the examples there do not immediately help, the linked articles and presentations should.

Count too slow in MongoDB with PHP

I'm trying to check my code, with count lines. But this code works very slow. how can i optimize this code? is there anyway to count?
$find = $conn_stok->distinct("isbn");
for($i=0;$i<=25; $i++) {
$isbn = $find[$i];
$countit= $conn_kit->find(array('isbn'=>$isbn))->count();
if($countit> 0){
echo "ok<br>";
} else {
echo "error<br>";
}
}
Looks like you are trying to do a simple count(*) group by in the old SQL speak. In MongoDB you would use the aggregation framework to have the database do the work for you instead of doing it in your code.
Here is what the aggregation framework pipeline would look like:
db.collection.aggregate({$group:{_id:"$isbn", count:{$sum:1}}}
I will let you translate that to PHP if you need help there are plenty of examples available.
It looks like you're trying to count the number of 25 top most ISBNs used, and count how often they have been used. In PHP, you would run the following queries. The first one to find all ISBNs, and the second is an aggregation command to do the grouping.
$find = $conn_stok->distinct( 'isbn' );
$aggr = $conn_kit->aggregate(
// find all ISBNs
array( '$match' => array( 'isbn' => array( '$in' => $find ) ) ),
// group those
array( '$group' => array( '_id' => '$isbn', count => array( '$sum' => 1 ) ) ),
// sort by the count
array( '$sort' => array( 'count' => 1 ) ),
// limit to the first 25 items (ie, the 25 most used ISBNs)
array( '$limit' => 25 ),
)
(You're a bit vague as to what $conn_stok and $conn_kit contain and what you want as answer. If you can update your question with that, I can update the answer).

MongoDB aggregate in PHP, adding seconds to date/time

I have a MongoDB aggregate in PHP defined as:
$results = $c->aggregate(array(
array(
'$project' => array(
'year' => array('$year' => array('$add' => array('$executed.getTime()', 3600))),
'month' => array('$month' => array('$add' => array('$executed.getTime()', 3600))),
'day' => array('$dayOfMonth' => array('$add' => array('$executed.getTime()', 3600)))
),
),
array(
'$group' => array(
'_id' => array('year' => '$year', 'month' => '$month', 'day' => '$day'),
'count' => array('$sum' => 1)
),
),
array(
'$sort' => array(
'_id' => 1
),
),
array(
'$limit' => 30
)
));
The problem is that the $add aggregate function in $project is not working.
exception: the $year operator does not accept an object as an operand
What is the correct way to add an arbitrary number of seconds to the date/time field $executed?
Thanks.
The issue you're seeing is a bug in MongoDB, which I've reported in SERVER-9289. A work-around for this entails wrapping the argument to the date operator in an array, as in the following shell example:
> db.foo.drop()
> db.foo.insert({x:ISODate()})
> db.foo.aggregate({$project: {x:1, y: {$year: {$add:['$x',1000]}}}})
Error: Printing Stack Trace
at printStackTrace (src/mongo/shell/utils.js:37:7)
at DBCollection.aggregate (src/mongo/shell/collection.js:897:1)
at (shell):1:8
Mon Apr 8 18:15:15.198 JavaScript execution failed: aggregate failed: {
"errmsg" : "exception: the $year operator does not accept an object as an operand",
"code" : 16021,
"ok" : 0
} at src/mongo/shell/collection.js:L898
> db.foo.aggregate({$project: {x:1, y: {$year: [{$add:['$x',1000]}]}}})
{
"result" : [
{
"_id" : ObjectId("516341333512acfb2d33f156"),
"x" : ISODate("2013-04-08T22:14:11.665Z"),
"y" : 2013
}
],
"ok" : 1
}
It should be trivial to port that over to PHP.
Having said that, your original code does have a bug in the reference to $executed. Per the $project documentation, you can refer to fields in the BSON document by name (or a dotted path to a field within objects/arrays), but there is no support for invoking JavaScript methods on those fields. Along those lines, the aggregation pipeline is operating on the raw BSON documents, so those types are never translated into their JavaScript representations over the course of the pipeline (e.g. the BSON date never becomes an ISODate).
Thankfully, calling $executed.getTime() should not even be necessary with MongoDB 2.4. SERVER-6239 improved support for BSON date handling in $add and $subtract. You can see that ticket for more details, such as the expected result for subtracting two dates, or adding a date and a number.

Categories