mongodb aggregation framework group by two fields

mongodb aggregation framework group by two fields - php

I'm querying my database using aggregation and pipeline, with two separate queries:
$groups_q = array(
'$group' => array(
'_id' => '$group_name',
'total_sum' => array('$sum' => 1)
)
);
$statuses_q = array(
'$group' => array(
'_id' => '$user_status',
'total_sum' => array('$sum' => 1)
)
);
$data['statuses'] = $this->mongo_db->aggregate('users',$statuses_q);
$data['groups'] = $this->mongo_db->aggregate('users',$groups_q);
And I'm getting what I want:
Array
(
[statuses] => Array
(
[result] => Array
(
[0] => Array
(
[_id] => Inactive
[total_sum] => 2
)
[1] => Array
(
[_id] => Active
[total_sum] => 5
)
)
[ok] => 1
)
[groups] => Array
(
[result] => Array
(
[0] => Array
(
[_id] => Accounting
[total_sum] => 1
)
[1] => Array
(
[_id] => Administrator
[total_sum] => 2
)
[2] => Array
(
[_id] => Rep
[total_sum] => 1
)
)
[ok] => 1
)
)
I don't want to query my database twice. Is there is a better way to do it?
How can I accomplish it with one query? Should I use $project operator?

You can't use a single aggregate() to do two grouped counts with your desired result format. Once the data has been grouped the first time you no longer have the details needed to create the second count.
The straightforward approach is to do two queries, as you are already doing ;-).
Thoughts on alternatives
If you really wanted to get the information in one aggregation query you could group on both fields and then do some manipulation in your application code. With two fields in the group _id, results are going to be every combination of group_name and status.
Example using the mongo shell :
db.users.aggregate(
{ $group: {
_id: { group_name: "$group_name", status: "$status" },
'total_sum': { $sum: 1 }
}}
)
That doesn't seem particularly efficient and lends itself to some convoluted application code because you have to iterate the results twice to get the expected groupings.
If you only wanted the unique names for each group instead of the names + counts, you could use $addToSet in a single group.
The other obvious alternative would be to do the grouping in your application code. Do a single find() projecting only the group_name and status fields, and build up your count arrays as you iterate the results.

Related

Mongo $sum slow

I was holding off on answer, because I was sure that some MongoDB experts will answer. However as no one is giving answers, I will give few hints. Maybe something of that can help. But then again - I'm not a MongoDB expert. Take everything with small grain of salt.
1) Which version are you using? If you are still on 2.6 - try out 3.0.x (or newer) with WiredTiger engine.
2) If you have a lot of data sharding can greatly help. This will increase setup complexity, but as you will be able to process parts of data set in paralell, you can get significant speed gains. But be careful with choosing proper sharding key.
3) Consider creation of several collections which can act as smaller views. Example: if you currently have 15 fields in [..] there is great chance that lots of queries just use 1 or 2 at once. Like country. Create one more collection in which you use country data and skip rest. If query uses only country fields and not other of those 15, then use small collection. If query uses more fields, use big one. That way queries on countries will be much faster as you will be able to group data more. However not always this is possible as it adds extra complexity in building such small collections. If you process data in some queue (to insert in big), you could insert in small too. Or you could use some aggregate queries and $out to build smaller tables once every X minutes.
4) Come up with 3rd schema. Yours 2nd schema is easy to put data in, but its hard to get data out. You could use arrays more. That way it will be harder to get data in, but much more easy and faster to query it. Keep in mind that in your 2nd schema and in my sample for 3rd schema documents are growing and there can be need for MongoDB to move them around on disk and that is really slow operation. Test if that affects your setup. Small example of potential collection schema:
{
"user": "asd",
[...],
"date": ISODate("2015-07-01T00:00:00Z"), // first date of the month
"total": 2222,
"daily": [
{"date": ISODate("2015-07-01T00:00:00Z"), "total": 22},
{"date": ISODate("2015-07-11T00:00:00Z"), "total": 200},
{"date": ISODate("2015-07-20T00:00:00Z"), "total": 2000},
]
}
When inserting data you can use update with criteria (if you are in PHP): $criteria = ["user": "asd", "daily.date": new MongoDate("...."), // other fields] and update clause $update = ['$inc': ["total: 1, 'daily.$.total': 1]] . Check how many rows were updated. If 0, then create insert from the same data. I.e. unset $criteria['daily.date'] and change update to $update = ['$inc' => ['total' => 1], '$push' => ['daily' => ['date' => new MonoDate('..'), 'total': 1]]]. Keep in mind that you can run into problems if you have several scripts which insert data. Better do everything in queue by one. Or you do in parallel make sure that $push does not result in adding several daily.date with the same date. So - you try to update, if cant update, insert. As you use arrays and possitional operator, you can't use upserts. That's why there is extra insert needed. As I said, its more complicated to get data in. But it will be more easy to get data out. Make sure to set up proper indexes. For example on 'daily.date' etc. So that update queries would not need to check lots of documents. Even more - you can create some hash field to put [...] fields which would hold hash of all [...] fields. And use that in update. That way it will be much more easy to create small index to pinpoint particular document (you put in index 'daily.date', hash field and few more, but will not need to put 15 [..] fields).
When you have such structure you could do a lot of things with queries. For example - if you need full months, just query on date and [...] fields that you need, sum total and you are good. If you need some date range (like 1st - 10th of the month) you can query by [...] fields and date, project to get rid of unnecessary fields, $unwind daily, match again, but this time on daily.date field, then project to rename fields, then group and sum. It's much more flexible than use of $date.years.2015.months.07.days.03.total .
Keep in mind that all of those are just hints. Test everything on your own. And maybe 1 o 5 hints will work. But that can make all the difference.

Search in MongoDB array using PHP

I have some mongoDB documents with the following structure:
[_id] => MongoId Object (
[$id] => 50664339b3e7a7cf1c000001
)
[uid] => 1
[name] => Alice
[words] => 1
[formatIds] => Array (
[0] => 1
[1] => 4
)
What I want to do is find all documents which has the value 1 in formatIds[ ]. I think it's possible to accomplish that. How can I do it in PHP?
UPDATE
Thanks for the help. It works fine now. Here is how i wrote the search,
$id=$_POST['id'];
$query = array('formatIds'=> "{$id}" );
$result = $stations_table->find($query); //where $stations_table = $db->stations;

MongoDB treats queries on array values the same way as queries on standard values, as per the docs.
Querying for array('formatIds' => 1) should work.

As MongoDB "transform" array into multi value for a same key :
$cursor = $mongo->base->collection->find(array('formatIds' => 1));
With correct definition of you mongo object and setting base/collection string.

It depends on whether you only want the documents or values that match your query or not.
If you don't mind pulling out the entire array and then searching client side for it you can of course use:
$c = $db->col->find(array('formatIds' => 1))
Since a 1-dimensional array in MongoDB can be searched like a single field.
But now to get only those that match your query since the above query will pick out all:
$db->command(array(
'aggregate' => 'col',
'pipeline' => array(
array('$unwind' => "$formatIds"),
array('$match' => array('formatIds' => 1)),
array('$group' => array(
'_id' => '$_id',
'formats' => array('$push' => '$formatIds'))
)
)
)
Use something like that.
This would give you a result of the _id being the _id of the document and a field of formats with only rows of the value 1 in the array.

trying to get a multi table sql query into one result array

You can use the SQL join command to make multiple queries together..
Use this...
sql_join

first a note: it looks like you have a missing condition. according to the above query, every song in songs table will be joined with every result possible. probably there should be a condition similar to the following added: (column names can be different based on your tables):
...
and mix.song_id=songs.song_id
...
as for your question: I don't know php so i regard mysql alone: I don't think it is possible to do it with mysql. mysql returns rows in the result set and each row can contain a single value in each column. to add a group of values (song names) in one column, they must be concatenated (and that is possible: Can I concatenate multiple MySQL rows into one field?), and later you split them back in your php script. this is not a good idea as you will need to choose a separator that you know will never appear in the values that are concatenated. therefore I think its better to remove the songs table from the query and after getting the mix id, run a second query to get all songs in that mix.

MongoDB Array Search in Query or client side

I am wondering what is better to do. I have a pulled back a query like this:
Array
(
[_id] => MongoId Object
(
[$id] => 4eeedd9545c717620a000007
)
[field1] => ...
[field2] => ...
[field3] => ...
[field4] => ...
[field5] => ...
[field6] => ...
[votes] => Array
(
[whoVoted] => Array
(
[0] => 4f98930cb1445d0a7d000001
[1] => 4f98959cb1445d0a7d000002
[1] => 4f88730cb1445d0a7d000003
)
)
)
Which would be faster:
Pull that entire array in 1 query and use in_array() to find the right id?
Pull everything from the first query except the votes and then do another mongodb query to see if that id exist in the array?

It Depends on a lot of factors that I suggest you test but IMO most of the time it would be faster to just do 2 querys

Depends on the size of the array being returned / searched.
Also different servers are doing the work, what do you mean by faster? At what scale?

cakephp - sorting by a second level association in paginate

I am playing around with a quotes database relating to a ski trip I run. I am trying to list the quotes, but sort by the person who said the quote, and am struggling to get the paginate helper to let me do this.
I have four relevant tables.
quotes, trips, people and attendances. Attendances is essentially a join table for people and trips.
Relationships are as follows;
Attendance belongsTo Person hasMany Attendance
Attendance belongsTo Trip hasMany Attendance
Attendance hasMany Quote belongs to Attendance
In the QuotesController I use containable to retrieve the fields from Quote, along with the associated Attendance, and the fields from the Trip and Person associated with that Attendance.
function index() {
$this->Quote->recursive = 0;
$this->paginate['Quote'] = array(
'contain' => array('Attendance.Person', 'Attendance.Trip'));
$this->set('quotes', $this->paginate());
}
This seems to work fine, and in the view, I can echo out
foreach ($quotes as $quote) {
echo $quote['Attendance']['Person']['first_name'];
}
without any problem.
What I cannot get to work is accessing/using the same variable as a sort field in paginate
echo $this->Paginator->sort('Name', 'Attendance.Person.first_name');
or
echo $this->Paginator->sort('Location', 'Attendance.Trip.location');
Does not work. It appears to sort by something, but I'm not sure what.
The $quotes array I am passing looks like this;
Array
(
[0] => Array
(
[Quote] => Array
(
[id] => 1
[attendance_id] => 15
[quote_text] => Hello
)
[Attendance] => Array
(
[id] => 15
[person_id] => 2
[trip_id] => 7
[Person] => Array
(
[id] => 2
[first_name] => John
[last_name] => Smith
)
[Trip] => Array
(
[id] => 7
[location] => La Plagne
[year] => 2000
[modified] =>
)
)
)
I would be immensely grateful if someone could suggest how I might be able to sort by the the first_name of the Person associated with the Quote. I suspect my syntax is wrong, but I have not been able to find the answer. Is it not possible to sort by a second level association in this way?
I am pretty much brand new with cakephp so please be gentle.
Thanks very much in advance.

I've had the similar problem awhile back. Not with sort though. Try putting the associated table in another array.
echo $this->Paginator->sort('Name', 'Attendance.Person.first_name');
change to:
echo $this->Paginator->sort('Name', array('Attendance' => 'Person.first_name'));
Hope this helps

On CakePHP 3 this problem can be solved by adding 'sortWhitelist' params to $this->paginate on your controller.
$this->paginate = [
// ...
'sortWhitelist' => ['id', 'status', 'Attendance.Person.first_name']
];
And then in your view:
echo $this->Paginator->sort('Name', 'Attendance.Person.first_name');
This is noted in the docs:
This option is required when you want to sort on any associated data, or computed fields that may be part of your pagination query:
However that could be easily missed by tired eyes, so hope this helps someone out there!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

mongodb aggregation framework group by two fields - php

Related

Mongo $sum slow

Search in MongoDB array using PHP

trying to get a multi table sql query into one result array

MongoDB Array Search in Query or client side

cakephp - sorting by a second level association in paginate

Categories

Resources