First time here - please go easy… ;)
I'm starting off with MongoDB for the first time - using the offical PHP driver to interact with an application.
Here's the first problem I've ran into with regards to the aggregation framework.
I have a collection of documents, all of which contain an array of numbers, like in the following shortened example...
{
"_id": ObjectId("51c42c1218ef9de420000002"),
"my_id": 1,
"numbers": [
482,
49,
382,
290,
31,
126,
997,
20,
145
],
}
{
"_id": ObjectId("51c42c1218ef9de420000006"),
"my_id": 2,
"numbers": [
19,
234,
28,
962,
24,
12,
8,
643,
145
],
}
{
"_id": ObjectId("51c42c1218ef9de420000008"),
"my_id": 3,
"numbers": [
912,
18,
456,
34,
284,
556,
95,
125,
579
],
}
{
"_id": ObjectId("51c42c1218ef9de420000012"),
"my_id": 4,
"numbers": [
12,
97,
227,
872,
103,
78,
16,
377,
20
],
}
{
"_id": ObjectId("51c42c1218ef9de420000016"),
"my_id": 5,
"numbers": [
212,
237,
103,
93,
55,
183,
193,
17,
346
],
}
Using the aggregation framework and PHP (which I think is the correct way), I'm trying to work out the average amount of times a number doesn't appear in a collection (within the numbers array) before it appears again.
For example, the average amount of times the number 20 doesn't appear in the above example is 1.5 (there's a gap of 2 collections, followed by a gap of 1 - add these values together, divide by number of gaps).
I can get as far as working out if the number 20 is within the results array, and then using the $cond operator, passing a value based on the result. Here’s my PHP…
$unwind_results = array(
'$unwind' => '$numbers'
);
$project = array (
'$project' => array(
'my_id' => '$my_id',
'numbers' => '$numbers',
'hit' => array('$cond' => array(
array(
'$eq' => array('$numbers',20)
),
0,
1
)
)
)
);
$group = array (
'$group' => array(
'_id' => '$my_id',
'hit' => array('$min'=>'$hit'),
)
);
$sort = array(
'$sort' => array( '_id' => 1 ),
);
$avg = $c->aggregate(array($unwind_results,$project, $group, $sort));
What I was trying to achieve, was to setup up some kind of incremental counter that reset everytime the number 20 appeared in the numbers array, and then grab all of those numbers and work out the average from there…But im truly stumped.
I know I could work out the average from a collection of documents on the application side, but ideally I’d like Mongo to give me the result I want so it’s more portable.
Would Map/Reduce need to get involved somewhere?
Any help/advice/pointers greatly received!
As Asya said, the aggregation framework isn't usable for the last part of your problem (averaging gaps in "hits" between documents in the pipeline). Map/reduce also doesn't seem well-suited to this task, since you need to process the documents serially (and in a sorted order) for this computation and MR emphasizes parallel processing.
Given that the aggregation framework does process documents in a sorted order, I was brainstorming yesterday about how it might support your use case. If $group exposed access to its accumulator values during the projection (in addition to the document being processed), we might be able to use $push to collect previous values in a projected array and then inspect them during a projection to compute these "hit" gaps. Alternatively, if there was some facility to access the previous document encountered by a $group for our bucket (i.e. group key), this could allow us to determine diffs and compute the gap span as well.
I shared those thoughts with Mathias, who works on the framework, and he explained that while all of this might be possible for a single server (were the functionality implemented), it would not work at all on a sharded infrastructure, where $group and $sort operations are distributed. It would not be a portable solution.
I think you're best option is to run the aggregation with the $project you have, and then process those results in your application language.
Related
I am cracking my brain and can't find a good solution for my problem. I am trying to design a system that I can use for batch picking in our order system.
The point is that from a set of orders I want to pick 6 orders that are most equal to each other. In our warehouse most orders are them so we can safe a lot of time by picking some orders at the same time.
Assume I have the following array:
<?php
$data = [
156 => [
1,
2,
7,
9,
],
332 => [
3,
10,
6
],
456 => [
1,
],
765 => [
7,
2,
10,
],
234 => [
1,
9,
3,
6,
],
191 => [
7,
],
189 => [
7,
6,
3,
],
430 => [
10,
9,
1,
],
482 => [
1,
2,
7,
],
765 => [
1,
5,
9,
]
];
?>
The array key is the order id, and the values are the product ID's it contains. If I want to pick the top 3 orders which look at much like each other, where do I start?
Any help would be much appreciated!
1. Step
Sort productId inside order (ASC)
2. Step
In loop check difference (array_diff) in each order to each other.
Create array with defference. For example:
$diff = [
'156' => [ //order id
'234' => 4, // with order 234 has 4 differences
'332' => 7, // with order 332 has 7 differences
// and so on...
],
]
3. Step
Order $diff by ASC and receive order with less differences.
Improvement
Also you could add total size of products in order for compare with difference. For example, If you have an order with 100 products and 10 diffs - it's better than order with 10 products and 9 diffs.
Here is what i would do if I had the problem :
$topOrders = [];
foreach($data as $value):
foreach($value as $order):
if(isset($$order)):
$$order++;
else:
$$order = 1;
endif;
$topOrders[$order] = $$order;
endforeach;
endforeach;
print_r($topOrders);
In $topOrders, you have an array that contains as key the ID, and as value you got the number of orders. All you have to do is to sort your array to get your top 3.
I'm making a gallery with sortable photos with Laravel and jQuery UI Sortable.
My function in the controller gets a nice array:
$items = [0 => 22, 1 => 25, 2 => 45];
But there will be approx 150 - 200 photos in one gallery. Is there any chance to set one DB Query instead 150 - 200? Because my controller makes this at the moment...
<?php
foreach($photos['item'] as $position => $id){
Photo::where('id', $id)->update(['position' => $position]);
}
But it creates approx 150 - 200 DB queries, which is awful.
Edit #1
Basically I need something like this (two corresponding arrays with ids and positions):
$ids = [22, 24, 25, 34];
$positions = [0, 1, 2, 3];
Photos::where('id', $ids)->update(['position'] => $positions);
But I can't find anything about this approach.
Take a look here: Eloquent model mass update.
Basically, you are looking for a mass or bulk update.
I have a Mongo Collection that I'm trying to aggregate in which I need to be able to filter the results based on a sum of values from a subdocument. Each of my entries has a subdocument that looks like this
{
"_id": <MongoId>,
'clientID': 'some ID',
<Other fields I can filter on normally>
"bidCompData": [
{
"lineItemID": "210217",
"qtyBid": 3,
"priceBid": 10.25,
"qtyComp": 0
"description": "Lawn Mowed"
"invoiceID": 23
},
{
<More similar entries>
}
]
}
What I'm trying to do is filter on the sum of qtyBid in a given record. For example, my user could specify that they only want records that have a total qtyBid across all of the bidCompData that's greater than 5. My research shows that I can't use $sum outside of the $group stage in the pipeline but I need to be able to sum just the qtyBid values for each individual record. Presently my pipeline looks like this.
array(
array('$project' => $basicProjection), //fields to project calculated earlier using the input parameters.
array('$match' => $query),
array('$group' => array(
'_id' =>
array('clientID' => '$clientID'),
'count' => array('$sum' => 1)
)
)
I tried having another group and an unwind before the group I presently have in my pipeline so that I could get the sum there but it doesn't let me keep my fields besides the id and the sum field. Is there a way to do this without using $where? My database is large and I can't afford the speed hit from the JS execution.
I'm trying to display data on a line graph using Google Charts. The data displays fine, however I would like to set a date range to be displayed.
The data is sent from the database in a JSON literal format:
{
"cols": [
{"label": "Week", "type": "date"},
{"label": "Speed", "type": "number"},
{"type":"string","p":{"role":"tooltip"}},
{"type":"string","p":{"role":"tooltip"}},
{"type":"string","p":{"role":"tooltip"}},
{"type":"string","p":{"role":"tooltip"}},
],
"rows": [
{"c":[{"v": "Date('.$date.')"},{"v": null},{"v": null},{"v": null},{"v": null},{"v": null}]},
{"c":[{"v": "Date('.$date.')"},{"v": null},{"v": null},{"v": null},{"v": null},{"v": null}]}
]
}
Data is either displayed by week or month (null for easy reading) for example this week:
2012, 02, 06
2012, 02, 07
2012, 02, 09
Data isn't set for everyday of the week, therefore in this example only the dates above are shown. What I would like to be shown is the start of the week (2012, 02, 06) to the end of the week (2012, 02, 12) similar to the third example here.
I managed to get the whole week showing by checking if the date exists in the database and if not append an extra row will null data, this however meant the line was not continuous and the dates where not in order.
Could anyone offer any advice on how to I could go about doing this?
Thanks!
Did you try leaving the missing dates be missing dates (i.e. let the database return 2 values instead of 7)?
The continuous axis should handle missing dates, you just need to set the axis range from start to end of the week.
UPDATE
for interactive line chart the axis ranges can be set like this (as inspired by this thread):
hAxis: {...
viewWindowMode: 'explicit',
viewWindow: {min: new Date(2007,1,1),
max: new Date(2010,1,1)}
...}
see http://jsfiddle.net/REgJu/
"I managed to get the whole week showing by checking if the date exists in the database and if not append an extra row will null data, this however meant the line was not continuous and the dates where not in order."
I think you are on the right track you just need to do it in a slightly different way. I have the function like the below to make data continuous.
$data = array(
1 => 50,
2 => 75,
4 => 65,
7 => 60,
);
$dayAgoStart = 1;
$daysAgoEnd = 14;
$continuousData = array();
for($daysAgo=$daysAgoStart ; $daysAgo<=$daysAgoEnd ; $daysAgo++){
if(array_key_exists($daysAgo, $data) == true){
$continuousData[$daysAgo] = $data[$daysAgo];
}
else{
$continuousData[$daysAgo] = 0;
}
}
continuousData will now hold:
$data = array(
1 => 50,
2 => 75,
3 => 0,
4 => 65,
5 => 0,
6 => 0,
7 => 60,
8 => 0,
9 => 0,
10 => 0,
11 => 0,
12 => 0,
13 => 0,
14 => 0,
);
in that order, and then the data can be used in the charts without any gaps.
Perhaps you can use a different chart type? Dygraphs looks like it might be helpful.
Otherwise you may have to write your own custom chart type.
i started with mongodb and played around with random temperature data
like this:
'weather' => array(
'Air' => array(
'Jan' => 11,
'Feb' => 20,
'Mar' => 24,
'Jun' => 28,
'Jul' => 30
)
),
Now my question:
How can i query the Air array ?
I knwo i can do somethin like:
$query = array('weather.Air.Jan' => 11);
Works fine...
But how can i search in the whole Air array:
$query = array('weather.Air.$' => 40);
This query doesn't work...
Can somebody help me
Unfortunately, the query you're looking for does not exist.
As written, you're asking for "weather.Air where a key in the JSON object contains a value of 40".
MongoDB has the ability to "drill into" arrays. However, when it comes to sub-objects, you have to reach into the keys directly. There is no operator that provides a "search all keys" method. There is an outstanding JIRA request for this item right here.