I have this data in MongoDB:
{
"_id": ObjectId("542bxxxxxxxxxxxx"),
"TEMP_C": 13,
"time": ISODate("2014-08-21T05:30:00Z")
}
I want to group it by day and feed it to Highcharts, to display Temp average per day.
Like this: http://jsfiddle.net/tw7n6wxb/3/
Using MongoDB Aggregation Pipeline, I was able to do the grouping, based on some other examples and this great post: http://www.kamsky.org/stupid-tricks-with-mongodb/stupid-date-tricks-with-aggregation-framework
Side question: Why is it so complicated to group by date in MongoDB??? The most annoying part is having to re-compose the date object after splitting it into '$dayOfMonth', '$month', and '$year'.
Is there any simpler way of doing this?
In any case, I got this part working (I think). This is the result:
{
"_id" : {"sec":1409346800,"usec":0},
"avg" : 12
},
{
"_id" : {"sec":1409356800,"usec":0},
"avg" : 15
},
But, Highcharts series take arrays of value pairs as input:
Example: data: [[5, 2], [6, 3], [8, 2]].
The first value on each pair is the X value, and this value has to be a number (when X axis is configured as datetime, X values are in milliseconds).
The PROBLEM I'm having is that MongoDB returns the date as a MongoDate Object with two values inside, 'sec' and 'usec', while Highcharts is expecting one number.
Is there anyway to convert a MongoDate Object to integer in the pipeline? using a $project for example?
I'm using PHP, but I would like to avoid post-pocessing in the application (like PHP date formatting).
Or, any other ideas on how to solve this?
Thanks,
You seem to just want the timestamp values returned from the result. There is indeed a simple way to do this in the aggregation framework without using the date aggregation operators. You can use basic "date math" instead, and with a trick that can be used to extract the "timestamp" value from the date object and manipulate it:
db.collection.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$time", new Date("1970-01-01") ] },
{ "$mod": [
{ "$subtract": [ "$time", new Date("1970-01-01") ] },
1000 * 60 * 60 * 24
]}
]
},
"avg": { "$avg": "$TEMP_C" }
}}
])
So the basic "trick" there is that when subtract one date object from another ( or similar operation ) the result returned is a number for "milliseconds" of the time difference between the two. So by using the "epoch" date of "1970-01-01", you get the "epoch timestamp" value for the date object as a number.
Then the basic date math is applied by subtracting from this value the modulo ( or remainder ) from the milliseconds in a day. This "rounds" the value to represent the "day" on which the entry is recorded.
I like posting the JSON because it parses everywhere, but in a more PHP way, then like this:
$collection->aggregate(array(
array( '$group' => array(
'_id' => array(
'$subtract' => array(
array( '$subtract' => array(
'$time', new MongoDate(strtotime("1970-01-01 00:00:00"))
) ),
array( '$mod' => array(
array( '$subtract' => array(
'$time', new MongoDate(strtotime("1970-01-01 00:00:00"))
) ),
1000 * 60 * 60 * 24
))
)
),
"avg" => array( '$avg' => '$TEMP_C' )
))
))
So that is a little cleaner than using the date aggregation operators to get to your intended result. Of course this is still not "all the way" to how you want the data to be presented where you can use it in the client.
The real thing to do here is manipulate the result so that you get the output format you want. This is probably better suited to your server code doing the manipulation before you return the response, but if you have MongoDB 2.6 or greater then it is "possible" to do this within the aggregation pipeline itself:
db.collection.aggregate([
{ "$group": {
"_id": {
"$subtract": [
{ "$subtract": [ "$time", new Date("1970-01-01") ] },
{ "$mod": [
{ "$subtract": [ "$time", new Date("1970-01-01") ] },
1000 * 60 * 60 * 24
]}
]
},
"avg": { "$avg": "$TEMP_C" }
}},
{ "$group": {
"_id": null,
"data": {
"$push": {
"$map": {
"input": { "$literal": [ 1,2 ] },
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el", 1 ] },
"$$_id",
"$avg"
]
}
}
}
}
}}
])
So this is pretty sneaky really. After the initial "grouping" is done to determine the averages for each day you get two fields in your result documents per day for _id and avg. What the $map operator does here is takes and array as input ( in this case, just a numbered template with a pair of values to identify position ) and processes each element to return an array equal to the elements present in the original.
The $cond operator here allows you to look at the value of the current element of that array and "swap it" with another value present in the current document. So for each document, the results contain something that is a paired array like:
[ 1409346800000, 12 ]
Then all that happens is all results are pushed into a single document with a "data" array that appears as follows:
{ "_id": null, "data": [ [..,..], [..,..], (...) ] }
Now your data element in that one result is an array of array pairs representing the points you want.
Of course though, operators like $map are only available from MongoDB 2.6 and onwards, so if you have that available then you can use them but otherwise just process the results in code with a similar "map" operation:
function my_combine($v) {
return array($v["_id"],$v["avg"])
}
$newresult = array_map( "my_combine", $result )
So this really comes down to array manipulation from whichever way you approach it, but the date manipulation trick should also save you some work in obtaining the results as the expected timestamp values as well.
Related
I have a big chunk of data in this format:
[
{"date":"2018-11-17"},{"weather":"sunny"},{"Temp":"9"},
{"date":"2014-12-19"},{"Temp":"10"},{"weather":"rainy"},
{"date":"2018-04-10"},{"weather":"cloudy"},{"Temp":"15"},
{"date":"2017-01-28"},{"weather":"sunny"},{"Temp":"12"}
]
Is there any faster and more efficient way to organize and save it the database for future reference? Like making a comparison of the temperature for different days etc. [date,weather,Temp] are supposed to be in one set.
I've tried str_replace() but I'd like to know if there's any better way.
Taking into account your comment, it seems that is an array of objects in which every three objects make a record (that is form of: date, weather & temp) so you can create this setup with the help of collections:
$string = ['your', 'json', 'string'];
$records = collect(json_decode($string))
->chunk(3) // this creates a subset of every three items
->mapSpread(function ($date, $weather, $temp) { // this will map them
return array_merge((array) $date, (array) $weather, (array) $temp);
});
This will give you this output:
dd($records);
=> Illuminate\Support\Collection {#3465
all: [
[
"date" => "2018-11-17",
"weather" => "sunny",
"Temp" => "9",
],
[
"date" => "2014-12-19",
"Temp" => "10",
"weather" => "rainy",
],
[
"date" => "2018-04-10",
"weather" => "cloudy",
"Temp" => "15",
],
[
"date" => "2017-01-28",
"weather" => "sunny",
"Temp" => "12",
],
],
}
PS: To get the array version of this collections just attach ->all() at the end.
You can check in the Collections documentation a good explanation of the chunk() and mapSpread() methods as well of the rest of the available methods.
I create a platform in PHP/MYsql and I am now migrating to mongo
My old query for mysql :
select sum(game_won) as game_won,count(id) as total,position
from games_player_stats
where position < 6 and position > 0 and user_id = :pa_id
group by position
order by total desc
The new json format looks like this:
{
"region" : "EUW",
"players" : [
{
"position" : 2,
"summoner_id" : 123456,
"game_won": 1
},
{
"position" : 1,
"summoner_id" : 123459,
"game_won": 0
},
{
"position" : 3,
"summoner_id" : 123458,
"game_won": 1
},
{
"position" : 4,
"summoner_id" : 123457,
"game_won": 0
}
]
}
Having multiple documents like this, I need to find howmany times summoner_id 123456 has had position 2 or any of the other positions 1-6 and howmany times did he win in that position
The Index needs to be queryable on region and summoner_id
Outcome would look like
{
"positions" :
[
{ "position" : 1,
"total" : 123,
"won" : 65
},
{ "position" : 2,
"total" : 37,
"won" : 10
}
]
}
Would I need to use Map/Reduce for this?
The best results for this are obtained by the aggregation framework for MongoDB. It differs from mapReduce in that all operations are performed using "natively coded operators" as opposed to the JavaScript evaluation that is used by mapReduce.
This means "faster", and significantly so. Not to mention there are also certain parts of what you are looking for in a result that actually favour the "multiple group" concept that is inherently available to a "pipeline" of operations, that would otherwise be a fairly ugly accumulator using mapReduce.
Aggregation Pipeline Formats
The best approach will differ depending on the MongoDB "server" version you have available.
Ideally with MongoDB 3.2 you use $filter to "pre-filter" the array content before processing with $unwind:
var pipeline = [
// Match documents with array members matching conditions
{ "$match": {
"players": {
"$elemMatch": {
"summoner_id": 123456,
"position": { "$gte": 1, "$lte": 6 }
}
}
}},
// Filter the array content for matched conditions
{ "$project": {
"players": {
"$filter": {
"input": "$players",
"as": "player"
"cond": {
"$and": [
{ "$eq": [ "$$player.summoner_id", 123456 ] },
{ "$gte": [ "$$player.position", 1 ] },
{ "$lte": [ "$$player.position", 6 ] }
]
}
}
}
}},
// Unwind the array contents to de-normalize
{ "$unwind": "$players" },
// Group on the inner "position"
{ "$group": {
"_id": "$players.position",
"total": { "$sum": 1 },
"won": { "$sum": "$players.won" }
}},
// Optionally Sort by position since $group is not ordered
{ "$sort": { "total": -1 } },
// Optionally $group to a single document response with an array
{ "$group": {
"_id": null,
"positions": {
"$push": {
"position": "$_id",
"total": "$total",
"won": "$won"
}
}
}}
];
db.collection.aggregate(pipeline);
For MongoDB 2.6.x releases, still "pre-filter" but using $map and $setDifference:
var pipeline = [
// Match documents with array members matching conditions
{ "$match": {
"players": {
"$elemMatch": {
"summoner_id": 123456,
"position": { "$gte": 1, "$lte": 6 }
}
}
}},
// Filter the array content for matched conditions
{ "$project": {
"players": {
"$setDifference": [
{ "$map": {
"input": "$players",
"as": "player",
"in": {
"$cond": {
"if": {
"$and": [
{ "$eq": [ "$$player.summoner_id", 123456 ] },
{ "$gte": [ "$$player.position", 1 ] },
{ "$lte": [ "$$player.position", 6 ] }
]
},
"then": "$$player",
"else": false
}
}
}},
[false]
]
}
}},
// Unwind the array contents to de-normalize
{ "$unwind": "$players" },
// Group on the inner "position"
{ "$group": {
"_id": "$players.position",
"total": { "$sum": 1 },
"won": { "$sum": "$players.won" }
}},
// Optionally Sort by position since $group is not ordered
{ "$sort": { "total": -1 } },
// Optionally $group to a single document response with an array
{ "$group": {
"_id": null,
"positions": {
"$push": {
"position": "$_id",
"total": "$total",
"won": "$won"
}
}
}}
];
And for earlier versions with the aggregation framework from MongoDB 2.2, "post filter" with $match "after" the $unwind:
var pipeline = [
// Match documents with array members matching conditions
{ "$match": {
"players": {
"$elemMatch": {
"summoner_id": 123456,
"position": { "$gte": 1, "$lte": 6 }
}
}
}},
{ "$unwind": "$players" },
// Post filter the denormalized content
{ "$match": {
"players.summoner_id": 123456,
"players.position": { "$gte": 1, "$lte": 6 }
}},
// Group on the inner "position"
{ "$group": {
"_id": "$players.position",
"total": { "$sum": 1 },
"won": { "$sum": "$players.won" }
}},
// Optionally Sort by position since $group is not ordered
{ "$sort": { "total": -1 } },
// Optionally $group to a single document response with an array
{ "$group": {
"_id": null,
"positions": {
"$push": {
"position": "$_id",
"total": "$total",
"won": "$won"
}
}
}}
];
Walkthrough
Matching the Document: This is primarily done using $elemMatch since you are looking for "multiple" conditions within the array elements. With a "single" condition on an array element it is fine to use "dot notation":
"players.summoner_id": 12345
But for anything more than "one" condition you need to use $elemMatch, otherwise all the statement is really asking is "does this match something within the array?", and that does not contain to "all" within the element. So even the $gte and $lte combination alone is actually "two" conditions, and therefore requires $elemMatch:
"players": {
"$elemMatch": {
"position": { "$gte": 1, "$lte": 6 }
}
}
Also noting here that from "1 to 6 inclusive" means "greater than or equal to" and vice versa for the "less than" condition.
-
"Pre-filtering": Noting here that the eventual goal is to "group" by an element within the array, being "position". This means that eventually you are going to need to $unwind the content to do that.
However, the $unwind pipeline operation is going to be quite costly, considering that it "takes apart" the array and creates a new document to process for each array member. Since you only want "some" of the members that actually match the conditions, it's desirable to "remove" any un-matched content from the array "before" you de-normalize this content.
MongoDB 3.2 has a good method for this with the $filter operator. It performs exactly as named by "filtering" the content of the array to only elements that match a particular set of conditions.
In an aggregation pipeline stage we use it's "logical variants" of the operators such as $gte and $lte. These return a true/false value depending on where the condition matched. Also within the array, these can actually be referred to using the member fields using "dot notation" to the alias argument in "as" which points to the current processed member.
The $and here is also another "logical operator" which does the same true/false response. So this means "all" the arguments in it's array of arguments must be met in order to return true. For the $filter itself, the true/false evaluated in "cond" determines whether to return the array element or not.
For MongoDB 2.6 which does not have the $filter operator, the same is represented with the combination of $map and $setDifference Simply put the $map looks at each element and applies an expression within "in". In this case we use $cond which as a "ternary" operator evaluates an 'if/then/else` form.
So here where the "if" returns true the expression in "then" is returned as the current array member. Where it is false, the expression in else returns, and in this case we are returning the value of false ( PHP False ).
Since all members are actually being returned by the result of $map we then emulate $filter by applying the $setDifference operator. This does a comparison to the members of the array and effectively "removes" any members where the element was returned as false from the result. So with distinct array members such as you have, the resulting "set" ( being a "set" of "unique" elements) just contains those elements where the condition was true and a non-false value was returned.
"Post" filtering: The alternate approach which is mandatory for server versions below MongoDB 2.6 is to "post" filter the array content. Since there are no operators in these versions that allow such actions on array content before $unwind, the simple process here to applying another $match to the content "after" the $unwind is processed:
{ "$match": {
"players.summoner_id": 123456,
"players.position": { "$gte": 1, "$lte": 6 }
}}
Here you use "dot notation" since each array element is now actually it's own document, and there is nothing else to compare to other than looking at the conditions on the specified path.
This is not ideal, since when you process $unwind all of the elements that actually don't match the conditions are still present. This ultimately means "more documents to process" and has the double cost of:
Had to create a new document for every member despite it not matching the conditions
Now you have to to apply the condition across every "document" emitted as a result of $unwind
This has a potentially huge impact on performance, and for that reason the modern MongoDB releases introduce ways to act on arrays without resorting to $unwind in order to process. You still need it for the remaining processing since you are "grouping" on a property contained within the array. But it is of course desirably to "get rid of un-matched elements first".
Remainging Grouping: Now the elements are filtered and de-normalized, it only remains to do the actual $group condition that will total things by the "position" within each element. This is a simple matter of providing the grouping key to "_id" and using the appropriate data accumulation.
In this case you have two constructs, being:
"total": { "$sum": 1 },
"won": { "$sum": "$players.won" }
The basic { "$sum": 1 } is just "counting" the elements matched for each group and the { "$sum": "$players.won" } actually uses the "won" value to accumulate a total. This is pretty standard usage for the $sum accumulator.
Of course your output shows the content within an "array", so the following stages are really "optional" since the real work of actually "grouping" is already done. So you could actually just use the results in the form provided up to this first $group, and the remaining just puts everything into a single document response rather than "one document per 'position' value", which would be the return at this point.
The first note is output from $group is not ordered. So if you want a specific order of results ( i.e by position ascending ) then you must $sort after that $group stage. This will order the resulting documents of the pipeline as of the point where it is applied.
In your case you are actually asking for a sort on "total" anyway, so you would of course apply this with -1 meaning "descending" in this case. But whatever the case, you still should not presume that the output from $group is ordered in any way.
The "second" $group here is basically cosmetic in that this is what makes a "single document" response. Using null ( PHP NULL ) in the grouping key basically says "group everything" and will produce a single document in response. The $push accumulator here is what actually makes the "array" from the documents in the pipeline preceding this.
Wrap-Up
So that's the general process in accumulating data like this:
Match the documents required to the conditions, since after all it would be a waste to apply conditions later to every document when they don't even contain array elements that would match the conditions you eventually want.
Filter the array content and de-normalize. Ideally done as a "pre-filter" where possible. This gets the documents into a form for grouping, from there original array form.
Accumulate the content using appropriate operators for the task, either $sum or $avg or $push or any other available according to needs. Nothing also that depending on structure and conditions you can always use "more than one" $group pipeline stage.
PHP Translation
The initial example in PHP notation:
pipeline = array(
array(
'$match' => array(
'players' => array(
'$elemMatch' => array(
'summoner_id' => 123456,
'position' => array( '$gte' => 0, '$lte' => 6 )
)
)
)
),
array(
'$project' => array(
'$filter' => array(
'input' => '$players',
'as' => 'player',
'cond' => (
'$and' => array(
array( '$eq' => array( '$$player.summoner_id' => 123456 ) ),
array( '$gte' => array( '$$player.position' => 1 ) ),
array( '$lte' => array( '$$player.position' => 6 ) )
)
)
)
)
),
array( '$unwind' => '$players' ),
array(
'$group' => array(
'_id' => '$players.position',
'total' => array( '$sum' => 1 ),
'won' => array( '$sum' => '$players.won' )
)
),
array( '$sort' => array( 'total' => -1 ) ),
array(
'$group' => array(
'_id' => NULL,
'positions' => array(
'$push' => array(
'position' => '$_id',
'total' => '$total',
'won' => '$won'
)
)
)
)
)
$result = $collection->aggregate($pipeline);
When making data structures in PHP that you are comparing to JSON, it is is often useful to check your structure with something like:
echo json_encode($pipeline, JSON_PRETTY_PRINT)
Then you can see that what you are doing in PHP notation is the same as the JSON example you are following. It's a helpful tip so that you cannot really go wrong. If it looks different then you are not doing the "same" thing.
I have a multidimensional array that looks like this:
{
"groups": [
{
"__v": 0,
"_create_date": "2014-08-20T23:00:12.901Z",
"_id": "53f5287ca78473a969001827",
"_last_message_date": "2014-08-20T23:04:36.347Z",
"activity": 0,
"blocked_users": [],
"created_by": {
"_id": "53e84b0eba84943c6d0003f8",
"_last_modified": "2014-08-20T00:11:05.399Z",
"first_name": "Jegg",
"last_name": "V"
},
"curated": false,
"diversity": 0,
"featured": false,
"flagged": false,
"last_message": {
"text": "let's talk beo",
"created_by": {
"_id": "53e84b0eba84943c6d0003f8",
"first_name": "Jegg",
"last_name": "V"
},
"_id": "53f52984a78473a969001833",
"_create_date": "2014-08-20T23:04:36.347Z"
},
"member_count": 1,
"messages_count": 1,
"name": "Test",
"public": true,
"recency": 52182276.347,
"score": 52182276.347,
"tags": []
},
This structure repeats over 3000 times creating a very large multidimensional array. I think I can use array_chunk($array, 300) to break the array into smaller chunks. But I can't figure out how to access them exactly.
What I want to do is independently loop through the newly chunked arrays. So I'd like to end up with something like:
$array1 = {...}
$array2 = {...}
$array3 = {...}
$array4 = {...}
... and so on
THen I could loop through each of the newly created arrays, which are essentially smaller groups of the original array, but of 3000 arrays in one multidimensional array as I have in the first place, I end up with these smaller ones of 300 arrays each.
I hope this makes sense, I'm kinda out of my league. Help is always appreciated.
I think your array is in json format.
First decode it and then pass to array_chunk method.
array_chunk($input_array, 300));
then access them as $input_array[0][0], $input_array[0][1]....... $input_array[0][299], $input_array[1][0], $input_array[1][1].....
EDIT: oh, somehow I entirely misread the question. array_chunk is something worth looking into.
You could try using extract to fetch array values to the "global" variable namespace.
extract takes three arguments: the array you wish to extract, flags, and prefix if needed.
I'm not sure how non-associative arrays are extracted, but you could try
$full_array = array(
array( ... ),
array( ... ),
array( ... ),
array( ... ),
...
);
// EXTR_PREFIX_ALL prefixes all extracted keys with wanted prefix (the third param).
$extract_amount = extract( $full_array, EXTR_PREFIX_ALL, 'prefix' );
Now you should have the array extracted and available for use with variable names $prefix0, $prefix1, $prefix2 and so on.
I'm not sure how smart it is to extract an array with hundreds of available values.
Due to a bug in my PHP script, I have created multiple erroneous entries in my MongoDB. Specifically, I was using $addToSet and $each and under certain circumstances, the MongoDB object gets updated wrongly as below:
array (
'_id' => new MongoId("4fa4f815a6a54cedde000000"),
'poster' => 'alex#randommail.com',
'image' =>
array (
'0' => 'image1.jpg',
'1' => 'image2.jpg',
'2' => 'image3.png',
'3' =>
array (
'$each' => NULL,
),
),
You can see that "image.3" is different from the other array entries, and that was the incorrect entry. I have fixed the related in my PHP script, however I am having difficulties tracking down all the affected entries in MongoDB to remove such entries.
Is there any way in MongoDB to check if any of the image array entry contains another array instead of string? Since the index of containing the sub-array is a variable, it would not be possible to perform a $type check on image.3 for every entry.
Could use any suggestion. Thanks!
Since this was an error in your PHP, I suppose, now you want to upgrade all erroneous documents.
If so, you could just find all documents where array item is an array itself.
> db.arr.insert({values: [1, 2, 3, [4, 5]]})
> db.arr.insert({values: [6, 7, 8]})
>
> db.arr.find({values: {$type: 4}})
{ "_id" : ObjectId("4fae0750d59332f28c702618"), "values" : [ 1, 2, 3, [ 4, 5 ] ] }
Now let's fix this. To remove such entries without fetching them to PHP I offer this simple two-step operation.
First, find all documents with arrays and unset those arrays. This will leave nulls in place of them. Note, it will only match and update first array in the document. If there are several, you want to repeat the operation
> db.arr.update({values: {$type: 4}}, {$unset: {'values.$': 1}}, false, true);
> db.arr.find()
{ "_id" : ObjectId("4fae0750d59332f28c702618"), "values" : [ 1, 2, 3, null ] }
Remove nulls.
> db.arr.update({values: null}, {$pull: {values: null}}, false, true);
> db.arr.find()
{ "_id" : ObjectId("4fae0750d59332f28c702618"), "values" : [ 1, 2, 3 ] }
This question already has an answer here:
How to extract and access data from JSON with PHP?
(1 answer)
Closed 11 months ago.
All,
I have the following JSON Data. I need help writing a function in PHP which takes a categoryid and returns all URLs belonging to it in an array.
Something like this::
<?php
function returnCategoryURLs(catId)
{
//Parse the JSON data here..
return URLArray;
}
?>
{
"jsondata": [
{
"categoryid": [
20
],
"url": "www.google.com"
},
{
"categoryid": [
20
],
"url": "www.yahoo.com"
},
{
"categoryid": [
30
],
"url": "www.cnn.com"
},
{
"categoryid": [
30
],
"url": "www.time.com"
},
{
"categoryid": [
5,
6,
30
],
"url": "www.microsoft.com"
},
{
"categoryid": [
30
],
"url": "www.freshmeat.com"
}
]
}
Thanks
What about something like this :
You first use json_decode, which is php's built-in function to decode JSON data :
$json = '{
...
}';
$data = json_decode($json);
Here, you can seen what PHP kind of data (i.e. objects, arrays, ...) the decoding of the JSON string gave you, using, for example :
var_dump($data);
And, then, you loop over the data items, searching in each element's categoryid if the $catId you are searching for is in the list -- in_array helps doing that :
$catId = 30;
$urls = array();
foreach ($data->jsondata as $d) {
if (in_array($catId, $d->categoryid)) {
$urls[] = $d->url;
}
}
And, each time you find a match, add the url to an array...
Which means that, at the end of the loop, you have the list of URLs :
var_dump($urls);
Gives you, in this example :
array
0 => string 'www.cnn.com' (length=11)
1 => string 'www.time.com' (length=12)
2 => string 'www.microsoft.com' (length=17)
3 => string 'www.freshmeat.com' (length=17)
Up to you to build from this -- there shouldn't be much left to do ;-)
Try the built-in json_decode function.