I have a Test database with a collection called collection:
{
"_id": "576008e5b47a6120c800418d",
"UserID": "Paul",
"Page": "A"
}
I want to record webactivity and use mapreduce to get an outcome like
{
"_id": "Paul",
"value": {
"A": 1,
"B": 0,
"C": 0,
"D": 0,
"E": 0
}
}
For a start I tried a simple code with PHP 7 MongoDB Driver 1.1.7 MapReduce using command which failed to decode document from the server:
<?php
$manager = new MongoDB\Driver\Manager("mongodb://localhost:27017");
$command = new MongoDB\Driver\Command(array(
"mapReduce" => "collection",
"map" => "function() { emit(this.UserID, 1); }",
"reduce" => "function(Users, Pages){".
"return Pages;}",
"out" => "ex"
));
try {
$cursor = $manager->executeCommand('Test.collection', $command);
$response = $cursor->toArray()[0];
} catch(MongoDB\Driver\Exception $e) {
echo $e->getMessage(), "\n";
exit;
}
var_dump($response);
?>
Any ideas will be appreciated thanks.
Not too sure if I would recommend MapReduce for this type of operation, would say the aggregation framework will do the aggregation with better performance since the operations are all done in native code without spawning the code to JavaScript for compiling (in the MapReduce case).
With the aggregation operation, all you would need is a $group pipeline that makes use of the $cond operator which allows you to tranform a logical condition into a value. In this case you'd want to specify the pages as keys and their count as the value, with the documents grouped by the UserID.
Consider running the following aggregation operation in mongo shell:
db.collection.aggregate([
{
"$group": {
"_id": "$UserID",
"A": {
"$sum": {
"$cond": [
{ "$eq": [ "$Page", "A" ] },
1,
0
]
}
},
"B": {
"$sum": {
"$cond": [
{ "$eq": [ "$Page", "B" ] },
1,
0
]
}
},
"C": {
"$sum": {
"$cond": [
{ "$eq": [ "$Page", "C" ] },
1,
0
]
}
},
"D": {
"$sum": {
"$cond": [
{ "$eq": [ "$Page", "D" ] },
1,
0
]
}
},
"E": {
"$sum": {
"$cond": [
{ "$eq": [ "$Page", "E" ] },
1,
0
]
}
}
}
}
])
which will produce the output:
{
"_id": "Paul",
"A": 1,
"B": 0,
"C": 0,
"D": 0,
"E": 0
}
for the above sample document.
For brevity, if suppose you have a list of the pages beforehand, you can dynamically produce the pipeline as follows:
var groupOperation = { "$group": { "_id": "$UserID" } },
pages = ["A", "B", "C", "D", "E"];
pages.forEach(function (page){
groupOperation["$group"][page] = {
"$sum": {
"$cond": [
{ "$eq": [ "$Page", page ] },
1,
0
]
}
};
})
db.collection.aggregate([groupOperation]);
Now, translating this to PHP follows:
<?php
$group_pipeline = [
'$group' => [
'_id' => '$UserID',
'A' => [
'$sum' => [
'$cond' => [ [ '$eq' => [ '$Page', 'A' ] ], 1, 0 ]
]
],
'B' => [
'$sum' => [
'$cond' => [ [ '$eq' => [ '$Page', 'B' ] ], 1, 0 ]
]
],
'C' => [
'$sum' => [
'$cond' => [ [ '$eq' => [ '$Page', 'C' ] ], 1, 0 ]
]
],
'D' => [
'$sum' => [
'$cond' => [ [ '$eq' => [ '$Page', 'D' ] ], 1, 0 ]
]
],
'E' => [
'$sum' => [
'$cond' => [ [ '$eq' => [ '$Page', 'E' ] ], 1, 0 ]
]
]
],
];
$aggregation = $collection->aggregate([ group_pipeline ]);
?>
Should you rather stick to MapReduce, then consider changing the map and reduce functions to :
db.collection.mapReduce(
function() {
var obj = {};
["A", "B", "C", "D", "E"].forEach(function (page){ obj[page] = 0; } );
obj[this.Page] = 1;
emit(this.UserID, obj);
},
function(key, values) {
var obj = {};
values.forEach(function(value) {
Object.keys(value).forEach(function(key) {
if (!obj.hasOwnProperty(key)){
obj[key] = 0;
}
obj[key]++;
});
});
return obj;
},
{ "out": { "inline": 1 } }
)
Which gives the output:
{
"results" : [
{
"_id" : "Paul",
"value" : {
"A" : 1,
"B" : 0,
"C" : 0,
"D" : 0,
"E" : 0
}
}
]
}
Translating the above mapReduce operation to PHP is trivial.
Related
I have this php code as I am trying to execute a particular function in my php project, I code the implementation correct but I ran into a small problem.
<?php echo '<script type="text/javascript">';
$data = array(
'base' => 'USD',
'alter' => 'ETH',
'data' => array()
);
foreach ($cryptos as $row) {
$sy = $row["symbol"];
$data['data'][] = array(
"$sy" => [
"rate" => 1.552000000000000,
"min" => 1.0077600000000000,
"max" => 10.077600000000000,
"code" => $row["symbol"],
"dp" => 8
],
);
}
print_r("var fxCur = " . json_encode($data));
Running the code above I got this result below, That's the expected result but I want to omit [] between data
{
"base":"USD",
"alter":"ETH",
"data":[
{
"BTC":{
"rate": 1.552000000000000,
"min": 1.0077600000000000,
"max": 10.077600000000000,
"code":"BTC",
"dp":8
}
},
{
"ETH":{
"rate": 1.552000000000000,
"min": 1.0077600000000000,
"max": 10.077600000000000,
"code":"ETH",
"dp":8
}
}
]
}
But actually I wanted this result
{
"base":"USD",
"alter":"ETH",
"data":{
"BTC":{
"rate": 1.552000000000000,
"min": 1.0077600000000000,
"max": 10.077600000000000,
"code":"BTC",
"dp":8
},
"ETH":{
"rate": 1.552000000000000,
"min": 1.0077600000000000,
"max": 10.077600000000000,
"code":"ETH",
"dp":8
},
}
}
You're telling it to construct the data structure that way.
$data['data'][] = array(
"$sy" => [
...
]
);
That line says "append an element to $data['data'] at the next integer index and set it equal to the array e.g. [ "BTC" => [ ... ]]
I think what you want is:
$data['data'][$sy] = [
"rate" => 1.552000000000000,
"min" => 1.0077600000000000,
"max" => 10.077600000000000,
"code" => $row["symbol"],
"dp" => 8
];
I am needing to ignore the apostrophe with indexed results so that searching for "Johns potato" will show results for "John's potato"
I was able to get the analyzer accepted but now I return no search results. Does anyone see something obvious that I am missing?
$params = [
'index' => $index,
'body' => [
'settings' => [
'number_of_shards' => 5,
'number_of_replicas' => 2,
'analysis' => [
"analyzer" => [
"my_analyzer" => [
"tokenizer" => "keyword",
"char_filter" => [
"my_char_filter"
]
]
],
"char_filter" => [
"my_char_filter" => [
"type" => "mapping",
"mappings" => [
"' => "
]
]
]
]
],
'mappings' => [
$type => [
'_source' => [
'enabled' => true
],
'properties' => [
'title' => [
'type' => 'text',
'analyzer' => 'my_analyzer'
],
'content' => [
'type' => 'text',
'analyzer' => 'my_analyzer'
]
]
]
]
]
];
I did find out that removing the analyzer from my field mappings allowed results to reappear, but I get no results the second I add the analyzer.
Here's an example query that I make.
{
"body": {
"query": {
"bool": {
"must": {
"multi_match": {
"query": "apples",
"fields": [
"title",
"content"
]
}
},
"filter": {
"terms": {
"site_id": [
"1351",
"1349"
]
}
},
"must_not": [
{
"match": {
"visible": "false"
}
},
{
"match": {
"locked": "true"
}
}
]
}
}
}
}
Probably, what you really want, is to use the english analyzer that is provided. The standard analyzer which is the default will tokenize on whitespace and some punctuation, but will leave apostrophes alone. The english analyzer can stem and remove stop words since the language is known.
Here is the standard analyzer's output, where you can see "john's":
POST _analyze
{
"analyzer": "standard",
"text": "John's potato"
}
{
"tokens": [
{
"token": "john's",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "potato",
"start_offset": 7,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
}
]
}
And here is the english analyzer where you can see the 's is removed. The stemming will allow "John's", "Johns", and "John" to all match the document.
POST _analyze
{
"analyzer": "english",
"text": "John's potato"
}
{
"tokens": [
{
"token": "john",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "potato",
"start_offset": 7,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
}
]
}
I'm trying to use a PHP mongo library to "aggregate" on a data structure like this:
{
"_id": 100,
"name": "Joe",
"pets":[
{
"name": "Kill me",
"animal": "Frog"
},
{
"name": "Petrov",
"animal": "Cat"
},
{
"name": "Joe",
"animal": "Frog"
}
]
},
{
"_id": 101,
"name": "Jane",
"pets":[
{
"name": "James",
"animal": "Hedgehog"
},
{
"name": "Franklin",
"animal": "Frog"
}
}
For example, if I want to get all subdocuments where the animal is a frog. Note that I do NOT want all matching "super-documents" (i.e. the ones with _id). I want to get an ARRAY that looks like this:
[
{
"name": "Kill me",
"animal": "Frog"
},
{
"name": "Joe",
"animal": "Frog"
},
{
"name": "Franklin",
"animal": "Frog"
}
]
What syntax am I supposed to use (in PHP) to accomplish this? I know it has to do with aggregate, but I couldn't find anything that matches this specific scenario.
You can use below aggregation. $match to find documents where array has a value of Frog and $unwind the pets array. $match where document has Frog and final step is to group the matching documents into array.
<?php
$mongo = new MongoDB\Driver\Manager("mongodb://localhost:27017");
$pipeline =
[
[
'$match' =>
[
'pets.animal' => 'Frog',
],
],
[
'$unwind' =>'$pets',
],
[
'$match' =>
[
'pets.animal' => 'Frog',
],
],
[
'$group' =>
[
'_id' => null,
'animals' => ['$push' => '$pets'],
],
],
];
$command = new \MongoDB\Driver\Command([
'aggregate' => 'insert_collection_name',
'pipeline' => $pipeline
]);
$cursor = $mongo->executeCommand('insert_db_name', $command);
foreach($cursor as $key => $document) {
//do something
}
?>
Sample records in the collection,
(doc 1)
[{
"_id": ObjectId("567941aaf0058ed6755ab3dc"),
"hash_count": NumberInt(7),
"time": [
NumberInt(1450787170),
NumberInt(1450787292),
NumberInt(1450787307),
NumberInt(1450787333),
NumberInt(1450787615)
],
"word": "batman"
},
(doc 2)
{
"_id": ObjectId("567941aaf0058ed6755ab3dc"),
"hash_count": NumberInt(7),
"time": [
NumberInt(1450787170),
NumberInt(1450787292),
NumberInt(1450787307),
NumberInt(1450787333),
NumberInt(1450787354),
NumberInt(1450787526),
NumberInt(1450787615)
],
"word": "apple"
}]
Have stored using PHP,
I want to find the number of records in between time (1450787307) and (1450787615)
Answer:
apple=5
batman=3
What should be query for it?
I ran this command
{
aggregate : "hashtags",
pipeline:
[
{$match:{"time":{$gte:NumberInt(1450787307), $lte:NumberInt(1450787615)}}},
{$unwind:"$time"},
{$match:{"time":{$gte:NumberInt(1450787307), $lte:NumberInt(1450787615)}}},
{$group:{"_id":"$word","count":{$sum:1}}}
]
}
which gave this result
Response from server:
{
"result": [
],
"ok": 1
}
Since you are stuck with an older version of mongoDB, you cannot leverage the power of the array aggregation operators introduced in 3.2.
You would have to aggregate as below:
db.collection.aggregate([
{$match:{"time":{$gte:NumberInt(1450787307), $lte:NumberInt(1450787615)}}},
{$unwind:"$time"},
{$match:{"time":{$gte:NumberInt(1450787307), $lte:NumberInt(1450787615)}}},
{$group:{"_id":"$word","count":{$sum:1}}}
])
translated to PHP,
$result = $c->aggregate([
[ '$match' => [ 'time' => [ '$gte' => NumberInt(1450787307),
'$lte' => NumberInt(1450787615) ] ] ],
[ '$unwind' => '$time' ],
[ '$match' => [ 'time' => [ '$gte' => NumberInt(1450787307),
'$lte' => NumberInt(1450787615) ] ] ],
[ '$group' => [ '_id' => '$word', 'count' => [ '$sum' => 1 ] ] ]
]);
In version 3.2, you could use the combination of $filter and $size to acheive the same result and with less expensive operations.
db.collection.aggregate([
{$match:{"time":{$gte:NumberInt(1450787307),
$lte:NumberInt(1450787615)}}},
{$project:{"_id":0,"word":1,
"count":{$size:{$filter:
{"input":"$time",
"as":"t",
"cond":{$and:[
{$gte:["$$t",NumberInt(1450787307)]},
{$lte:["$$t",NumberInt(1450787615)]}]}
}
}
}
}}
])
ok , after trying a lot I have come with this answer and is correct
for
1450787615- lower limit
1450855155- upper limit
db.hashtags.aggregate([
{
"$match": {
"time": {
"$gte": 1450787615, "$lte": 1450855155
}
}
},
{ "$unwind": "$time" },
{
"$match": {
"time": {
"$gte": 1450787615, "$lte": 1450855155
}
}
},
{
"$group": {
"_id": "$word",
"count": {
"$sum": 1
}
}
}
])
answer is like
{
"result" : [
{
"_id" : "batman",
"count" : 3
},
{
"_id" : "dear",
"count" : 1
},
{
"_id" : "ghost",
"count" : 1
}
],
"ok" : 1
}
db.collection.find({time:{$gt: 1450787307, $lt: 1450787615}});
This will first give you a cursor of all docs that fit within your given time range. Once you have that you can iterate through the cursor and print out the name as well as some loop logic to find the number of occurences for each one. I've only lightly worked with mongodb so there may be a more efficient way to do this.
reference:
https://docs.mongodb.org/v3.0/reference/method/db.collection.find/
It works like I need:
$out = $collection->aggregate(
array(
'$match' => array('type' => 'chair')
),
array(
'$project' => array(
'chairtype' => 1,
'mijczjeqeo'=>1
)
),
array(
'$group' => array(
'_id' => '$chairtype',
'MIDDLE_mijczjeqeo' => array('$avg' => '$mijczjeqeo'),
'SUMMA__mijczjeqeo' => array('$sum' => '$mijczjeqeo')
)
)
);
my_dump($out);
But i need to get true data for aggregation from array in the same documents: versions[0][content][mijczjeqeo]
Please correct my script. It does not work:
$out = $collection->aggregate(
array(
'$match' => array('type' => 'chair')
),
array(
'$project' => array(
'chairtype' => 1,
'versions.0.content.mijczjeqeo'=>1
)
),
array(
'$group' => array(
'_id' => '$chairtype',
'MIDDLEmijczjeqeo' => array('$avg' => '$versions.0.content.mijczjeqeo'),
'SUMMAmijczjeqeo' => array('$sum' => '$versions[0]["content"]["mijczjeqeo"]')
)
)
);
no one method does not work:
'MIDDLEmijczjeqeo' => array('$avg' => '$versions.0.content.mijczjeqeo')
'SUMMAmijczjeqeo' => array('$sum' => '$versions[0]["content"]["mijczjeqeo"]')
I think the problem near .0.
I try to do it in mongo console...
db.documents.aggregate({$match:{'type':'chair'}},{$project:{'chairtype': 1, 'mijczjeqeo':1}},{$group:{'_id':'$chairtype','MID':{$avg:'$mijczjeqeo'}}})
{
"result" : [
{
"_id" : "T",
"MID" : 6.615384615384615
},
{
"_id" : "G",
"MID" : 8.310344827586206
},
{
"_id" : "E",
"MID" : 6.9523809523809526
}
],
"ok" : 1
}
db.documents.aggregate({$match:{'type':'chair'}},{$project:{'chairtype': 1, 'versions.0.content.mijczjeqeo':1}},{$group:{'_id':'$chairtype','MID':{$avg:'$versions.0.content.mijczjeqeo'}}})
{
"result" : [
{
"_id" : "T",
"MID" : 0
},
{
"_id" : "G",
"MID" : 0
},
{
"_id" : "E",
"MID" : 0
}
],
"ok" : 1
}
Well you cannot project like that in the aggregation pipeline. If you want to act on array elements within an aggregation statement you first need to $unwind the array and then either $match the required element(s) or as in your case choose the $first item using an additional $group stage.
Your question does not show the structure of a document so I'll just use a sample, as my "chairs" collection:
{
"_id": 1,
"type": "chair",
"chairtype": "A",
"versions": [
{
"revision": 1,
"content": {
"name": "ABC",
"value": 10
}
},
{
"revision": 2,
"content": {
"name": "BBB",
"value": 15
}
}
]
}
{
"_id": 2,
"type": "chair",
"chairtype": "A",
"versions": [
{
"revision": 1,
"content": {
"name": "CCC",
"value": 20
}
},
{
"revision": 2,
"content": {
"name": "BAB",
"value": 12
}
}
]
}
Minimal, but enough to get the point. Now the aggregate statement:
db.chairs.aggregate([
// Normal query matching, which is good
{ "$match": { "type": "chair" } },
// Unwind the array to de-normalize
{ "$unwind": "$versions" },
// Group by the document in order to get the "first" array element
{ "$group": {
"_id": "$_id",
"chairtype": { "$first": "$chairtype" },
"versions": { "$first": "$versions" }
}},
// Then group by "chairtype" to get your average values
{ "$group": {
"_id": "$chairtype",
"MID": {"$avg": "$versions.content.value"}
}}
])
Of course if your actual document has nested arrays then you will be "unwinding" and "matching" the required elements. But that is the general process of "narrowing down" the array contents to the elements you need.