MapReduce for MongoDB on PHP? - php

I'm new to MongoDB and this is my first use of MapReduce ever.
I have two collections: Shops and Products with the following schema
Products
{'_id', 'type': 'first', 'enabled': 1, 'shop': $SHOP_ID }
{'_id', 'type': 'second', 'enabled': 0, 'shop': $SHOP_ID }
{'_id', 'type': 'second', 'enabled': 1, 'shop': $SHOP_ID }
And
Shops
{'_id', 'name':'L', ... }
{'_id', 'name':'M', ... }
I'm looking for a GROUPBY similar statement for MongoDB with MapReduce to retrieve the Shops with name 'L' that have Products with 'enabled' => 1
How can I do it? Thank you.

It should be possible to retrieve the desired information without a Map Reduce operation.
You could first query the "Products" collection for documents that match {'enabled': 1}, and then take the list of $SHOP_IDs from that query (which I imagine correspond to the _id values in the "Shops" collection), put them in an array, and perform an $in query on the "Shops" collection, combined with the query on "name".
For example, given the two collections:
> db.products.find()
{ "_id" : 1, "type" : "first", "enabled" : 1, "shop" : 3 }
{ "_id" : 2, "type" : "second", "enabled" : 0, "shop" : 4 }
{ "_id" : 3, "type" : "second", "enabled" : 1, "shop" : 5 }
> db.shops.find()
{ "_id" : 3, "name" : "L" }
{ "_id" : 4, "name" : "L" }
{ "_id" : 5, "name" : "M" }
>
First find all of the documents that match {"enabled" : 1}
> db.products.find({"enabled" : 1})
{ "_id" : 1, "type" : "first", "enabled" : 1, "shop" : 3 }
{ "_id" : 3, "type" : "second", "enabled" : 1, "shop" : 5 }
From the above query, generate a list of _ids:
> var c = db.products.find({"enabled" : 1})
> shop_ids = []
[ ]
> c.forEach(function(doc){shop_ids.push(doc.shop)})
> shop_ids
[ 3, 5 ]
Finally, query the shops collection for documents with _id values in the shop_ids array that also match {name:"L"}.
> db.shops.find({_id:{$in:shop_ids}, name:"L"})
{ "_id" : 3, "name" : "L" }
>
Similar questions regarding doing the equivalent of a join operation with Mongo have been asked before. This question provides some links which may provide you with additional guidance:
How to join MongoDB collections in Python?
If you would like to experiment with Map Reduce, here is a link to a blog post from a user who used an incremental Map Reduce operation to combine values from two collections.
http://tebros.com/2011/07/using-mongodb-mapreduce-to-join-2-collections/
Hopefully the above will allow you to retrieve the desired information from your collections.

Short answer: you can't do that (with a single MapReduce command).
Long answer: MapReduce jobs in MongoDB run only on a single collection and cannot refer other collections in the process. So, JOIN/GROUP BY-like behaviour of SQL is not available here. The new Aggregation Framework also operates on a single collection only.
I propose a two-part solution:
Get all shops with name "L".
Compose and run map-reduce command that will check every product document against this pre-computed list of shops.

Related

How to project more than one value with doctrine mongodb query builder selectElemMatch

I have the following kind of data in my mongo database. The property "values" consists of an array of attributes. "values" is a property of a product, which also has some other properties like "normalizedData". But the structure of "values" is what gives me a headache.
"values" : [
{
"_id" : ObjectId("5a09d88c83218b814a8df57d"),
"attribute" : NumberLong("118"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "de_AT",
"varchar" : "LED PAR-56 TCL 9x3W Short sw"
},
{
"_id" : ObjectId("5a09d88c83218b814a8df57a"),
"attribute" : NumberLong("118"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "de_DE",
"varchar" : "LED PAR-56 TCL 9x3W Short sw"
},
{
"_id" : ObjectId("5a09d88c83218b814a8df57c"),
"attribute" : NumberLong("184"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "de_AT",
"boolean" : false
},
{
"_id" : ObjectId("5a09d88c83218b814a8df585"),
"attribute" : NumberLong("118"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "fr_FR",
"varchar" : "LED PAR-56 TCL 9x3W Short sw"
},
{
"_id" : ObjectId("5a09d88c83218b814a8df584"),
"attribute" : NumberLong("121"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"locale" : "fr_FR",
"varchar" : "Eurolite LED PAR-56 TCL 9x3W Short sw"
},
{
"_id" : ObjectId("5a09d88c83218b814a8df574"),
"attribute" : NumberLong("207"),
"entity" : DBRef("pim_catalog_product", ObjectId("59148ee283218bb8548b45a8"), "akeneo_pim"),
"varchar" : "51913611"
},
]
A couple of things to notice about this extract from the dataset:
attributes with their ID ("attribute") can appear multiple times, like 118 for example.
attributes do not always have the same subset of properties (see 207 and 121 for example).
if an attribute is present multiple times (like 118) it should differ in the "locale" property at least.
Now I need the doctrine mongoDB query builder to project the following result:
I want only those attributes to be present in the result that contain one of the IDs specified by the query (e.g. array(118, 184)).
If the attribute exists multiple times, I want to see it multiple times.
If the attribute exists multiple times, I want to limit the number by an array of locales given.
So an example query would be: return all attributes inside "values" that have eigther 118 or 184 as the "attribute" property, and (if specified) limit the results to those attributes, where the locale is either "de_DE" or "it_IT".
Here is what I have tried so far:
$qb = $productRepository->createQueryBuilder();
$query = $qb
->hydrate(false)
->select(array('normalizedData.sku'))
->selectElemMatch(
'values',
$qb->expr()->field('attribute')->in(array(117, 110))->addAnd(
$qb->expr()->field('locale')->in(array('it_IT', 'de_DE'))
))
->field('_id')->in($entityIds)
->field('values')->elemMatch($qb->expr()->field('attribute')->in(array(117, 110)))
->limit($limit)
->skip($offset);
This query always returns only one attribute (no matter how many times it is present within the "values" array) per product. What am I doing wrong here?
EDIT: My MongoDB version is 2.4.9 and doctrine-mongo-odm is below 1.2. Currently I cannot update either.
You can try below aggregation query in 3.4 mongo version. $elemMatch by design returns first matching element.
You will need $filter to return multiple matches.
$match to limit the documents were values has atleast one value where it contains both attribute in [118,184] and locale in ["de_DE","it_IT"] followed by $filter to limit to matching documents in a $project stage. You can add $limit and $skip stage at the end of aggregation pipeliine same as what you did with regular query.
db.col.aggregate([
{"$match":{
"values":{
"$elemMatch":{
"attribute":{"$in":[118,184]},
"locale":{"$in":["de_DE","it_IT"]}
}
}
}},
{"$project":{
"values":{
"$filter":{
"input":"$values",
"as":"item",
"cond":{
"$and":[
{"$in":["$$item.attribute",[118,184]]},
{"$in":["$$item.locale",["de_DE","it_IT"]]}
]
}
}
}
}}
])
You can use AggregationBuilder to write the query in doctrine.

Intelligent searching with mongo

I am using phalcon with mongodb. I have the following document in collection:
{
"_id" : ObjectId("547c8b6f7d30dd522b522255"),
"title" : "Test vacancy",
"slug" : "test-vacancy",
"location" : "the-netherlands",
"contract" : "fixed",
"function" : "Test vacancy",
"short_description" : "gdfsgfds",
"description" : "fdsafsdgfsdgdfa",
"promo_text" : "gfdsgdfs",
"company_name" : "gfdsgfsd",
"hits" : 36,
"updated_at" : 1.42685e+09,
}
In controller I am fetching all results by searched phase/query. For example I put example word and output will be all posts with example word in description or title or short_desc etc. Everything is correct but I want sort these posts in specific order. I mean if query will be same as title, this post should be first. Now it is somewhere below.
Can you help me? Thank you in advance.

Mongo and Yii -> update with $set a field in all the arrays of a subdocument

I'm having problems updating a specific field in all the arrays of a subdocument. I have the following structure in MongoDB:
{
"_id" : ObjectId("539c9e97cac5852a1b880397"),
"DocumentoDesgloseER" : [
{
"elemento" : "COSTO VENTA",
"id_rubroer" : "11",
"id_documento" : "45087",
"abreviatura" : "CV",
"orden" : "1",
"formula" : "Cuenta Contable",
"tipo_fila" : "1",
"color" : "#FFD2E9",
"sucursal" : "D",
"documentoID" : "0",
"TOTAL" : "55426.62",
},
{ ... MORE OF THE SAME ... }
],
"id_division" : "2",
"id_empresa" : "9",
"id_sucursal" : "37",
"ejercicio" : "2008",
"lastMonthNumber" : NumberLong(6),
}
I need to update the field "documentoID" to a specific value; like "20" for example, in all the arrays of the subdocument "DocumentoDesgloseER". How I can do this?
I tried the following (with $ operator) and is not working:
$querySearch = array('id_division'=>'2', 'id_empresa'=>'9', 'id_sucursal'=>'37', 'ejercicio'=>'2008');
$queryUpdate = array('$set'=>array('DocumentoDesgloseER.$.documentoID'=>'20'));
Yii::app()->edmsMongoCollection('DocumentosDesgloseER')->update($querySearch,$queryUpdate);
By the way, I'm using Yii Framework to make the connection with Mongo. Any help or advice is welcome.
Thanks ;D!
Unfortunately, you can't currently use a positional operator to update all items in an array. There is a ticket opened in the MongoDB JIRA about this issue.
There a two "solutions":
Change your schema so that your embedded documents are in the separate collection (it's probably not what you want).
The best you can do, if you don't want to change your schema, is to update each subdocument in PHP and then save the whole document.

Duplicate value inserted in the mongodb "_id" field

In mongodb, We can assign our own value to _id field and the "_id" field value may be of any type, other than arrays, so long as it is a unique -- From the docs.
But in my live database, i can see some records were duplicated as follows,
db.memberrecords.find().limit(2).forEach(printjson)
{
"_id" : "999783",
"Memberid" : "999783",
"Name" : "ASHEESH SHARMA",
"Gender" : "M",
}
{
"_id" : "999783",
"Memberid" : "999783",
"Name" : "Sanwal Singh Meena",
"Gender" : "M",
}
In above records, the same _id value inserted twice in the table. When i tested with local database it is not allowing to insert the same _id record and throwing error as follows,
E11000 duplicate key error index: mongoms.memberrecords.$_id_ dup key: { : "999783" }
Below is the Indexes for my live memberrecords table(for your reference),
db.memberrecords.getIndexes()
[
{
"name" : "_id_",
"ns" : "mongoms.memberrecords",
"key" : {
"_id" : 1
},
"v" : 0
},
{
"_id" : ObjectId("4f0bcdf2b1513267f4ac227c"),
"ns" : "mongoms.memberrecords",
"key" : {
"Memberid" : 1
},
"name" : "Memberid_1",
"unique" : true,
"v" : 0
}
]
Note: i have two sharding for this table.
Any suggestion on this please,
Is your shard key the _id field? You can only have one unique index enforced across a cluster: the shard key (otherwise the server would have to check with every shard on every insert).
So: on a single a shard, _id will be unique. However, if it isn't your shard key, all bets are off across multiple shards.
See http://www.mongodb.org/display/DOCS/Sharding+Limits#ShardingLimits-UniqueIndexesDOCS%3AIndexes%23UniqueIndexes.

Sort data in sub array in mongodb

Is that possible to sort data in sub array in mongo database?
{ "_id" : ObjectId("4e3f8c7de7c7914b87d2e0eb"),
"list" : [
{
"id" : ObjectId("4e3f8d0be62883f70c00031c"),
"datetime" : 1312787723,
"comments" :
{
"id" : ObjectId("4e3f8d0be62883f70c00031d")
"datetime": 1312787723,
},
{
"id" : ObjectId("4e3f8d0be62883f70c00031d")
"datetime": 1312787724,
},
{
"id" : ObjectId("4e3f8d0be62883f70c00031d")
"datetime": 1312787725,
},
}
],
"user_id" : "3" }
For example I want to sort comments by field "datetime". Thanks. Or only variant is to select all data and sort it in PHP code, but my query works with limit from mongo...
With MongoDB, you can sort the documents or select only some parts of the documents, but you can't modify the documents returned by a search query.
If the current order of your comments can be changed, then the best solution would be to sort them in the MongoDB documents (find(), then for each doc, sort its comments and update()). If you want to keep the current internal order of comments, then you'll have to sort each document after each query.
In both case, the sort will be done with PHP. Something like:
foreach ($doc['list'] as $list) {
// uses a lambda function, PHP 5.3 required
usort($list['comments'], function($a,$b){ return $a["datetime"] < $b["datetime"] ? -1 : 1; });
}
If you can't use PHP 5.3, replace the lambda function by a normal one. See usort() examples.

Categories