How do I sum multiple fields in doctrine odm? - php

I want to use doctrine odm's aggregation builder to build this query:
db.TeamStandings.aggregate(
// Pipeline
[
// Stage 1
{
$match: {
"team.$id": ObjectId("5a1643fdf5d8741a883c2aeb")
}
},
// Stage 2
{
$group: {
"_id": { "team": "team.$id" },
// This is the sum of multiple fields
"games": { $sum: { $sum: ["$wins", "$losses", "$ties"] } },
"wins": { $sum: "$wins" },
"losses": { $sum: "$losses" },
"ties": { $sum: "$ties" },
"homeWins" : { $sum: "$homeRecord.wins" },
"homeLosses" : { $sum: "$homeRecord.losses" },
"homeTies" : { $sum: "$homeRecord.ties" },
"roadWins" : { $sum: "$roadRecord.wins" },
"roadLosses" : { $sum: "$roadRecord.losses" },
"roadTies" : { $sum: "$roadRecord.ties" },
}
},
]
);
I executed this in Studio3T and got the following:
{
"_id" : {
"team" : "team.$id"
},
"games" : NumberInt(776),
"wins" : NumberInt(377),
"losses" : NumberInt(398),
"ties" : NumberInt(1),
"homeWins" : NumberInt(218),
"homeLosses" : NumberInt(170),
"homeTies" : NumberInt(1),
"roadWins" : NumberInt(159),
"roadLosses" : NumberInt(228),
"roadTies" : NumberInt(0)
}
How do I write this exact query using doctrine odm's aggregation builder?

This one is tricky because the documentation is not quite clear.
In theory, you create a nested sub-expression and use the sum operator there:
$builder = $this->dm->createAggregationBuilder(\Documents\BlogPost::class);
$builder->group()
->field('id')
->expression(null)
->field('games')
->sum($builder->expr()->sum('$wins', '$losses', '$ties'))
;
However, due to the way the aggregation builder is built, it doesn't quite understand the sum syntax with multiple expressions outside a $project stage, resulting in the following (wrong) result:
[{
"$group": {
"_id": null,
"games": { "$sum": { "$sum": "$wins" } }
}
}]
To work around this problem, use the $add operator instead of $sum in the nested expression:
$builder = $this->dm->createAggregationBuilder(\Documents\BlogPost::class);
$builder->group()
->field('id')
->expression(null)
->field('games')
->sum($builder->expr()->add('$wins', '$losses', '$ties'))
;
This creates the aggregation pipeline you want it to create.
The reason this is weird is because the documentation defines different behavior for $sum when used in $group and $project stages: in $group, it accepts a single argument, while in $project it accepts multiple arguments. However, it is not perfectly clear how it behaves in a nested expression within $group: the fact that the aggregation pipeline you posted works suggests that it doesn't see the sub-expression as being within a $group stage, thus allowing multiple arguments. When I built the operator, I assumed the opposite: only when in a $project stage should $sum accept multiple arguments and default to the one argument syntax otherwise.
I'll create a ticket for this in MongoDB ODM and I'll see if this can easily be fixed.

Related

filter and sort embed collection ( embedsMany relationship) mongodb db laravel [duplicate]

Suppose you have the following documents in my collection:
{
"_id":ObjectId("562e7c594c12942f08fe4192"),
"shapes":[
{
"shape":"square",
"color":"blue"
},
{
"shape":"circle",
"color":"red"
}
]
},
{
"_id":ObjectId("562e7c594c12942f08fe4193"),
"shapes":[
{
"shape":"square",
"color":"black"
},
{
"shape":"circle",
"color":"green"
}
]
}
Do query:
db.test.find({"shapes.color": "red"}, {"shapes.color": 1})
Or
db.test.find({shapes: {"$elemMatch": {color: "red"}}}, {"shapes.color": 1})
Returns matched document (Document 1), but always with ALL array items in shapes:
{ "shapes":
[
{"shape": "square", "color": "blue"},
{"shape": "circle", "color": "red"}
]
}
However, I'd like to get the document (Document 1) only with the array that contains color=red:
{ "shapes":
[
{"shape": "circle", "color": "red"}
]
}
How can I do this?
MongoDB 2.2's new $elemMatch projection operator provides another way to alter the returned document to contain only the first matched shapes element:
db.test.find(
{"shapes.color": "red"},
{_id: 0, shapes: {$elemMatch: {color: "red"}}});
Returns:
{"shapes" : [{"shape": "circle", "color": "red"}]}
In 2.2 you can also do this using the $ projection operator, where the $ in a projection object field name represents the index of the field's first matching array element from the query. The following returns the same results as above:
db.test.find({"shapes.color": "red"}, {_id: 0, 'shapes.$': 1});
MongoDB 3.2 Update
Starting with the 3.2 release, you can use the new $filter aggregation operator to filter an array during projection, which has the benefit of including all matches, instead of just the first one.
db.test.aggregate([
// Get just the docs that contain a shapes element where color is 'red'
{$match: {'shapes.color': 'red'}},
{$project: {
shapes: {$filter: {
input: '$shapes',
as: 'shape',
cond: {$eq: ['$$shape.color', 'red']}
}},
_id: 0
}}
])
Results:
[
{
"shapes" : [
{
"shape" : "circle",
"color" : "red"
}
]
}
]
The new Aggregation Framework in MongoDB 2.2+ provides an alternative to Map/Reduce. The $unwind operator can be used to separate your shapes array into a stream of documents that can be matched:
db.test.aggregate(
// Start with a $match pipeline which can take advantage of an index and limit documents processed
{ $match : {
"shapes.color": "red"
}},
{ $unwind : "$shapes" },
{ $match : {
"shapes.color": "red"
}}
)
Results in:
{
"result" : [
{
"_id" : ObjectId("504425059b7c9fa7ec92beec"),
"shapes" : {
"shape" : "circle",
"color" : "red"
}
}
],
"ok" : 1
}
Caution: This answer provides a solution that was relevant at that time, before the new features of MongoDB 2.2 and up were introduced. See the other answers if you are using a more recent version of MongoDB.
The field selector parameter is limited to complete properties. It cannot be used to select part of an array, only the entire array. I tried using the $ positional operator, but that didn't work.
The easiest way is to just filter the shapes in the client.
If you really need the correct output directly from MongoDB, you can use a map-reduce to filter the shapes.
function map() {
filteredShapes = [];
this.shapes.forEach(function (s) {
if (s.color === "red") {
filteredShapes.push(s);
}
});
emit(this._id, { shapes: filteredShapes });
}
function reduce(key, values) {
return values[0];
}
res = db.test.mapReduce(map, reduce, { query: { "shapes.color": "red" } })
db[res.result].find()
Another interesing way is to use $redact, which is one of the new aggregation features of MongoDB 2.6. If you are using 2.6, you don't need an $unwind which might cause you performance problems if you have large arrays.
db.test.aggregate([
{ $match: {
shapes: { $elemMatch: {color: "red"} }
}},
{ $redact : {
$cond: {
if: { $or : [{ $eq: ["$color","red"] }, { $not : "$color" }]},
then: "$$DESCEND",
else: "$$PRUNE"
}
}}]);
$redact "restricts the contents of the documents based on information stored in the documents themselves". So it will run only inside of the document. It basically scans your document top to the bottom, and checks if it matches with your if condition which is in $cond, if there is match it will either keep the content($$DESCEND) or remove($$PRUNE).
In the example above, first $match returns the whole shapes array, and $redact strips it down to the expected result.
Note that {$not:"$color"} is necessary, because it will scan the top document as well, and if $redact does not find a color field on the top level this will return false that might strip the whole document which we don't want.
Better you can query in matching array element using $slice is it helpful to returning the significant object in an array.
db.test.find({"shapes.color" : "blue"}, {"shapes.$" : 1})
$slice is helpful when you know the index of the element, but sometimes you want
whichever array element matched your criteria. You can return the matching element
with the $ operator.
db.getCollection('aj').find({"shapes.color":"red"},{"shapes.$":1})
OUTPUTS
{
"shapes" : [
{
"shape" : "circle",
"color" : "red"
}
]
}
The syntax for find in mongodb is
db.<collection name>.find(query, projection);
and the second query that you have written, that is
db.test.find(
{shapes: {"$elemMatch": {color: "red"}}},
{"shapes.color":1})
in this you have used the $elemMatch operator in query part, whereas if you use this operator in the projection part then you will get the desired result. You can write down your query as
db.users.find(
{"shapes.color":"red"},
{_id:0, shapes: {$elemMatch : {color: "red"}}})
This will give you the desired result.
Thanks to JohnnyHK.
Here I just want to add some more complex usage.
// Document
{
"_id" : 1
"shapes" : [
{"shape" : "square", "color" : "red"},
{"shape" : "circle", "color" : "green"}
]
}
{
"_id" : 2
"shapes" : [
{"shape" : "square", "color" : "red"},
{"shape" : "circle", "color" : "green"}
]
}
// The Query
db.contents.find({
"_id" : ObjectId(1),
"shapes.color":"red"
},{
"_id": 0,
"shapes" :{
"$elemMatch":{
"color" : "red"
}
}
})
//And the Result
{"shapes":[
{
"shape" : "square",
"color" : "red"
}
]}
You just need to run query
db.test.find(
{"shapes.color": "red"},
{shapes: {$elemMatch: {color: "red"}}});
output of this query is
{
"_id" : ObjectId("562e7c594c12942f08fe4192"),
"shapes" : [
{"shape" : "circle", "color" : "red"}
]
}
as you expected it'll gives the exact field from array that matches color:'red'.
Along with $project it will be more appropriate other wise matching elements will be clubbed together with other elements in document.
db.test.aggregate(
{ "$unwind" : "$shapes" },
{ "$match" : { "shapes.color": "red" } },
{
"$project": {
"_id":1,
"item":1
}
}
)
Likewise you can find for the multiple
db.getCollection('localData').aggregate([
// Get just the docs that contain a shapes element where color is 'red'
{$match: {'shapes.color': {$in : ['red','yellow'] } }},
{$project: {
shapes: {$filter: {
input: '$shapes',
as: 'shape',
cond: {$in: ['$$shape.color', ['red', 'yellow']]}
}}
}}
])
db.test.find( {"shapes.color": "red"}, {_id: 0})
Use aggregation function and $project to get specific object field in document
db.getCollection('geolocations').aggregate([ { $project : { geolocation : 1} } ])
result:
{
"_id" : ObjectId("5e3ee15968879c0d5942464b"),
"geolocation" : [
{
"_id" : ObjectId("5e3ee3ee68879c0d5942465e"),
"latitude" : 12.9718313,
"longitude" : 77.593551,
"country" : "India",
"city" : "Chennai",
"zipcode" : "560001",
"streetName" : "Sidney Road",
"countryCode" : "in",
"ip" : "116.75.115.248",
"date" : ISODate("2020-02-08T16:38:06.584Z")
}
]
}
Although the question was asked 9.6 years ago, this has been of immense help to numerous people, me being one of them. Thank you everyone for all your queries, hints and answers. Picking up from one of the answers here.. I found that the following method can also be used to project other fields in the parent document.This may be helpful to someone.
For the following document, the need was to find out if an employee (emp #7839) has his leave history set for the year 2020. Leave history is implemented as an embedded document within the parent Employee document.
db.employees.find( {"leave_history.calendar_year": 2020},
{leave_history: {$elemMatch: {calendar_year: 2020}},empno:true,ename:true}).pretty()
{
"_id" : ObjectId("5e907ad23997181dde06e8fc"),
"empno" : 7839,
"ename" : "KING",
"mgrno" : 0,
"hiredate" : "1990-05-09",
"sal" : 100000,
"deptno" : {
"_id" : ObjectId("5e9065f53997181dde06e8f8")
},
"username" : "none",
"password" : "none",
"is_admin" : "N",
"is_approver" : "Y",
"is_manager" : "Y",
"user_role" : "AP",
"admin_approval_received" : "Y",
"active" : "Y",
"created_date" : "2020-04-10",
"updated_date" : "2020-04-10",
"application_usage_log" : [
{
"logged_in_as" : "AP",
"log_in_date" : "2020-04-10"
},
{
"logged_in_as" : "EM",
"log_in_date" : ISODate("2020-04-16T07:28:11.959Z")
}
],
"leave_history" : [
{
"calendar_year" : 2020,
"pl_used" : 0,
"cl_used" : 0,
"sl_used" : 0
},
{
"calendar_year" : 2021,
"pl_used" : 0,
"cl_used" : 0,
"sl_used" : 0
}
]
}
if you want to do filter, set and find at the same time.
let post = await Post.findOneAndUpdate(
{
_id: req.params.id,
tasks: {
$elemMatch: {
id: req.params.jobId,
date,
},
},
},
{
$set: {
'jobs.$[i].performer': performer,
'jobs.$[i].status': status,
'jobs.$[i].type': type,
},
},
{
arrayFilters: [
{
'i.id': req.params.jobId,
},
],
new: true,
}
);
This answer does not fully answer the question but it's related and I'm writing it down because someone decided to close another question marking this one as duplicate (which is not).
In my case I only wanted to filter the array elements but still return the full elements of the array. All previous answers (including the solution given in the question) gave me headaches when applying them to my particular case because:
I needed my solution to be able to return multiple results of the subarray elements.
Using $unwind + $match + $group resulted in losing root documents without matching array elements, which I didn't want to in my case because in fact I was only looking to filter out unwanted elements.
Using $project > $filter resulted in loosing the rest of the fields or the root documents or forced me to specify all of them in the projection as well which was not desirable.
So at the end I fixed all of this problems with an $addFields > $filter like this:
db.test.aggregate([
{ $match: { 'shapes.color': 'red' } },
{ $addFields: { 'shapes': { $filter: {
input: '$shapes',
as: 'shape',
cond: { $eq: ['$$shape.color', 'red'] }
} } } },
])
Explanation:
First match documents with a red coloured shape.
For those documents, add a field called shapes, which in this case will replace the original field called the same way.
To calculate the new value of shapes, $filter the elements of the original $shapes array, temporarily naming each of the array elements as shape so that later we can check if the $$shape.color is red.
Now the new shapes array only contains the desired elements.
for more details refer =
mongo db official referance
suppose you have document like this (you can have multiple document too) -
{
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44b"
},
"results": [
{
"yearOfRelease": "2022",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/d/d4/The_Kashmir_Files_poster.jpg",
"title": "The Kashmir Files",
"overview": "Krishna endeavours to uncover the reason behind his parents' brutal killings in Kashmir. He is shocked to uncover a web of lies and conspiracies in connection with the massive genocide.",
"originalLanguage": "hi",
"imdbRating": "8.3",
"isbookMark": null,
"originCountry": "india",
"productionHouse": [
"Zee Studios"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44c"
}
},
{
"yearOfRelease": "2022",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/a/a9/Black_Adam_%28film%29_poster.jpg",
"title": "Black Adam",
"overview": "In ancient Kahndaq, Teth Adam was bestowed the almighty powers of the gods. After using these powers for vengeance, he was imprisoned, becoming Black Adam. Nearly 5,000 years have passed, and Black Adam has gone from man to myth to legend. Now free, his unique form of justice, born out of rage, is challenged by modern-day heroes who form the Justice Society: Hawkman, Dr. Fate, Atom Smasher and Cyclone",
"originalLanguage": "en",
"imdbRating": "8.3",
"isbookMark": null,
"originCountry": "United States of America",
"productionHouse": [
"DC Comics"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44d"
}
},
{
"yearOfRelease": "2022",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/0/09/The_Sea_Beast_film_poster.png",
"title": "The Sea Beast",
"overview": "A young girl stows away on the ship of a legendary sea monster hunter, turning his life upside down as they venture into uncharted waters.",
"originalLanguage": "en",
"imdbRating": "7.1",
"isbookMark": null,
"originCountry": "United States Canada",
"productionHouse": [
"Netflix Animation"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44e"
}
},
{
"yearOfRelease": "2021",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/7/7d/Hum_Do_Hamare_Do_poster.jpg",
"title": "Hum Do Hamare Do",
"overview": "Dhruv, who grew up an orphan, is in love with a woman who wishes to marry someone with a family. In order to fulfil his lover's wish, he hires two older individuals to pose as his parents.",
"originalLanguage": "hi",
"imdbRating": "6.0",
"isbookMark": null,
"originCountry": "india",
"productionHouse": [
"Maddock Films"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44f"
}
},
{
"yearOfRelease": "2021",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/7/74/Shang-Chi_and_the_Legend_of_the_Ten_Rings_poster.jpeg",
"title": "Shang-Chi and the Legend of the Ten Rings",
"overview": "Shang-Chi, a martial artist, lives a quiet life after he leaves his father and the shadowy Ten Rings organisation behind. Years later, he is forced to confront his past when the Ten Rings attack him.",
"originalLanguage": "en",
"imdbRating": "7.4",
"isbookMark": null,
"originCountry": "United States of America",
"productionHouse": [
"Marvel Entertainment"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c450"
}
}
],
"__v": 0
}
=======
mongo db query by aggregate command -
mongomodels.movieMainPageSchema.aggregate(
[
{
$project: {
_id:0, // to supress id
results: {
$filter: {
input: "$results",
as: "result",
cond: { $eq: [ "$$result.yearOfRelease", "2022" ] }
}
}
}
}
]
)
For the new version of MongoDB, it's slightly different.
For db.collection.find you can use the second parameter of find with the key being projection
db.collection.find({}, {projection: {name: 1, email: 0}});
You can also use the .project() method.
However, it is not a native MongoDB method, it's a method provided by most MongoDB driver like Mongoose, MongoDB Node.js driver etc.
db.collection.find({}).project({name: 1, email: 0});
And if you want to use findOne, it's the same that with find
db.collection.findOne({}, {projection: {name: 1, email: 0}});
But findOne doesn't have a .project() method.

MongoDB Ordering by average combined numbers or nested sub arrays

Having some issues working out the best way to do this in MongoDB, arguably its a relation data set so I will probably be slated. Still its a challenge to see if its possible.
I currently need to order by a Logistics Managers' daily average miles across the vans in their department and also in a separate list a combined weekly average.
Mr First setup in the database was as follows
{
"_id" : ObjectId("555cf04fa3ed8cc2347b23d7"),
"name" : "My Manager 1",
"vans" : [
{
"name" : "van1",
"miles" : NumberLong(56)
},
{
"name" : "van2",
"miles" : NumberLong(34)
}
]
}
But I can't see how to order by a nested array value without knowing the parent array keys (these will be standard 0-x)
So my next choice was to scrap that idea just have the name in the first collection and the vans in the second collection with Id of the manager.
So removing vans from the above example and adding this collection (vans)
{
"_id" : ObjectId("555cf04fa3ed8cc2347b23d9"),
"name" : "van1",
"miles" : NumberLong(56),
"manager_id" : "555cf04fa3ed8cc2347b23d7"
}
But because I need show the results by manager, how do I order in a query (if possible) the average miles in this collection where id=x and then display the manager by his id.
Thanks for your help
If the Manager is going to have limited number of Vans, then your first approach is better, as you do not have to make two separate calls/queries to the database to collect your information.
Then comes the question how to calculate the average milage per Manager, where the Aggregation Framework will help you a lot. Here is a query that will get you the desired data:
db.manager.aggregate([
{$unwind: "$vans"},
{$group:
{_id:
{
_id: "$_id",
name: "$name"
},
avg_milage: {$avg: "$vans.miles"}
}
},
{$sort: {"avg_milage": -1}},
{$project:
{_id: "$_id._id",
name: "$_id.name",
avg_milage: "$avg_milage"
}
}
])
The first $unwind step simply unwraps the vans array, and creates a separate documents for each element of the array.
Then the $group stage gets all documents with the same (_id, name) pair, and in the avg_milage field, counts the average value of miles field out of those documents.
The $sort stage is obvious, it just sorts the documents in the descending order, using the new avg_milage field as the sort key.
And finally, the last $project step just cleans up the documents by making appropriate projections, just for beauty :)
A similar thing is needed for your second desired result:
db.manager.aggregate([
{$unwind: "$vans"},
{$group:
{_id:
{
_id: "$_id",
name: "$name"
},
total_milage: {$sum: "$vans.miles"}
}
},
{$sort: {"total_milage": -1}},
{$project:
{_id: "$_id._id",
name: "$_id.name",
weekly_milage: {
$multiply: [
"$total_milage",
7
]
}
}
}
])
This will produce the list of Managers with their weekly milage, sorted in descending order. So you can $limit the result, and get the Manager with the highest milage for instance.
And in pretty much similar way, you can grab info for your vans:
db.manager.aggregate([
{$unwind: "$vans"},
{$group:
{_id: "$vans.name",
total_milage: {$sum: "$vans.miles"}
}
},
{$sort: {"total_milage": -1}},
{$project:
{van_name: "$_id",
weekly_milage: {
$multiply: [
"$total_milage",
7
]
}
}
}
])
First, do you require average miles for a single day, average miles over a given time period, or average miles over the life of the manager? I would consider adding a timestamp field. Yes, _id has a timestamp, but this only reflects the time the document was created, not necessarily the time of the initial day's log.
Considerations for the first data model:
Does each document represent one day, or one manager?
How many "vans" do you expect to have in the array? Does this list grow over time? Do you need to consider the 16MB max doc size in a year or two from now?
Considerations for the second data model:
Can you store the manager's name as the "manager_id" field? Can this be used as a possible unique ID for a secondary meta lookup? Doing so would limit the necessity of a secondary manager meta-data lookup just to get their name.
As #n9code has pointed out, the aggregation framework is the answer in both cases.
For the first data model, assuming each document represents one day and you want to retrieve an average for a given day or a range of days:
db.collection.aggregate([
{ $match: {
name: 'My Manager 1',
timestamp: { $gte: ISODate(...), $lt: ISODate(...) }
} },
{ $unwind: '$vans' },
{ $group: {
_id: {
_id: '$_id',
name: '$name',
timestamp: '$timestamp'
},
avg_mileage: {
$avg: '$miles'
}
} },
{ $sort: {
avg_mileage: -1
} },
{ $project: {
_id: '$_id._id',
name: '$_id.name',
timestamp: '$_id.timestamp',
avg_mileage: 1
} }
]);
If, for the first data model, each document represents a manager and the "vans" array grows daily, this particular data model is not ideal for two reasons:
"vans" array may grow beyond max document size... eventually, although that would be a lot of data
It is more difficult and memory intensive to limit a certain date range since timestamp at this point would be nested within an item of "vans" and not in the root of the document
For the sake of completeness, here is the query:
/*
Assuming data model is:
{
_id: ...,
name: ...,
vans: [
{ name: ..., miles: ..., timestamp: ... }
]
}
*/
db.collection.aggregate([
{ $match: {
name: 'My Manager 1'
} },
{ $unwind: '$vans' },
{ $match: {
'vans.timestamp': { $gte: ISODate(...), $lt: ISODate(...) }
} },
{ $group: {
_id: {
_id: '$_id',
name: '$name'
},
avg_mileage: {
$avg: '$miles'
}
} },
{ $sort: {
avg_mileage: -1
} },
{ $project: {
_id: '$_id._id',
name: '$_id.name',
avg_mileage: 1
} }
]);
For the second data model, aggregation is more straightforward. I'm assuming the inclusion of a timestamp:
db.collection.aggregate([
{ $match: {
manager_id: ObjectId('555cf04fa3ed8cc2347b23d7')
timestamp: { $gte: ISODate(...), $lt: ISODate(...) }
} },
{ $group: {
_id: '$manager_id'
},
avg_mileage: {
$avg: '$miles'
}
names: {
$addToSet: '$name'
}
} },
{ $sort: {
avg_mileage: -1
} },
{ $project: {
manager_id: '$_id',
avg_mileage: 1
names: 1
} }
]);
I have added an array of names (vehicles?) used during the average computation.
Relevant documentation:
$match, $unwind, $group, $sort, $project - Pipeline Aggregation Stages
$avg, $addToSet - Group Accumulator Operators
Date types
ObjectId.getTimestamp

Percentage of OR conditions matched in mongodb

I have got my data in following format..
{
"_id" : ObjectId("534fd4662d22a05415000000"),
"product_id" : "50862224",
"ean" : "8808992479390",
"brand" : "LG",
"model" : "37LH3000",
"features" : [{
{
"key" : "Screen Format",
"value" : "16:9",
}, {
"key" : "DVD Player / Recorder",
"value" : "No",
},
"key" : "Weight in kg",
"value" : "12.6",
}
... so on
]
}
I need to compare features of one product with others and divide the result into separate categories ( 100% match, 50-99 % match) based on % of feature matches..
My initial thought was to prepare a dynamic query with or condition for each feature and do the percentage thing in php but then that means mongodb will return me even those product which only have 1 feature matching. And I I think nearly all products of a category might have some feature in common, so I fear I might be working on lot of products in php.
I have two questions basically.
is there any alternate ways?
And is the data structure I am using is good enough to support the functionality I am looking for, Or should I consider changing it
Well your solution really should be MongoDB specific otherwise you will end up doing your calculations and possible matching on the client side, and that is not going to be good for performance.
So of course what you really want is a way for that to have that processing on the server side:
db.products.aggregate([
// Match the documents that meet your conditions
{ "$match": {
"$or": [
{
"features": {
"$elemMatch": {
"key": "Screen Format",
"value": "16:9"
}
}
},
{
"features": {
"$elemMatch": {
"key" : "Weight in kg",
"value" : { "$gt": "5", "$lt": "8" }
}
}
},
]
}},
// Keep the document and a copy of the features array
{ "$project": {
"_id": {
"_id": "$_id",
"product_id": "$product_id",
"ean": "$ean",
"brand": "$brand",
"model": "$model",
"features": "$features"
},
"features": 1
}},
// Unwind the array
{ "$unwind": "$features" },
// Find the actual elements that match the conditions
{ "$match": {
"$or": [
{
"features.key": "Screen Format",
"features.value": "16:9"
},
{
"features.key" : "Weight in kg",
"features.value" : { "$gt": "5", "$lt": "8" }
},
]
}},
// Count those matched elements
{ "$group": {
"_id": "$_id",
"count": { "$sum": 1 }
}},
// Restore the document and divide the mated elements by the
// number of elements in the "or" condition
{ "$project": {
"_id": "$_id._id",
"product_id": "$_id.product_id",
"ean": "$_id.ean",
"brand": "$_id.brand",
"model": "$_id.model",
"features": "$_id.features",
"matched": { "$divide": [ "$count", 2 ] }
}},
// Sort by the matched percentage
{ "$sort": { "matched": -1 } }
])
So as you know the "length" of the $or condition being applied, then you simply need to find out how many of the elements in the "features" array match those conditions. So that is what the second $match in the pipeline is all about.
Once you have that count, you simply divide by the number of conditions what were passed in as your $or. The beauty here is that now you can do something useful with this like sort by that relevance and then even "page" the results server side.
Of course if you want some additional "categorization" of this, all you would need to do is add another $project stage to the end of the pipeline:
{ "$project": {
"product_id": 1
"ean": 1
"brand": 1
"model": 1,
"features": 1,
"matched": 1,
"category": { "$cond": [
{ "$eq": [ "$matched", 1 ] },
"100",
{ "$cond": [
{ "$gte": [ "$matched", .7 ] },
"70-99",
{ "$cond": [
"$gte": [ "$matched", .4 ] },
"40-69",
"under 40"
]}
]}
]}
}}
Or as something similar. But the $cond operator can help you here.
The architecture should be fine as you have it as you can have a compound index on the "key" and "value" for the entries in your features array and this should scale well for queries.
Of course if you actually need something more than that, such as faceted searching and results, you can look at solutions like Solr or elastic search. But the full implementation of that would be a bit lengthy for here.
I'm assuming that you'd like to compare the rest of the collection to a given product, which is a textbook example of aggregation:
lookingat = db.products.findOne({product_id:'50862224'})
matches = db.products.aggregate([
{ $unwind: '$features' },
{ $match: { features: { $in: lookingat.features }}},
{ $group: { _id: '$product_id', matchedfeatures: { $sum:1 }}},
{ $sort: { matchedfeatures: -1 }},
{ $limit: 5 },
{ $project: { _id:0, product_id: '$_id',
pctmatch: { $multiply: [ '$matchedfeatures',
100/lookingat.features.length ]}
}}
])
Walking through this briefly from the perspective of a product in the collection that has 6 features, and comparing it to the target product ('lookingat') which has 4 features, 3 of which match:
$unwind turns 1 document with 6 features into 6 otherwise-identical documents with 1 feature each
$match looks for that feature in the target's feature array (be aware that two documents are "equal" only if they have the same field names and values, in the same order), discards the 3 that don't match, and passes along the 3 that do
$group consumes those 3 matching documents and produces a new one that tells you there were 3 documents that matched that product_id
$sort and $limit give you the most relevant results and leave behind all those 1-feature matches you were concerned about
$project lets you rename the _id from the $group step back to product_id and also math the number of matching features into a percentage (we avoided a $divide operation by recognizing that 2 of the 3 terms in our calculation are constants and can be divided in JS)

Mongo+PHP Querying nested (embedded) documents excluding fields +'Where' clause [duplicate]

Suppose you have the following documents in my collection:
{
"_id":ObjectId("562e7c594c12942f08fe4192"),
"shapes":[
{
"shape":"square",
"color":"blue"
},
{
"shape":"circle",
"color":"red"
}
]
},
{
"_id":ObjectId("562e7c594c12942f08fe4193"),
"shapes":[
{
"shape":"square",
"color":"black"
},
{
"shape":"circle",
"color":"green"
}
]
}
Do query:
db.test.find({"shapes.color": "red"}, {"shapes.color": 1})
Or
db.test.find({shapes: {"$elemMatch": {color: "red"}}}, {"shapes.color": 1})
Returns matched document (Document 1), but always with ALL array items in shapes:
{ "shapes":
[
{"shape": "square", "color": "blue"},
{"shape": "circle", "color": "red"}
]
}
However, I'd like to get the document (Document 1) only with the array that contains color=red:
{ "shapes":
[
{"shape": "circle", "color": "red"}
]
}
How can I do this?
MongoDB 2.2's new $elemMatch projection operator provides another way to alter the returned document to contain only the first matched shapes element:
db.test.find(
{"shapes.color": "red"},
{_id: 0, shapes: {$elemMatch: {color: "red"}}});
Returns:
{"shapes" : [{"shape": "circle", "color": "red"}]}
In 2.2 you can also do this using the $ projection operator, where the $ in a projection object field name represents the index of the field's first matching array element from the query. The following returns the same results as above:
db.test.find({"shapes.color": "red"}, {_id: 0, 'shapes.$': 1});
MongoDB 3.2 Update
Starting with the 3.2 release, you can use the new $filter aggregation operator to filter an array during projection, which has the benefit of including all matches, instead of just the first one.
db.test.aggregate([
// Get just the docs that contain a shapes element where color is 'red'
{$match: {'shapes.color': 'red'}},
{$project: {
shapes: {$filter: {
input: '$shapes',
as: 'shape',
cond: {$eq: ['$$shape.color', 'red']}
}},
_id: 0
}}
])
Results:
[
{
"shapes" : [
{
"shape" : "circle",
"color" : "red"
}
]
}
]
The new Aggregation Framework in MongoDB 2.2+ provides an alternative to Map/Reduce. The $unwind operator can be used to separate your shapes array into a stream of documents that can be matched:
db.test.aggregate(
// Start with a $match pipeline which can take advantage of an index and limit documents processed
{ $match : {
"shapes.color": "red"
}},
{ $unwind : "$shapes" },
{ $match : {
"shapes.color": "red"
}}
)
Results in:
{
"result" : [
{
"_id" : ObjectId("504425059b7c9fa7ec92beec"),
"shapes" : {
"shape" : "circle",
"color" : "red"
}
}
],
"ok" : 1
}
Caution: This answer provides a solution that was relevant at that time, before the new features of MongoDB 2.2 and up were introduced. See the other answers if you are using a more recent version of MongoDB.
The field selector parameter is limited to complete properties. It cannot be used to select part of an array, only the entire array. I tried using the $ positional operator, but that didn't work.
The easiest way is to just filter the shapes in the client.
If you really need the correct output directly from MongoDB, you can use a map-reduce to filter the shapes.
function map() {
filteredShapes = [];
this.shapes.forEach(function (s) {
if (s.color === "red") {
filteredShapes.push(s);
}
});
emit(this._id, { shapes: filteredShapes });
}
function reduce(key, values) {
return values[0];
}
res = db.test.mapReduce(map, reduce, { query: { "shapes.color": "red" } })
db[res.result].find()
Another interesing way is to use $redact, which is one of the new aggregation features of MongoDB 2.6. If you are using 2.6, you don't need an $unwind which might cause you performance problems if you have large arrays.
db.test.aggregate([
{ $match: {
shapes: { $elemMatch: {color: "red"} }
}},
{ $redact : {
$cond: {
if: { $or : [{ $eq: ["$color","red"] }, { $not : "$color" }]},
then: "$$DESCEND",
else: "$$PRUNE"
}
}}]);
$redact "restricts the contents of the documents based on information stored in the documents themselves". So it will run only inside of the document. It basically scans your document top to the bottom, and checks if it matches with your if condition which is in $cond, if there is match it will either keep the content($$DESCEND) or remove($$PRUNE).
In the example above, first $match returns the whole shapes array, and $redact strips it down to the expected result.
Note that {$not:"$color"} is necessary, because it will scan the top document as well, and if $redact does not find a color field on the top level this will return false that might strip the whole document which we don't want.
Better you can query in matching array element using $slice is it helpful to returning the significant object in an array.
db.test.find({"shapes.color" : "blue"}, {"shapes.$" : 1})
$slice is helpful when you know the index of the element, but sometimes you want
whichever array element matched your criteria. You can return the matching element
with the $ operator.
db.getCollection('aj').find({"shapes.color":"red"},{"shapes.$":1})
OUTPUTS
{
"shapes" : [
{
"shape" : "circle",
"color" : "red"
}
]
}
The syntax for find in mongodb is
db.<collection name>.find(query, projection);
and the second query that you have written, that is
db.test.find(
{shapes: {"$elemMatch": {color: "red"}}},
{"shapes.color":1})
in this you have used the $elemMatch operator in query part, whereas if you use this operator in the projection part then you will get the desired result. You can write down your query as
db.users.find(
{"shapes.color":"red"},
{_id:0, shapes: {$elemMatch : {color: "red"}}})
This will give you the desired result.
Thanks to JohnnyHK.
Here I just want to add some more complex usage.
// Document
{
"_id" : 1
"shapes" : [
{"shape" : "square", "color" : "red"},
{"shape" : "circle", "color" : "green"}
]
}
{
"_id" : 2
"shapes" : [
{"shape" : "square", "color" : "red"},
{"shape" : "circle", "color" : "green"}
]
}
// The Query
db.contents.find({
"_id" : ObjectId(1),
"shapes.color":"red"
},{
"_id": 0,
"shapes" :{
"$elemMatch":{
"color" : "red"
}
}
})
//And the Result
{"shapes":[
{
"shape" : "square",
"color" : "red"
}
]}
You just need to run query
db.test.find(
{"shapes.color": "red"},
{shapes: {$elemMatch: {color: "red"}}});
output of this query is
{
"_id" : ObjectId("562e7c594c12942f08fe4192"),
"shapes" : [
{"shape" : "circle", "color" : "red"}
]
}
as you expected it'll gives the exact field from array that matches color:'red'.
Along with $project it will be more appropriate other wise matching elements will be clubbed together with other elements in document.
db.test.aggregate(
{ "$unwind" : "$shapes" },
{ "$match" : { "shapes.color": "red" } },
{
"$project": {
"_id":1,
"item":1
}
}
)
Likewise you can find for the multiple
db.getCollection('localData').aggregate([
// Get just the docs that contain a shapes element where color is 'red'
{$match: {'shapes.color': {$in : ['red','yellow'] } }},
{$project: {
shapes: {$filter: {
input: '$shapes',
as: 'shape',
cond: {$in: ['$$shape.color', ['red', 'yellow']]}
}}
}}
])
db.test.find( {"shapes.color": "red"}, {_id: 0})
Use aggregation function and $project to get specific object field in document
db.getCollection('geolocations').aggregate([ { $project : { geolocation : 1} } ])
result:
{
"_id" : ObjectId("5e3ee15968879c0d5942464b"),
"geolocation" : [
{
"_id" : ObjectId("5e3ee3ee68879c0d5942465e"),
"latitude" : 12.9718313,
"longitude" : 77.593551,
"country" : "India",
"city" : "Chennai",
"zipcode" : "560001",
"streetName" : "Sidney Road",
"countryCode" : "in",
"ip" : "116.75.115.248",
"date" : ISODate("2020-02-08T16:38:06.584Z")
}
]
}
Although the question was asked 9.6 years ago, this has been of immense help to numerous people, me being one of them. Thank you everyone for all your queries, hints and answers. Picking up from one of the answers here.. I found that the following method can also be used to project other fields in the parent document.This may be helpful to someone.
For the following document, the need was to find out if an employee (emp #7839) has his leave history set for the year 2020. Leave history is implemented as an embedded document within the parent Employee document.
db.employees.find( {"leave_history.calendar_year": 2020},
{leave_history: {$elemMatch: {calendar_year: 2020}},empno:true,ename:true}).pretty()
{
"_id" : ObjectId("5e907ad23997181dde06e8fc"),
"empno" : 7839,
"ename" : "KING",
"mgrno" : 0,
"hiredate" : "1990-05-09",
"sal" : 100000,
"deptno" : {
"_id" : ObjectId("5e9065f53997181dde06e8f8")
},
"username" : "none",
"password" : "none",
"is_admin" : "N",
"is_approver" : "Y",
"is_manager" : "Y",
"user_role" : "AP",
"admin_approval_received" : "Y",
"active" : "Y",
"created_date" : "2020-04-10",
"updated_date" : "2020-04-10",
"application_usage_log" : [
{
"logged_in_as" : "AP",
"log_in_date" : "2020-04-10"
},
{
"logged_in_as" : "EM",
"log_in_date" : ISODate("2020-04-16T07:28:11.959Z")
}
],
"leave_history" : [
{
"calendar_year" : 2020,
"pl_used" : 0,
"cl_used" : 0,
"sl_used" : 0
},
{
"calendar_year" : 2021,
"pl_used" : 0,
"cl_used" : 0,
"sl_used" : 0
}
]
}
if you want to do filter, set and find at the same time.
let post = await Post.findOneAndUpdate(
{
_id: req.params.id,
tasks: {
$elemMatch: {
id: req.params.jobId,
date,
},
},
},
{
$set: {
'jobs.$[i].performer': performer,
'jobs.$[i].status': status,
'jobs.$[i].type': type,
},
},
{
arrayFilters: [
{
'i.id': req.params.jobId,
},
],
new: true,
}
);
This answer does not fully answer the question but it's related and I'm writing it down because someone decided to close another question marking this one as duplicate (which is not).
In my case I only wanted to filter the array elements but still return the full elements of the array. All previous answers (including the solution given in the question) gave me headaches when applying them to my particular case because:
I needed my solution to be able to return multiple results of the subarray elements.
Using $unwind + $match + $group resulted in losing root documents without matching array elements, which I didn't want to in my case because in fact I was only looking to filter out unwanted elements.
Using $project > $filter resulted in loosing the rest of the fields or the root documents or forced me to specify all of them in the projection as well which was not desirable.
So at the end I fixed all of this problems with an $addFields > $filter like this:
db.test.aggregate([
{ $match: { 'shapes.color': 'red' } },
{ $addFields: { 'shapes': { $filter: {
input: '$shapes',
as: 'shape',
cond: { $eq: ['$$shape.color', 'red'] }
} } } },
])
Explanation:
First match documents with a red coloured shape.
For those documents, add a field called shapes, which in this case will replace the original field called the same way.
To calculate the new value of shapes, $filter the elements of the original $shapes array, temporarily naming each of the array elements as shape so that later we can check if the $$shape.color is red.
Now the new shapes array only contains the desired elements.
for more details refer =
mongo db official referance
suppose you have document like this (you can have multiple document too) -
{
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44b"
},
"results": [
{
"yearOfRelease": "2022",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/d/d4/The_Kashmir_Files_poster.jpg",
"title": "The Kashmir Files",
"overview": "Krishna endeavours to uncover the reason behind his parents' brutal killings in Kashmir. He is shocked to uncover a web of lies and conspiracies in connection with the massive genocide.",
"originalLanguage": "hi",
"imdbRating": "8.3",
"isbookMark": null,
"originCountry": "india",
"productionHouse": [
"Zee Studios"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44c"
}
},
{
"yearOfRelease": "2022",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/a/a9/Black_Adam_%28film%29_poster.jpg",
"title": "Black Adam",
"overview": "In ancient Kahndaq, Teth Adam was bestowed the almighty powers of the gods. After using these powers for vengeance, he was imprisoned, becoming Black Adam. Nearly 5,000 years have passed, and Black Adam has gone from man to myth to legend. Now free, his unique form of justice, born out of rage, is challenged by modern-day heroes who form the Justice Society: Hawkman, Dr. Fate, Atom Smasher and Cyclone",
"originalLanguage": "en",
"imdbRating": "8.3",
"isbookMark": null,
"originCountry": "United States of America",
"productionHouse": [
"DC Comics"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44d"
}
},
{
"yearOfRelease": "2022",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/0/09/The_Sea_Beast_film_poster.png",
"title": "The Sea Beast",
"overview": "A young girl stows away on the ship of a legendary sea monster hunter, turning his life upside down as they venture into uncharted waters.",
"originalLanguage": "en",
"imdbRating": "7.1",
"isbookMark": null,
"originCountry": "United States Canada",
"productionHouse": [
"Netflix Animation"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44e"
}
},
{
"yearOfRelease": "2021",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/7/7d/Hum_Do_Hamare_Do_poster.jpg",
"title": "Hum Do Hamare Do",
"overview": "Dhruv, who grew up an orphan, is in love with a woman who wishes to marry someone with a family. In order to fulfil his lover's wish, he hires two older individuals to pose as his parents.",
"originalLanguage": "hi",
"imdbRating": "6.0",
"isbookMark": null,
"originCountry": "india",
"productionHouse": [
"Maddock Films"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44f"
}
},
{
"yearOfRelease": "2021",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/7/74/Shang-Chi_and_the_Legend_of_the_Ten_Rings_poster.jpeg",
"title": "Shang-Chi and the Legend of the Ten Rings",
"overview": "Shang-Chi, a martial artist, lives a quiet life after he leaves his father and the shadowy Ten Rings organisation behind. Years later, he is forced to confront his past when the Ten Rings attack him.",
"originalLanguage": "en",
"imdbRating": "7.4",
"isbookMark": null,
"originCountry": "United States of America",
"productionHouse": [
"Marvel Entertainment"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c450"
}
}
],
"__v": 0
}
=======
mongo db query by aggregate command -
mongomodels.movieMainPageSchema.aggregate(
[
{
$project: {
_id:0, // to supress id
results: {
$filter: {
input: "$results",
as: "result",
cond: { $eq: [ "$$result.yearOfRelease", "2022" ] }
}
}
}
}
]
)
For the new version of MongoDB, it's slightly different.
For db.collection.find you can use the second parameter of find with the key being projection
db.collection.find({}, {projection: {name: 1, email: 0}});
You can also use the .project() method.
However, it is not a native MongoDB method, it's a method provided by most MongoDB driver like Mongoose, MongoDB Node.js driver etc.
db.collection.find({}).project({name: 1, email: 0});
And if you want to use findOne, it's the same that with find
db.collection.findOne({}, {projection: {name: 1, email: 0}});
But findOne doesn't have a .project() method.

Mongodb nested array search

document structure example is:
{
"dob": "12-13-2001",
"name": "Kam",
"visits": {
"0": {
"service_date": "12-5-2011",
"payment": "40",
"chk_number": "1234455",
},
"1": {
"service_date": "12-15-2011",
"payment": "45",
"chk_number": "3461234",
},
"2": {
"service_date": "12-25-2011",
"payment": "25",
"chk_number": "9821234",
}
}
}
{
"dob": "10-01-1998",
"name": "Sam",
"visits": {
"0": {
"service_date": "12-5-2011",
"payment": "30",
"chk_number": "86786464",
},
"1": {
"service_date": "12-15-2011",
"payment": "35",
"chk_number": "45643461234",
},
"2": {
"service_date": "12-25-2011",
"payment": "20",
"chk_number": "4569821234",
}
}
}
In PHP i want to list all those "visits" information (and corresponding "name" ) for which payment is less than "30".
I want to print only the visits with "payment" < "30" not others. Is such query possible, or do i have to get entire document first using search and then use PHP to select such visits??
In the example document, the "payment" values are given as strings which may not work as intended with the $lt command. For this response, I have converted them to integers.
Wildcard queries are not possible with MongoDB, so with the given document structure, the key (0,1,2, etcetera) of the sub-document must be known. For instance, the following query will work:
> db.test.find({"visits.2.payment":{$lt:35}})
However,
> db.test.find({"visits.payment":{$lt:35}})
Will not work in this case, and
> db.test.find({"visits.*.payment":{$lt:35}})
will also not return any results.
In order to be able to query the embedded "visits" documents, you must change your document structure and make "visits" into an array or embedded documents, like so:
> db.test2.find().pretty()
{
"_id" : ObjectId("4f16199d3563af4cb141c547"),
"dob" : "10-01-1998",
"name" : "Sam",
"visits" : [
{
"service_date" : "12-5-2011",
"payment" : 30,
"chk_number" : "86786464"
},
{
"service_date" : "12-15-2011",
"payment" : 35,
"chk_number" : "45643461234"
},
{
"service_date" : "12-25-2011",
"payment" : 20,
"chk_number" : "4569821234"
}
]
}
Now you can query all of the embedded documents in "visits":
> db.test2.find({"visits.payment":{$lt:35}})
For more information, please refer to the Mongo documentation on dot notation:
http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29
Now on to the second part of your question: it is not possible to return only a conditional sub-set of embedded documents.
With either document format, it is not possible to return a document containing ONLY the sub-documents that match the query. If one of the sub-documents matches the query , then the entire document matches the query, and it will be returned.
As per the Mongo Document "Retrieving a subset of fields"
http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields
We can return parts of embedded documents like so:
> db.test2.find({"visits.payment":{$lt:35}},{"visits.service_date":1}).pretty()
{
"_id" : ObjectId("4f16199d3563af4cb141c547"),
"visits" : [
{
"service_date" : "12-5-2011"
},
{
"service_date" : "12-15-2011"
},
{
"service_date" : "12-25-2011"
}
]
}
But we cannot have conditional retrieval of some sub documents. The closest that we can get is the $slice operator, but this is not conditional, and you will have to first know the location of each sub-document in the array:
http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields#RetrievingaSubsetofFields-RetrievingaSubrangeofArrayElements
In order for the application to display only the embedded documents that match the query, it will have to be done programmatically.
You may try:
$results = $mongodb->find(array("visits.payment" => array('$lt' => 30)));
But i don't know if it will work since visits is an object. BTW judging from what you posted it could be transfered to array (or should since numerical property names tends to cause confusion)
try - db.test2.find({"visits.payment":"35"})

Categories