Suppose you have the following documents in my collection:
{
"_id":ObjectId("562e7c594c12942f08fe4192"),
"shapes":[
{
"shape":"square",
"color":"blue"
},
{
"shape":"circle",
"color":"red"
}
]
},
{
"_id":ObjectId("562e7c594c12942f08fe4193"),
"shapes":[
{
"shape":"square",
"color":"black"
},
{
"shape":"circle",
"color":"green"
}
]
}
Do query:
db.test.find({"shapes.color": "red"}, {"shapes.color": 1})
Or
db.test.find({shapes: {"$elemMatch": {color: "red"}}}, {"shapes.color": 1})
Returns matched document (Document 1), but always with ALL array items in shapes:
{ "shapes":
[
{"shape": "square", "color": "blue"},
{"shape": "circle", "color": "red"}
]
}
However, I'd like to get the document (Document 1) only with the array that contains color=red:
{ "shapes":
[
{"shape": "circle", "color": "red"}
]
}
How can I do this?
MongoDB 2.2's new $elemMatch projection operator provides another way to alter the returned document to contain only the first matched shapes element:
db.test.find(
{"shapes.color": "red"},
{_id: 0, shapes: {$elemMatch: {color: "red"}}});
Returns:
{"shapes" : [{"shape": "circle", "color": "red"}]}
In 2.2 you can also do this using the $ projection operator, where the $ in a projection object field name represents the index of the field's first matching array element from the query. The following returns the same results as above:
db.test.find({"shapes.color": "red"}, {_id: 0, 'shapes.$': 1});
MongoDB 3.2 Update
Starting with the 3.2 release, you can use the new $filter aggregation operator to filter an array during projection, which has the benefit of including all matches, instead of just the first one.
db.test.aggregate([
// Get just the docs that contain a shapes element where color is 'red'
{$match: {'shapes.color': 'red'}},
{$project: {
shapes: {$filter: {
input: '$shapes',
as: 'shape',
cond: {$eq: ['$$shape.color', 'red']}
}},
_id: 0
}}
])
Results:
[
{
"shapes" : [
{
"shape" : "circle",
"color" : "red"
}
]
}
]
The new Aggregation Framework in MongoDB 2.2+ provides an alternative to Map/Reduce. The $unwind operator can be used to separate your shapes array into a stream of documents that can be matched:
db.test.aggregate(
// Start with a $match pipeline which can take advantage of an index and limit documents processed
{ $match : {
"shapes.color": "red"
}},
{ $unwind : "$shapes" },
{ $match : {
"shapes.color": "red"
}}
)
Results in:
{
"result" : [
{
"_id" : ObjectId("504425059b7c9fa7ec92beec"),
"shapes" : {
"shape" : "circle",
"color" : "red"
}
}
],
"ok" : 1
}
Caution: This answer provides a solution that was relevant at that time, before the new features of MongoDB 2.2 and up were introduced. See the other answers if you are using a more recent version of MongoDB.
The field selector parameter is limited to complete properties. It cannot be used to select part of an array, only the entire array. I tried using the $ positional operator, but that didn't work.
The easiest way is to just filter the shapes in the client.
If you really need the correct output directly from MongoDB, you can use a map-reduce to filter the shapes.
function map() {
filteredShapes = [];
this.shapes.forEach(function (s) {
if (s.color === "red") {
filteredShapes.push(s);
}
});
emit(this._id, { shapes: filteredShapes });
}
function reduce(key, values) {
return values[0];
}
res = db.test.mapReduce(map, reduce, { query: { "shapes.color": "red" } })
db[res.result].find()
Another interesing way is to use $redact, which is one of the new aggregation features of MongoDB 2.6. If you are using 2.6, you don't need an $unwind which might cause you performance problems if you have large arrays.
db.test.aggregate([
{ $match: {
shapes: { $elemMatch: {color: "red"} }
}},
{ $redact : {
$cond: {
if: { $or : [{ $eq: ["$color","red"] }, { $not : "$color" }]},
then: "$$DESCEND",
else: "$$PRUNE"
}
}}]);
$redact "restricts the contents of the documents based on information stored in the documents themselves". So it will run only inside of the document. It basically scans your document top to the bottom, and checks if it matches with your if condition which is in $cond, if there is match it will either keep the content($$DESCEND) or remove($$PRUNE).
In the example above, first $match returns the whole shapes array, and $redact strips it down to the expected result.
Note that {$not:"$color"} is necessary, because it will scan the top document as well, and if $redact does not find a color field on the top level this will return false that might strip the whole document which we don't want.
Better you can query in matching array element using $slice is it helpful to returning the significant object in an array.
db.test.find({"shapes.color" : "blue"}, {"shapes.$" : 1})
$slice is helpful when you know the index of the element, but sometimes you want
whichever array element matched your criteria. You can return the matching element
with the $ operator.
db.getCollection('aj').find({"shapes.color":"red"},{"shapes.$":1})
OUTPUTS
{
"shapes" : [
{
"shape" : "circle",
"color" : "red"
}
]
}
The syntax for find in mongodb is
db.<collection name>.find(query, projection);
and the second query that you have written, that is
db.test.find(
{shapes: {"$elemMatch": {color: "red"}}},
{"shapes.color":1})
in this you have used the $elemMatch operator in query part, whereas if you use this operator in the projection part then you will get the desired result. You can write down your query as
db.users.find(
{"shapes.color":"red"},
{_id:0, shapes: {$elemMatch : {color: "red"}}})
This will give you the desired result.
Thanks to JohnnyHK.
Here I just want to add some more complex usage.
// Document
{
"_id" : 1
"shapes" : [
{"shape" : "square", "color" : "red"},
{"shape" : "circle", "color" : "green"}
]
}
{
"_id" : 2
"shapes" : [
{"shape" : "square", "color" : "red"},
{"shape" : "circle", "color" : "green"}
]
}
// The Query
db.contents.find({
"_id" : ObjectId(1),
"shapes.color":"red"
},{
"_id": 0,
"shapes" :{
"$elemMatch":{
"color" : "red"
}
}
})
//And the Result
{"shapes":[
{
"shape" : "square",
"color" : "red"
}
]}
You just need to run query
db.test.find(
{"shapes.color": "red"},
{shapes: {$elemMatch: {color: "red"}}});
output of this query is
{
"_id" : ObjectId("562e7c594c12942f08fe4192"),
"shapes" : [
{"shape" : "circle", "color" : "red"}
]
}
as you expected it'll gives the exact field from array that matches color:'red'.
Along with $project it will be more appropriate other wise matching elements will be clubbed together with other elements in document.
db.test.aggregate(
{ "$unwind" : "$shapes" },
{ "$match" : { "shapes.color": "red" } },
{
"$project": {
"_id":1,
"item":1
}
}
)
Likewise you can find for the multiple
db.getCollection('localData').aggregate([
// Get just the docs that contain a shapes element where color is 'red'
{$match: {'shapes.color': {$in : ['red','yellow'] } }},
{$project: {
shapes: {$filter: {
input: '$shapes',
as: 'shape',
cond: {$in: ['$$shape.color', ['red', 'yellow']]}
}}
}}
])
db.test.find( {"shapes.color": "red"}, {_id: 0})
Use aggregation function and $project to get specific object field in document
db.getCollection('geolocations').aggregate([ { $project : { geolocation : 1} } ])
result:
{
"_id" : ObjectId("5e3ee15968879c0d5942464b"),
"geolocation" : [
{
"_id" : ObjectId("5e3ee3ee68879c0d5942465e"),
"latitude" : 12.9718313,
"longitude" : 77.593551,
"country" : "India",
"city" : "Chennai",
"zipcode" : "560001",
"streetName" : "Sidney Road",
"countryCode" : "in",
"ip" : "116.75.115.248",
"date" : ISODate("2020-02-08T16:38:06.584Z")
}
]
}
Although the question was asked 9.6 years ago, this has been of immense help to numerous people, me being one of them. Thank you everyone for all your queries, hints and answers. Picking up from one of the answers here.. I found that the following method can also be used to project other fields in the parent document.This may be helpful to someone.
For the following document, the need was to find out if an employee (emp #7839) has his leave history set for the year 2020. Leave history is implemented as an embedded document within the parent Employee document.
db.employees.find( {"leave_history.calendar_year": 2020},
{leave_history: {$elemMatch: {calendar_year: 2020}},empno:true,ename:true}).pretty()
{
"_id" : ObjectId("5e907ad23997181dde06e8fc"),
"empno" : 7839,
"ename" : "KING",
"mgrno" : 0,
"hiredate" : "1990-05-09",
"sal" : 100000,
"deptno" : {
"_id" : ObjectId("5e9065f53997181dde06e8f8")
},
"username" : "none",
"password" : "none",
"is_admin" : "N",
"is_approver" : "Y",
"is_manager" : "Y",
"user_role" : "AP",
"admin_approval_received" : "Y",
"active" : "Y",
"created_date" : "2020-04-10",
"updated_date" : "2020-04-10",
"application_usage_log" : [
{
"logged_in_as" : "AP",
"log_in_date" : "2020-04-10"
},
{
"logged_in_as" : "EM",
"log_in_date" : ISODate("2020-04-16T07:28:11.959Z")
}
],
"leave_history" : [
{
"calendar_year" : 2020,
"pl_used" : 0,
"cl_used" : 0,
"sl_used" : 0
},
{
"calendar_year" : 2021,
"pl_used" : 0,
"cl_used" : 0,
"sl_used" : 0
}
]
}
if you want to do filter, set and find at the same time.
let post = await Post.findOneAndUpdate(
{
_id: req.params.id,
tasks: {
$elemMatch: {
id: req.params.jobId,
date,
},
},
},
{
$set: {
'jobs.$[i].performer': performer,
'jobs.$[i].status': status,
'jobs.$[i].type': type,
},
},
{
arrayFilters: [
{
'i.id': req.params.jobId,
},
],
new: true,
}
);
This answer does not fully answer the question but it's related and I'm writing it down because someone decided to close another question marking this one as duplicate (which is not).
In my case I only wanted to filter the array elements but still return the full elements of the array. All previous answers (including the solution given in the question) gave me headaches when applying them to my particular case because:
I needed my solution to be able to return multiple results of the subarray elements.
Using $unwind + $match + $group resulted in losing root documents without matching array elements, which I didn't want to in my case because in fact I was only looking to filter out unwanted elements.
Using $project > $filter resulted in loosing the rest of the fields or the root documents or forced me to specify all of them in the projection as well which was not desirable.
So at the end I fixed all of this problems with an $addFields > $filter like this:
db.test.aggregate([
{ $match: { 'shapes.color': 'red' } },
{ $addFields: { 'shapes': { $filter: {
input: '$shapes',
as: 'shape',
cond: { $eq: ['$$shape.color', 'red'] }
} } } },
])
Explanation:
First match documents with a red coloured shape.
For those documents, add a field called shapes, which in this case will replace the original field called the same way.
To calculate the new value of shapes, $filter the elements of the original $shapes array, temporarily naming each of the array elements as shape so that later we can check if the $$shape.color is red.
Now the new shapes array only contains the desired elements.
for more details refer =
mongo db official referance
suppose you have document like this (you can have multiple document too) -
{
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44b"
},
"results": [
{
"yearOfRelease": "2022",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/d/d4/The_Kashmir_Files_poster.jpg",
"title": "The Kashmir Files",
"overview": "Krishna endeavours to uncover the reason behind his parents' brutal killings in Kashmir. He is shocked to uncover a web of lies and conspiracies in connection with the massive genocide.",
"originalLanguage": "hi",
"imdbRating": "8.3",
"isbookMark": null,
"originCountry": "india",
"productionHouse": [
"Zee Studios"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44c"
}
},
{
"yearOfRelease": "2022",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/a/a9/Black_Adam_%28film%29_poster.jpg",
"title": "Black Adam",
"overview": "In ancient Kahndaq, Teth Adam was bestowed the almighty powers of the gods. After using these powers for vengeance, he was imprisoned, becoming Black Adam. Nearly 5,000 years have passed, and Black Adam has gone from man to myth to legend. Now free, his unique form of justice, born out of rage, is challenged by modern-day heroes who form the Justice Society: Hawkman, Dr. Fate, Atom Smasher and Cyclone",
"originalLanguage": "en",
"imdbRating": "8.3",
"isbookMark": null,
"originCountry": "United States of America",
"productionHouse": [
"DC Comics"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44d"
}
},
{
"yearOfRelease": "2022",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/0/09/The_Sea_Beast_film_poster.png",
"title": "The Sea Beast",
"overview": "A young girl stows away on the ship of a legendary sea monster hunter, turning his life upside down as they venture into uncharted waters.",
"originalLanguage": "en",
"imdbRating": "7.1",
"isbookMark": null,
"originCountry": "United States Canada",
"productionHouse": [
"Netflix Animation"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44e"
}
},
{
"yearOfRelease": "2021",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/7/7d/Hum_Do_Hamare_Do_poster.jpg",
"title": "Hum Do Hamare Do",
"overview": "Dhruv, who grew up an orphan, is in love with a woman who wishes to marry someone with a family. In order to fulfil his lover's wish, he hires two older individuals to pose as his parents.",
"originalLanguage": "hi",
"imdbRating": "6.0",
"isbookMark": null,
"originCountry": "india",
"productionHouse": [
"Maddock Films"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c44f"
}
},
{
"yearOfRelease": "2021",
"imagePath": "https://upload.wikimedia.org/wikipedia/en/7/74/Shang-Chi_and_the_Legend_of_the_Ten_Rings_poster.jpeg",
"title": "Shang-Chi and the Legend of the Ten Rings",
"overview": "Shang-Chi, a martial artist, lives a quiet life after he leaves his father and the shadowy Ten Rings organisation behind. Years later, he is forced to confront his past when the Ten Rings attack him.",
"originalLanguage": "en",
"imdbRating": "7.4",
"isbookMark": null,
"originCountry": "United States of America",
"productionHouse": [
"Marvel Entertainment"
],
"_id": {
"$oid": "63b5cfbfbcc3196a2a23c450"
}
}
],
"__v": 0
}
=======
mongo db query by aggregate command -
mongomodels.movieMainPageSchema.aggregate(
[
{
$project: {
_id:0, // to supress id
results: {
$filter: {
input: "$results",
as: "result",
cond: { $eq: [ "$$result.yearOfRelease", "2022" ] }
}
}
}
}
]
)
For the new version of MongoDB, it's slightly different.
For db.collection.find you can use the second parameter of find with the key being projection
db.collection.find({}, {projection: {name: 1, email: 0}});
You can also use the .project() method.
However, it is not a native MongoDB method, it's a method provided by most MongoDB driver like Mongoose, MongoDB Node.js driver etc.
db.collection.find({}).project({name: 1, email: 0});
And if you want to use findOne, it's the same that with find
db.collection.findOne({}, {projection: {name: 1, email: 0}});
But findOne doesn't have a .project() method.
I'm using ElasticSearch's PHP client and I find really difficult to return results with scores whenever I want to search for a word that is "hidden" within a string.
This is an example:
I want to get all the documents where the field "file" has the word "anses" and files are named like this:
axx14anses19122015.zip
What I know about it
I know I should tokenize those words, can't realize how to do it.
Also I've read about aggregations but I'm really new to ES and I have to deliver a working piece ASAP.
What I've tried so far
REGEXP: using regular expressions is very expensive and does not return any scores, which is a must-to-have in order to shrink results and bring the user accurate information.
Wildcards: same thing, slow and no scores
Own script where I have a dictionary and search for critical words using regexp, if match, create a new field within that matched document with the word. The reason is to create a TOKEN so in future searches I can use regular match with scores. Negative side: the dictionary thing was totally denied by my boss so I'm here asking for any ideas.
Thanks in advance.
I suggest in your case nGram tokenizer see the example
I will create a analyzer and a mapping for a doc type
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"analysis": {
"tokenizer": {
"ngram_tokenizer": {
"type": "nGram",
"min_gram": 4,
"max_gram": 4,
"token_chars": [ "letter", "digit" ]
}
},
"analyzer": {
"ngram_tokenizer_analyzer": {
"type": "custom",
"tokenizer": "ngram_tokenizer",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"doc": {
"properties": {
"text_field": {
"type": "string",
"term_vector": "yes",
"analyzer": "ngram_tokenizer_analyzer"
}
}
}
}
}
after that I`ll insert a document using your file name
PUT /test_index/doc/1
{
"text_field": "axx14anses19122015"
}
now I`ll just will use a query match
POST /test_index/_search
{
"query": {
"match": {
"text_field": "anses"
}
}
}
and will receive a reponse like this
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.10848885,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.10848885,
"_source": {
"text_field": "axx14anses19122015"
}
}
]
}
}
What i did?
i just created a nGram tokenizer that will explode our string in 4 characters terms and will index this terms separated and they will be searched when I search a part of the string.
To see more, read this article https://qbox.io/blog/an-introduction-to-ngrams-in-elasticsearch
Hope it help!
Ok after trying -so- many times it worked. I'll share the solution just in case someone else needs it. Thank you so much to Waldemar, it was a really good approach and I still cannot see why it's not working.
curl -XPUT 'http://ipaddresshere/tokentest' -d
'{ "settings":
{ "number_of_shards": 1, "analysis" :
{ "analyzer" : { "myngram" : { "tokenizer" : "mytokenizer" } },
"tokenizer" : { "mytokenizer" : {
"type" : "nGram",
"min_gram" : "3",
"max_gram" : "5",
"token_chars" : [ "letter", "digit" ] } } } },
"mappings":
{ "doc" :
{ "properties" :
{ "field" : {
"type" : "string",
"term_vector" : "yes",
"analyzer" : "myngram" } } } } }'
Sorry for bad indentation, I'm really hurry but want to post the solution.
So, this will take any string from "field" and split it into nGrams with lenght 3 to 5. For example: "abcanses14f.zip" will result in:
abc, abca, abcan, bca, bcan, bcans, etc... until it reaches anses or a similar term which is matcheable and has a score related to it.
I have got my data in following format..
{
"_id" : ObjectId("534fd4662d22a05415000000"),
"product_id" : "50862224",
"ean" : "8808992479390",
"brand" : "LG",
"model" : "37LH3000",
"features" : [{
{
"key" : "Screen Format",
"value" : "16:9",
}, {
"key" : "DVD Player / Recorder",
"value" : "No",
},
"key" : "Weight in kg",
"value" : "12.6",
}
... so on
]
}
I need to compare features of one product with others and divide the result into separate categories ( 100% match, 50-99 % match) based on % of feature matches..
My initial thought was to prepare a dynamic query with or condition for each feature and do the percentage thing in php but then that means mongodb will return me even those product which only have 1 feature matching. And I I think nearly all products of a category might have some feature in common, so I fear I might be working on lot of products in php.
I have two questions basically.
is there any alternate ways?
And is the data structure I am using is good enough to support the functionality I am looking for, Or should I consider changing it
Well your solution really should be MongoDB specific otherwise you will end up doing your calculations and possible matching on the client side, and that is not going to be good for performance.
So of course what you really want is a way for that to have that processing on the server side:
db.products.aggregate([
// Match the documents that meet your conditions
{ "$match": {
"$or": [
{
"features": {
"$elemMatch": {
"key": "Screen Format",
"value": "16:9"
}
}
},
{
"features": {
"$elemMatch": {
"key" : "Weight in kg",
"value" : { "$gt": "5", "$lt": "8" }
}
}
},
]
}},
// Keep the document and a copy of the features array
{ "$project": {
"_id": {
"_id": "$_id",
"product_id": "$product_id",
"ean": "$ean",
"brand": "$brand",
"model": "$model",
"features": "$features"
},
"features": 1
}},
// Unwind the array
{ "$unwind": "$features" },
// Find the actual elements that match the conditions
{ "$match": {
"$or": [
{
"features.key": "Screen Format",
"features.value": "16:9"
},
{
"features.key" : "Weight in kg",
"features.value" : { "$gt": "5", "$lt": "8" }
},
]
}},
// Count those matched elements
{ "$group": {
"_id": "$_id",
"count": { "$sum": 1 }
}},
// Restore the document and divide the mated elements by the
// number of elements in the "or" condition
{ "$project": {
"_id": "$_id._id",
"product_id": "$_id.product_id",
"ean": "$_id.ean",
"brand": "$_id.brand",
"model": "$_id.model",
"features": "$_id.features",
"matched": { "$divide": [ "$count", 2 ] }
}},
// Sort by the matched percentage
{ "$sort": { "matched": -1 } }
])
So as you know the "length" of the $or condition being applied, then you simply need to find out how many of the elements in the "features" array match those conditions. So that is what the second $match in the pipeline is all about.
Once you have that count, you simply divide by the number of conditions what were passed in as your $or. The beauty here is that now you can do something useful with this like sort by that relevance and then even "page" the results server side.
Of course if you want some additional "categorization" of this, all you would need to do is add another $project stage to the end of the pipeline:
{ "$project": {
"product_id": 1
"ean": 1
"brand": 1
"model": 1,
"features": 1,
"matched": 1,
"category": { "$cond": [
{ "$eq": [ "$matched", 1 ] },
"100",
{ "$cond": [
{ "$gte": [ "$matched", .7 ] },
"70-99",
{ "$cond": [
"$gte": [ "$matched", .4 ] },
"40-69",
"under 40"
]}
]}
]}
}}
Or as something similar. But the $cond operator can help you here.
The architecture should be fine as you have it as you can have a compound index on the "key" and "value" for the entries in your features array and this should scale well for queries.
Of course if you actually need something more than that, such as faceted searching and results, you can look at solutions like Solr or elastic search. But the full implementation of that would be a bit lengthy for here.
I'm assuming that you'd like to compare the rest of the collection to a given product, which is a textbook example of aggregation:
lookingat = db.products.findOne({product_id:'50862224'})
matches = db.products.aggregate([
{ $unwind: '$features' },
{ $match: { features: { $in: lookingat.features }}},
{ $group: { _id: '$product_id', matchedfeatures: { $sum:1 }}},
{ $sort: { matchedfeatures: -1 }},
{ $limit: 5 },
{ $project: { _id:0, product_id: '$_id',
pctmatch: { $multiply: [ '$matchedfeatures',
100/lookingat.features.length ]}
}}
])
Walking through this briefly from the perspective of a product in the collection that has 6 features, and comparing it to the target product ('lookingat') which has 4 features, 3 of which match:
$unwind turns 1 document with 6 features into 6 otherwise-identical documents with 1 feature each
$match looks for that feature in the target's feature array (be aware that two documents are "equal" only if they have the same field names and values, in the same order), discards the 3 that don't match, and passes along the 3 that do
$group consumes those 3 matching documents and produces a new one that tells you there were 3 documents that matched that product_id
$sort and $limit give you the most relevant results and leave behind all those 1-feature matches you were concerned about
$project lets you rename the _id from the $group step back to product_id and also math the number of matching features into a percentage (we avoided a $divide operation by recognizing that 2 of the 3 terms in our calculation are constants and can be divided in JS)
I'm new to the map reduce concept and even though I'm making some slow progress, I'm finding some issues that I need some help with.
I have a simple collection consisting of an id, city and and destination, something like this:
{ "_id" : "5230e7e00000000000000000", "city" : "Boston", "to" : "Chicago" },
{ "_id" : "523fe7e00000000000000000", "city" : "New York", "to" : "Miami" },
{ "_id" : "5240e1e00000000000000000", "city" : "Boston", "to" : "Miami" },
{ "_id" : "536fe4e00000000000000000", "city" : "Washington D.C.", "to" : "Boston" },
{ "_id" : "53ffe7e00000000000000000", "city" : "New York", "to" : "Boston" },
{ "_id" : "5740e1e00000000000000000", "city" : "Boston", "to" : "Miami" },
...
(Please do note that this data is just made up for example purposes)
I'd like to group by city the destinations including a count:
{ "city" : "Boston", values : [{"Chicago",1}, {"Miami",2}] }
{ "city" : "New York", values : [{"Miami",1}, {"Boston",1}] }
{ "city" : "Washington D.C.", values : [{"Boston", 1}] }
For this I'm starting to playing with this function to map:
function() {
emit(this.city, this.to);
}
which performs the expected grouping. My reduce function is this:
function(key, values) {
var reduced = {"to":[]};
for (var i in values) {
var item = values[i];
reduced.to.push(item);
}
return reduced;
}
which gives somewhat an expected output:
{ "_id" : ObjectId("522f8a9181f01e671a853adb"), "value" : { "to" : [ "Boston", "Miami" ] } }
{ "_id" : ObjectId("522f933a81f01e671a853ade"), "value" : { "to" : [ "Chicago", "Miami", "Miami" ] } }
{ "_id" : ObjectId("5231f0ed81f01e671a853ae0"), "value" : "Boston" }
As you can see, I still haven't counted the repeated cities, but as can be seen above, for some reason the last result in the output doesn't look good. I'd expected it to be
{ "_id" : ObjectId("5231f0ed81f01e671a853ae0"), "value" : { "to" : ["Boston"] } }
Has this anything to do with the fact that there is a single item? Is there any way to obtain this?
Thank you.
I see you are asking about a PHP issue, but you are using javascript to ask, so I’m assuming a javascript answer will help you move things along. As such here is the javascript needed in the shell to run your aggregation. I strong suggest getting your aggregation working in the shell(or some other javascript editor) in general and then translating it into the language of your choice. It is a lot easier to see what is going on and there faster using this method. You can then run:
use admin
db.runCommand( { setParameter: 1, logLevel: 2 } )
to check the bson output of your selected language vs what the shell looks like. This will appear in the terminal if mongo is in the foreground, otherwise you’ll have ot look in the logs.
Summing the routes in the aggregation framework [AF] with Mongo is fairly strait forward. The AF is faster and easier to use then map reduce[MR]. Though in this case they both have similar issues, simply pushing to an array won’t yield a count in and of itself (in MR you either need more logic in your reduce function or to use a finalize function).
With the AF using the example data provided this pipeline is useful:
db.agg1.aggregate([
{$group:{
_id: { city: "$city", to: "$to" },
count: { $sum: 1 }
}},
{$group: {
_id: "$_id.city",
to:{ $push: {to: "$_id.to", count: "$count"}}
}}
]);
The aggregation framework can only operate on known fields, but many pipeline operations so a problem needs to broken down with that as a consideration.
Above, the 1st stage calculates the numbers need, for which there are 3 fixed fields: the source, the destination, and the count.
The second stage has 2 fixed fields, one of which is an array, which is only being pushed to (all the data for the final form is there).
For MR you can do this:
var map = function() {
var key = {source:this.city, dest:this.to};
emit(key, 1);
};
var reduce = function(key, values) {
return Array.sum(values);
};
A separate function will have to pretty it however.
If you have any additional questions please don’t hesitate to ask.
Best,
Charlie
document structure example is:
{
"dob": "12-13-2001",
"name": "Kam",
"visits": {
"0": {
"service_date": "12-5-2011",
"payment": "40",
"chk_number": "1234455",
},
"1": {
"service_date": "12-15-2011",
"payment": "45",
"chk_number": "3461234",
},
"2": {
"service_date": "12-25-2011",
"payment": "25",
"chk_number": "9821234",
}
}
}
{
"dob": "10-01-1998",
"name": "Sam",
"visits": {
"0": {
"service_date": "12-5-2011",
"payment": "30",
"chk_number": "86786464",
},
"1": {
"service_date": "12-15-2011",
"payment": "35",
"chk_number": "45643461234",
},
"2": {
"service_date": "12-25-2011",
"payment": "20",
"chk_number": "4569821234",
}
}
}
In PHP i want to list all those "visits" information (and corresponding "name" ) for which payment is less than "30".
I want to print only the visits with "payment" < "30" not others. Is such query possible, or do i have to get entire document first using search and then use PHP to select such visits??
In the example document, the "payment" values are given as strings which may not work as intended with the $lt command. For this response, I have converted them to integers.
Wildcard queries are not possible with MongoDB, so with the given document structure, the key (0,1,2, etcetera) of the sub-document must be known. For instance, the following query will work:
> db.test.find({"visits.2.payment":{$lt:35}})
However,
> db.test.find({"visits.payment":{$lt:35}})
Will not work in this case, and
> db.test.find({"visits.*.payment":{$lt:35}})
will also not return any results.
In order to be able to query the embedded "visits" documents, you must change your document structure and make "visits" into an array or embedded documents, like so:
> db.test2.find().pretty()
{
"_id" : ObjectId("4f16199d3563af4cb141c547"),
"dob" : "10-01-1998",
"name" : "Sam",
"visits" : [
{
"service_date" : "12-5-2011",
"payment" : 30,
"chk_number" : "86786464"
},
{
"service_date" : "12-15-2011",
"payment" : 35,
"chk_number" : "45643461234"
},
{
"service_date" : "12-25-2011",
"payment" : 20,
"chk_number" : "4569821234"
}
]
}
Now you can query all of the embedded documents in "visits":
> db.test2.find({"visits.payment":{$lt:35}})
For more information, please refer to the Mongo documentation on dot notation:
http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29
Now on to the second part of your question: it is not possible to return only a conditional sub-set of embedded documents.
With either document format, it is not possible to return a document containing ONLY the sub-documents that match the query. If one of the sub-documents matches the query , then the entire document matches the query, and it will be returned.
As per the Mongo Document "Retrieving a subset of fields"
http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields
We can return parts of embedded documents like so:
> db.test2.find({"visits.payment":{$lt:35}},{"visits.service_date":1}).pretty()
{
"_id" : ObjectId("4f16199d3563af4cb141c547"),
"visits" : [
{
"service_date" : "12-5-2011"
},
{
"service_date" : "12-15-2011"
},
{
"service_date" : "12-25-2011"
}
]
}
But we cannot have conditional retrieval of some sub documents. The closest that we can get is the $slice operator, but this is not conditional, and you will have to first know the location of each sub-document in the array:
http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields#RetrievingaSubsetofFields-RetrievingaSubrangeofArrayElements
In order for the application to display only the embedded documents that match the query, it will have to be done programmatically.
You may try:
$results = $mongodb->find(array("visits.payment" => array('$lt' => 30)));
But i don't know if it will work since visits is an object. BTW judging from what you posted it could be transfered to array (or should since numerical property names tends to cause confusion)
try - db.test2.find({"visits.payment":"35"})