I have a document with this structure:
{"user":{
"nice":{
"funny":"sure"
}
,
"notnice":{
"funny":"maybe"
}
}
}
I know the keys "user","funny" and the value "sure" and "maybe" but I don't know "nice" and "notnice".
How do I do an optimized query to search through many documents.
For example, if I want to search "sure" value knowing the middle keys I do:
$document = $users->findOne([
'$or' => [
['user.nice.funny' => 'sure'],
['user.notnice.funny' => 'sure']
]
]
);
But how do I do the same without knowing "nice" and "notnice".
This should point you in the right direction:
db.collection.aggregate({
$addFields: {
"userTransformed": {
$objectToArray: "$user" // transform "user" field into key-value pair
}
}
}, {
$match: {
"userTransformed.v.funny": "sure" // just filter on the values
}
})
Frankly, this is not going to be fast for lots of documents but there is no other way. Indexes will not be used by this query. If you want to get faster you will need to change your document structure.
Related
I have a JSON array of data that I am trying to extract particular value/keys(?) from, and would like to add them into a new array.
The array looks like this:
{ "total':2000,
"achievements":[
{
"id":6,
"achievement":{},
"criteria":{
"id":2050,
"is_completed":false
},
"completed_timestamp":1224053510000
},
{
"id":8,
"achievement":{},
"criteria":{
"id":1289,
"is_completed":true
},
"completed_timestamp":0000000
}
]
}
I want to search for true in the is_completed, and then add the id from that array into a new array.
Basically, find the id's of all the key/array (sorry unsure of terminology) where is_completed is true.
I've tried something simple like finding trying to find the key of an ID, but struggling to get that to work. And also seen some of the multi-level for loop examples but can't get them to work for my data.
Example:
$key = array_search('1289', array_column($array, 'id'));
As pointed out in the comments, you could combine array_filter (to filter completed events) and array_column (to extract their IDs).
$completedAchievements = array_filter(
$array->achievements,
static function (\stdClass $achievement): bool {
return $achievement->criteria->is_completed === true;
}
);
$completedAchievementsIds = array_column($completedAchievements, 'id');
print_r($completedAchievementsIds); // Array([0] => 8)
Note: the code above supposes your JSON was decoded as an object. If it was decoded as an array, just replace -> syntax with the corresponding array index access.
Demo
Let say, I have 2 collection
first one :-
db.product_main
{
_id:123121,
source_id:"B4456dde1",
title:"test Sample",
price: 250
quantity: 40
}
which consist approx ~10000 objects (Array) and unique field is source_id.
Second :-
db.product_id
{
"_id":58745633,
"product_id":"B4456dde1"
}
which consist of ~500 and only have field "product_id" which is equals to "source_id" of db.product_main
now, i want to intersect two collection so that i only find those which don't exist in db.product_id.
db.product_main.aggregate({any query})
Just use the lookup stage to find the products associated with the 'product_main' collection and then match for empty array (i.e. records where no product_id was found)
db.product_main.aggregate([
{
$lookup: {
from: "product_id",
localField: "source_id",
foreignField: "product_id",
as: "products_available"
}
},
{
$match: {
products_available: {
$size: 0
}
}
}
])
On WRITE operations using aggregate pipeline You can also directly offload statistics update by using $out command and store cached result in product_stats collection (for example).
Later in web/ui/api READ operations just use this cached collection. Of cause, You can create database query methods for cached and non-cached results.
Lets assume, the return value of an search-fuction is something like this
// If only one record is found
$value = [
'records' => [
'record' => ['some', 'Important', 'Information']
]
]
// If multiple records are found
$value = [
'records' => [
'record' => [
0 => ['some', 'important', 'information'],
1 => ['some', 'information', 'I dont care']
]
]
]
what woul'd be the best way to get the important information (in case of multiple records, it is always the first one)?
Should I check something like
if (array_values($value['record']['records'])[0] == 0){//do something};
But I guess, there is a way more elegant solution.
Edit:
And btw, this is not realy a duplicate of the refered question which only covers the multiple records.
If you want the first element of an array, you should use reset. This function sets the pointer to the first element and returns it.
$firstValue = reset($value['record']['records']);
Edit.. after reading your question again, it seems, you dont want the first element.
You rather want this
if (isset($value['record']['records'][0]) && is_array($value['record']['records'][0])) {
// multiple return values
} else {
// single return value
}
Doing this is kind of error proun and i wouldn't suggest that one function returns different kinds of array structures.
check like this..
if(is_array($value['records']['record'][0])) {
// multiple records
} else {
// single record
}
I want to find documents where last elements in an array equals to some value.
Array elements may be accessed by specific array position:
// i.e. comments[0].by == "Abe"
db.example.find( { "comments.0.by" : "Abe" } )
but how do i search using the last item as criteria?
i.e.
db.example.find( { "comments.last.by" : "Abe" } )
By the way, i'm using php
I know this question is old, but I found it on google after answering a similar new question. So I thought this deserved the same treatment.
You can avoid the performance hit of $where by using aggregate instead:
db.example.aggregate([
// Use an index, which $where cannot to narrow down
{$match: { "comments.by": "Abe" }},
// De-normalize the Array
{$unwind: "$comments"},
// The order of the array is maintained, so just look for the $last by _id
{$group: { _id: "$_id", comments: {$last: "$comment"} }},
// Match only where that $last comment by `by.Abe`
{$match: { "comments.by": "Abe" }},
// Retain the original _id order
{$sort: { _id: 1 }}
])
And that should run rings around $where since we were able to narrow down the documents that had a comment by "Abe" in the first place. As warned, $where is going to test every document in the collection and never use an index even if one is there to be used.
Of course, you can also maintain the original document using the technique described here as well, so everything would work just like a find().
Just food for thought for anyone finding this.
Update for Modern MongoDB releases
Modern releases have added the $redact pipeline expression as well as $arrayElemAt ( the latter as of 3.2, so that would be the minimal version here ) which in combination would allow a logical expression to inspect the last element of an array without processing an $unwind stage:
db.example.aggregate([
{ "$match": { "comments.by": "Abe" }},
{ "$redact": {
"$cond": {
"if": {
"$eq": [
{ "$arrayElemAt": [ "$comments.by", -1 ] },
"Abe"
]
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}}
])
The logic here is done in comparison where $arrayElemAt is getting the last index of the array -1, which is transformed to just an array of the values in the "by" property via $map. This allows comparison of the single value against the required parameter, "Abe".
Or even a bit more modern using $expr for MongoDB 3.6 and greater:
db.example.find({
"comments.by": "Abe",
"$expr": {
"$eq": [
{ "$arrayElemAt": [ "$comments.by", -1 ] },
"Abe"
]
}
})
This would be by far the most performant solution for matching the last element within an array, and actually expected to supersede the usage of $where in most cases and especially here.
You can't do this in one go with this schema design. You can either store the length and do two queries, or store the last comment additionally in another field:
{
'_id': 'foo';
'comments' [
{ 'value': 'comment #1', 'by': 'Ford' },
{ 'value': 'comment #2', 'by': 'Arthur' },
{ 'value': 'comment #3', 'by': 'Zaphod' }
],
'last_comment': {
'value': 'comment #3', 'by': 'Zaphod'
}
}
Sure, you'll be duplicating some data, but atleast you can set this data with $set together with the $push for the comment.
$comment = array(
'value' => 'comment #3',
'by' => 'Zaphod',
);
$collection->update(
array( '_id' => 'foo' ),
array(
'$set' => array( 'last_comment' => $comment ),
'$push' => array( 'comments' => $comment )
)
);
Finding the last one is easy now!
You could do this with a $where operator:
db.example.find({ $where:
'this.comments.length && this.comments[this.comments.length-1].by === "Abe"'
})
The usual slow performance caveats for $where apply. However, you can help with this by including "comments.by": "Abe" in your query:
db.example.find({
"comments.by": "Abe",
$where: 'this.comments.length && this.comments[this.comments.length-1].by === "Abe"'
})
This way, the $where only needs to be evaluated against documents that include comments by Abe and the new term would be able to use an index on "comments.by".
I'm just doing :
db.products.find({'statusHistory.status':'AVAILABLE'},{'statusHistory': {$slice: -1}})
This gets me products for which the last statusHistory item in the array, contains the property status='AVAILABLE' .
I am not sure why my answer above is deleted. I am reposting it. I am pretty sure without changing the schema, you should be able to do it this way.
db.example.find({ "comments:{$slice:-1}.by" : "Abe" }
// ... or
db.example.find({ "comments.by" : "Abe" }
This by default takes the last element in the array.
I have a problem that I need some help on but I feel I'm close. It involves Lithium and MongoDB Code looks like this:
http://pastium.org/view/0403d3e4f560e3f790b32053c71d0f2b
$db = PopularTags::connection();
$map = new \MongoCode("function() {
if (!this.saved_terms) {
return;
}
for (index in this.saved_terms) {
emit(this.saved_terms[index], 1);
}
}");
$reduce = new \MongoCode("function(previous, current) {
var count = 0;
for (index in current) {
count += current[index];
}
return count;
}");
$metrics = $db->connection->command(array(
'mapreduce' => 'users',
'map' => $map,
'reduce' => $reduce,
'out' => 'terms'
));
$cursor = $db->connection->selectCollection($metrics['result'])->find()->limit(1);
print_r($cursor);
/**
User Data In Mongo
{
"_id" : ObjectId("4e789f954c734cc95b000012"),
"email" : "example#bob.com",
"saved_terms" : [
null,
[
"technology",
" apple",
" iphone"
],
[
"apple",
" water",
" beryy"
]
] }
**/
I am having a user savings terms they search on and then I am try to get the most populars terms
but I keep getting errors like :Uncaught exception 'Exception' with message 'MongoDB::__construct( invalid name '. does anyone have any idea how to do this or some direction?
First off I would not store this in the user object. MongoDb objects have an upper limit of 4/16MB (depending on version). Now this limit is normally not a problem, but when logging inline in one object you might be able to reach it. However a more real problem is that every time you need to act on these objects you need to load them into RAM and it becomes consuming. I dont think you want that on your user objects.
Secondly arrays in objects are not sortable and have other limitations that might come back to bite you later.
But, if you want to have it like this (low volume of searches should not be a problem really) you can solve this most easy by using a group query.
A group query is pretty much like a group query in sql, so its a slight trick as you need to group on something most objects share. (An active field on users maybe).
So, heres a working group example that will sum words used based on your structure.
Just put this method in your model and do MyModel::searchTermUsage() to get a Document object back.
public static function searchTermUsage() {
$reduce = 'function(obj, prev) {
obj.terms.forEach(function(terms) {
terms.forEach(function(term) {
if (!(term in prev)) prev[term] = 0;
prev[term]++;
});
});
}';
return static::all(array(
'initial' => new \stdclass,
'reduce' => $reduce,
'group' => 'common-value-key' // Change this
));
}
There is no protection against non-array types in the terms field (you had a null value in your example). I removed it for simplicity, its better to probably strip this before it ends up in the database.