I need to find keywords in the name/description/tags, etc. of each document and remove them if they are found. I'm new to Mongo, so I'm following a similar script in the existing codebase. First, get the MongoCursor and only get the fields we'll be checking:
/** #var MongoCursor $products */
$products = $collection->find(
['type' => ['$in' => ['PHONES', 'TABLETS']], 'supplier.is_awful' => ['$exists' => true]],
['details.name' => true, 'details.description' => true]
);
Then, iterate through each document, then check each of the properties for the values we're interested in:
/** #var \Doctrine\ODM\MongoDB\DocumentManager $manager */
$manager = new Manager();
foreach ($products as $product) {
// Find objectionable words in the content and remove these documents
foreach (["suckysucky's", "deuce", "a z z"] as $word) {
if (false !== strpos(mb_strtolower($product['details']['name']), $word)
|| false !== strpos(mb_strtolower($product['details']['description']), $word)) {
$object = $manager->find(\App\Product::class, $product['_id']);
$manager->remove($object);
}
}
}
// Persist to DB
$manager->flush();
The problem is that the database has hundreds of thousands of records, and it looks like iterating over the MongoCursor, the memory usage goes up and up until it runs out:
Now at (0) 20035632
Now at (100) 24446048
Now at (200) 32190312
Now at (300) 36098208
Now at (400) 42433656
Now at (500) 45204376
Now at (600) 50664808
Now at (700) 54916888
Now at (800) 59847312
Now at (900) 65145808
Now at (1000) 70764408
Is there a way for me to iterate over the MongoCursor without running out of memory (I've tried unsetting the various objects at different points, but no luck there)? Alternatively, is this a query that can be run directly in Mongo? I've looked at the docs, and I saw some hope in $text, but it looks like I need to have an index there (I don't), and there can only be one text index per collection.
You don't need fulltext index to find a substring: the right way is using a regex and then return only the "_id" value, something like:
$mongore = new MongoRegex("/suckysucky's|deuce|a z z/i")
$products = $collection->find(
['type' => ['$in' => ['PHONES', 'TABLETS']],
'supplier.is_awful' => ['$exists' => true],
'$or': [['details.name' => $mongore],
['details.description' => $mongore]]]
['_id' => true]
);
I'm not sure about the exact PHP syntax, but the key is an inclusive $or filter with the same mongodb regex on the two fields.
Related
Given a variable that holds this string:
$property = 'parent->requestdata->inputs->firstname';
And an object:
$obj->parent->requestdata->inputs->firstname = 'Travis';
How do I access the value 'Travis' using the string? I tried this:
$obj->{$property}
But it looks for a property called 'parent->requestdata->inputs->firstname' not the property located at $obj->parent->requestdtaa->inputs->firstname`
I've tried various types of concatenation, use of var_export(), and others. I can explode it into an array and then loop the array like in this question.
But the variable '$property' can hold a value that goes 16 levels deep. And, the data I'm parsing can have hundreds of properties I need to import, so looping through and returning the value at each iteration until I get to level 16 X 100 items seems really inefficient; especially given that I know the actual location of the property at the start.
How do I get the value 'Travis' given (stdClass)$obj and (string)$property?
My initial searches didn't yield many results, however, after thinking up a broader range of search terms I found other questions on SO that addressed similar problems. I've come up with three solutions. All will work, but not all will work for everyone.
Solution 1 - Looping
Using an approach similar to the question referenced in my original question or the loop proposed by #miken32 will work.
Solution 2 - anonymous function
The string can be exploded into an array. The array can then be parsed using array_reduce() to produce the result. In my case, the working code (with a check for incorrect/non-existent property names/spellings) was this (PHP 7+):
//create object - this comes from and external API in my case, but I'll include it here
//so that others can copy and paste for testing purposes
$obj = (object)[
'parent' => (object)[
'requestdata' => (object)[
'inputs' => (object)[
'firstname' => 'Travis'
]
]
]
];
//string representing the property we want to get on the object
$property = 'parent->requestdata->inputs->firstname';
$name = array_reduce(explode('->', $property), function ($previous, $current) {
return is_numeric($current) ? ($previous[$current] ?? null) : ($previous->$current ?? null); }, $obj);
var_dump($name); //outputs Travis
see this question for potentially relevant information and the code I based my answer on.
Solution 3 - symfony property access component
In my case, it was easy to use composer to require this component. It allows access to properties on arrays and objects using simple strings. You can read about how to use it on the symfony website. The main benefit for me over the other options was the included error checking.
My code ended up looking like this:
//create object - this comes from and external API in my case, but I'll include it here
//so that others can copy and paste for testing purposes
//don't forget to include the component at the top of your class
//'use Symfony\Component\PropertyAccess\PropertyAccess;'
$obj = (object)[
'parent' => (object)[
'requestdata' => (object)[
'inputs' => (object)[
'firstname' => 'Travis'
]
]
]
];
//string representing the property we want to get on the object
//NOTE: syfony uses dot notation. I could not get standard '->' object notation to work.
$property = 'parent.requestdata.inputs.firstname';
//create symfony property access factory
$propertyAccessor = PropertyAccess::createPropertyAccessor();
//get the desired value
$name = $propertyAccessor->getValue($obj, $property);
var_dump($name); //outputs 'Travis'
All three options will work. Choose the one that works for you.
You're right that you'll have to do a loop iteration for each nested object, but you don't need to loop through "hundreds of properties" for each of them, you just access the one you're looking for:
$obj = (object)[
'parent' => (object)[
'requestdata' => (object)[
'inputs' => (object)[
'firstname' => 'Travis'
]
]
]
];
$property = "parent->requestdata->inputs->firstname";
$props = explode("->", $property);
while ($props && $obj !== null) {
$prop = array_shift($props);
$obj = $obj->$prop ?? null;
}
var_dump($obj);
Totally untested but seems like it should work and be fairly performant.
I have a laravel collection object.
I want to use the nth model within it.
How do I access it?
Edit:
I cannot find a suitable method in the laravel documentation. I could iterate the collection in a foreach loop and break when the nth item is found:
foreach($collection as $key => $object)
{
if($key == $nth) {break;}
}
// $object is now the nth one
But this seems messy.
A cleaner way would be to perform the above loop once and create a simple array containing all the objects in the collection. But this seems like unnecessary duplication.
In the laravel collection class documentation, there is a fetch method but I think this fetches an object from the collection matching a primary key, rather than the nth one in the collection.
Seeing as Illuminate\Support\Collection implements ArrayAccess, you should be able to simply use square-bracket notation, ie
$collection[$nth]
This calls offsetGet internally which you can also use
$collection->offsetGet($nth)
and finally, you can use the get method which allows for an optional default value
$collection->get($nth)
// or
$collection->get($nth, 'some default value')
#Phil's answer doesn't quite obtain the nth element, since the keys may be unordered. If you've got an eloquent collection from a db query it'll work fine, but if your keys aren't sequential then you'll need to do something different.
$collection = collect([0 => 'bish', 2 => 'bash']); $collection[1] // Undefined index
Instead we can do $collection->values()[1] // string(4) bash
which uses array_values()
Or even make a macro to do this:
Collection::macro('nthElement', function($offset, $default = null) {
return $this->values()->get($offset, $default);
}):
Example macro usage:
$collection = collect([0 => 'bish', 2 => 'bash']);
$collection->nthElement(1) // string(4) 'bash'
$collection->nthElement(3) // undefined index
$collection->nthElement(3, 'bosh') // string (4) bosh
I am late to this question, but I thought this might be a useful solution for someone.
Collections have the slice method with the following parameters:
$items->slice(whereToStartSlice, sizeOfSlice);
Therefore, if you set the whereToStartSlice parameter at the nth item and the sizeOfSlice to 1 you retrieve the nth item.
Example:
$nthItem = $items->slice($nth,1);
If you are having problems with the collection keeping the indices after sorting... you can make a new collection out of the values of that collection and try accessing the newly indexed collection like you would expect:
e.g. Get the second highest priced item in a collection
$items = collect(
[
"1" => ["name" => "baseball", "price" => 5],
"2" => ["name"=> "bat", "price" => 15],
"3" => ["name" => "glove", "price" => 10]
]
);
collect($items->sortByDesc("price")->values())[1]["name"];
// Result: glove
Similar to morphs answer but not the same. Simply using values() after a sort will not give you the expected results because the indices remain coupled to each item.
Credit to #howtomakeaturn for this solution on the Laravel Github:
https://github.com/laravel/framework/issues/1335
I'm creating some analytics script using PHP and MongoDB and I am a bit stuck. I would like to get the unique number of visitors per day within a certain time frame.
{
"_id": ObjectId("523768039b7e7a1505000000"),
"ipAddress": "127.0.0.1",
"pageId": ObjectId("522f80f59b7e7a0f2b000000"),
"uniqueVisitorId": "0445905a-4015-4b70-a8ef-b339ab7836f1",
"recordedTime": ISODate("2013-09-16T20:20:19.0Z")
}
The field to filter on is uniqueVisitorId and recordedTime.
I've created a database object in PHP that I initialise and it makes me a database connection when the object is constructed, then I have MongoDB php functions simply mapped to public function using the database connection created on object construction.
Anyhow, so far I get the number of visitors per day with:
public function GetUniqueVisitorsDiagram() {
// MAP
$map = new MongoCode('function() {
day = new Date(Date.UTC(this.recordedTime.getFullYear(), this.recordedTime.getMonth(), this.recordedTime.getDate()));
emit({day: day, uniqueVisitorId:this.uniqueVisitorId},{count:1});
}');
// REDUCE
$reduce = new MongoCode("function(key, values) {
var count = 0;
values.forEach(function(v) {
count += v['count'];
});
return {count: count};
}");
// STATS
$stats = $this->database->Command(array(
'mapreduce' => 'statistics',
'map' => $map,
'reduce' => $reduce,
"query" => array(
"recordedTime" =>
array(
'$gte' => $this->startDate,
'$lte' => $this->endDate
)
),
"out" => array(
"inline" => 1
)
));
return $stats;
}
How would I filter this data correctly to get unique visitors? Or would it better to use aggregation, if so could you be so kind to help me out with a code snippet?
The $group operator in the aggregation framework was designed for exactly this use case and will likely be ~10 to 100 times faster. Read up on the group operator here: http://docs.mongodb.org/manual/reference/aggregation/group/
And the php driver implementation here: http://php.net/manual/en/mongocollection.aggregate.php
You can combine the $group operator with other operators to further limit your aggregations. It's probably best you do some reading up on the framework yourself to better understand what's happening, so I'm not going to post a complete example for you.
$m=new MongoClient();
$db=$m->super_test;
$db->gjgjgjg->insert(array(
"ipAddress" => "127.0.0.1",
"pageId" => new MongoId("522f80f59b7e7a0f2b000000"),
"uniqueVisitorId" => "0445905a-4015-4b70-a8ef-b339ab7836f1",
"recordedTime" => new MongoDate(strtotime("2013-09-16T20:20:19.0Z"))
));
var_dump($db->gjgjgjg->find(array('recordedTime'=>array('$lte'=>new MongoDate(),'$gte'=>new MongoDate(strtotime('-1 week')))))->count()); // Prints 1
$res=$db->gjgjgjg->aggregate(array(
array('$match'=>array('recordedTime'=>array('$lte'=>new MongoDate(),'$gte'=>new MongoDate(strtotime('-1 week'))),'uniqueVisitorId'=>array('$ne'=>null))),
array('$project'=>array('day'=>array('$dayOfMonth'=>'$recordedTime'),'month'=>array('$month'=>'$recordedTime'),'year'=>array('$year'=>'$recordedTime'))),
array('$group'=>array('_id'=>array('day'=>'$day','month'=>'$month','year'=>'$year'), 'c'=>array('$sum'=>1)))
));
var_dump($res['result']);
To answer the question entirely:
$m=new MongoClient();
$db=$m->super_test;
$db->gjgjgjg->insert(array(
"ipAddress" => "127.0.0.1",
"pageId" => new MongoId("522f80f59b7e7a0f2b000000"),
"uniqueVisitorId" => "0445905a-4015-4b70-a8ef-b339ab7836f1",
"recordedTime" => new MongoDate(strtotime("2013-09-16T20:20:19.0Z"))
));
var_dump($db->gjgjgjg->find(array('recordedTime'=>array('$lte'=>new MongoDate(),'$gte'=>new MongoDate(strtotime('-1 week')))))->count()); // Prints 1
$res=$db->gjgjgjg->aggregate(array(
array('$match'=>array('recordedTime'=>array('$lte'=>new MongoDate(),'$gte'=>new MongoDate(strtotime('-1 week'))),'uniqueVisitorId'=>array('$ne'=>null))),
array('$project'=>array('day'=>array('$dayOfMonth'=>'$recordedTime'),'month'=>array('$month'=>'$recordedTime'),'year'=>array('$year'=>'$recordedTime'))),
array('$group'=>array('_id'=>array('day'=>'$day','month'=>'$month','year'=>'$year','v'=>'$uniqueVisitorId'), 'c'=>array('$sum'=>1))),
array('$group'=>array('_id'=>array('day'=>'$_id.day','month'=>'$_id.month','year'=>'$_id.year'),'c'=>array('$sum'=>1)))
));
var_dump($res['result']);
Something close to that is what your looking for I believe.
It will reutrn a set of documents that have the _id as the date and then the count of unique visitors for that day irresptive of the of the id, it simply detects only if the id is there.
Since you want it per day you can actually exchange the dat parts for just one field of $dayOfYear I reckon.
Usually when I search for one related ID I do it like this:
$thisSearch = $collection->find(array(
'relatedMongoID' => new MongoId($mongoIDfromSomewhereElse)
));
How would I do it if I wanted to do something like this:
$mongoIdArray = array($mongoIDfromSomewhereElseOne, $mongoIDfromSomewhereElseTwo, $mongoIDfromSomewhereElseThree);
$thisSearch = $collection->find(array(
'relatedMongoID' => array( '$in' => new MongoId(mongoIdArray)
)));
I've tried it with and without the new MongoId(), i've even tried this with no luck.
foreach($mongoIdArray as $seprateIds){
$newMongoString .= new MongoId($seprateIds).', ';
}
$mongoIdArray = explode(',', $newMongoString).'0';
how do I search '$in' "_id" when you need to have the new MongoID() ran on each _id?
Hmm your rtying to do it the SQL way:
foreach($mongoIdArray as $seprateIds){
$newMongoString .= new MongoId($seprateIds).', ';
}
$mongoIdArray = explode(',', $newMongoString).'0';
Instead try:
$_ids = array();
foreach($mongoIdArray as $seprateIds){
$_ids[] = $serprateIds instanceof MongoId ? $seprateIds : new MongoId($seprateIds);
}
$thisSearch = $collection->find(array(
'relatedMongoID' => array( '$in' => $_ids)
));
That should produce a list of ObjectIds that can be used to search that field - relatedMongoID.
This is what I am doing
Basically, as shown in the documentation ( https://docs.mongodb.org/v3.0/reference/operator/query/in/ ) the $in operator for MongoDB in fact takes an array so you need to replicate this structure in PHP since the PHP driver is a 1-1 with the documentation on most fronts (except in some areas where you need to use an additional object, for example: MongoRegex)
Now, all _ids in MongoDB are in fact ObjectIds (unless you changed your structure) so what you need to do to complete this query is make an array of ObjectIds. The ObjectId in PHP is MongoId ( http://php.net/manual/en/class.mongoid.php )
So you need to make an array of MongoIds.
First, I walk through the array (could be done with array_walk) changing the values of each array element to a MongoId with the old value encapsulated in that object:
foreach($mongoIdArray as $seprateIds){
$_ids[] = $serprateIds instanceof MongoId ? $seprateIds : new MongoId($seprateIds);
}
I use a ternary operator here to see if the value is already a MongoId encapsulated value, and if not encapsulate it.
Then I add this new array to the query object to form the $in query array as shown in the main MongoDB documentation:
$thisSearch = $collection->find(array(
'relatedMongoID' => array( '$in' => $_ids)
));
So now when the query is sent to the server it forms a structure similar to:
{relatedMongoId: {$in: [ObjectId(''), ObjectId('')]}}
Which will return results.
Well... I came across the same issue and the solution might not be relevant anymore since the API might have changed. I solved this one with:
$ids = [
new \MongoDB\BSON\ObjectId('5ae0cc7bf3dd2b8bad1f71e2'),
new \MongoDB\BSON\ObjectId('5ae0cc7cf3dd2b8bae5aaf33'),
];
$collection->find([
'_id' => ['$in' => $_ids],
]);
I have a problem that I need some help on but I feel I'm close. It involves Lithium and MongoDB Code looks like this:
http://pastium.org/view/0403d3e4f560e3f790b32053c71d0f2b
$db = PopularTags::connection();
$map = new \MongoCode("function() {
if (!this.saved_terms) {
return;
}
for (index in this.saved_terms) {
emit(this.saved_terms[index], 1);
}
}");
$reduce = new \MongoCode("function(previous, current) {
var count = 0;
for (index in current) {
count += current[index];
}
return count;
}");
$metrics = $db->connection->command(array(
'mapreduce' => 'users',
'map' => $map,
'reduce' => $reduce,
'out' => 'terms'
));
$cursor = $db->connection->selectCollection($metrics['result'])->find()->limit(1);
print_r($cursor);
/**
User Data In Mongo
{
"_id" : ObjectId("4e789f954c734cc95b000012"),
"email" : "example#bob.com",
"saved_terms" : [
null,
[
"technology",
" apple",
" iphone"
],
[
"apple",
" water",
" beryy"
]
] }
**/
I am having a user savings terms they search on and then I am try to get the most populars terms
but I keep getting errors like :Uncaught exception 'Exception' with message 'MongoDB::__construct( invalid name '. does anyone have any idea how to do this or some direction?
First off I would not store this in the user object. MongoDb objects have an upper limit of 4/16MB (depending on version). Now this limit is normally not a problem, but when logging inline in one object you might be able to reach it. However a more real problem is that every time you need to act on these objects you need to load them into RAM and it becomes consuming. I dont think you want that on your user objects.
Secondly arrays in objects are not sortable and have other limitations that might come back to bite you later.
But, if you want to have it like this (low volume of searches should not be a problem really) you can solve this most easy by using a group query.
A group query is pretty much like a group query in sql, so its a slight trick as you need to group on something most objects share. (An active field on users maybe).
So, heres a working group example that will sum words used based on your structure.
Just put this method in your model and do MyModel::searchTermUsage() to get a Document object back.
public static function searchTermUsage() {
$reduce = 'function(obj, prev) {
obj.terms.forEach(function(terms) {
terms.forEach(function(term) {
if (!(term in prev)) prev[term] = 0;
prev[term]++;
});
});
}';
return static::all(array(
'initial' => new \stdclass,
'reduce' => $reduce,
'group' => 'common-value-key' // Change this
));
}
There is no protection against non-array types in the terms field (you had a null value in your example). I removed it for simplicity, its better to probably strip this before it ends up in the database.