Will mongo cursor affected while update on the same collection? - php

I have some code like following,
$query = array("vid"=>"just_a_video_key_and_can_be_any_string");
$set = array("$set" => array("attr" => "attr_value"));
$cursor = $collection->find();
$cursor = $cursor->batchSize(500);
foreach ($cursor as $item) {
$collection->update($query, $set)
}
I find the loop times is 500, while the $collection has 20K+ documents.
The update operation only update one document, not involving delete or insert.
My question is why the foreach loop only 500 times(which is the number of batch size) while the total documents in the database is more than 20K?

If I'm understanding you correctly, you're confused about why mongodb is only updating one document.
Mongodb only updates one document at a time. To update all documents which match your query, pass the multiple option to mongodb via 'multiple' => true. See https://stackoverflow.com/a/15691466/2416049

Your foreach only iterates 500 times because your batch size is 500. You need to call next() or getnext() to retrieve the next 500 elements.
http://php.net/manual/en/mongocursor.batchsize.php
http://php.net/manual/en/mongocursor.next.php

You try to update inside the foreach loop with a query which hasn't anything from $item in it. So your query will always run against the same document (vid: "just_a_video_key_and_can_be_any_string") and not against the items... you need to set query inside of the foreach to something like "vid" => $item->vid.
Regarding the "only 500" thing: LinJuuichi is right - you need to trigger the cursor via next() to get the next batch of 500.

Related

Laravel / Eloquent merging chunked queries into one large query?

I am trying to chunk a large amount of database calls into many small ones. Should be simple, right?
$chunks = array_chunk($users, 50);
$data = [];
foreach ($chunks as $chunk) {
$data[] = $this->doQuery($chunk); // Eloquent query, no joins just a simple select with whereIn
}
This works fine if I check what is added to $data individually, however, if I try to run the code on the full set, MySQL shows the full query as if I had just called the query with all user ids, not each individual chunk.
This is unlike anything I've seen before so any suggestions would be appreciated.

Eloquent chunk() missing half the results

I have a problem with Laravel's ORM Eloquent chunk() method.
It misses some results.
Here is a test query :
$destinataires = Destinataire::where('statut', '<', 3)
->where('tokenized_at', '<', $date_active)
->chunk($this->chunk, function ($destinataires) {
foreach($destinataires as $destinataire) {
$this->i++;
}
}
echo $this->i;
It gives 124838 results.
But :
$num_dest = Destinataire::where('statut', '<', 3)
->where('tokenized_at', '<', $date_active)
->count();
echo $num_dest;
gives 249676, so just TWICE as the first code example.
My script is supposed to edit all matching records in the database. If I launch it multiple times, it just hands out half the remaining records, each time.
I tried with DB::table() instead of the Model.
I tried to add a ->take(20000) but it doesn't seem to be taken into account.
I echoed the query with ->toSql() and eveything seems to be fine (the LIMIT clause is added when I add the ->take() parameter).
Any suggestions ?
Imagine you are using chunk method to delete all of the records. The table has 2,000,000 records and you are going to delete all of them by 1000 chunks.
$query->orderBy('id')->chunk(1000, function ($items) {
foreach($items as $item) {
$item->delete();
}
});
It will delete the first 1000 records by getting first 1000 records in a query like this:
SELECT * FROM table ORDER BY id LIMIT 0,1000
And then the other query from chunk method is:
SELECT * FROM table ORDER BY id LIMIT 1000,2000
Our problem is here, that we delete 1000 records and then getting results from 1000 to 2000. Actually we are missing first 1000 records and this means that we are not going to delete 1000 records in first step of chunk! This scenario will be the same for other steps. In each step we are going to miss 1000 records and this is the reason that we are not getting best result in these situations.
I made an example for deletion because this way we could know the exact behavior of chunk method.
UPDATE:
You can use chunkById() for deleting safely.
Read more here:
http://laravel.at.jeffsbox.eu/laravel-5-eloquent-builder-chunk-chunkbyid
https://laravel.com/api/5.4/Illuminate/Database/Eloquent/Builder.html#method_chunkById
Quick answer: Use chunkById() instead of chunk().
When updating or deleting records while iterating over them, any changes to the primary key or foreign keys could affect the chunk query. This could potentially result in records not being included in the results.
The explanation can be found in the Laravel documentation:
If you are updating database records while chunking results, your chunk results could change in unexpected ways. If you plan to update the retrieved records while chunking, it is always best to use the chunkById method instead. This method will automatically paginate the results based on the record's primary key.
Example usage of chunkById():
DB::table('users')->where('active', false)
->chunkById(100, function ($users) {
foreach ($users as $user) {
DB::table('users')
->where('id', $user->id)
->update(['active' => true]);
}
});
(end of the update)
Below is the original answer which used the cursor() method instead of the chunk() method to solve the problem:
I had the same problem - only half of the total results were passed to the callback function of the chunk() method.
Here is the code which had the same problem - half of the transactions were not processed:
Transaction::whereNull('processed')->chunk(100, function ($transactions) {
$transactions->each(function($transaction){
$transaction->process();
});
});
I used Laravel 5.4 and managed to solve the problem replacing the chunk() method with cursor() method and changing the code accordingly:
foreach (Transaction::whereNull('processed')->cursor() as $transaction) {
$transaction->process();
}
Even though the answer doesn't address the problem itself, it provides a valuable solution.
For anyone looking for a bit of code that solves this, here you go:
while (Model::where('x', '>', 'y')->count() > 0)
{
Model::where('x', '>', 'y')->chunk(10, function ($models)
{
foreach ($models as $model)
{
$model->delete();
}
});
}
The problem is in the deletion / removal of the model while chunking away at the total. Including it in a while loop makes sure you get them all! This example works when deleting Models, change the while condition to suit your needs!
When you fetch data using chunk the same SQL query is being executed only the offset is different. Actually increasing as specified on the chunk method param. For example:
SELECT * FROM users WHERE status = 0;
Let's say there are 200 records(let's suppose that is a lot so we want to retrieve these data as chunks). So this looks like:
SELECT * FROM users WHERE status = 0 LIMIT 50 OFFSET 0(offset has a dynamic value
which means next time is 50, after that 100, and the last time 150).
The problem when using laravel chunk while updating is that we are only changing the offset. And this means the number of results is different each time we try to retrieve a chunk of data. So the first time there are 200 records that match the where condition. But if we update the status, for example to 1(status = 1) this means the next time when we try to fetch data we still execute the same query:
SELECT * FROM users WHERE status = 0 LIMIT 50 OFFSET 50(offset has a dynamic value
which means next time 100 and the last time 150).
We only have 150 records that match this query since we updated the table status = 1 for 50rows. Also we said the offset on the second time is going to be 50. And what is going to happen is that we skip 50 rows from 150rows since the offset is 50. And do the same update to these data. This means rows from 50->100 status is being updated to 1(status = 1) from the total of 150 rows.
The third time we run this query:
SELECT * FROM users WHERE status = 0 LIMIT 50 OFFSET 150(offset is going to be 150).
But the result of the query is 100 users in total that have status = 0. So no more data to go through.
This is not what you would expect to happen on the first thought. But this is how it works and why only half of data are being updated and the other part of data is being skipped.

cursor performace with mongodb php driver

Is there any performance issues with php mongo query cursor handling?
My code:
$cursor = $collection->find($searchCriteria)->limit($limit_rows);
// Sort ascending based on S_DTTM
$cursor->sort(array('S_DTTM' => 1 , 'SYMBOL' => 1 ));
// How many results found?
$num_docs = $cursor->count();
if( $num_docs > 0 )
{
// loop over the results
foreach ($cursor as $ticks)
{
See codes like
// request data
$result = $cursor->getNext();
My issue is after the first query returns ( full with limit of 100 rows ) the next query just goes on looping. Have millions of rows returning, so I wanted to put the limits with "limit".
I did do re-index just in case, still no difference.
What am I doing wrong? Does the getNext works better?
Using mongod ver 2.5.4 and the latest php mongo driver downloaded a week ago.
Collection size is 100Gb including 2 additional indexes.
mongo log shows all the query executing in less than 200ms.
Turns out to be Query Issue and not php mongo driver issue ..
Use of count() and sort() may decrease performance.

Mysqli_fetch_assoc($result), pointer moves to the next record. Is there any way to reset the pointer to the start of the query result?

Have a look at this code
Suppose you are looping through a set of mysql query results in php
while($temp = mysqli_fetch_assoc($result))
{
echo $temp['id']; // ID Column
}
When you do $temp=mysqli_fetch_assoc($result), basically the pointer moves to the next record. Is there any way to reset the pointer to the start of the query? As after the end of this loop mysqli_fetch_assoc($result) will only return empty rows, making it unusable again. So what's the possible solution?
So I was stuck with this problem at work today, and the only solution I initially found was to re-query, or use temporary copy of mysql result in a variable. Neither of which were appealing.
There is a much simpler solution to this which is mysql_data_seek.
Basic syntax is mysqli_data_seek(data,row)
So in this case you just do
mysqli_data_seek($result,0)
$row=mysqli_fetch_assoc($result);// Will now return the first row.
In a similar way you could loop through it again too.
It works similarly with mysql_data_seek. Hope it was helpful.
Quit that thing of printing data directly out of database loop. Learn to separate your business logic from presentation.
Get yourself a database wrapper, to get rid of all these ugly numerous mysqli_fetch_assoc from your code.
Store query result in array.
Use this array as many times as you wish.
Like this
$data = $db->getAll("SELECT * FROM table");
foreach ($data as $row) // 1st iteration
foreach ($data as $row) // 2nd iteration
foreach ($data as $row) // and so on

MongoDB PHP findAndModify Multiple Performance

I have a documents in a collection called Reports that are to be processed. I do a query like
$collectionReports->find(array('processed' => 0))
(anywhere between 50 and 2000 items). I process them how I need to and insert the results into another collection, but I need to update the original Report to set processed to the current system time. Right now it looks something like:
$reports = $collectionReports->find(array('processed' => 0));
$toUpdate = array();
foreach ($reports as $report) {
//Perform the operations on them now
$toUpdate = $report['_id'];
}
foreach ($toUpdate as $reportID) {
$criteria = array('_id' => new MongoId($reportID));
$data = array('$set' => array('processed' => round(microtime(true)*1000)));
$collectionReports->findAndModify($criteria, $data);
}
My problem with this is that it is horribly inefficient. Processing the reports and inserting them into the collection takes maybe 700ms for 2000 reports, but just updating the processed times takes at least 1500ms for those same 2000 reports. Any tips to speed this up? Thanks in advance.
EDIT: The processed time doesn't have to be exact, it can just be the time the script is ran (+/- 10 seconds or so), if it would be possible to take the object ($report) and update the time directly like that, it would be better than just searching after the first foreach.
Thanks Sammaye, changing from findAndModify() to update() seems to work much better and faster.

Categories