Eloquent chunk() missing half the results - php

I have a problem with Laravel's ORM Eloquent chunk() method.
It misses some results.
Here is a test query :
$destinataires = Destinataire::where('statut', '<', 3)
->where('tokenized_at', '<', $date_active)
->chunk($this->chunk, function ($destinataires) {
foreach($destinataires as $destinataire) {
$this->i++;
}
}
echo $this->i;
It gives 124838 results.
But :
$num_dest = Destinataire::where('statut', '<', 3)
->where('tokenized_at', '<', $date_active)
->count();
echo $num_dest;
gives 249676, so just TWICE as the first code example.
My script is supposed to edit all matching records in the database. If I launch it multiple times, it just hands out half the remaining records, each time.
I tried with DB::table() instead of the Model.
I tried to add a ->take(20000) but it doesn't seem to be taken into account.
I echoed the query with ->toSql() and eveything seems to be fine (the LIMIT clause is added when I add the ->take() parameter).
Any suggestions ?

Imagine you are using chunk method to delete all of the records. The table has 2,000,000 records and you are going to delete all of them by 1000 chunks.
$query->orderBy('id')->chunk(1000, function ($items) {
foreach($items as $item) {
$item->delete();
}
});
It will delete the first 1000 records by getting first 1000 records in a query like this:
SELECT * FROM table ORDER BY id LIMIT 0,1000
And then the other query from chunk method is:
SELECT * FROM table ORDER BY id LIMIT 1000,2000
Our problem is here, that we delete 1000 records and then getting results from 1000 to 2000. Actually we are missing first 1000 records and this means that we are not going to delete 1000 records in first step of chunk! This scenario will be the same for other steps. In each step we are going to miss 1000 records and this is the reason that we are not getting best result in these situations.
I made an example for deletion because this way we could know the exact behavior of chunk method.
UPDATE:
You can use chunkById() for deleting safely.
Read more here:
http://laravel.at.jeffsbox.eu/laravel-5-eloquent-builder-chunk-chunkbyid
https://laravel.com/api/5.4/Illuminate/Database/Eloquent/Builder.html#method_chunkById

Quick answer: Use chunkById() instead of chunk().
When updating or deleting records while iterating over them, any changes to the primary key or foreign keys could affect the chunk query. This could potentially result in records not being included in the results.
The explanation can be found in the Laravel documentation:
If you are updating database records while chunking results, your chunk results could change in unexpected ways. If you plan to update the retrieved records while chunking, it is always best to use the chunkById method instead. This method will automatically paginate the results based on the record's primary key.
Example usage of chunkById():
DB::table('users')->where('active', false)
->chunkById(100, function ($users) {
foreach ($users as $user) {
DB::table('users')
->where('id', $user->id)
->update(['active' => true]);
}
});
(end of the update)
Below is the original answer which used the cursor() method instead of the chunk() method to solve the problem:
I had the same problem - only half of the total results were passed to the callback function of the chunk() method.
Here is the code which had the same problem - half of the transactions were not processed:
Transaction::whereNull('processed')->chunk(100, function ($transactions) {
$transactions->each(function($transaction){
$transaction->process();
});
});
I used Laravel 5.4 and managed to solve the problem replacing the chunk() method with cursor() method and changing the code accordingly:
foreach (Transaction::whereNull('processed')->cursor() as $transaction) {
$transaction->process();
}
Even though the answer doesn't address the problem itself, it provides a valuable solution.

For anyone looking for a bit of code that solves this, here you go:
while (Model::where('x', '>', 'y')->count() > 0)
{
Model::where('x', '>', 'y')->chunk(10, function ($models)
{
foreach ($models as $model)
{
$model->delete();
}
});
}
The problem is in the deletion / removal of the model while chunking away at the total. Including it in a while loop makes sure you get them all! This example works when deleting Models, change the while condition to suit your needs!

When you fetch data using chunk the same SQL query is being executed only the offset is different. Actually increasing as specified on the chunk method param. For example:
SELECT * FROM users WHERE status = 0;
Let's say there are 200 records(let's suppose that is a lot so we want to retrieve these data as chunks). So this looks like:
SELECT * FROM users WHERE status = 0 LIMIT 50 OFFSET 0(offset has a dynamic value
which means next time is 50, after that 100, and the last time 150).
The problem when using laravel chunk while updating is that we are only changing the offset. And this means the number of results is different each time we try to retrieve a chunk of data. So the first time there are 200 records that match the where condition. But if we update the status, for example to 1(status = 1) this means the next time when we try to fetch data we still execute the same query:
SELECT * FROM users WHERE status = 0 LIMIT 50 OFFSET 50(offset has a dynamic value
which means next time 100 and the last time 150).
We only have 150 records that match this query since we updated the table status = 1 for 50rows. Also we said the offset on the second time is going to be 50. And what is going to happen is that we skip 50 rows from 150rows since the offset is 50. And do the same update to these data. This means rows from 50->100 status is being updated to 1(status = 1) from the total of 150 rows.
The third time we run this query:
SELECT * FROM users WHERE status = 0 LIMIT 50 OFFSET 150(offset is going to be 150).
But the result of the query is 100 users in total that have status = 0. So no more data to go through.
This is not what you would expect to happen on the first thought. But this is how it works and why only half of data are being updated and the other part of data is being skipped.

Related

SQL Laravel 5.7 unique() on collection really slow

I'm trying to calculate the number of unique records based on a mobile column that has an index via the Laravel collect and unique method. I have 200,000 rows and have a column called optout_csv_schedule_id that has an index on it along with the mobile. Right now, it's been running over 15 minutes for the query to execute, how can I improve the performance of this as I need to calculate the number of unique numbers out of the 200,000, my current query is:
/**
* Get valid lead count
*/
protected function getValidLeadCount($schedule_id)
{
$optoutConnectionLogs = OptoutConnectionLog::where('optout_csv_schedule_id', $schedule_id)
->get();
// no leads
if (!$optoutConnectionLogs) {
return 0;
}
// count total unique leads
$uniqueLeads = collect($optoutConnectionLogs)->unique('mobile')->count();
return $uniqueLeads;
}
It seems to be difficult to calculate the number of unique numbers out of the 200,000 in Laravel.
Try to change as follows:
protected function getValidLeadCount($schedule_id)
{
$uniqueLeads = OptoutConnectionLog::where('optout_csv_schedule_id', $schedule_id)
->distinct('mobile')
->count('mobile');
return $uniqueLeads;
}
You are not using the database to do the unique, you already got the records with ->get(), and are using PHP/Laravel to do it. That will be much slower than using the database.
Use distinct() to get unique records, eg:
$optoutConnectionLogs = OptoutConnectionLog::where('optout_csv_schedule_id', $schedule_id)
->select('mobile')
->distinct()
->get();
You read all the data into memory, convert it into PHP objects, and then iterate to count the numbers. The database index you created is not used at all.
Your needs should be simplified into the following code
return OptoutConnectionLog::where('optout_csv_schedule_id', $schedule_id)
->distinct('mobile')
->count();

Using count_all_results or get_compiled_select and $this->db->get('table') lists table twice in query?

How do I use get_compiled_select or count_all_results before running the query without getting the table name added twice? When I use $this->db->get('tblName') after either of those, I get the error:
Not unique table/alias: 'tblProgram'
SELECT * FROM (`tblProgram`, `tblProgram`) JOIN `tblPlots` ON `tblPlots`.`programID`=`tblProgram`.`pkProgramID` JOIN `tblTrees` ON `tblTrees`.`treePlotID`=`tblPlots`.`id` ORDER BY `tblTrees`.`id` ASC LIMIT 2000
If I don't use a table name in count_all_results or $this->db->get(), then I get an error that no table is used. How can I get it to set the table name just once?
public function get_download_tree_data($options=array(), $rand=""){
//join tables and order by tree id
$this->db->reset_query();
$this->db->join('tblPlots','tblPlots.programID=tblProgram.pkProgramID');
$this->db->join('tblTrees','tblTrees.treePlotID=tblPlots.id');
$this->db->order_by('tblTrees.id', 'ASC');
//get number of results to return
$allResults=$this->db->count_all_results('tblProgram', false);
//chunk data and write to CSV to avoid reaching memory limit
$offset=0;
$chunk=2000;
$treePath=$this->config->item('temp_path')."$rand/trees.csv";
$tree_handle=fopen($treePath,'a');
while (($offset<$allResults)) {
$this->db->limit($chunk, $offset);
$result=$this->db->get('tblProgram')->result_array();
foreach ($result as $row) {
fputcsv($tree_handle, $row);
}
$offset=$offset+$chunk;
}
fclose($tree_handle);
return array('resultCount'=>$allResults);
}
To count how many rows would be returned by a query, essentially all the work must be performed. That is, it is impractical to get the count, then perform the query; you may as well just do the query.
If your goal is to "paginate" by getting some of the rows, plus the total count, that is essentially two separate actions (that may be combined to look like one.)
If the goal is to estimate the number of rows, then SHOW TABLE STATUS or SELECT Rows FROM information_schema.TABLES WHERE ... gives you an estimate.
If you want to see if there are, say "at least 100 rows", then this may be practical:
SELECT 1 FROM ... WHERE ... ORDER BY ... LIMIT 99,1
and see if you get a row back. However, this may or may not be efficient, depending on the indexes and the WHERE and the ORDER BY. (Show us the query and I can elaborate.)
Using OFFSET for chunking is grossly inefficient. If there is not a usable index, then it is performing essentially the entire query for each chunk. If there is a usable index, the chunks are slower and slower. Here is a discussion of why OFFSET is not good for "pagination", plus an efficient workaround: Pagination . It talks about how to "remember where you left off " as an efficient technique for chunking. Fetch between 100 and 1000 rows per chunk.
The flaw in your code is that it aims to select a subset of some records and their total count in the same query. This is impossible in MySQL, so you cannot generate such a query, hence, you get the error as mentioned. The problem is that if you do a
select ... from t where ... limit 0, 2000
then you get maximum 2000 records, so, if the total records matching the criteria have a count that is greater than the limit, then you will not get accurately the count from above, so, in that case you need a
select count(1) from t where ...
This means that you need to build your actual query (the code below your count_all_results call), see whether the number of results reaches the limit. If the number of results does not reach the limit, then you do not need to perform a separate query in order to get the count, because you can compute $offset * $chunk + $recordCount. However, if you get as many records as they can be, then you will need to build another query, without the order_by call, since the count is independent of your sort and get the counts.
$this->db->count_all_results()
Counting the number of returned results with count_all_results()
It's useful to count the number of results returned—often bugs can arise if a section of code which expects to have at least one row is passed zero rows. Without handling the eventuality of a zero result, an application may become unpredictably unstable and may give away hints to a malicious user about the architecture of the app. Ensuring correct handling of zero results is what we're going to focus on here.
Permits you to determine the number of rows in a particular Active Record query. Queries will accept Query Builder restrictors such as where(), or_where(), like(), or_like(), etc. Example:
echo $this->db->count_all_results('my_table'); // Produces an integer, like 25
$this->db->like('title', 'match');
$this->db->from('my_table');
echo $this->db->count_all_results(); // Produces an integer, like 17
However, this method also resets any field values that you may have passed to select(). If you need to keep them, you can pass FALSE as the second parameter:
echo $this->db->count_all_results('my_table', FALSE);
get_compiled_select()
The method $this->db->get_compiled_select(); is introduced in codeigniter v3.0 and compiles active records query without actually executing it. But this is not a completely new method. In older versions of CI it is like $this->db->_compile_select(); but the method has been made protected in later versions making it impossible to call back.
// Note that the second parameter of the get_compiled_select method is FALSE
$sql = $this->db->select(array('field1','field2'))
->where('field3',5)
->get_compiled_select('mytable', FALSE);
// ...
// Do something crazy with the SQL code... like add it to a cron script for
// later execution or something...
// ...
$data = $this->db->get()->result_array();
// Would execute and return an array of results of the following query:
// SELECT field1, field1 from mytable where field3 = 5;
NOTE:- Double calls to get_compiled_select() while you’re using the Query Builder Caching functionality and NOT resetting your queries will results in the cache being merged twice. That in turn will i.e. if you’re caching a select() - select the same field twice.
Rick James got me on the right track. I ended up having to chunk the results using pagination AND a nested query. Using LIMIT on even 1 chunk of 2000 records was timing out. This is the code I ended up with, which uses get_compiled_select('tblProgram') and then get('tblTrees O1'). Since I didn't use FALSE as the second argument to get_compiled_select, the query was cleared before the get() was run.
//grab the data in chunks, write it to CSV chunk by chunk
$offset=0;
$chunk=2000;
$i=10; //counter for the progress bar
$this->db->limit($chunk);
$this->db->select('tblTrees.id');
//nesting the limited query and then joining the other field later improved performance significantly
$query1=' ('.$this->db->get_compiled_select('tblProgram').') AS O2';
$this->db->join($query1, 'O1.id=O2.id');
$result=$this->db->get('tblTrees O1')->result_array();
$allResults=count($result);
$putHeaders=0;
$treePath=$this->config->item('temp_path')."$rand/trees.csv";
$tree_handle=fopen($treePath,'a');
//while select limit returns the limit
while (count($result)===$chunk) {
$highestID=max(array_column($result, 'id'));
//update progres bar with estimate
if ($i<90) {
$this->set_runStatus($qcRunId, $status = "processing", $progress = $i);
$i=$i+1;
}
//only get the fields the first time
foreach ($result as $row) {
if ($offset===0 && $putHeaders===0){
fputcsv($tree_handle, array_keys($row));
$putHeaders=1;
}
fputcsv($tree_handle, $row);
}
//get the next chunk
$offset=$offset+$chunk;
$this->db->reset_query();
$this->make_query($options);
$this->db->order_by('tblTrees.id', 'ASC');
$this->db->where('tblTrees.id >', $highestID);
$this->db->limit($chunk);
$this->db->select('tblTrees.id');
$query1=' ('.$this->db->get_compiled_select('tblProgram').') AS O2';
$this->db->join($query1, 'O1.id=O2.id');
$result=$this->db->get('tblTrees O1')->result_array();
$allResults=$allResults+count($result);
}
//write out last chunk
foreach ($result as $row) {
fputcsv($tree_handle, $row);
}
fclose($tree_handle);
return array('resultCount'=>$allResults);

Doctrine MIN() low performance

I have table (~150 columns with ~150k records) and project on Symfony 3 with Doctrine. In project is clasic filter to show results.
If you submit form i collect data in object $selectedInputOptions and build query looks like:
$query = $repository
->createQueryBuilder('t')
->select('t.idkatcountry', 't.idkatlocality', 't, MIN(t.price) AS priceFrom'......);
if(count($selectedInputOptions->getCountry()) > 0)
$query->andWhere('t.idkatcountry IN (:idkatcountry )')->setParameter('idkatcountry ', $selectedInputOptions->getCountry());
if(count($selectedInputOptions->getLocality()) > 0)
$query->andWhere('t.idkatlocality IN (:idkatlocality )')->setParameter('idkatlocality ', $selectedInputOptions->getLocality());
price column have decimal(15,2) datatype
Before i have in $repository->select('t.price') and everything was OK but after change this to 't, MIN(t.price) AS priceFrom' query execution time was increased +40% and in few cases (any input in form be blank = checks all records) +900%.
So my questions:
How i can cut execution time? (Is there some idexes for this?, Will help change datetype range let's say to decimal(6,2)?)
And bonus question :) Table has ~150columns but query for filtering using ~10-15 columns can i set some type of index for quicker selects?
EDIT:
changed column price to ineger - did not help
added index to column pricte - did not help
SOLUTION!
It was little mistake in select parameter using MIN().
Insted of:
't, MIN(t.price) AS priceFrom'
I used:
'MIN(t.price) AS priceFrom')'
Because t takes ALL columns (~150 in my case) and I didn't notice this... So now is everything OK and time is normal.
Here you can do one thing, stop loading unwanted data in entity, by using unset in jsonSerialize() method.

Laravel Query Builder returning empty rows from 100k+ rows

My users table has over 100K rows. I need to fetch them all for export purpose. But, when I use following code, it doesn't return anything. But, if I apply limit then it returns them. It basically returns 10K rows and when I provide 20K on limit, it doesn't return anything.If I use mysqli_query, it returns all well in the same server and DB.
$myRows = DB::table('users')->get();//returns empty
$myRows = DB::table('users')->take(10000);//returns 10,000 rows
$myRows = DB::table('users')->take(20000);//returns empty
I am new to Laravel.
Thanks in advance.
I think chunking (scroll to "Chunking Results From A Table" point) would be good option here.
DB::table('users')->chunk(10000, function($users) {
//some connect code
});

Propel is not returning all the rows

I'm trying to execute a query using propel and it should return 50 rows of data, but instead of that it returns other random number of rows. I printed the query copied in my pgphpadmin and it returns the 50 rows. I'm not sure what is happening. Also if I execute "count" function before "find" it returns 50, but after it returns the random amount of rows.
$limit = 50;
$offset = 0;
..... //filters
$companies = $companies->orderById()->limit($limit)->offset($offset);
var_dump($companies->count()); // this returns 50
$companies = $companies->find();
var_dump($companies->count()); // this returns 13
Also the foreach after iterates only 13 times.
And this is the query generated
SELECT "company"."id", "company"."otherfields",
"lists"."id", "lists"."otherfields", "place"."id",
"place"."otherfields", "contact"."id", "contact"."otherfields",
"entry"."id","entry"."otherfields"
FROM "company" LEFT JOIN "lists" ON
("company"."sector_id"="lists"."id") LEFT JOIN "place" ON
("company"."country_id"="place"."id") LEFT JOIN "contact" ON
("company"."id"="contact"."company_id" AND "contact"."active" = true)
LEFT JOIN "entry" ON ("company"."id"="entry"."company_id")
WHERE "company"."active"=true ORDER BY "company"."id" ASC LIMIT 50 OFFSET 0
The Propel version is 2.0-dev
I always had a problem with this query so I think that the problem may come from the fact that I removed one throw in the library. I'm using with() and limit(). The query generated is ok, but the result is not.
Changed Propel/Runtime/Formatter/ObjectFormatter.php
I've commented this.
/*if ($this->hasLimit) {
throw new LogicException('Cannot use limit() in conjunction with with() on a one-to-many relationship. Please remove the with() call, or the limit() call.');
}*/
I imagine that the error might come from here.
My join query looks like this
$companies->leftJoinWithLists()->leftJoinWithPlace()->leftJoinWithContact()->addJoinCondition('Contact','Contact.active = ?', true)->leftJoinWithEntry();
And then I add the limit. Any way to avoid this or do it other way?
Any way to avoid this or do it another way?
You need to do it a different way. See my comment on the issue you created at https://github.com/propelorm/Propel2/issues/1231
Basically, the LIMIT clause in SQL will limit the number of rows in the result set, but since you are using a LEFT JOIN, one company entity may be represented in multiple rows of the result set (see the raw result in pgphpadmin).
The solution: Use multiple queries. I suggest querying first for all the companies you want, then using the ->populateRelation() method on the resulting ObjectCollection.

Categories