Propel is not returning all the rows - php

I'm trying to execute a query using propel and it should return 50 rows of data, but instead of that it returns other random number of rows. I printed the query copied in my pgphpadmin and it returns the 50 rows. I'm not sure what is happening. Also if I execute "count" function before "find" it returns 50, but after it returns the random amount of rows.
$limit = 50;
$offset = 0;
..... //filters
$companies = $companies->orderById()->limit($limit)->offset($offset);
var_dump($companies->count()); // this returns 50
$companies = $companies->find();
var_dump($companies->count()); // this returns 13
Also the foreach after iterates only 13 times.
And this is the query generated
SELECT "company"."id", "company"."otherfields",
"lists"."id", "lists"."otherfields", "place"."id",
"place"."otherfields", "contact"."id", "contact"."otherfields",
"entry"."id","entry"."otherfields"
FROM "company" LEFT JOIN "lists" ON
("company"."sector_id"="lists"."id") LEFT JOIN "place" ON
("company"."country_id"="place"."id") LEFT JOIN "contact" ON
("company"."id"="contact"."company_id" AND "contact"."active" = true)
LEFT JOIN "entry" ON ("company"."id"="entry"."company_id")
WHERE "company"."active"=true ORDER BY "company"."id" ASC LIMIT 50 OFFSET 0
The Propel version is 2.0-dev
I always had a problem with this query so I think that the problem may come from the fact that I removed one throw in the library. I'm using with() and limit(). The query generated is ok, but the result is not.
Changed Propel/Runtime/Formatter/ObjectFormatter.php
I've commented this.
/*if ($this->hasLimit) {
throw new LogicException('Cannot use limit() in conjunction with with() on a one-to-many relationship. Please remove the with() call, or the limit() call.');
}*/
I imagine that the error might come from here.
My join query looks like this
$companies->leftJoinWithLists()->leftJoinWithPlace()->leftJoinWithContact()->addJoinCondition('Contact','Contact.active = ?', true)->leftJoinWithEntry();
And then I add the limit. Any way to avoid this or do it other way?

Any way to avoid this or do it another way?
You need to do it a different way. See my comment on the issue you created at https://github.com/propelorm/Propel2/issues/1231
Basically, the LIMIT clause in SQL will limit the number of rows in the result set, but since you are using a LEFT JOIN, one company entity may be represented in multiple rows of the result set (see the raw result in pgphpadmin).
The solution: Use multiple queries. I suggest querying first for all the companies you want, then using the ->populateRelation() method on the resulting ObjectCollection.

Related

Using count_all_results or get_compiled_select and $this->db->get('table') lists table twice in query?

How do I use get_compiled_select or count_all_results before running the query without getting the table name added twice? When I use $this->db->get('tblName') after either of those, I get the error:
Not unique table/alias: 'tblProgram'
SELECT * FROM (`tblProgram`, `tblProgram`) JOIN `tblPlots` ON `tblPlots`.`programID`=`tblProgram`.`pkProgramID` JOIN `tblTrees` ON `tblTrees`.`treePlotID`=`tblPlots`.`id` ORDER BY `tblTrees`.`id` ASC LIMIT 2000
If I don't use a table name in count_all_results or $this->db->get(), then I get an error that no table is used. How can I get it to set the table name just once?
public function get_download_tree_data($options=array(), $rand=""){
//join tables and order by tree id
$this->db->reset_query();
$this->db->join('tblPlots','tblPlots.programID=tblProgram.pkProgramID');
$this->db->join('tblTrees','tblTrees.treePlotID=tblPlots.id');
$this->db->order_by('tblTrees.id', 'ASC');
//get number of results to return
$allResults=$this->db->count_all_results('tblProgram', false);
//chunk data and write to CSV to avoid reaching memory limit
$offset=0;
$chunk=2000;
$treePath=$this->config->item('temp_path')."$rand/trees.csv";
$tree_handle=fopen($treePath,'a');
while (($offset<$allResults)) {
$this->db->limit($chunk, $offset);
$result=$this->db->get('tblProgram')->result_array();
foreach ($result as $row) {
fputcsv($tree_handle, $row);
}
$offset=$offset+$chunk;
}
fclose($tree_handle);
return array('resultCount'=>$allResults);
}
To count how many rows would be returned by a query, essentially all the work must be performed. That is, it is impractical to get the count, then perform the query; you may as well just do the query.
If your goal is to "paginate" by getting some of the rows, plus the total count, that is essentially two separate actions (that may be combined to look like one.)
If the goal is to estimate the number of rows, then SHOW TABLE STATUS or SELECT Rows FROM information_schema.TABLES WHERE ... gives you an estimate.
If you want to see if there are, say "at least 100 rows", then this may be practical:
SELECT 1 FROM ... WHERE ... ORDER BY ... LIMIT 99,1
and see if you get a row back. However, this may or may not be efficient, depending on the indexes and the WHERE and the ORDER BY. (Show us the query and I can elaborate.)
Using OFFSET for chunking is grossly inefficient. If there is not a usable index, then it is performing essentially the entire query for each chunk. If there is a usable index, the chunks are slower and slower. Here is a discussion of why OFFSET is not good for "pagination", plus an efficient workaround: Pagination . It talks about how to "remember where you left off " as an efficient technique for chunking. Fetch between 100 and 1000 rows per chunk.
The flaw in your code is that it aims to select a subset of some records and their total count in the same query. This is impossible in MySQL, so you cannot generate such a query, hence, you get the error as mentioned. The problem is that if you do a
select ... from t where ... limit 0, 2000
then you get maximum 2000 records, so, if the total records matching the criteria have a count that is greater than the limit, then you will not get accurately the count from above, so, in that case you need a
select count(1) from t where ...
This means that you need to build your actual query (the code below your count_all_results call), see whether the number of results reaches the limit. If the number of results does not reach the limit, then you do not need to perform a separate query in order to get the count, because you can compute $offset * $chunk + $recordCount. However, if you get as many records as they can be, then you will need to build another query, without the order_by call, since the count is independent of your sort and get the counts.
$this->db->count_all_results()
Counting the number of returned results with count_all_results()
It's useful to count the number of results returned—often bugs can arise if a section of code which expects to have at least one row is passed zero rows. Without handling the eventuality of a zero result, an application may become unpredictably unstable and may give away hints to a malicious user about the architecture of the app. Ensuring correct handling of zero results is what we're going to focus on here.
Permits you to determine the number of rows in a particular Active Record query. Queries will accept Query Builder restrictors such as where(), or_where(), like(), or_like(), etc. Example:
echo $this->db->count_all_results('my_table'); // Produces an integer, like 25
$this->db->like('title', 'match');
$this->db->from('my_table');
echo $this->db->count_all_results(); // Produces an integer, like 17
However, this method also resets any field values that you may have passed to select(). If you need to keep them, you can pass FALSE as the second parameter:
echo $this->db->count_all_results('my_table', FALSE);
get_compiled_select()
The method $this->db->get_compiled_select(); is introduced in codeigniter v3.0 and compiles active records query without actually executing it. But this is not a completely new method. In older versions of CI it is like $this->db->_compile_select(); but the method has been made protected in later versions making it impossible to call back.
// Note that the second parameter of the get_compiled_select method is FALSE
$sql = $this->db->select(array('field1','field2'))
->where('field3',5)
->get_compiled_select('mytable', FALSE);
// ...
// Do something crazy with the SQL code... like add it to a cron script for
// later execution or something...
// ...
$data = $this->db->get()->result_array();
// Would execute and return an array of results of the following query:
// SELECT field1, field1 from mytable where field3 = 5;
NOTE:- Double calls to get_compiled_select() while you’re using the Query Builder Caching functionality and NOT resetting your queries will results in the cache being merged twice. That in turn will i.e. if you’re caching a select() - select the same field twice.
Rick James got me on the right track. I ended up having to chunk the results using pagination AND a nested query. Using LIMIT on even 1 chunk of 2000 records was timing out. This is the code I ended up with, which uses get_compiled_select('tblProgram') and then get('tblTrees O1'). Since I didn't use FALSE as the second argument to get_compiled_select, the query was cleared before the get() was run.
//grab the data in chunks, write it to CSV chunk by chunk
$offset=0;
$chunk=2000;
$i=10; //counter for the progress bar
$this->db->limit($chunk);
$this->db->select('tblTrees.id');
//nesting the limited query and then joining the other field later improved performance significantly
$query1=' ('.$this->db->get_compiled_select('tblProgram').') AS O2';
$this->db->join($query1, 'O1.id=O2.id');
$result=$this->db->get('tblTrees O1')->result_array();
$allResults=count($result);
$putHeaders=0;
$treePath=$this->config->item('temp_path')."$rand/trees.csv";
$tree_handle=fopen($treePath,'a');
//while select limit returns the limit
while (count($result)===$chunk) {
$highestID=max(array_column($result, 'id'));
//update progres bar with estimate
if ($i<90) {
$this->set_runStatus($qcRunId, $status = "processing", $progress = $i);
$i=$i+1;
}
//only get the fields the first time
foreach ($result as $row) {
if ($offset===0 && $putHeaders===0){
fputcsv($tree_handle, array_keys($row));
$putHeaders=1;
}
fputcsv($tree_handle, $row);
}
//get the next chunk
$offset=$offset+$chunk;
$this->db->reset_query();
$this->make_query($options);
$this->db->order_by('tblTrees.id', 'ASC');
$this->db->where('tblTrees.id >', $highestID);
$this->db->limit($chunk);
$this->db->select('tblTrees.id');
$query1=' ('.$this->db->get_compiled_select('tblProgram').') AS O2';
$this->db->join($query1, 'O1.id=O2.id');
$result=$this->db->get('tblTrees O1')->result_array();
$allResults=$allResults+count($result);
}
//write out last chunk
foreach ($result as $row) {
fputcsv($tree_handle, $row);
}
fclose($tree_handle);
return array('resultCount'=>$allResults);

Allowed memory size exhausted with Propel2 custom query

(updates at bottom)
I'm trying to get the latest entry in my table called "VersionHistory", and since the ID is set to auto increment, I was trying to get the max id. Trying to stay away from sorting the whole table in descending order and taking the top as I want to minimize the computation required for this query as the table grows, and this table will probably get pretty huge fast.
class VersionHistoryQuery extends BaseVersionHistoryQuery {
public function getLatestVersion() {
return $this
->withColumn('MAX(version_history.id)')
->limit(1);
}
}
I'm calling the function in my VersionHistory constructor as below:
class VersionHistory extends BaseVersionHistory {
public function __construct($backupPath = "") {
$lastVersion = VersionHistoryQuery::create()
->getLatestVersion()
->find();
$lastVersion->setBackupPath("backup/" . $backupPath);
$lastVersion->save();
parent::setImportedAt(date("Y-m-d H:i:s"));
}
}
This outputs a "Allowed memory size exhausted" error in php. Any idea why? Commenting out the query in the VersionHistory constructor fixes the error, so it's somewhere in the query. I tried setting up a custom query following the instructions here: http://propelorm.org/documentation/03-basic-crud.html#using-custom-sql. But I couldn't get that to work. Running:
SELECT * FROM version_history WHERE id = (SELECT MAX(id) FROM version_history)
From MySQL workbench works fine and quickly.
Any ideas of what I'm doing wrong?
What I tried
Updated the code to:
public function getLatestVersion() {
return $this
->orderById('desc')
->findOne();
}
Still get the same memory allocation error.
Updated the code to:
$lastVersion = VersionHistoryQuery::create()
->orderById('desc')
->findOne();
Removed the custom function, turned on propel debug mode, it outputs that this query is run:
[2015-10-11 17:26:54] shop_reporting_db.INFO: SELECT `version_history`.`id`, `version_history`.`imported_at`, `version_history`.`backup_path` FROM `version_history` ORDER BY `version_history`.`id` DESC LIMIT 1 [] []
Still runs into a memory overflow.
Thats all:
SELECT * FROM version_history ORDER BY id DESC LIMIT 1;
From the documentation, withColumn does the following:
Propel adds the 'with' column to the SELECT clause of the query, and
uses the second argument of the withColumn() call as a column alias.
So, it looks like you are actually querying every row in the table, and also every row is querying the max ID.
I don't know anything about propel (except what I just googled), but it looks like you need a different way to specify your where condition.
Your raw SQL and your Propel Query are different / not equivalent.
In the propel query merely added a column, whereas is your raw SQL your actually have two queries with one being a sub-query to the other.
So you need to do the equivalent in Propel:
$lastVersionID = VersionHistoryQuery::create()
->withColumn('MAX(id)', 'LastVersionID')
->select('LastVersionID')
->findOne();
$lastVersion = VersionHistoryQuery::create()
->filterByVersionHistory($lastVersionID)
->find();
Note the ->select('LatestVersionID') since you only need a scalar value and not an entire object, as well as the virtual column (alias in SQL) using withColumn()

Eloquent chunk() missing half the results

I have a problem with Laravel's ORM Eloquent chunk() method.
It misses some results.
Here is a test query :
$destinataires = Destinataire::where('statut', '<', 3)
->where('tokenized_at', '<', $date_active)
->chunk($this->chunk, function ($destinataires) {
foreach($destinataires as $destinataire) {
$this->i++;
}
}
echo $this->i;
It gives 124838 results.
But :
$num_dest = Destinataire::where('statut', '<', 3)
->where('tokenized_at', '<', $date_active)
->count();
echo $num_dest;
gives 249676, so just TWICE as the first code example.
My script is supposed to edit all matching records in the database. If I launch it multiple times, it just hands out half the remaining records, each time.
I tried with DB::table() instead of the Model.
I tried to add a ->take(20000) but it doesn't seem to be taken into account.
I echoed the query with ->toSql() and eveything seems to be fine (the LIMIT clause is added when I add the ->take() parameter).
Any suggestions ?
Imagine you are using chunk method to delete all of the records. The table has 2,000,000 records and you are going to delete all of them by 1000 chunks.
$query->orderBy('id')->chunk(1000, function ($items) {
foreach($items as $item) {
$item->delete();
}
});
It will delete the first 1000 records by getting first 1000 records in a query like this:
SELECT * FROM table ORDER BY id LIMIT 0,1000
And then the other query from chunk method is:
SELECT * FROM table ORDER BY id LIMIT 1000,2000
Our problem is here, that we delete 1000 records and then getting results from 1000 to 2000. Actually we are missing first 1000 records and this means that we are not going to delete 1000 records in first step of chunk! This scenario will be the same for other steps. In each step we are going to miss 1000 records and this is the reason that we are not getting best result in these situations.
I made an example for deletion because this way we could know the exact behavior of chunk method.
UPDATE:
You can use chunkById() for deleting safely.
Read more here:
http://laravel.at.jeffsbox.eu/laravel-5-eloquent-builder-chunk-chunkbyid
https://laravel.com/api/5.4/Illuminate/Database/Eloquent/Builder.html#method_chunkById
Quick answer: Use chunkById() instead of chunk().
When updating or deleting records while iterating over them, any changes to the primary key or foreign keys could affect the chunk query. This could potentially result in records not being included in the results.
The explanation can be found in the Laravel documentation:
If you are updating database records while chunking results, your chunk results could change in unexpected ways. If you plan to update the retrieved records while chunking, it is always best to use the chunkById method instead. This method will automatically paginate the results based on the record's primary key.
Example usage of chunkById():
DB::table('users')->where('active', false)
->chunkById(100, function ($users) {
foreach ($users as $user) {
DB::table('users')
->where('id', $user->id)
->update(['active' => true]);
}
});
(end of the update)
Below is the original answer which used the cursor() method instead of the chunk() method to solve the problem:
I had the same problem - only half of the total results were passed to the callback function of the chunk() method.
Here is the code which had the same problem - half of the transactions were not processed:
Transaction::whereNull('processed')->chunk(100, function ($transactions) {
$transactions->each(function($transaction){
$transaction->process();
});
});
I used Laravel 5.4 and managed to solve the problem replacing the chunk() method with cursor() method and changing the code accordingly:
foreach (Transaction::whereNull('processed')->cursor() as $transaction) {
$transaction->process();
}
Even though the answer doesn't address the problem itself, it provides a valuable solution.
For anyone looking for a bit of code that solves this, here you go:
while (Model::where('x', '>', 'y')->count() > 0)
{
Model::where('x', '>', 'y')->chunk(10, function ($models)
{
foreach ($models as $model)
{
$model->delete();
}
});
}
The problem is in the deletion / removal of the model while chunking away at the total. Including it in a while loop makes sure you get them all! This example works when deleting Models, change the while condition to suit your needs!
When you fetch data using chunk the same SQL query is being executed only the offset is different. Actually increasing as specified on the chunk method param. For example:
SELECT * FROM users WHERE status = 0;
Let's say there are 200 records(let's suppose that is a lot so we want to retrieve these data as chunks). So this looks like:
SELECT * FROM users WHERE status = 0 LIMIT 50 OFFSET 0(offset has a dynamic value
which means next time is 50, after that 100, and the last time 150).
The problem when using laravel chunk while updating is that we are only changing the offset. And this means the number of results is different each time we try to retrieve a chunk of data. So the first time there are 200 records that match the where condition. But if we update the status, for example to 1(status = 1) this means the next time when we try to fetch data we still execute the same query:
SELECT * FROM users WHERE status = 0 LIMIT 50 OFFSET 50(offset has a dynamic value
which means next time 100 and the last time 150).
We only have 150 records that match this query since we updated the table status = 1 for 50rows. Also we said the offset on the second time is going to be 50. And what is going to happen is that we skip 50 rows from 150rows since the offset is 50. And do the same update to these data. This means rows from 50->100 status is being updated to 1(status = 1) from the total of 150 rows.
The third time we run this query:
SELECT * FROM users WHERE status = 0 LIMIT 50 OFFSET 150(offset is going to be 150).
But the result of the query is 100 users in total that have status = 0. So no more data to go through.
This is not what you would expect to happen on the first thought. But this is how it works and why only half of data are being updated and the other part of data is being skipped.

Codeigniter, join of two tables with a WHERE clause

I've this code:
public function getAllAccess(){
$this->db->select('accesscode');
$this->db->where(array('chain_code' => '123');
$this->db->order_by('dateandtime', 'desc');
$this->db->limit($this->config->item('access_limit'));
return $this->db->get('accesstable')->result();
}
I need to join it with another table (codenamed table), I've to tell it this. Not really a literal query but what I want to achieve:
SELECT * accesscode, dateandtime FROM access table WHERE chain_code = '123' AND codenames.accselect_lista != 0
So basically accesstable has a column code which is a number, let us say 33, this number is also present in the codenames table; in this last table there is a field accselect_lista.
So I have to select only the accselect_lista != 0 and from there get the corrisponding accesstable rows where codenames are the ones selected in the codenames.
Looking for this?
SELECT *
FROM access_table a INNER JOIN codenames c ON
a.chain_code = c.chain_code
WHERE a.chain_code = '123' AND
c.accselect_lista != 0
It will bring up all columns from both tables for the specified criteria. The table and column names need to be exact, obviously.
Good start! But I think you might be getting a few techniques mixed up here.
Firstly, there are two main ways to run multiple where queries. You can use an associative array (like you've started to do there).
$this->db->where(array('accesstable.chain_code' => '123', 'codenames.accselect_lista !=' => 0));
Note that I've appended the table name to each column. Also notice that you can add alternative operators if you include them in the same block as the column name.
Alternatively you can give each their own line. I prefer this method because I think its a bit easier to read. Both will accomplish the same thing.
$this->db->where('accesstable.chain_code', '123');
$this->db->where('codenames.accselect_lista !=', 0);
Active record will format the query with 'and' etc on its own.
The easiest way to add the join is to use from with join.
$this->db->from('accesstable');
$this->db->join('codenames', 'codenames.accselect_lista = accesstable.code');
When using from, you don't need to include the table name in get, so to run the query you can now just use something like:
$query = $this->db->get();
return $query->result();
Check out Codeigniter's Active Record documentation if you haven't already, it goes into a lot more detail with lots of examples.

MySQL: Get only count of result set

I am using MVC with PHP/MySQL.
Suppose I am using 10 functions with different queries for fetching details from the database.
But at other times I may want to get only the count of the result that will be returned by the query.
What is the standard way to handle such situation.
Should I write 10 more functions which duplicate almost whole query and return only the count.
Or
Should I always return the count also with the result set
Or
I can pass a flag to indicate that the function should return count only, and then based on the flag I will dynamically generate the (select part of) query.
Or
Is there a better way?
Now that mysql supports sub-queries, you can get counts for any query using:
$count_query="SELECT COUNT(*) FROM ($query)";
How hard was that?
However this approach always means that you are running two queries instead of just the one (I'm not sure if MySQL would necessarily be able to use a cached result set for the count - try it out and see).
If you've already fetched the entire result set it'll probably be faster counting the rows in PHP than issuing another query.
There are 2 functions in MySQL which would return the number of matched rows prior to application of a limit statement:
SQL_CALC_FOUND_ROWS and FOUND_ROWS()
see
http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_found-rows
C.
If you want only number of rows matched certain criteria, you shouldn't use a count of the result, but another query that select only count(*) instead.
If you need both data and it's count, why don't you just use count() on the resulting array?
another way is to use some class that can return both data and it;s count, but not different classes for the each 10 queries but one single database access class.
I'd go with the flag idea.
Writing 10 more functions and copy/pasting code does not help readability at all. If you always also return the count, that means that whenever you're only interested in the count, the database still has to generate and transmit the full result set which might be grossly inefficient.
With the flag, you'd have something like
function getData($countOnly=false) {
// ...generate FROM and WHERE clause
if ($countOnly) {
$query = 'SELECT COUNT(*) '.$query;
} else {
$query = 'SELECT field1, field2, ...'.$query.' ORDER BY ...';
}
...
}
I would generally try to have as much code as possible shared between methods. A possibility would be to :
have one select() and one count() functions
each one building the specific part of the query
and one buildFromAndWhere() function to build the parts of the query that are common.
and have select() and count() use that one
Written in pseudo-code, it could look a bit like this :
function select($params) {
return "select * "
. from()
. where($params)
. "limit 0, 10";
}
function count() {
return "count(*) as nbr "
. from()
. where();
}
function from() {
return "from table1 inner join table1 on ... ";
}
function where($params) {
// Use $params to build the where clause
return "where X=Y and Z=blah";
}
This way, you have as much common code as possible in the from() and where() functions -- considering the hard part of the queries is often there, it's for the best.
I prefer having two separate functions to select and count ; I think it make code easier to read and understand.
I don't like the ideas of one method returning two distinct data (list of results and the total count) ; and i don't really like the idea of passing a flag either : looking at the function's call, you'll never know what that parameter means.

Categories