Allowed memory size exhausted with Propel2 custom query - php

(updates at bottom)
I'm trying to get the latest entry in my table called "VersionHistory", and since the ID is set to auto increment, I was trying to get the max id. Trying to stay away from sorting the whole table in descending order and taking the top as I want to minimize the computation required for this query as the table grows, and this table will probably get pretty huge fast.
class VersionHistoryQuery extends BaseVersionHistoryQuery {
public function getLatestVersion() {
return $this
->withColumn('MAX(version_history.id)')
->limit(1);
}
}
I'm calling the function in my VersionHistory constructor as below:
class VersionHistory extends BaseVersionHistory {
public function __construct($backupPath = "") {
$lastVersion = VersionHistoryQuery::create()
->getLatestVersion()
->find();
$lastVersion->setBackupPath("backup/" . $backupPath);
$lastVersion->save();
parent::setImportedAt(date("Y-m-d H:i:s"));
}
}
This outputs a "Allowed memory size exhausted" error in php. Any idea why? Commenting out the query in the VersionHistory constructor fixes the error, so it's somewhere in the query. I tried setting up a custom query following the instructions here: http://propelorm.org/documentation/03-basic-crud.html#using-custom-sql. But I couldn't get that to work. Running:
SELECT * FROM version_history WHERE id = (SELECT MAX(id) FROM version_history)
From MySQL workbench works fine and quickly.
Any ideas of what I'm doing wrong?
What I tried
Updated the code to:
public function getLatestVersion() {
return $this
->orderById('desc')
->findOne();
}
Still get the same memory allocation error.
Updated the code to:
$lastVersion = VersionHistoryQuery::create()
->orderById('desc')
->findOne();
Removed the custom function, turned on propel debug mode, it outputs that this query is run:
[2015-10-11 17:26:54] shop_reporting_db.INFO: SELECT `version_history`.`id`, `version_history`.`imported_at`, `version_history`.`backup_path` FROM `version_history` ORDER BY `version_history`.`id` DESC LIMIT 1 [] []
Still runs into a memory overflow.

Thats all:
SELECT * FROM version_history ORDER BY id DESC LIMIT 1;

From the documentation, withColumn does the following:
Propel adds the 'with' column to the SELECT clause of the query, and
uses the second argument of the withColumn() call as a column alias.
So, it looks like you are actually querying every row in the table, and also every row is querying the max ID.
I don't know anything about propel (except what I just googled), but it looks like you need a different way to specify your where condition.

Your raw SQL and your Propel Query are different / not equivalent.
In the propel query merely added a column, whereas is your raw SQL your actually have two queries with one being a sub-query to the other.
So you need to do the equivalent in Propel:
$lastVersionID = VersionHistoryQuery::create()
->withColumn('MAX(id)', 'LastVersionID')
->select('LastVersionID')
->findOne();
$lastVersion = VersionHistoryQuery::create()
->filterByVersionHistory($lastVersionID)
->find();
Note the ->select('LatestVersionID') since you only need a scalar value and not an entire object, as well as the virtual column (alias in SQL) using withColumn()

Related

Using count_all_results or get_compiled_select and $this->db->get('table') lists table twice in query?

How do I use get_compiled_select or count_all_results before running the query without getting the table name added twice? When I use $this->db->get('tblName') after either of those, I get the error:
Not unique table/alias: 'tblProgram'
SELECT * FROM (`tblProgram`, `tblProgram`) JOIN `tblPlots` ON `tblPlots`.`programID`=`tblProgram`.`pkProgramID` JOIN `tblTrees` ON `tblTrees`.`treePlotID`=`tblPlots`.`id` ORDER BY `tblTrees`.`id` ASC LIMIT 2000
If I don't use a table name in count_all_results or $this->db->get(), then I get an error that no table is used. How can I get it to set the table name just once?
public function get_download_tree_data($options=array(), $rand=""){
//join tables and order by tree id
$this->db->reset_query();
$this->db->join('tblPlots','tblPlots.programID=tblProgram.pkProgramID');
$this->db->join('tblTrees','tblTrees.treePlotID=tblPlots.id');
$this->db->order_by('tblTrees.id', 'ASC');
//get number of results to return
$allResults=$this->db->count_all_results('tblProgram', false);
//chunk data and write to CSV to avoid reaching memory limit
$offset=0;
$chunk=2000;
$treePath=$this->config->item('temp_path')."$rand/trees.csv";
$tree_handle=fopen($treePath,'a');
while (($offset<$allResults)) {
$this->db->limit($chunk, $offset);
$result=$this->db->get('tblProgram')->result_array();
foreach ($result as $row) {
fputcsv($tree_handle, $row);
}
$offset=$offset+$chunk;
}
fclose($tree_handle);
return array('resultCount'=>$allResults);
}
To count how many rows would be returned by a query, essentially all the work must be performed. That is, it is impractical to get the count, then perform the query; you may as well just do the query.
If your goal is to "paginate" by getting some of the rows, plus the total count, that is essentially two separate actions (that may be combined to look like one.)
If the goal is to estimate the number of rows, then SHOW TABLE STATUS or SELECT Rows FROM information_schema.TABLES WHERE ... gives you an estimate.
If you want to see if there are, say "at least 100 rows", then this may be practical:
SELECT 1 FROM ... WHERE ... ORDER BY ... LIMIT 99,1
and see if you get a row back. However, this may or may not be efficient, depending on the indexes and the WHERE and the ORDER BY. (Show us the query and I can elaborate.)
Using OFFSET for chunking is grossly inefficient. If there is not a usable index, then it is performing essentially the entire query for each chunk. If there is a usable index, the chunks are slower and slower. Here is a discussion of why OFFSET is not good for "pagination", plus an efficient workaround: Pagination . It talks about how to "remember where you left off " as an efficient technique for chunking. Fetch between 100 and 1000 rows per chunk.
The flaw in your code is that it aims to select a subset of some records and their total count in the same query. This is impossible in MySQL, so you cannot generate such a query, hence, you get the error as mentioned. The problem is that if you do a
select ... from t where ... limit 0, 2000
then you get maximum 2000 records, so, if the total records matching the criteria have a count that is greater than the limit, then you will not get accurately the count from above, so, in that case you need a
select count(1) from t where ...
This means that you need to build your actual query (the code below your count_all_results call), see whether the number of results reaches the limit. If the number of results does not reach the limit, then you do not need to perform a separate query in order to get the count, because you can compute $offset * $chunk + $recordCount. However, if you get as many records as they can be, then you will need to build another query, without the order_by call, since the count is independent of your sort and get the counts.
$this->db->count_all_results()
Counting the number of returned results with count_all_results()
It's useful to count the number of results returned—often bugs can arise if a section of code which expects to have at least one row is passed zero rows. Without handling the eventuality of a zero result, an application may become unpredictably unstable and may give away hints to a malicious user about the architecture of the app. Ensuring correct handling of zero results is what we're going to focus on here.
Permits you to determine the number of rows in a particular Active Record query. Queries will accept Query Builder restrictors such as where(), or_where(), like(), or_like(), etc. Example:
echo $this->db->count_all_results('my_table'); // Produces an integer, like 25
$this->db->like('title', 'match');
$this->db->from('my_table');
echo $this->db->count_all_results(); // Produces an integer, like 17
However, this method also resets any field values that you may have passed to select(). If you need to keep them, you can pass FALSE as the second parameter:
echo $this->db->count_all_results('my_table', FALSE);
get_compiled_select()
The method $this->db->get_compiled_select(); is introduced in codeigniter v3.0 and compiles active records query without actually executing it. But this is not a completely new method. In older versions of CI it is like $this->db->_compile_select(); but the method has been made protected in later versions making it impossible to call back.
// Note that the second parameter of the get_compiled_select method is FALSE
$sql = $this->db->select(array('field1','field2'))
->where('field3',5)
->get_compiled_select('mytable', FALSE);
// ...
// Do something crazy with the SQL code... like add it to a cron script for
// later execution or something...
// ...
$data = $this->db->get()->result_array();
// Would execute and return an array of results of the following query:
// SELECT field1, field1 from mytable where field3 = 5;
NOTE:- Double calls to get_compiled_select() while you’re using the Query Builder Caching functionality and NOT resetting your queries will results in the cache being merged twice. That in turn will i.e. if you’re caching a select() - select the same field twice.
Rick James got me on the right track. I ended up having to chunk the results using pagination AND a nested query. Using LIMIT on even 1 chunk of 2000 records was timing out. This is the code I ended up with, which uses get_compiled_select('tblProgram') and then get('tblTrees O1'). Since I didn't use FALSE as the second argument to get_compiled_select, the query was cleared before the get() was run.
//grab the data in chunks, write it to CSV chunk by chunk
$offset=0;
$chunk=2000;
$i=10; //counter for the progress bar
$this->db->limit($chunk);
$this->db->select('tblTrees.id');
//nesting the limited query and then joining the other field later improved performance significantly
$query1=' ('.$this->db->get_compiled_select('tblProgram').') AS O2';
$this->db->join($query1, 'O1.id=O2.id');
$result=$this->db->get('tblTrees O1')->result_array();
$allResults=count($result);
$putHeaders=0;
$treePath=$this->config->item('temp_path')."$rand/trees.csv";
$tree_handle=fopen($treePath,'a');
//while select limit returns the limit
while (count($result)===$chunk) {
$highestID=max(array_column($result, 'id'));
//update progres bar with estimate
if ($i<90) {
$this->set_runStatus($qcRunId, $status = "processing", $progress = $i);
$i=$i+1;
}
//only get the fields the first time
foreach ($result as $row) {
if ($offset===0 && $putHeaders===0){
fputcsv($tree_handle, array_keys($row));
$putHeaders=1;
}
fputcsv($tree_handle, $row);
}
//get the next chunk
$offset=$offset+$chunk;
$this->db->reset_query();
$this->make_query($options);
$this->db->order_by('tblTrees.id', 'ASC');
$this->db->where('tblTrees.id >', $highestID);
$this->db->limit($chunk);
$this->db->select('tblTrees.id');
$query1=' ('.$this->db->get_compiled_select('tblProgram').') AS O2';
$this->db->join($query1, 'O1.id=O2.id');
$result=$this->db->get('tblTrees O1')->result_array();
$allResults=$allResults+count($result);
}
//write out last chunk
foreach ($result as $row) {
fputcsv($tree_handle, $row);
}
fclose($tree_handle);
return array('resultCount'=>$allResults);

function render makes website 500% slow! can anyone fix that please?

Function render makes website 500% slow! Can anyone fix that please ?
Someone told me :
because it sends a database request on each iteration of the loop (it's not the only problem with this chunk of code but it's the most taxing one)
Yes I understand what that means. His way is:
you need to get all of the data before you start building the menu,
then you just insert the data instead of requesting more data on each
iteration
But i don't know how i must do it!
<?php
$menu_html='';
function render_menu($parent_id,$actmenuid)
{
$obj = new Database();
$con = $obj->dbconnectt();
global $menu_html;
$result=mysqli_query($con, "select * from tbl_menu where parent_id='$parent_id'");
if(mysqli_num_rows($result)==0) return;
if($parent_id==0){
$menu_html.='<ul class="topnav">';
}else{
$menu_html.='<ul>';
}
while($row=mysqli_fetch_array($result)) {
$childnum = $obj->recordcount("SELECT * FROM tbl_menu WHERE parent_id='".$row['id']."'");
if($childnum == 0){
$linkvalue='/category/'.$row['id'].'.html';
} else{
$linkvalue='#';
}
if($row['id']==$actmenuid && $actmenuid !=NULL){
$actv='class="active"';
}else{
$actv='';
}
$menu_html.='<li '.$actv.'>'.$row['title'].'';
render_menu($row['id'],$actmenuid);
$menu_html.='</li>';
}
$menu_html.='</ul>';return $menu_html;
}
if($isDsh==false){
echo render_menu(0,$actmenuid);
}
?>
Depending on how many records you have, try removing this query from inside the loop since it's running for every record on the first query.
$childnum = $obj->recordcount("SELECT * FROM tbl_menu WHERE parent_id='".$row['id']."'");
Change it a single query like this where it returns counts for each parent idea, and place it outside of the loop:
$parentcount = mysqli_query($con, ("SELECT parent_id, count(*) FROM tbl_menu GROUP BY parent_id");
There may be other issues, so please post the database structure and number of records that you're working with too.
Don't make recursive queries.
Having "more than 1000" rows is not too big. You can simply call everything from the table into php, then perform the recursive html build in php this will have a memory overhead, but far less processing overhead because you only ever make one trip to the db.
Alternatively (when your db table is prohibitively large), you should avoid gathering rows unnecessarily by adding a new column. The new column will store all "descendants" for the respective row when the row is INSERTed or update it when it is UPDATEd. Then you only need to reference this column when needing to call specific rows. In other words, do the recursive processing only once (when writing to the db) AND not when needing to display the data. This will, again, produce a finite result set in one query which can then be recursively traversed to build the desired output.
basically you need to do what #spudly has suggested.
But there is a small catch in his solution which depending on the number of the rows in yous tbl_menu table you may use a big chunk of memory to fetch all the records.
you can optimise it more with using his solution but changing the query to:
select
parent_tbl_menu.id,
count(child_tbl_menu.id) as cnt
from
tbl_menu as parent_tbl_menu
left join
tbl_menu as child_tbl_menu
on parent_tbl_menu.id = child_tbl_menu.parent_id
where
parent_tbl_menu.parent_id = ?
group by
parent_tbl_menu.id
This way you will only fetch the child records of a specific parent.
And please consider using prepared statements as your code has sql injection vulnerability.
Connect (from PHP to MySQL) only once for the entire web page.
Don't put a SELECT inside a loop if you can do all the work in a single SELECT, such as with a JOIN. (Exception: A "hierarchical" table needs the nested SELECT. Exception to the exception: MySQL 8.0 and MariaDB 10.2 can do it with a "recursive CTE".)
Don't fetch all the columns (SELECT *) when all you want it is a recordcount. Instead, SELECT COUNT(*) ... and use the number returned.
1000 of anything is probably excessive for a web page. Re-think the UI.

How to get database resource instead of array from Laravel Query Builder?

When I execute a PDO statement, internally a result set is stored, and I can use ->fetch() to get a row from the result.
If I wanted to convert the entire result to an array, I could do ->fetchAll().
With Laravel, in the Query Builder docs, I only see a way to get an array result from executing the query.
// example query, ~30,000 rows
$scores = DB::table("highscores")
->select("player_id", "score")
->orderBy("score", "desc")
->get();
var_dump($scores);
// array of 30,000 items...
// unbelievable ...
Is there any way to get a result set from Query Builder like PDO would return? Or am I forced to wait for Query Builder to build an entire array before it returns a value ?
Perhaps some sort of ->lazyGet(), or ->getCursor() ?
If this is the case, I can't help but see Query Builder is an extremely short-sighted tool. Imagine a query that selects 30,000 rows. With PDO I can step through row by row, one ->fetch() at a time, and handle the data with very little additional memory consumption.
Laravel Query Builder on the other hand? "Memory management, huh? It's fine, just load 30,000 rows into one big array!"
PS yes, I know I can use ->skip() and ->take() to offset and limit the result set. In most cases, this would work fine, as presenting a user with 30,000 rows is not even usable. If I want to generate large reports, I can see PHP running out of memory easily.
After #deczo pointed out an undocumented function ->chunk(), I dug around in the source code a bit. What I found is that ->chunk() is a convenience wrapper around multiplying my query into several queries queries but automatically populating the ->step($m)->take($n) parameters. If I wanted to build my own iterator, using ->chunk with my data set, I'd end up with 30,000 queries on my DB instead of 1.
This doesn't really help, too, because ->chunk() takes a callback which forces me to couple my looping logic at the time I'm building the query. Even if the function was defined somewhere else, the query is going to happen in the controller, which should have little interest in the intricacies of my View or Presenter.
Digging a little further, I found that all Query Builder queries inevitably pass through \Illuminate\Database\Connection#run.
// https://github.com/laravel/framework/blob/3d1b38557afe0d09326d0b5a9ff6b5705bc67d29/src/Illuminate/Database/Connection.php#L262-L284
/**
* Run a select statement against the database.
*
* #param string $query
* #param array $bindings
* #return array
*/
public function select($query, $bindings = array())
{
return $this->run($query, $bindings, function($me, $query, $bindings)
{
if ($me->pretending()) return array();
// For select statements, we'll simply execute the query and return an array
// of the database result set. Each element in the array will be a single
// row from the database table, and will either be an array or objects.
$statement = $me->getReadPdo()->prepare($query);
$statement->execute($me->prepareBindings($bindings));
return $statement->fetchAll($me->getFetchMode());
});
}
See that nasty $statement->fetchAll near the bottom ?
That means arrays for everyone, always and forever; your wishes and dreams abstracted away into an unusable tool Laravel Query Builder.
I can't express the valley of my depression right now.
One thing I will say though is that the Laravel source code was at least organized and formatted nicely. Now let's get some good code in there!
Use chunk:
DB::table('highscores')
->select(...)
->orderBy(...)
->chunk($rowsNumber, function ($portion) {
foreach ($portion as $row) { // do whatever you like }
});
Obviously returned result will be just the same as calling get, so:
$portion; // array of stdObjects
// and for Eloquent models:
Model::chunk(100, function ($portion) {
$portion; // Collection of Models
});
Here is a way to use the laravel query builder for making the query, but to then use the underlying pdo fetch to loop over the record set which I believe will solve your problem - running one query and looping the record set so you don't run out of memory on 30k records.
This approach will use all the config stuff you setup in laravel so you don't have to config pdo separately.
You could also abstract out a method to make this easy to use that takes in the query builder object, and returns the record set (executed pdo statement), which you would then while loop over as below.
$qb = DB::table("highscores")
->select("player_id", "score")
->orderBy("score", "desc");
$connection = $qb->getConnection();
$pdo = $connection->getPdo();
$query = $qb->toSql();
$bindings = $qb->getBindings();
$statement = $pdo->prepare($query);
$statement->execute($bindings);
while ($row = $statement->fetch($connection->getFetchMode()))
{
// do stuff with $row
}

How to check if table is empty , Execute and FetchRow methods in php and mysql

$RSGetID = $this->MyDBObject->Prepare("SELECT FinalID FROM clothes
WHERE ClothID=:|1 AND PriceID = :|2 LIMIT 1");
$RSGetID->Execute(2, 199);
$ClothIDRow = $RSGetID->FetchRow();
return $ClothIDRow->FinalID;
This last line gives an error, because there are no rows in the table, so it says:
"the query did not return any records"
How do I put a condition, that if the table is empty then return 0 , else return the fetched FinalID from the database table?
You're using some custom DB layer (MyDBObject?) rather than straight-up PDO - it's impossible for us to know how this behaves. There's probably a method along the lines of ->RowCount() or ->NumRows() you can call to see if you got anything back after the ->Execute() - but this is just guessing, since I can't see the DB object you're using.

MySQL: Get only count of result set

I am using MVC with PHP/MySQL.
Suppose I am using 10 functions with different queries for fetching details from the database.
But at other times I may want to get only the count of the result that will be returned by the query.
What is the standard way to handle such situation.
Should I write 10 more functions which duplicate almost whole query and return only the count.
Or
Should I always return the count also with the result set
Or
I can pass a flag to indicate that the function should return count only, and then based on the flag I will dynamically generate the (select part of) query.
Or
Is there a better way?
Now that mysql supports sub-queries, you can get counts for any query using:
$count_query="SELECT COUNT(*) FROM ($query)";
How hard was that?
However this approach always means that you are running two queries instead of just the one (I'm not sure if MySQL would necessarily be able to use a cached result set for the count - try it out and see).
If you've already fetched the entire result set it'll probably be faster counting the rows in PHP than issuing another query.
There are 2 functions in MySQL which would return the number of matched rows prior to application of a limit statement:
SQL_CALC_FOUND_ROWS and FOUND_ROWS()
see
http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_found-rows
C.
If you want only number of rows matched certain criteria, you shouldn't use a count of the result, but another query that select only count(*) instead.
If you need both data and it's count, why don't you just use count() on the resulting array?
another way is to use some class that can return both data and it;s count, but not different classes for the each 10 queries but one single database access class.
I'd go with the flag idea.
Writing 10 more functions and copy/pasting code does not help readability at all. If you always also return the count, that means that whenever you're only interested in the count, the database still has to generate and transmit the full result set which might be grossly inefficient.
With the flag, you'd have something like
function getData($countOnly=false) {
// ...generate FROM and WHERE clause
if ($countOnly) {
$query = 'SELECT COUNT(*) '.$query;
} else {
$query = 'SELECT field1, field2, ...'.$query.' ORDER BY ...';
}
...
}
I would generally try to have as much code as possible shared between methods. A possibility would be to :
have one select() and one count() functions
each one building the specific part of the query
and one buildFromAndWhere() function to build the parts of the query that are common.
and have select() and count() use that one
Written in pseudo-code, it could look a bit like this :
function select($params) {
return "select * "
. from()
. where($params)
. "limit 0, 10";
}
function count() {
return "count(*) as nbr "
. from()
. where();
}
function from() {
return "from table1 inner join table1 on ... ";
}
function where($params) {
// Use $params to build the where clause
return "where X=Y and Z=blah";
}
This way, you have as much common code as possible in the from() and where() functions -- considering the hard part of the queries is often there, it's for the best.
I prefer having two separate functions to select and count ; I think it make code easier to read and understand.
I don't like the ideas of one method returning two distinct data (list of results and the total count) ; and i don't really like the idea of passing a flag either : looking at the function's call, you'll never know what that parameter means.

Categories