Iterate mysqli unbuffered query result more than once

Iterate mysqli unbuffered query result more than once - php

Problem:
I have a query that returns a large result set. It is too large to bring into PHP. I get a fatal memory max error and cannot increase memory limit. Unbuffered Queries
I need to iterate over the array multiple times but mysqli_data_seek doesn't work on unbuffered queries. mysqli_result::data_seek
//I have a buffered result set
$bresult = $mysql->query("SELECT * FROM Small_Table");
//And a very large unbuffered result set
$uresult = $mysqli->query("SELECT * FROM Big_Table", MYSQLI_USE_RESULT);
//The join to combine them takes too long and is too large
//The result set returned by the unbuffered query is too large itself to store in PHP
//There are too many rows in $bresult to re-execute the query or even a subset of it for each one
foreach($bresult as &$row) {
//My solution was to search $uresult foreach row in $bresult to get the values I need
$row['X'] = searchResult($uresult, $row['Key']);
//PROBLEM: After the first search, $uresult is at its and and cannot be reset with mysqli_result::data_seek
}
function searchResult($uresult, $val)
while($row = $uresult->fetch_assoc()){
if($row['X'] == $val) {
return $row['X'];
}
}
}
If you have another solution that meets these requirements I will accept it:
- Does not try to join the result in a single query (takes too long)
- Does not run any query for each result in another query (too many queries, takes too long, slows down system)
Please leave a comment if you need more info.
Thank you.

If you're trying to process a big data set have you considered using an intermediary like Hadoop? you can set up a small hadoop cluster, do your processing, then have your php code make a request for the processed data to the hadoop output.

Related

Using count_all_results or get_compiled_select and $this->db->get('table') lists table twice in query?

How do I use get_compiled_select or count_all_results before running the query without getting the table name added twice? When I use $this->db->get('tblName') after either of those, I get the error:
Not unique table/alias: 'tblProgram'
SELECT * FROM (`tblProgram`, `tblProgram`) JOIN `tblPlots` ON `tblPlots`.`programID`=`tblProgram`.`pkProgramID` JOIN `tblTrees` ON `tblTrees`.`treePlotID`=`tblPlots`.`id` ORDER BY `tblTrees`.`id` ASC LIMIT 2000
If I don't use a table name in count_all_results or $this->db->get(), then I get an error that no table is used. How can I get it to set the table name just once?
public function get_download_tree_data($options=array(), $rand=""){
//join tables and order by tree id
$this->db->reset_query();
$this->db->join('tblPlots','tblPlots.programID=tblProgram.pkProgramID');
$this->db->join('tblTrees','tblTrees.treePlotID=tblPlots.id');
$this->db->order_by('tblTrees.id', 'ASC');
//get number of results to return
$allResults=$this->db->count_all_results('tblProgram', false);
//chunk data and write to CSV to avoid reaching memory limit
$offset=0;
$chunk=2000;
$treePath=$this->config->item('temp_path')."$rand/trees.csv";
$tree_handle=fopen($treePath,'a');
while (($offset<$allResults)) {
$this->db->limit($chunk, $offset);
$result=$this->db->get('tblProgram')->result_array();
foreach ($result as $row) {
fputcsv($tree_handle, $row);
}
$offset=$offset+$chunk;
}
fclose($tree_handle);
return array('resultCount'=>$allResults);
}

To count how many rows would be returned by a query, essentially all the work must be performed. That is, it is impractical to get the count, then perform the query; you may as well just do the query.
If your goal is to "paginate" by getting some of the rows, plus the total count, that is essentially two separate actions (that may be combined to look like one.)
If the goal is to estimate the number of rows, then SHOW TABLE STATUS or SELECT Rows FROM information_schema.TABLES WHERE ... gives you an estimate.
If you want to see if there are, say "at least 100 rows", then this may be practical:
SELECT 1 FROM ... WHERE ... ORDER BY ... LIMIT 99,1
and see if you get a row back. However, this may or may not be efficient, depending on the indexes and the WHERE and the ORDER BY. (Show us the query and I can elaborate.)
Using OFFSET for chunking is grossly inefficient. If there is not a usable index, then it is performing essentially the entire query for each chunk. If there is a usable index, the chunks are slower and slower. Here is a discussion of why OFFSET is not good for "pagination", plus an efficient workaround: Pagination . It talks about how to "remember where you left off " as an efficient technique for chunking. Fetch between 100 and 1000 rows per chunk.

The flaw in your code is that it aims to select a subset of some records and their total count in the same query. This is impossible in MySQL, so you cannot generate such a query, hence, you get the error as mentioned. The problem is that if you do a
select ... from t where ... limit 0, 2000
then you get maximum 2000 records, so, if the total records matching the criteria have a count that is greater than the limit, then you will not get accurately the count from above, so, in that case you need a
select count(1) from t where ...
This means that you need to build your actual query (the code below your count_all_results call), see whether the number of results reaches the limit. If the number of results does not reach the limit, then you do not need to perform a separate query in order to get the count, because you can compute $offset * $chunk + $recordCount. However, if you get as many records as they can be, then you will need to build another query, without the order_by call, since the count is independent of your sort and get the counts.

$this->db->count_all_results()
Counting the number of returned results with count_all_results()
It's useful to count the number of results returned—often bugs can arise if a section of code which expects to have at least one row is passed zero rows. Without handling the eventuality of a zero result, an application may become unpredictably unstable and may give away hints to a malicious user about the architecture of the app. Ensuring correct handling of zero results is what we're going to focus on here.
Permits you to determine the number of rows in a particular Active Record query. Queries will accept Query Builder restrictors such as where(), or_where(), like(), or_like(), etc. Example:
echo $this->db->count_all_results('my_table'); // Produces an integer, like 25
$this->db->like('title', 'match');
$this->db->from('my_table');
echo $this->db->count_all_results(); // Produces an integer, like 17
However, this method also resets any field values that you may have passed to select(). If you need to keep them, you can pass FALSE as the second parameter:
echo $this->db->count_all_results('my_table', FALSE);
get_compiled_select()
The method $this->db->get_compiled_select(); is introduced in codeigniter v3.0 and compiles active records query without actually executing it. But this is not a completely new method. In older versions of CI it is like $this->db->_compile_select(); but the method has been made protected in later versions making it impossible to call back.
// Note that the second parameter of the get_compiled_select method is FALSE
$sql = $this->db->select(array('field1','field2'))
->where('field3',5)
->get_compiled_select('mytable', FALSE);
// ...
// Do something crazy with the SQL code... like add it to a cron script for
// later execution or something...
// ...
$data = $this->db->get()->result_array();
// Would execute and return an array of results of the following query:
// SELECT field1, field1 from mytable where field3 = 5;
NOTE:- Double calls to get_compiled_select() while you’re using the Query Builder Caching functionality and NOT resetting your queries will results in the cache being merged twice. That in turn will i.e. if you’re caching a select() - select the same field twice.

Rick James got me on the right track. I ended up having to chunk the results using pagination AND a nested query. Using LIMIT on even 1 chunk of 2000 records was timing out. This is the code I ended up with, which uses get_compiled_select('tblProgram') and then get('tblTrees O1'). Since I didn't use FALSE as the second argument to get_compiled_select, the query was cleared before the get() was run.
//grab the data in chunks, write it to CSV chunk by chunk
$offset=0;
$chunk=2000;
$i=10; //counter for the progress bar
$this->db->limit($chunk);
$this->db->select('tblTrees.id');
//nesting the limited query and then joining the other field later improved performance significantly
$query1=' ('.$this->db->get_compiled_select('tblProgram').') AS O2';
$this->db->join($query1, 'O1.id=O2.id');
$result=$this->db->get('tblTrees O1')->result_array();
$allResults=count($result);
$putHeaders=0;
$treePath=$this->config->item('temp_path')."$rand/trees.csv";
$tree_handle=fopen($treePath,'a');
//while select limit returns the limit
while (count($result)===$chunk) {
$highestID=max(array_column($result, 'id'));
//update progres bar with estimate
if ($i<90) {
$this->set_runStatus($qcRunId, $status = "processing", $progress = $i);
$i=$i+1;
}
//only get the fields the first time
foreach ($result as $row) {
if ($offset===0 && $putHeaders===0){
fputcsv($tree_handle, array_keys($row));
$putHeaders=1;
}
fputcsv($tree_handle, $row);
}
//get the next chunk
$offset=$offset+$chunk;
$this->db->reset_query();
$this->make_query($options);
$this->db->order_by('tblTrees.id', 'ASC');
$this->db->where('tblTrees.id >', $highestID);
$this->db->limit($chunk);
$this->db->select('tblTrees.id');
$query1=' ('.$this->db->get_compiled_select('tblProgram').') AS O2';
$this->db->join($query1, 'O1.id=O2.id');
$result=$this->db->get('tblTrees O1')->result_array();
$allResults=$allResults+count($result);
}
//write out last chunk
foreach ($result as $row) {
fputcsv($tree_handle, $row);
}
fclose($tree_handle);
return array('resultCount'=>$allResults);

function render makes website 500% slow! can anyone fix that please?

Function render makes website 500% slow! Can anyone fix that please ?
Someone told me :
because it sends a database request on each iteration of the loop (it's not the only problem with this chunk of code but it's the most taxing one)
Yes I understand what that means. His way is:
you need to get all of the data before you start building the menu,
then you just insert the data instead of requesting more data on each
iteration
But i don't know how i must do it!
<?php
$menu_html='';
function render_menu($parent_id,$actmenuid)
{
$obj = new Database();
$con = $obj->dbconnectt();
global $menu_html;
$result=mysqli_query($con, "select * from tbl_menu where parent_id='$parent_id'");
if(mysqli_num_rows($result)==0) return;
if($parent_id==0){
$menu_html.='<ul class="topnav">';
}else{
$menu_html.='<ul>';
}
while($row=mysqli_fetch_array($result)) {
$childnum = $obj->recordcount("SELECT * FROM tbl_menu WHERE parent_id='".$row['id']."'");
if($childnum == 0){
$linkvalue='/category/'.$row['id'].'.html';
} else{
$linkvalue='#';
}
if($row['id']==$actmenuid && $actmenuid !=NULL){
$actv='class="active"';
}else{
$actv='';
}
$menu_html.='<li '.$actv.'>'.$row['title'].'';
render_menu($row['id'],$actmenuid);
$menu_html.='</li>';
}
$menu_html.='</ul>';return $menu_html;
}
if($isDsh==false){
echo render_menu(0,$actmenuid);
}
?>

Depending on how many records you have, try removing this query from inside the loop since it's running for every record on the first query.
$childnum = $obj->recordcount("SELECT * FROM tbl_menu WHERE parent_id='".$row['id']."'");
Change it a single query like this where it returns counts for each parent idea, and place it outside of the loop:
$parentcount = mysqli_query($con, ("SELECT parent_id, count(*) FROM tbl_menu GROUP BY parent_id");
There may be other issues, so please post the database structure and number of records that you're working with too.

Don't make recursive queries.
Having "more than 1000" rows is not too big. You can simply call everything from the table into php, then perform the recursive html build in php this will have a memory overhead, but far less processing overhead because you only ever make one trip to the db.
Alternatively (when your db table is prohibitively large), you should avoid gathering rows unnecessarily by adding a new column. The new column will store all "descendants" for the respective row when the row is INSERTed or update it when it is UPDATEd. Then you only need to reference this column when needing to call specific rows. In other words, do the recursive processing only once (when writing to the db) AND not when needing to display the data. This will, again, produce a finite result set in one query which can then be recursively traversed to build the desired output.

basically you need to do what #spudly has suggested.
But there is a small catch in his solution which depending on the number of the rows in yous tbl_menu table you may use a big chunk of memory to fetch all the records.
you can optimise it more with using his solution but changing the query to:
select
parent_tbl_menu.id,
count(child_tbl_menu.id) as cnt
from
tbl_menu as parent_tbl_menu
left join
tbl_menu as child_tbl_menu
on parent_tbl_menu.id = child_tbl_menu.parent_id
where
parent_tbl_menu.parent_id = ?
group by
parent_tbl_menu.id
This way you will only fetch the child records of a specific parent.
And please consider using prepared statements as your code has sql injection vulnerability.

Connect (from PHP to MySQL) only once for the entire web page.
Don't put a SELECT inside a loop if you can do all the work in a single SELECT, such as with a JOIN. (Exception: A "hierarchical" table needs the nested SELECT. Exception to the exception: MySQL 8.0 and MariaDB 10.2 can do it with a "recursive CTE".)
Don't fetch all the columns (SELECT *) when all you want it is a recordcount. Instead, SELECT COUNT(*) ... and use the number returned.
1000 of anything is probably excessive for a web page. Re-think the UI.

Laravel dabatabse facade memory usage

I've found great example written in php pdo, which helps to iterate huge amount of data without actually allocating memory for whole set of results:
$sql = 'SELECT * from playlists limit 50000';
$statement = $pdo->prepare($sql);
$statement->execute();
while (($result = $statement->fetch(PDO::FETCH_ASSOC)) !== false) {
//do something
}
I've done an investigation and this approach uses 18mb of memory.
If I fetch all results like this $results = $statement->fetchAll(PDO::FETCH_ASSOC); memory usage upraises to 35mb.
Using laravel's illuminate/database component and very similar approach DB::table('playlists')->limit(50000)->get(); also uses 35mb of memory.
How can I achieve first approach using Laravel's eloquent or DB facade?
Could you suggest some articles how this difference in memory usage develops?
Thanks

When you execute an SQL query with php (either mysql functions or PDO) all data returned from query loads in to memory as a "result set".
In order to use data in "result set" you have to fetch them in regular php arrays/objects.
PDOStatement::fetch - fetches one row from the result set in to memory.
PDOStatement::fetchAll - fetches all rows from result set to memory thus doubling the memory usage.
Eloquent has ability to chunk result sets. This is equivalent to performing "X times fetch" in PDO.
However, if you are working with very large result sets consider using SQL limits.

The Laravel approach to processing large data sets like this is to use chunking.
DB::table('playlists')->chunk(1000, function($playlists) use($count) {
foreach($playlists as $playlist) {
// do something with this playlist
}
});
This ensures that no more than the chunk size (in my example, 1000 rows) is loaded into RAM at once. 1k is arbitrary; you could chunk 1, 100, 253, etc.

SQL select by using limit or using program to fetch special records

I am using php to get special records from Database.
which one is better?
1.
Select * From [table] Limit 50000, 10;
while($row = $stmt->fetch()){
//save in array, total 10 times
}
or
2.
Select * From [table];
$start = 50000;
$length = 10;
while($row = $stmt->fetch()){
if($i < $start+$length && $j >=$start){
//save in array, total 50010 times
}
}
In this case, which one should I use?
Which one using DB with less resources?

which one is better?
Too vague: what is "better"?
Which one using DB with less resources?
You're much better off with the first approach. It's efficient to select as little data as you need and no more. Selecting the whole table will force your script to use a lot more memory because all that data needs to be kept live
The best answer you'll get is: test! You can run your queries multiple times in multiple ways and see for yourself. Just use SELECT SQL_NO_CACHE... instead of the generic SELECT... to force the DB to restart the work from scratch. Measure how long it takes to run the query and process results
function wayOne(){
// execute your 1st query and loop through results
}
function wayTwo(){
// execute 2nd query and loop through results
}
//Measures # of milliseconds it takes to execute another function
function timeThis(callable $callback){
$start_time = microtime();
call_user_func($callback);
$microsecs = microtime()-$start_time; //duration in microseconds
return round($microsecs*1000);//duration in milliseconds
}
$wayOneTime = timeThis('wayOne');
$wayTwoTime = timeThis('wayTwo');
You can then compare the two times. Generally (not always) a process that takes significantly less time uses fewer resources

PHP: how are query results stored in mysqli_result

When I made a query to the database and retrieve the results in mysqli_result, the memory usage is extremely small. However, when I fetch all the rows in the query results in to an associative array, the memory usage becomes extremely high.
<?php
require_once("../config.php"); //db connection config
$db = new mysqli(DB_HOST,DB_USER,DB_PASSWORD,DB_DBASE);
$query ="select * from table_name";
if($r = $db->query($query)){
echo "MEMORY USAGE before : ". memory_get_usage()."<br><br>";
$rows = array();
while($row = $r->fetch_assoc()){
$rows[]= $row;
}
echo "MEMORY USAGE after : ". memory_get_usage()."<br><br>";
//before: 660880
//after: 114655768
// # of records: around 30 thousands
?>
It makes sense to me that storing this many results is very memory consuming, but I'm just wondering how come mysqli_result is so small. It can't be that the results are queried to the dbase every time fetch_assoc is called. So then where are the results stored in the memory.

There is a HUGE difference between fetching results and storing a pointer to a resource.
If you echo $r; before your first call to memory_get_usage();, you will realize it is just a pointer. This is the pointer to your result set. Until you fetch your results, the result set will not actually be stored into memory.
I would suggest that you run fetchAll() for what you are trying to do. This will then result in 1 method accessing all your results with better performance since it's pawned off on the mysqli extension (C Library) rather than a loop in PHP.
You can also use the free results function to clear your results from memory when you are done with them. This is like closing a cursor in Java if you are familiar.

I think you should to this instead:
while($row = $r->fetch_assoc()){
//Do whatever you need with the record, then:
unset($row);
}
The way you posted is gathering a huge array in $rows, and memory usage reflects that.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Iterate mysqli unbuffered query result more than once - php

If you're trying to process a big data set have you considered using an intermediary like Hadoop? you can set up a small hadoop cluster, do your processing, then have your php code make a request for the processed data to the hadoop output.

Related

Using count_all_results or get_compiled_select and $this->db->get('table') lists table twice in query?

function render makes website 500% slow! can anyone fix that please?

Laravel dabatabse facade memory usage

SQL select by using limit or using program to fetch special records

PHP: how are query results stored in mysqli_result

Categories

Resources