PHP: How to Cleanup after Generator Function - php

After some research, I have been sold on the idea of generators (more generally, iterators) for lots of tasks that would normally buffer results into an array, since the memory usage is O(1) instead of O(n).
So I plan to use generators to handle database results queried via mysqli. I have 2 questions regarding this approach that I have not been able to find answers on, and I am hoping the community can give me some creative solutions:
Is there a way to release resources opened by a generator, if the consuming code chooses to not fully iterate the results? Using an Iterator class, one might do this in the __desctruct method. But, from my tests, a generator will simply not execute code following an iteration sequence if it does not conclude naturally. I am looking for workarounds to this that will prevent having to create an Iterator subclass. See code below.
Does using a generator or iterator even provide any benefit for database results? Some of my snooping seemed to indicate that mysqli might be loading the resultset into memory anyway (MYSQLI_STORE_RESULT), defeating the purpose of an iterator. If the results are not buffered, I am curious whether multiple queries can be executed while their resultsets are being iterated (fetched) at the same time (think nested loops where you are iterating over a set of items and then query for child items for each parent). This seems like the database cursor might lock during the entire iteration.
Example
The below is a simplification of what I mean by cleanup. From my tests, the result only gets freed if the entire result is iterated. If there is an exception or break in the consuming loop, the results never get freed. Perhaps I am overthinking this and the garbage collector is good enough?
function query($mysqli, $sql){
$result = $mysqli->query($sql);
foreach($result as $row){
yield $row;
}
$result->free(); //Never reached if break, exception, take first n rows, etc.
}
tl;dr is I am just curious how to free resources used by a generator, and subsequently if generators for database access really saves anything, or if the results are buffered anyway
UPDATE
It looks here (http://www.php.net/manual/en/mysqlinfo.concepts.buffering.php) like PHP buffers queries by default, possibly defeating the purpose of generators. Although the argument could be made that buffering only one array is better than creating a copy of the buffered array and then having two buffered sets.
I am looking for anyone with experience in the matter to weigh in. Your thoughts are appreciated!

I may be a little late to the party, but if you are using generators and need to clean up when finished (say you break you parent loop before you are finished looping through everything), you can just use a try/catch/finally with the cleanup in the finally block:
function query($mysqli, $sql) {
$result = $mysqli->query($sql);
try {
if ($result) {
foreach($result as $row) {
yield $row;
}
}
} catch (Exception $e) {
throw $e; // send this up the stack (or you could handle here)
} finally {
$result->free(); // clean up when the loop is finished.
}
}

Here's how to detect loop breaks, and how to handle or cleanup after an interruption.
function generator()
{
$complete = false;
try {
while (($result = some_function())) {
yield $result;
}
$complete = true;
} finally {
if (!$complete) {
// cleanup when loop breaks
} else {
// cleanup when loop completes
}
}
// Do something only after loop completes
}

function query($mysqli, $sql){
$result = $mysqli->query($sql);
foreach($result as $i => $row) {
if ($i + 1 === $result->num_rows) {
$result->free();
}
yield $row;
}
}

Related

Most efficient way of calling mysqli_fetch_assoc

Recently, I have been moving all of my 'mysql specific' functions into a class, just so if in the future I decided to switch DB languages, I woudn't have to correct all of my scripts, but just one class.
My question is about efficiency and speed on large data.
For example, now in my database.class.php I have:
public function fetchAssociative($data) {
$assocarr = array();
while ($row = mysqli_fetch_assoc($data)){
array_push($assocarr, $row);
}
return $assocarr;
}
which then gets called in my script.
In compare to when I just use
while ($row = mysqli_fetch_assoc($result)) {
//operate on returned data ONE ROW AT A TIME
}
right in my script.
Now as I mentioned before, it is convenient to have all mysql specific functions in one class, but what about speed efficiency? In this example my script has to wait for all the data to get returned before doing any operations where as before (using while ($row = mysqli_fetch_assoc($result))) operations where done one at a time on each row returned.
TLDR: How are the efficiency and speed affected if I push mysql specific functions back into a class?
My question is about efficiency and speed on large data.
Then you should worry about SQL queries and indexing.
Not PHP classes or PHP functions.. 98% off performance issues with MySQL and Big Data (read large datasets) are caused by inefficient SQL queries like subqueries, IN/EXISTS statements no indexing, wrong indexing, using joins without indexes and not understanding the MySQL engine..
You should have two functions, not one.
public function fetch($result, $method = MYSQLI_ASSOC) {
return mysqli_fetch_array($result, $method);
}
public function fetchAll($result, $method = MYSQLI_ASSOC) {
$ret = array();
while (mysqli_fetch_array($result, $method)){
$ret[] = $row;
}
return $ret;
}
in case you need all your data at once - call fetchAll()
in case you need only one row or there are too much rows returned - use fetch() once/in a while loop respectively.

When are do loops useful?

As you all probably know, do loops execute at least once, even if the statement is false — while the while loop would never execute even once if the statement is false.
When are do loops useful? Could someone give me a real life example?
They're basically useful when you want something to happen at least once, and maybe more.
The first example that comes to mind is generating a unique ID (non sequentially) in a database. The approach I sometimes take is:
lock table
do {
id = generate random id
} while(id exists)
insert into db with the generated id
unlock table
Basically it will keep generating ids until one doesn't exist (note: potentially an infinite loop, which I might guard against depending on the situation).
The Do loop is very powerfull if you have to check multiple files etc. Due to the guarentee of iteration it will work all the way through.
do {
if($integer > 0) { $nameoffile[0]++; }
else { $nameoffile[0] = $nameoffile[0].$integer; }
$integer++;
} while(file_exists("directory/".$nameoffile[0].".".$nameoffile[1]));
Next to what has already been answered, you can do crude stuff like this with a do:
do
{
if ($cond1) break;
if ($cond2) continue;
do_something();
} while(true/false);
Which is a modification of a switch loop, which allows continue. You can simulate goto similarities in case goto is not available and similar.
It must not make your code more readable, so it's often not suggested to do that. But it technically works.

Which php loop is more effective in my case, when returning query results?

Which one of these two is better in my case?
While loop:
function search_hotel($searchterm)
{
$query = $this->db->order_by("id", "desc")->like('name', $searchterm)->get('hotel_submits');
$data = array();
while($row = mysql_fetch_array($query))
{
$data[] = $row->name;
}
return $data;
}
Foreach loop:
function search_hotel($searchterm)
{
$query = $this->db->order_by("id", "desc")->like('name', $searchterm)->get('hotel_submits');
$data = array();
foreach ($query->result() as $row)
{
$data[] = $row->name;
}
return $data;
//return mysql_query("select * from hotel_submits where name LIKE '".$searchterm."'");
}
while is technically more efficient than foreach, but it's not worth comparing: they're both pretty much identical in this case.
In your case, result returned by query is array. Which means you can use foreach statement or while statement.
Just note that foreach statement is optimized for working with arrays (and as of PHP5, objects as well) and is faster than while statement. While can be used to achieve the same effect but it is not as efficient if you want to go through all elements of the array.
When using a framework and its custom DB adapter class, it seems pointless to switch back to PHP's built-in functions in the middle of a script. Even if CI's adapter and PHP's mysql_* functions might be using the same DBMS connection library (mysql).
I strongly recommend to stick with Code Igniter's version (foreach ($query->result() as $row)). From a performance point of view, there shouldn't be any noticeable differences. Regarding the application architecture, it certainly is much cleaner not to mix the access interfaces. Although it might work out, it might also cause problems.
It would appear you're mixing up mysqli and mysql syntax. The two libraries are NOT compatible internally. You cannot use a handle/statement in one and consume it in another. Both libraries maintain completely independent connections to the database.
That'd mean the first one will be faster, since mysql_fetch_array() will fail and the inner loop will never run. But faster doesn't mean "right".

is it a good practice to use mysql_free_result($result)?

I am aware of that All associated result memory is automatically freed at the end of the script's execution. But would you recommend using it, if I am using a quite of lot of somewhat similar actions as below?
$sql = "select * from products";
$result = mysql_query($sql);
if($result && mysql_num_rows($result) > 0) {
while($data = mysql_fetch_assoc($result)) {
$sql2 = "insert into another_table set product_id = '".$data['product_id']."'
, product_name = '".$data['product_name']."'
";
$result2 = mysql_query($sql2);
**mysql_free_result($result2);**
}
}
Thanks.
Quoting the documentation of mysql_free_result :
mysql_free_result() only needs to be
called if you are concerned about how
much memory is being used for queries
that return large result sets. All
associated result memory is
automatically freed at the end of the
script's execution.
So, if the documentation says it's generally not necessary to call that function, I would say it's not really necessary, nor good practice, to call it ;-)
And, just to say : I almost never call that function myself ; memory is freed at the end of the script, and each script should not eat too much memory.
An exception could be long-running batches that have to deal with large amounts of data, though...
Yes, it is good practice to use mysql_free_result($result). The quoted documentation in the accepted answer is inaccurate. That is what the documentation says, but that doesn't make any sense. Here is what it says:
mysql_free_result() only needs to be called if you are concerned about how much memory is being used for queries that return large result sets. All associated result memory is automatically freed at the end of the script's execution.
The first part of the first sentence is correct. It is true that you don't need to use it for reasons other than memory concerns. Memory concerns are the only reason to use it. However, the second part of the first sentence doesn't make any sense. The claim is that you would only be concerned about memory for queries that return large result sets. This is very misleading as there are other common scenarios where memory is a concern and calling mysql_free_result() is very good practice. Any time queries may be run an unknown number of times, more and more memory will be used up if you don't call mysql_free_result(). So if you run your query in a loop, or from a function or method, it is usually a good idea to call mysql_free_result(). You just have to be careful not to free the result until after it will not be used anymore. You can shield yourself from having to think about when and how to use it by making your own select() and ex() functions so you are not working directly with result sets. (None of the code here is exactly the way I would actually write it, it is more illustrative. You may want to put these in a class or special namespace, and throw a different Exception type, or take additional parameters like $class_name, etc.)
// call this for select queries that do not modify anything
function select($sql) {
$array= array();
$rs= query($sql);
while($o= mysql_fetch_object($rs))
$array[]= $o;
mysql_free_result($rs);
return $array;
}
// call this for queries that modify data
function ex($sql) {
query($sql);
return mysql_affected_rows();
}
function query($sql) {
$rs= mysql_query($sql);
if($rs === false) {
throw new Exception("MySQL query error - SQL: \"$sql\" - Error Number: "
.mysql_errno()." - Error Message: ".mysql_error());
}
return $rs;
}
Now if you only call select() and ex(), you are just dealing with normal object variables and only normal memory concerns instead of manual memory management. You still have to think about normal memory concerns like how much memory is in use by the array of objects. After the variable goes out of scope, or you manually set it to null, it become available for garbage collection so PHP takes care of that for you. You may still want to set it to null before it goes out of scope if your code does not use it anymore and there are operations following it that use up an unknown amount of memory such as loops and other function calls. I don't know how result sets and functions operating on them are implemented under the hood (and even if I did, this could change with different/future versions of PHP and MySQL), so there is the possibility that the select() function approximately doubles the amount of memory used just before mysql_free_result($rs) is called. However using select() still eliminates what us usually the primary concern of more and more memory being used during loops and various function calls. If you are concerned about this potential for double memory usage, and you are only working with one row at a time over a single iteration, you can make an each() function that will not double your memory usage, and will still shield you from thinking about mysql_free_result():
each($sql,$fun) {
$rs= query($sql);
while($o= mysql_fetch_object($rs))
$fun($o);
mysql_free_result($rs);
}
You can use it like this:
each("SELECT * FROM users", function($user) {
echo $user->username."<BR>";
});
Another advantage of using each() is that it does not return anything, so you don't have to think about whether or not to set the return value to null later.
The answer is of course YES in mysqli.
Take a look at PHP mysqli_free_result documentation:
You should always free your result with mysqli_free_result(), when your result object is not needed anymore.
I used to test it with memory_get_usage function:
echo '<br>before mysqli free result: '.memory_get_usage();
mysqli_free_result($query[1]);
echo '<br>after mysqli free result'.memory_get_usage();
And it is the result:
before mysqli free result:2110088
after mysqli free result:1958744
And here, we are talking about 151,344 bytes of memory in only 1000 rows of mysql table.
How about a million rows and how is it to think about large projects?
mysqli_free_result() is not only for large amount of data, it is also a good practice for small projects.
It depends on how large your queries are or how many queries you run.
PHP frees the memory at the end of the script(s) automatically, but not during the run. So if you have a large amount of data comming from a query, better free the result manually.
I would say: YES, it is good practice because you care about memory during the development or your scripts and that is what makes a good developer :-)

php mysql 2 different ways

I recently ran across some code which the person did the first one. Would like some thoughts on if the top one is better or why a person would write it that way? Any positive reasons over the bottom way.
$result = mysql_query($query)or die("Obtaining location data failed!");
for ($i = mysql_num_rows($result) - 1; $i >=0; $i--)
{
if (!mysql_data_seek($result, $i))
{
echo "Cannot seek to row $i\n";
continue;
}
if(!($row = mysql_fetch_object($result)))
continue;
echo $row->locationname;
}
mysql_free_result($result);
vs
$result = mysql_query($query) or die("Obtaining location data failed!");
while($row = mysql_fetch_object($result)){
echo $row->locationname;
unset($row);
}
mysql_free_result($result);
It looks like the top code is iterating through the mysql result backwards, where the first one is going through it forwards.
The second code example looks cleaner, and there is probably a way to adjust the query to get the results in reverse order in the first place, instead of the somewhat convoluted way the top loop was performed.
Those two are not equivalent since only the first processes the result set in reverse order.
I'd do that with an ORDER BY x DESC clause if only to keep the code simple. When using mysql_query() the complete result set is transferred from the MySQL server to the php process before the function returns and mysql_data_seek() is only moving some pointer within the process' memory, so performace-wise it shouldn't matter much. But if you at some point decide to use an unbuffered query instead it might very well affect the performance.
Definitely the second one :
less code = less code to maintain =~ maybe less bugs !!
The top one has definite advantages when it comes to job security and 'lines of code' performance metrics. Apart from that there is no good reason to do what they did.

Categories