When I made a query to the database and retrieve the results in mysqli_result, the memory usage is extremely small. However, when I fetch all the rows in the query results in to an associative array, the memory usage becomes extremely high.
<?php
require_once("../config.php"); //db connection config
$db = new mysqli(DB_HOST,DB_USER,DB_PASSWORD,DB_DBASE);
$query ="select * from table_name";
if($r = $db->query($query)){
echo "MEMORY USAGE before : ". memory_get_usage()."<br><br>";
$rows = array();
while($row = $r->fetch_assoc()){
$rows[]= $row;
}
echo "MEMORY USAGE after : ". memory_get_usage()."<br><br>";
//before: 660880
//after: 114655768
// # of records: around 30 thousands
?>
It makes sense to me that storing this many results is very memory consuming, but I'm just wondering how come mysqli_result is so small. It can't be that the results are queried to the dbase every time fetch_assoc is called. So then where are the results stored in the memory.
There is a HUGE difference between fetching results and storing a pointer to a resource.
If you echo $r; before your first call to memory_get_usage();, you will realize it is just a pointer. This is the pointer to your result set. Until you fetch your results, the result set will not actually be stored into memory.
I would suggest that you run fetchAll() for what you are trying to do. This will then result in 1 method accessing all your results with better performance since it's pawned off on the mysqli extension (C Library) rather than a loop in PHP.
You can also use the free results function to clear your results from memory when you are done with them. This is like closing a cursor in Java if you are familiar.
I think you should to this instead:
while($row = $r->fetch_assoc()){
//Do whatever you need with the record, then:
unset($row);
}
The way you posted is gathering a huge array in $rows, and memory usage reflects that.
Related
I've found great example written in php pdo, which helps to iterate huge amount of data without actually allocating memory for whole set of results:
$sql = 'SELECT * from playlists limit 50000';
$statement = $pdo->prepare($sql);
$statement->execute();
while (($result = $statement->fetch(PDO::FETCH_ASSOC)) !== false) {
//do something
}
I've done an investigation and this approach uses 18mb of memory.
If I fetch all results like this $results = $statement->fetchAll(PDO::FETCH_ASSOC); memory usage upraises to 35mb.
Using laravel's illuminate/database component and very similar approach DB::table('playlists')->limit(50000)->get(); also uses 35mb of memory.
How can I achieve first approach using Laravel's eloquent or DB facade?
Could you suggest some articles how this difference in memory usage develops?
Thanks
When you execute an SQL query with php (either mysql functions or PDO) all data returned from query loads in to memory as a "result set".
In order to use data in "result set" you have to fetch them in regular php arrays/objects.
PDOStatement::fetch - fetches one row from the result set in to memory.
PDOStatement::fetchAll - fetches all rows from result set to memory thus doubling the memory usage.
Eloquent has ability to chunk result sets. This is equivalent to performing "X times fetch" in PDO.
However, if you are working with very large result sets consider using SQL limits.
The Laravel approach to processing large data sets like this is to use chunking.
DB::table('playlists')->chunk(1000, function($playlists) use($count) {
foreach($playlists as $playlist) {
// do something with this playlist
}
});
This ensures that no more than the chunk size (in my example, 1000 rows) is loaded into RAM at once. 1k is arbitrary; you could chunk 1, 100, 253, etc.
Problem:
I have a query that returns a large result set. It is too large to bring into PHP. I get a fatal memory max error and cannot increase memory limit. Unbuffered Queries
I need to iterate over the array multiple times but mysqli_data_seek doesn't work on unbuffered queries. mysqli_result::data_seek
//I have a buffered result set
$bresult = $mysql->query("SELECT * FROM Small_Table");
//And a very large unbuffered result set
$uresult = $mysqli->query("SELECT * FROM Big_Table", MYSQLI_USE_RESULT);
//The join to combine them takes too long and is too large
//The result set returned by the unbuffered query is too large itself to store in PHP
//There are too many rows in $bresult to re-execute the query or even a subset of it for each one
foreach($bresult as &$row) {
//My solution was to search $uresult foreach row in $bresult to get the values I need
$row['X'] = searchResult($uresult, $row['Key']);
//PROBLEM: After the first search, $uresult is at its and and cannot be reset with mysqli_result::data_seek
}
function searchResult($uresult, $val)
while($row = $uresult->fetch_assoc()){
if($row['X'] == $val) {
return $row['X'];
}
}
}
If you have another solution that meets these requirements I will accept it:
- Does not try to join the result in a single query (takes too long)
- Does not run any query for each result in another query (too many queries, takes too long, slows down system)
Please leave a comment if you need more info.
Thank you.
If you're trying to process a big data set have you considered using an intermediary like Hadoop? you can set up a small hadoop cluster, do your processing, then have your php code make a request for the processed data to the hadoop output.
I have build a system using PHP-MySQL. This system is subject to a very heavy load, with thousands of selects,updates,inserts,deletes every minute.
I would like to optimize this system, to make it faster, and reduce load on the servers.
I have already introduced memcache, but mysql data is still needed.
So my question is, which method would be the best in this case.
Currently my queries would look like this:
$q = mysql_query($sql);
while(mysql_fetch_array($q)) {...
I have read that there is a little speed to gain by using mysql_fetch_assoc (?)
But perhaps there is an antirely different approach, when i start optimizing this system?
Thank you all - (Apologies for my limited english skills)
mysql_fetch_assoc vs mysql_fetch_array will duplicate less data thus use less memory. Since the data is presented associative and by index in the array, with that you will get some tiny optimization although will help if your dataset is big.
Try to use natural sort (AKA avoid SORT in query sentences) and LIMIT your result set if you can
Batch queries: instead run 100 inserts over the same table try to do a few of them small.
cache cache cache if you can: using redis or memcached.
if you generate pages that can be treated as static try to use HTTP headers to avoid browsers to request your site all the time
etc. etc.
I would recommend you to use the mysql keyword LIMIT to limit the result set.
Adding pagination to the mysql returning resultset will make your application lighter, the ui will load faster because of less rows to fetch and the mysql server will only receive the select queries when needed.
Basically this is the syntax of how to use limit.
SELECT * FROM Person LIMIT X,Y
Where X is the total row count to be retrieved and Y the offset.
Example:
SELECT * FROM Person LIMIT 10, 0
This query will return the first ten rows of the table Person, and:
SELECT * FROM Person LIMIT 10, 10
Will display the next 10
I've been doing some timing tests on various methods of getting information out of MySQL in PHP. The goal was to find the fastest way of transferring a column of data into a simple array. I've tested it against the enSEMBL database, which is usefully huge.
The following code was common for methods 1 to 8 (9 used GROUP_CONCAT & 10 used PDO):
$query = "SELECT DISTINCT `name` FROM species LIMIT 5000";
$result = $mysqli->query($query);
*Method code*
print_r(array_filter($species));
Method 1: Textbook method
while ($row = $result->fetch_row()) {
$species[] = $row[0];
}
Method 2: while and reset (NB some IDEs detect an error here)
while ($species[] = reset($result->fetch_row())) ;
Method 3: foreach and reset
foreach ($result->fetch_all() as $value) $species[] = reset($value);
Method 4: while, foreach and reset
while ($species[] = $result->fetch_row()) ;
foreach ($species as $key => $value) $species[$key] = reset($value);
Method 5: while and index
while ($row = $result->fetch_row()) $species[] = $row[0];
Method 6: foreach and index
foreach ($result->fetch_all() as $value) $species[] = $value[0];
Method 7: recurse the array
$species = call_user_func_array('array_merge', $result->fetch_all());
Method 8: array_column
$species = array_column($result->fetch_all(), 0);
Method 9: Using GROUP_CONCAT in query.
$species = explode(',', $result->fetch_row()[0]);
Method 10: PDO
$species = $sth->fetchAll(PDO::FETCH_COLUMN, 0);
Surprisingly Method 1 (Textbook) was consistently about 4 times longer than the practically identical Method 5, but took about the same time as Method 10 (PDO).
Method 2 was consistently the slowest method at 50x longer, presumably because the system is writing warnings somewhere.
Method 4 (two loops) was the second slowest, taking 10x longer.
As stated Methods 1(textbook) & 10 (PDO) were third.
Method 9 was fourth slowest (2x longer, and had the disadvantage of hitting the GROUP_CONCAT limit without any warning).
The fastest method, however, wasn't consistent. Take your pick from 3, 5, 6, 7 & 8.
Method 8 (array_column) was often the fastest way to do this, but not always. However I think it's the most elegant method and provides slightly more flexibility as it can return an associative array using any two columns selected by your query (but don't mess with the order in the query!)
I'm having a strange time dealing with selecting from a table with about 30,000 rows.
It seems my script is using an outrageous amount of memory for what is a simple, forward only walk over a query result.
Please note that this example is a somewhat contrived, absolute bare minimum example which bears very little resemblance to the real code and it cannot be replaced with a simple database aggregation. It is intended to illustrate the point that each row does not need to be retained on each iteration.
<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$stmt = $pdo->prepare('SELECT * FROM round');
$stmt->execute();
function do_stuff($row) {}
$c = 0;
while ($row = $stmt->fetch()) {
// do something with the object that doesn't involve keeping
// it around and can't be done in SQL
do_stuff($row);
$row = null;
++$c;
}
var_dump($c);
var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());
This outputs:
int(39508)
int(43005064)
int(43018120)
I don't understand why 40 meg of memory is used when hardly any data needs to be held at any one time. I have already worked out I can reduce the memory by a factor of about 6 by replacing "SELECT *" with "SELECT home, away", however I consider even this usage to be insanely high and the table is only going to get bigger.
Is there a setting I'm missing, or is there some limitation in PDO that I should be aware of? I'm happy to get rid of PDO in favour of mysqli if it can not support this, so if that's my only option, how would I perform this using mysqli instead?
After creating the connection, you need to set PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false:
<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
// snip
var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());
This outputs:
int(39508)
int(653920)
int(668136)
Regardless of the result size, the memory usage remains pretty much static.
Another option would be to do something like:
$i = $c = 0;
$query = 'SELECT home, away FROM round LIMIT 2048 OFFSET %u;';
while ($c += count($rows = codeThatFetches(sprintf($query, $i++ * 2048))) > 0)
{
foreach ($rows as $row)
{
do_stuff($row);
}
}
The whole result set (all 30,000 rows) is buffered into memory before you can start looking at it.
You should be letting the database do the aggregation and only asking it for the two numbers you need.
SELECT SUM(home) AS home, SUM(away) AS away, COUNT(*) AS c FROM round
The reality of the situation is that if you fetch all rows and expect to be able to iterate over all of them in PHP, at once, they will exist in memory.
If you really don't think using SQL powered expressions and aggregation is the solution you could consider limiting/chunking your data processing. Instead of fetching all rows at once do something like:
1) Fetch 5,000 rows
2) Aggregate/Calculate intermediary results
3) unset variables to free memory
4) Back to step 1 (fetch next set of rows)
Just an idea...
I haven't done this before in PHP, but you may consider fetching the rows using a scrollable cursor - see the fetch documentation for an example.
Instead of returning all the results of your query at once back to your PHP script, it holds the results on the server side and you use a cursor to iterate through them getting one at a time.
Whilst I have not tested this, it is bound to have other drawbacks such as utilising more server resources and most likely reduced performance due to additional communication with the server.
Altering the fetch style may also have an impact as by default the documentation indicates it will store both an associative array and well as a numerical indexed array which is bound to increase memory usage.
As others have suggested, reducing the number of results in the first place is most likely a better option if possible.
Database structure:
id galleryId type file_name description
1 `artists_2010-01-15_7c1ec` `image` `band602.jpg` `Red Umbrella Promo`
2 `artists_2010-01-15_7c1ec` `image` `nov7.jpg` `CD Release Party`
3 `artists_2010-01-15_7c1ec` `video` `band.flv` `Presskit`
I'm going to pull images out for one section of an application, videos on another, etc. Is it better to make multiple mysql queries for each section like so:
$query = mysql_query("SELECT * FROM galleries WHERE galleryId='$galleryId' && type='image');
...Or should I be building an associative array and just looping through the array over and over whenever I need to use the result set?
Thanks for the thoughts.
It depends what's more important: readability or performance. I'd expect a single query and prefilling PHP arrays would be faster to execute, since database connections are expensive, but then a simple query for each section is much more readable.
Unless you know (and not just hope) you're going to get a huge amount of traffic I'd go for separate queries and then worry about optimising if it looks like it'll be a problem. At that point there'll be other things you'll want to do anyway, such as building a data access layer and adding some caching.
If by "sections" you mean separate single pages (separate HTTP requests) that users can view, I would suggest query-per-type as needed. If on a page where there are only image data sets, you really don't need to fetch the video data set for example. You won't be really saving much time fetching everything, since you will be connecting to the database for every page hit anyway (I assume.)
If by "sections" you mean different parts of one page, then fetch everything at once. This will save you time on querying (only one query.)
But depending on the size of your data set, you could run into trouble with PHP's memory limit querying for everything, though. You could then try raising the memory limit, but if that fails you'll probably have to fall back to query-per-type.
Using the query-per-type approach moves some of the computing load to the database server, as you will only be requesting and fetching what you really need. And you don't have to write code to filter and sort your results. Filtering and sorting is something the database is generally better at than PHP code. If at all possible, enable MySQL's query cache, that will speed up these queries much more than anything you could write in PHP.
If your data is all coming from one table, I would only do one query.
I presume you are building a single page with a section for pictures, a section for video, a section for music, etc. Write your query return results sorted by media type - iterate through all the pictures, then all the video, then all the music.
Better to have multiple queries. Every time you run a query all the data is getting pulled out and loaded into memory. If you have 5 different types, it means each page of that type is loading 5 times as much data as it needs to do.
Even with just one at a time, you are probably going to want to start paginating with LIMIT/OFFSET queries fairly quickly if you have more than 100 or however many you can reasonably display on one page at a time.
It really depends,
IN operator
ini_set('memory_limit', '-1');
$startMemory = memory_get_usage();
$conn = mysqli_connect("localhost", "", "", "");
$ar = array();
$sql = "SELECT * FROM table WHERE e IN (.....)";
$result = mysqli_query($conn, $sql);
while ($row = mysqli_fetch_assoc($result)) {
$ar[$row['c']] = $row;
}
echo (memory_get_usage() - $startMemory) / 1024 / 1024, ' MB'; //1409.7124481201
$end_time = microtime(true);
echo ($end_time - $start_time) . ' Seconds'; //5.2406549453735 Seconds
Foreach
ini_set('memory_limit', '-1');
$startMemory = memory_get_usage();
$conn = mysqli_connect("localhost", "", "", "");
$ar = array();
$array_loop = array(....)
foreach($array_loop as $key => $value){
$sql = "SELECT * FROM table WHERE e = '$value'";
$result = mysqli_query($conn, $sql);
while ($row = mysqli_fetch_assoc($result)) {
$ar[$row['c']] = $row;
}
}
echo (memory_get_usage() - $startMemory) / 1024 / 1024, ' MB'; //42.773330688477 MB
$end_time = microtime(true);
echo ($end_time - $start_time) . ' Seconds'; //12.469061136246 Seconds
I noticed that foreach consumes time but not memory and IN operator consumes memory but not time. All the test done based on test data generated by sql procudre about 1 Million