PDO/MySQL memory consumption with large result set

PDO/MySQL memory consumption with large result set - php

I'm having a strange time dealing with selecting from a table with about 30,000 rows.
It seems my script is using an outrageous amount of memory for what is a simple, forward only walk over a query result.
Please note that this example is a somewhat contrived, absolute bare minimum example which bears very little resemblance to the real code and it cannot be replaced with a simple database aggregation. It is intended to illustrate the point that each row does not need to be retained on each iteration.
<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$stmt = $pdo->prepare('SELECT * FROM round');
$stmt->execute();
function do_stuff($row) {}
$c = 0;
while ($row = $stmt->fetch()) {
// do something with the object that doesn't involve keeping
// it around and can't be done in SQL
do_stuff($row);
$row = null;
++$c;
}
var_dump($c);
var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());
This outputs:
int(39508)
int(43005064)
int(43018120)
I don't understand why 40 meg of memory is used when hardly any data needs to be held at any one time. I have already worked out I can reduce the memory by a factor of about 6 by replacing "SELECT *" with "SELECT home, away", however I consider even this usage to be insanely high and the table is only going to get bigger.
Is there a setting I'm missing, or is there some limitation in PDO that I should be aware of? I'm happy to get rid of PDO in favour of mysqli if it can not support this, so if that's my only option, how would I perform this using mysqli instead?

After creating the connection, you need to set PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false:
<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
// snip
var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());
This outputs:
int(39508)
int(653920)
int(668136)
Regardless of the result size, the memory usage remains pretty much static.

Another option would be to do something like:
$i = $c = 0;
$query = 'SELECT home, away FROM round LIMIT 2048 OFFSET %u;';
while ($c += count($rows = codeThatFetches(sprintf($query, $i++ * 2048))) > 0)
{
foreach ($rows as $row)
{
do_stuff($row);
}
}

The whole result set (all 30,000 rows) is buffered into memory before you can start looking at it.
You should be letting the database do the aggregation and only asking it for the two numbers you need.
SELECT SUM(home) AS home, SUM(away) AS away, COUNT(*) AS c FROM round

The reality of the situation is that if you fetch all rows and expect to be able to iterate over all of them in PHP, at once, they will exist in memory.
If you really don't think using SQL powered expressions and aggregation is the solution you could consider limiting/chunking your data processing. Instead of fetching all rows at once do something like:
1) Fetch 5,000 rows
2) Aggregate/Calculate intermediary results
3) unset variables to free memory
4) Back to step 1 (fetch next set of rows)
Just an idea...

I haven't done this before in PHP, but you may consider fetching the rows using a scrollable cursor - see the fetch documentation for an example.
Instead of returning all the results of your query at once back to your PHP script, it holds the results on the server side and you use a cursor to iterate through them getting one at a time.
Whilst I have not tested this, it is bound to have other drawbacks such as utilising more server resources and most likely reduced performance due to additional communication with the server.
Altering the fetch style may also have an impact as by default the documentation indicates it will store both an associative array and well as a numerical indexed array which is bound to increase memory usage.
As others have suggested, reducing the number of results in the first place is most likely a better option if possible.

Related

Laravel dabatabse facade memory usage

I've found great example written in php pdo, which helps to iterate huge amount of data without actually allocating memory for whole set of results:
$sql = 'SELECT * from playlists limit 50000';
$statement = $pdo->prepare($sql);
$statement->execute();
while (($result = $statement->fetch(PDO::FETCH_ASSOC)) !== false) {
//do something
}
I've done an investigation and this approach uses 18mb of memory.
If I fetch all results like this $results = $statement->fetchAll(PDO::FETCH_ASSOC); memory usage upraises to 35mb.
Using laravel's illuminate/database component and very similar approach DB::table('playlists')->limit(50000)->get(); also uses 35mb of memory.
How can I achieve first approach using Laravel's eloquent or DB facade?
Could you suggest some articles how this difference in memory usage develops?
Thanks

When you execute an SQL query with php (either mysql functions or PDO) all data returned from query loads in to memory as a "result set".
In order to use data in "result set" you have to fetch them in regular php arrays/objects.
PDOStatement::fetch - fetches one row from the result set in to memory.
PDOStatement::fetchAll - fetches all rows from result set to memory thus doubling the memory usage.
Eloquent has ability to chunk result sets. This is equivalent to performing "X times fetch" in PDO.
However, if you are working with very large result sets consider using SQL limits.

The Laravel approach to processing large data sets like this is to use chunking.
DB::table('playlists')->chunk(1000, function($playlists) use($count) {
foreach($playlists as $playlist) {
// do something with this playlist
}
});
This ensures that no more than the chunk size (in my example, 1000 rows) is loaded into RAM at once. 1k is arbitrary; you could chunk 1, 100, 253, etc.

JSON from SQL Table get double result

first of all i have to tell you that it is my first step on php and JSON.
I decided to use JSON to get value from a customer SQL Table.
I get my results using this script :
mysql_connect($config['mysql_host'],$config['mysql_user'],$config['mysql_pass']);
//select database
#mysql_select_db($config['db_name']) or die( "Unable to select database");
mysql_query('SET CHARACTER SET utf8');
$fet=mysql_query('select * from vehicule');
$json = array();
while($r=mysql_fetch_array($fet)){
$json[] = $r;
}
header('Content-Type: application/json');
echo $json_data=json_encode($json);
Everything is ok, exept that my JSON results looks like :
0 = 462;
1 = "Hyundai ix20 crdi 115 panoramic sunsation";
10 = 1346450400;
11 = "462-Hyundai-ix20-crdi-115-panoramic-sunsation";
12 = 462;
...
id = 462;
kilometrage = 14400;
marque = 4;
modele = 137;
motorisation = 2;
ordre = 462;
prix = 17500;
puissance = 6;
titre = "Hyundai ix20 crdi 115 panoramic sunsation";
url = "462-Hyundai-ix20-crdi-115-panoramic-sunsation";
...
I have result of the table in 2 versions : one with 0:value, 1:value, 2... and the other one using the table key, how can i print only the second one ?
By the way can someone give me link so i can know by what i have to replace mysql which is think out of date ? (i'm a beginner few hours using PHP)
Thank you very much !

You have two different issues happening here. One is outright causing the issue you are seeing, and the other is a bad practice mistake that will leave you wide open for trouble in the long run.
The first issue is the one you're asking about. The mysql_fetch_array function (see the Docs here) expects a minimum of one input (the result input) that you are providing. It also has a second, optional input. That optional input defaults to MYSQL_BOTH, which returns an associative array with the results available both through keys (column names) and their indexes. Which is to say, that if you select the column 'id', you get it's value in both $array[0] and $array['id']. It's duplicated, and thus the JSON process carries over the duplication. You need to provide a second value to the function, either MYSQL_ASSOC to get $array['id'] or MYSQL_NUM to get $array[0].
Your second issue is the choice of functions. You're using the 'raw' mysql functions. These have been depreciated, which is a technical term that means 'these functions are no longer supported, but we've left them in to give you time to fix legacy code'. For legacy, read 'old'. Those functions will be going away soon, and you need to upgrade to a better option -- either the mysqli functions, or the PDO class. I strongly recommend the PDO class, as once you learn it it's easy to learn and has the advantage of being more portable. Whichever set you go with, you need to learn to use prepared statements as both a performance and security issue. Right at the moment, you're working with 'raw' statements which have a history of being very easy to interfere with via what's called an 'injection attack'. You can see a fictionalized example of such an attack here, and there are plenty of articles online about it. These attacks can be incredibly complex and difficult to fight, so using prepared statements (which handle it for you), is strongly recommended. In the specific example you're using here, you don't need to worry about it because you aren't including any user inputs, but it's an important habit to get into.

correct way to retrieve mysql data on heavy load

I have build a system using PHP-MySQL. This system is subject to a very heavy load, with thousands of selects,updates,inserts,deletes every minute.
I would like to optimize this system, to make it faster, and reduce load on the servers.
I have already introduced memcache, but mysql data is still needed.
So my question is, which method would be the best in this case.
Currently my queries would look like this:
$q = mysql_query($sql);
while(mysql_fetch_array($q)) {...
I have read that there is a little speed to gain by using mysql_fetch_assoc (?)
But perhaps there is an antirely different approach, when i start optimizing this system?
Thank you all - (Apologies for my limited english skills)

mysql_fetch_assoc vs mysql_fetch_array will duplicate less data thus use less memory. Since the data is presented associative and by index in the array, with that you will get some tiny optimization although will help if your dataset is big.
Try to use natural sort (AKA avoid SORT in query sentences) and LIMIT your result set if you can
Batch queries: instead run 100 inserts over the same table try to do a few of them small.
cache cache cache if you can: using redis or memcached.
if you generate pages that can be treated as static try to use HTTP headers to avoid browsers to request your site all the time
etc. etc.

I would recommend you to use the mysql keyword LIMIT to limit the result set.
Adding pagination to the mysql returning resultset will make your application lighter, the ui will load faster because of less rows to fetch and the mysql server will only receive the select queries when needed.
Basically this is the syntax of how to use limit.
SELECT * FROM Person LIMIT X,Y
Where X is the total row count to be retrieved and Y the offset.
Example:
SELECT * FROM Person LIMIT 10, 0
This query will return the first ten rows of the table Person, and:
SELECT * FROM Person LIMIT 10, 10
Will display the next 10

I've been doing some timing tests on various methods of getting information out of MySQL in PHP. The goal was to find the fastest way of transferring a column of data into a simple array. I've tested it against the enSEMBL database, which is usefully huge.
The following code was common for methods 1 to 8 (9 used GROUP_CONCAT & 10 used PDO):
$query = "SELECT DISTINCT `name` FROM species LIMIT 5000";
$result = $mysqli->query($query);
*Method code*
print_r(array_filter($species));
Method 1: Textbook method
while ($row = $result->fetch_row()) {
$species[] = $row[0];
}
Method 2: while and reset (NB some IDEs detect an error here)
while ($species[] = reset($result->fetch_row())) ;
Method 3: foreach and reset
foreach ($result->fetch_all() as $value) $species[] = reset($value);
Method 4: while, foreach and reset
while ($species[] = $result->fetch_row()) ;
foreach ($species as $key => $value) $species[$key] = reset($value);
Method 5: while and index
while ($row = $result->fetch_row()) $species[] = $row[0];
Method 6: foreach and index
foreach ($result->fetch_all() as $value) $species[] = $value[0];
Method 7: recurse the array
$species = call_user_func_array('array_merge', $result->fetch_all());
Method 8: array_column
$species = array_column($result->fetch_all(), 0);
Method 9: Using GROUP_CONCAT in query.
$species = explode(',', $result->fetch_row()[0]);
Method 10: PDO
$species = $sth->fetchAll(PDO::FETCH_COLUMN, 0);
Surprisingly Method 1 (Textbook) was consistently about 4 times longer than the practically identical Method 5, but took about the same time as Method 10 (PDO).
Method 2 was consistently the slowest method at 50x longer, presumably because the system is writing warnings somewhere.
Method 4 (two loops) was the second slowest, taking 10x longer.
As stated Methods 1(textbook) & 10 (PDO) were third.
Method 9 was fourth slowest (2x longer, and had the disadvantage of hitting the GROUP_CONCAT limit without any warning).
The fastest method, however, wasn't consistent. Take your pick from 3, 5, 6, 7 & 8.
Method 8 (array_column) was often the fastest way to do this, but not always. However I think it's the most elegant method and provides slightly more flexibility as it can return an associative array using any two columns selected by your query (but don't mess with the order in the query!)

PHP: how are query results stored in mysqli_result

When I made a query to the database and retrieve the results in mysqli_result, the memory usage is extremely small. However, when I fetch all the rows in the query results in to an associative array, the memory usage becomes extremely high.
<?php
require_once("../config.php"); //db connection config
$db = new mysqli(DB_HOST,DB_USER,DB_PASSWORD,DB_DBASE);
$query ="select * from table_name";
if($r = $db->query($query)){
echo "MEMORY USAGE before : ". memory_get_usage()."<br><br>";
$rows = array();
while($row = $r->fetch_assoc()){
$rows[]= $row;
}
echo "MEMORY USAGE after : ". memory_get_usage()."<br><br>";
//before: 660880
//after: 114655768
// # of records: around 30 thousands
?>
It makes sense to me that storing this many results is very memory consuming, but I'm just wondering how come mysqli_result is so small. It can't be that the results are queried to the dbase every time fetch_assoc is called. So then where are the results stored in the memory.

There is a HUGE difference between fetching results and storing a pointer to a resource.
If you echo $r; before your first call to memory_get_usage();, you will realize it is just a pointer. This is the pointer to your result set. Until you fetch your results, the result set will not actually be stored into memory.
I would suggest that you run fetchAll() for what you are trying to do. This will then result in 1 method accessing all your results with better performance since it's pawned off on the mysqli extension (C Library) rather than a loop in PHP.
You can also use the free results function to clear your results from memory when you are done with them. This is like closing a cursor in Java if you are familiar.

I think you should to this instead:
while($row = $r->fetch_assoc()){
//Do whatever you need with the record, then:
unset($row);
}
The way you posted is gathering a huge array in $rows, and memory usage reflects that.

Multiple MYSQL queries vs. Multiple php foreach loops

Database structure:
id galleryId type file_name description
1 `artists_2010-01-15_7c1ec` `image` `band602.jpg` `Red Umbrella Promo`
2 `artists_2010-01-15_7c1ec` `image` `nov7.jpg` `CD Release Party`
3 `artists_2010-01-15_7c1ec` `video` `band.flv` `Presskit`
I'm going to pull images out for one section of an application, videos on another, etc. Is it better to make multiple mysql queries for each section like so:
$query = mysql_query("SELECT * FROM galleries WHERE galleryId='$galleryId' && type='image');
...Or should I be building an associative array and just looping through the array over and over whenever I need to use the result set?
Thanks for the thoughts.

It depends what's more important: readability or performance. I'd expect a single query and prefilling PHP arrays would be faster to execute, since database connections are expensive, but then a simple query for each section is much more readable.
Unless you know (and not just hope) you're going to get a huge amount of traffic I'd go for separate queries and then worry about optimising if it looks like it'll be a problem. At that point there'll be other things you'll want to do anyway, such as building a data access layer and adding some caching.

If by "sections" you mean separate single pages (separate HTTP requests) that users can view, I would suggest query-per-type as needed. If on a page where there are only image data sets, you really don't need to fetch the video data set for example. You won't be really saving much time fetching everything, since you will be connecting to the database for every page hit anyway (I assume.)
If by "sections" you mean different parts of one page, then fetch everything at once. This will save you time on querying (only one query.)
But depending on the size of your data set, you could run into trouble with PHP's memory limit querying for everything, though. You could then try raising the memory limit, but if that fails you'll probably have to fall back to query-per-type.
Using the query-per-type approach moves some of the computing load to the database server, as you will only be requesting and fetching what you really need. And you don't have to write code to filter and sort your results. Filtering and sorting is something the database is generally better at than PHP code. If at all possible, enable MySQL's query cache, that will speed up these queries much more than anything you could write in PHP.

If your data is all coming from one table, I would only do one query.
I presume you are building a single page with a section for pictures, a section for video, a section for music, etc. Write your query return results sorted by media type - iterate through all the pictures, then all the video, then all the music.

Better to have multiple queries. Every time you run a query all the data is getting pulled out and loaded into memory. If you have 5 different types, it means each page of that type is loading 5 times as much data as it needs to do.
Even with just one at a time, you are probably going to want to start paginating with LIMIT/OFFSET queries fairly quickly if you have more than 100 or however many you can reasonably display on one page at a time.

It really depends,
IN operator
ini_set('memory_limit', '-1');
$startMemory = memory_get_usage();
$conn = mysqli_connect("localhost", "", "", "");
$ar = array();
$sql = "SELECT * FROM table WHERE e IN (.....)";
$result = mysqli_query($conn, $sql);
while ($row = mysqli_fetch_assoc($result)) {
$ar[$row['c']] = $row;
}
echo (memory_get_usage() - $startMemory) / 1024 / 1024, ' MB'; //1409.7124481201
$end_time = microtime(true);
echo ($end_time - $start_time) . ' Seconds'; //5.2406549453735 Seconds
Foreach
ini_set('memory_limit', '-1');
$startMemory = memory_get_usage();
$conn = mysqli_connect("localhost", "", "", "");
$ar = array();
$array_loop = array(....)
foreach($array_loop as $key => $value){
$sql = "SELECT * FROM table WHERE e = '$value'";
$result = mysqli_query($conn, $sql);
while ($row = mysqli_fetch_assoc($result)) {
$ar[$row['c']] = $row;
}
}
echo (memory_get_usage() - $startMemory) / 1024 / 1024, ' MB'; //42.773330688477 MB
$end_time = microtime(true);
echo ($end_time - $start_time) . ' Seconds'; //12.469061136246 Seconds
I noticed that foreach consumes time but not memory and IN operator consumes memory but not time. All the test done based on test data generated by sql procudre about 1 Million

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.