correct way to retrieve mysql data on heavy load - php

I have build a system using PHP-MySQL. This system is subject to a very heavy load, with thousands of selects,updates,inserts,deletes every minute.
I would like to optimize this system, to make it faster, and reduce load on the servers.
I have already introduced memcache, but mysql data is still needed.
So my question is, which method would be the best in this case.
Currently my queries would look like this:
$q = mysql_query($sql);
while(mysql_fetch_array($q)) {...
I have read that there is a little speed to gain by using mysql_fetch_assoc (?)
But perhaps there is an antirely different approach, when i start optimizing this system?
Thank you all - (Apologies for my limited english skills)

mysql_fetch_assoc vs mysql_fetch_array will duplicate less data thus use less memory. Since the data is presented associative and by index in the array, with that you will get some tiny optimization although will help if your dataset is big.
Try to use natural sort (AKA avoid SORT in query sentences) and LIMIT your result set if you can
Batch queries: instead run 100 inserts over the same table try to do a few of them small.
cache cache cache if you can: using redis or memcached.
if you generate pages that can be treated as static try to use HTTP headers to avoid browsers to request your site all the time
etc. etc.

I would recommend you to use the mysql keyword LIMIT to limit the result set.
Adding pagination to the mysql returning resultset will make your application lighter, the ui will load faster because of less rows to fetch and the mysql server will only receive the select queries when needed.
Basically this is the syntax of how to use limit.
SELECT * FROM Person LIMIT X,Y
Where X is the total row count to be retrieved and Y the offset.
Example:
SELECT * FROM Person LIMIT 10, 0
This query will return the first ten rows of the table Person, and:
SELECT * FROM Person LIMIT 10, 10
Will display the next 10

I've been doing some timing tests on various methods of getting information out of MySQL in PHP. The goal was to find the fastest way of transferring a column of data into a simple array. I've tested it against the enSEMBL database, which is usefully huge.
The following code was common for methods 1 to 8 (9 used GROUP_CONCAT & 10 used PDO):
$query = "SELECT DISTINCT `name` FROM species LIMIT 5000";
$result = $mysqli->query($query);
*Method code*
print_r(array_filter($species));
Method 1: Textbook method
while ($row = $result->fetch_row()) {
$species[] = $row[0];
}
Method 2: while and reset (NB some IDEs detect an error here)
while ($species[] = reset($result->fetch_row())) ;
Method 3: foreach and reset
foreach ($result->fetch_all() as $value) $species[] = reset($value);
Method 4: while, foreach and reset
while ($species[] = $result->fetch_row()) ;
foreach ($species as $key => $value) $species[$key] = reset($value);
Method 5: while and index
while ($row = $result->fetch_row()) $species[] = $row[0];
Method 6: foreach and index
foreach ($result->fetch_all() as $value) $species[] = $value[0];
Method 7: recurse the array
$species = call_user_func_array('array_merge', $result->fetch_all());
Method 8: array_column
$species = array_column($result->fetch_all(), 0);
Method 9: Using GROUP_CONCAT in query.
$species = explode(',', $result->fetch_row()[0]);
Method 10: PDO
$species = $sth->fetchAll(PDO::FETCH_COLUMN, 0);
Surprisingly Method 1 (Textbook) was consistently about 4 times longer than the practically identical Method 5, but took about the same time as Method 10 (PDO).
Method 2 was consistently the slowest method at 50x longer, presumably because the system is writing warnings somewhere.
Method 4 (two loops) was the second slowest, taking 10x longer.
As stated Methods 1(textbook) & 10 (PDO) were third.
Method 9 was fourth slowest (2x longer, and had the disadvantage of hitting the GROUP_CONCAT limit without any warning).
The fastest method, however, wasn't consistent. Take your pick from 3, 5, 6, 7 & 8.
Method 8 (array_column) was often the fastest way to do this, but not always. However I think it's the most elegant method and provides slightly more flexibility as it can return an associative array using any two columns selected by your query (but don't mess with the order in the query!)

Related

Using count_all_results or get_compiled_select and $this->db->get('table') lists table twice in query?

How do I use get_compiled_select or count_all_results before running the query without getting the table name added twice? When I use $this->db->get('tblName') after either of those, I get the error:
Not unique table/alias: 'tblProgram'
SELECT * FROM (`tblProgram`, `tblProgram`) JOIN `tblPlots` ON `tblPlots`.`programID`=`tblProgram`.`pkProgramID` JOIN `tblTrees` ON `tblTrees`.`treePlotID`=`tblPlots`.`id` ORDER BY `tblTrees`.`id` ASC LIMIT 2000
If I don't use a table name in count_all_results or $this->db->get(), then I get an error that no table is used. How can I get it to set the table name just once?
public function get_download_tree_data($options=array(), $rand=""){
//join tables and order by tree id
$this->db->reset_query();
$this->db->join('tblPlots','tblPlots.programID=tblProgram.pkProgramID');
$this->db->join('tblTrees','tblTrees.treePlotID=tblPlots.id');
$this->db->order_by('tblTrees.id', 'ASC');
//get number of results to return
$allResults=$this->db->count_all_results('tblProgram', false);
//chunk data and write to CSV to avoid reaching memory limit
$offset=0;
$chunk=2000;
$treePath=$this->config->item('temp_path')."$rand/trees.csv";
$tree_handle=fopen($treePath,'a');
while (($offset<$allResults)) {
$this->db->limit($chunk, $offset);
$result=$this->db->get('tblProgram')->result_array();
foreach ($result as $row) {
fputcsv($tree_handle, $row);
}
$offset=$offset+$chunk;
}
fclose($tree_handle);
return array('resultCount'=>$allResults);
}
To count how many rows would be returned by a query, essentially all the work must be performed. That is, it is impractical to get the count, then perform the query; you may as well just do the query.
If your goal is to "paginate" by getting some of the rows, plus the total count, that is essentially two separate actions (that may be combined to look like one.)
If the goal is to estimate the number of rows, then SHOW TABLE STATUS or SELECT Rows FROM information_schema.TABLES WHERE ... gives you an estimate.
If you want to see if there are, say "at least 100 rows", then this may be practical:
SELECT 1 FROM ... WHERE ... ORDER BY ... LIMIT 99,1
and see if you get a row back. However, this may or may not be efficient, depending on the indexes and the WHERE and the ORDER BY. (Show us the query and I can elaborate.)
Using OFFSET for chunking is grossly inefficient. If there is not a usable index, then it is performing essentially the entire query for each chunk. If there is a usable index, the chunks are slower and slower. Here is a discussion of why OFFSET is not good for "pagination", plus an efficient workaround: Pagination . It talks about how to "remember where you left off " as an efficient technique for chunking. Fetch between 100 and 1000 rows per chunk.
The flaw in your code is that it aims to select a subset of some records and their total count in the same query. This is impossible in MySQL, so you cannot generate such a query, hence, you get the error as mentioned. The problem is that if you do a
select ... from t where ... limit 0, 2000
then you get maximum 2000 records, so, if the total records matching the criteria have a count that is greater than the limit, then you will not get accurately the count from above, so, in that case you need a
select count(1) from t where ...
This means that you need to build your actual query (the code below your count_all_results call), see whether the number of results reaches the limit. If the number of results does not reach the limit, then you do not need to perform a separate query in order to get the count, because you can compute $offset * $chunk + $recordCount. However, if you get as many records as they can be, then you will need to build another query, without the order_by call, since the count is independent of your sort and get the counts.
$this->db->count_all_results()
Counting the number of returned results with count_all_results()
It's useful to count the number of results returned—often bugs can arise if a section of code which expects to have at least one row is passed zero rows. Without handling the eventuality of a zero result, an application may become unpredictably unstable and may give away hints to a malicious user about the architecture of the app. Ensuring correct handling of zero results is what we're going to focus on here.
Permits you to determine the number of rows in a particular Active Record query. Queries will accept Query Builder restrictors such as where(), or_where(), like(), or_like(), etc. Example:
echo $this->db->count_all_results('my_table'); // Produces an integer, like 25
$this->db->like('title', 'match');
$this->db->from('my_table');
echo $this->db->count_all_results(); // Produces an integer, like 17
However, this method also resets any field values that you may have passed to select(). If you need to keep them, you can pass FALSE as the second parameter:
echo $this->db->count_all_results('my_table', FALSE);
get_compiled_select()
The method $this->db->get_compiled_select(); is introduced in codeigniter v3.0 and compiles active records query without actually executing it. But this is not a completely new method. In older versions of CI it is like $this->db->_compile_select(); but the method has been made protected in later versions making it impossible to call back.
// Note that the second parameter of the get_compiled_select method is FALSE
$sql = $this->db->select(array('field1','field2'))
->where('field3',5)
->get_compiled_select('mytable', FALSE);
// ...
// Do something crazy with the SQL code... like add it to a cron script for
// later execution or something...
// ...
$data = $this->db->get()->result_array();
// Would execute and return an array of results of the following query:
// SELECT field1, field1 from mytable where field3 = 5;
NOTE:- Double calls to get_compiled_select() while you’re using the Query Builder Caching functionality and NOT resetting your queries will results in the cache being merged twice. That in turn will i.e. if you’re caching a select() - select the same field twice.
Rick James got me on the right track. I ended up having to chunk the results using pagination AND a nested query. Using LIMIT on even 1 chunk of 2000 records was timing out. This is the code I ended up with, which uses get_compiled_select('tblProgram') and then get('tblTrees O1'). Since I didn't use FALSE as the second argument to get_compiled_select, the query was cleared before the get() was run.
//grab the data in chunks, write it to CSV chunk by chunk
$offset=0;
$chunk=2000;
$i=10; //counter for the progress bar
$this->db->limit($chunk);
$this->db->select('tblTrees.id');
//nesting the limited query and then joining the other field later improved performance significantly
$query1=' ('.$this->db->get_compiled_select('tblProgram').') AS O2';
$this->db->join($query1, 'O1.id=O2.id');
$result=$this->db->get('tblTrees O1')->result_array();
$allResults=count($result);
$putHeaders=0;
$treePath=$this->config->item('temp_path')."$rand/trees.csv";
$tree_handle=fopen($treePath,'a');
//while select limit returns the limit
while (count($result)===$chunk) {
$highestID=max(array_column($result, 'id'));
//update progres bar with estimate
if ($i<90) {
$this->set_runStatus($qcRunId, $status = "processing", $progress = $i);
$i=$i+1;
}
//only get the fields the first time
foreach ($result as $row) {
if ($offset===0 && $putHeaders===0){
fputcsv($tree_handle, array_keys($row));
$putHeaders=1;
}
fputcsv($tree_handle, $row);
}
//get the next chunk
$offset=$offset+$chunk;
$this->db->reset_query();
$this->make_query($options);
$this->db->order_by('tblTrees.id', 'ASC');
$this->db->where('tblTrees.id >', $highestID);
$this->db->limit($chunk);
$this->db->select('tblTrees.id');
$query1=' ('.$this->db->get_compiled_select('tblProgram').') AS O2';
$this->db->join($query1, 'O1.id=O2.id');
$result=$this->db->get('tblTrees O1')->result_array();
$allResults=$allResults+count($result);
}
//write out last chunk
foreach ($result as $row) {
fputcsv($tree_handle, $row);
}
fclose($tree_handle);
return array('resultCount'=>$allResults);

Laravel dabatabse facade memory usage

I've found great example written in php pdo, which helps to iterate huge amount of data without actually allocating memory for whole set of results:
$sql = 'SELECT * from playlists limit 50000';
$statement = $pdo->prepare($sql);
$statement->execute();
while (($result = $statement->fetch(PDO::FETCH_ASSOC)) !== false) {
//do something
}
I've done an investigation and this approach uses 18mb of memory.
If I fetch all results like this $results = $statement->fetchAll(PDO::FETCH_ASSOC); memory usage upraises to 35mb.
Using laravel's illuminate/database component and very similar approach DB::table('playlists')->limit(50000)->get(); also uses 35mb of memory.
How can I achieve first approach using Laravel's eloquent or DB facade?
Could you suggest some articles how this difference in memory usage develops?
Thanks
When you execute an SQL query with php (either mysql functions or PDO) all data returned from query loads in to memory as a "result set".
In order to use data in "result set" you have to fetch them in regular php arrays/objects.
PDOStatement::fetch - fetches one row from the result set in to memory.
PDOStatement::fetchAll - fetches all rows from result set to memory thus doubling the memory usage.
Eloquent has ability to chunk result sets. This is equivalent to performing "X times fetch" in PDO.
However, if you are working with very large result sets consider using SQL limits.
The Laravel approach to processing large data sets like this is to use chunking.
DB::table('playlists')->chunk(1000, function($playlists) use($count) {
foreach($playlists as $playlist) {
// do something with this playlist
}
});
This ensures that no more than the chunk size (in my example, 1000 rows) is loaded into RAM at once. 1k is arbitrary; you could chunk 1, 100, 253, etc.

SQL select by using limit or using program to fetch special records

I am using php to get special records from Database.
which one is better?
1.
Select * From [table] Limit 50000, 10;
while($row = $stmt->fetch()){
//save in array, total 10 times
}
or
2.
Select * From [table];
$start = 50000;
$length = 10;
while($row = $stmt->fetch()){
if($i < $start+$length && $j >=$start){
//save in array, total 50010 times
}
}
In this case, which one should I use?
Which one using DB with less resources?
which one is better?
Too vague: what is "better"?
Which one using DB with less resources?
You're much better off with the first approach. It's efficient to select as little data as you need and no more. Selecting the whole table will force your script to use a lot more memory because all that data needs to be kept live
The best answer you'll get is: test! You can run your queries multiple times in multiple ways and see for yourself. Just use SELECT SQL_NO_CACHE... instead of the generic SELECT... to force the DB to restart the work from scratch. Measure how long it takes to run the query and process results
function wayOne(){
// execute your 1st query and loop through results
}
function wayTwo(){
// execute 2nd query and loop through results
}
//Measures # of milliseconds it takes to execute another function
function timeThis(callable $callback){
$start_time = microtime();
call_user_func($callback);
$microsecs = microtime()-$start_time; //duration in microseconds
return round($microsecs*1000);//duration in milliseconds
}
$wayOneTime = timeThis('wayOne');
$wayTwoTime = timeThis('wayTwo');
You can then compare the two times. Generally (not always) a process that takes significantly less time uses fewer resources

PDO/MySQL memory consumption with large result set

I'm having a strange time dealing with selecting from a table with about 30,000 rows.
It seems my script is using an outrageous amount of memory for what is a simple, forward only walk over a query result.
Please note that this example is a somewhat contrived, absolute bare minimum example which bears very little resemblance to the real code and it cannot be replaced with a simple database aggregation. It is intended to illustrate the point that each row does not need to be retained on each iteration.
<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$stmt = $pdo->prepare('SELECT * FROM round');
$stmt->execute();
function do_stuff($row) {}
$c = 0;
while ($row = $stmt->fetch()) {
// do something with the object that doesn't involve keeping
// it around and can't be done in SQL
do_stuff($row);
$row = null;
++$c;
}
var_dump($c);
var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());
This outputs:
int(39508)
int(43005064)
int(43018120)
I don't understand why 40 meg of memory is used when hardly any data needs to be held at any one time. I have already worked out I can reduce the memory by a factor of about 6 by replacing "SELECT *" with "SELECT home, away", however I consider even this usage to be insanely high and the table is only going to get bigger.
Is there a setting I'm missing, or is there some limitation in PDO that I should be aware of? I'm happy to get rid of PDO in favour of mysqli if it can not support this, so if that's my only option, how would I perform this using mysqli instead?
After creating the connection, you need to set PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false:
<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
// snip
var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());
This outputs:
int(39508)
int(653920)
int(668136)
Regardless of the result size, the memory usage remains pretty much static.
Another option would be to do something like:
$i = $c = 0;
$query = 'SELECT home, away FROM round LIMIT 2048 OFFSET %u;';
while ($c += count($rows = codeThatFetches(sprintf($query, $i++ * 2048))) > 0)
{
foreach ($rows as $row)
{
do_stuff($row);
}
}
The whole result set (all 30,000 rows) is buffered into memory before you can start looking at it.
You should be letting the database do the aggregation and only asking it for the two numbers you need.
SELECT SUM(home) AS home, SUM(away) AS away, COUNT(*) AS c FROM round
The reality of the situation is that if you fetch all rows and expect to be able to iterate over all of them in PHP, at once, they will exist in memory.
If you really don't think using SQL powered expressions and aggregation is the solution you could consider limiting/chunking your data processing. Instead of fetching all rows at once do something like:
1) Fetch 5,000 rows
2) Aggregate/Calculate intermediary results
3) unset variables to free memory
4) Back to step 1 (fetch next set of rows)
Just an idea...
I haven't done this before in PHP, but you may consider fetching the rows using a scrollable cursor - see the fetch documentation for an example.
Instead of returning all the results of your query at once back to your PHP script, it holds the results on the server side and you use a cursor to iterate through them getting one at a time.
Whilst I have not tested this, it is bound to have other drawbacks such as utilising more server resources and most likely reduced performance due to additional communication with the server.
Altering the fetch style may also have an impact as by default the documentation indicates it will store both an associative array and well as a numerical indexed array which is bound to increase memory usage.
As others have suggested, reducing the number of results in the first place is most likely a better option if possible.

Multiple MYSQL queries vs. Multiple php foreach loops

Database structure:
id galleryId type file_name description
1 `artists_2010-01-15_7c1ec` `image` `band602.jpg` `Red Umbrella Promo`
2 `artists_2010-01-15_7c1ec` `image` `nov7.jpg` `CD Release Party`
3 `artists_2010-01-15_7c1ec` `video` `band.flv` `Presskit`
I'm going to pull images out for one section of an application, videos on another, etc. Is it better to make multiple mysql queries for each section like so:
$query = mysql_query("SELECT * FROM galleries WHERE galleryId='$galleryId' && type='image');
...Or should I be building an associative array and just looping through the array over and over whenever I need to use the result set?
Thanks for the thoughts.
It depends what's more important: readability or performance. I'd expect a single query and prefilling PHP arrays would be faster to execute, since database connections are expensive, but then a simple query for each section is much more readable.
Unless you know (and not just hope) you're going to get a huge amount of traffic I'd go for separate queries and then worry about optimising if it looks like it'll be a problem. At that point there'll be other things you'll want to do anyway, such as building a data access layer and adding some caching.
If by "sections" you mean separate single pages (separate HTTP requests) that users can view, I would suggest query-per-type as needed. If on a page where there are only image data sets, you really don't need to fetch the video data set for example. You won't be really saving much time fetching everything, since you will be connecting to the database for every page hit anyway (I assume.)
If by "sections" you mean different parts of one page, then fetch everything at once. This will save you time on querying (only one query.)
But depending on the size of your data set, you could run into trouble with PHP's memory limit querying for everything, though. You could then try raising the memory limit, but if that fails you'll probably have to fall back to query-per-type.
Using the query-per-type approach moves some of the computing load to the database server, as you will only be requesting and fetching what you really need. And you don't have to write code to filter and sort your results. Filtering and sorting is something the database is generally better at than PHP code. If at all possible, enable MySQL's query cache, that will speed up these queries much more than anything you could write in PHP.
If your data is all coming from one table, I would only do one query.
I presume you are building a single page with a section for pictures, a section for video, a section for music, etc. Write your query return results sorted by media type - iterate through all the pictures, then all the video, then all the music.
Better to have multiple queries. Every time you run a query all the data is getting pulled out and loaded into memory. If you have 5 different types, it means each page of that type is loading 5 times as much data as it needs to do.
Even with just one at a time, you are probably going to want to start paginating with LIMIT/OFFSET queries fairly quickly if you have more than 100 or however many you can reasonably display on one page at a time.
It really depends,
IN operator
ini_set('memory_limit', '-1');
$startMemory = memory_get_usage();
$conn = mysqli_connect("localhost", "", "", "");
$ar = array();
$sql = "SELECT * FROM table WHERE e IN (.....)";
$result = mysqli_query($conn, $sql);
while ($row = mysqli_fetch_assoc($result)) {
$ar[$row['c']] = $row;
}
echo (memory_get_usage() - $startMemory) / 1024 / 1024, ' MB'; //1409.7124481201
$end_time = microtime(true);
echo ($end_time - $start_time) . ' Seconds'; //5.2406549453735 Seconds
Foreach
ini_set('memory_limit', '-1');
$startMemory = memory_get_usage();
$conn = mysqli_connect("localhost", "", "", "");
$ar = array();
$array_loop = array(....)
foreach($array_loop as $key => $value){
$sql = "SELECT * FROM table WHERE e = '$value'";
$result = mysqli_query($conn, $sql);
while ($row = mysqli_fetch_assoc($result)) {
$ar[$row['c']] = $row;
}
}
echo (memory_get_usage() - $startMemory) / 1024 / 1024, ' MB'; //42.773330688477 MB
$end_time = microtime(true);
echo ($end_time - $start_time) . ' Seconds'; //12.469061136246 Seconds
I noticed that foreach consumes time but not memory and IN operator consumes memory but not time. All the test done based on test data generated by sql procudre about 1 Million

Categories