Fastest query for processing high amount of data in mysql

Fastest query for processing high amount of data in mysql - php

I face a problem with a mysql query. The query runs but sometimes it makes php exceed the maximum memory. I have like 80 or 90.000 rows of coordinates, with engine speeds and other things. I have to create KML files to display routes individually. Where the engine speed is not null, the car is moving, if it is, the car is stopped. Half of the table's engine speeds contains 0s. When I'm iterating through the records, I also delete the records at the same time, after I created the routes array, but it runs very slowly and sometimes it runs out of memory. Can it be because of the high and massive data amount in the database or some logical error in my code? Here is the code:
public function getPositions($device_id) {
$db = connect_database(DB_HOST, DB_USER, DB_PASSWORD, DB_NAME, DB_PORT);
$sql = "SELECT * FROM coordinates_log WHERE imei=:imei ORDER BY device_time ASC";
$statement = $db->prepare($sql);
$statement->execute(array(':imei' => $device_id));
$positions = array();
$delete_sql = "DELETE FROM coordinates_log WHERE id=:id";
$delete_statement = $db->prepare($delete_sql);
$counter = 0;
$flag = 0;
while ($row = $statement->fetch(PDO::FETCH_ASSOC)) {
//here I flag the last started route
if ($row['vehicle_engine_speed'] <= 0) {
$flag = $counter;
}
$positions[] = $row;
$counter++;
}
if (!empty($positions)) {
$last_key = count($positions)-1;
//here I check if the route is completed yet, or he is on his way
if ($positions[$last_key]['vehicle_engine_speed'] != 0) {
for($i = $flag; $i<=$last_key; $i++){
unset($positions[$i]);
}
}
foreach ($positions as $position) {
$delete_statement->execute(array(':id' => $position['id']));
}
return $positions;
} else {
return FALSE;
}
}

The PDO subsystem in PHP offers two kinds of queries: buffered and unbuffered. Buffered queries are what you get if you don't specifically request unbuffered queries. Buffered queries consume more RAM in your PHP engine because PDO fetches the entire result set into RAM, then gives it back to your program a line at a time when you use $statement->fetch().
So, if your result sets are quite large and you can process them a row at a time, you will use less RAM with unbuffered mode. You process each row, then fetch the next one, without trying to hold them all in RAM at once.
Here's a writeup on unbuffered mode.
http://php.net/manual/en/mysqlinfo.concepts.buffering.php
Buffered mode is generally easier to use for programmers, because PDO reads the entire resultset from each query and implicitly closes the statement object. That leaves your connection available for the next sql statement, even if you have not yet processed all the information in your resultset. With unbuffered mode, if you want to run other mysql statements while you're processing your result set, you need another database connection to do that.
You should try unbuffered mode for your SELECT * FROM coordinates... result set.
Pro tip: If you avoid SELECT * and instead use SELECT col, col, col you probably can reduce the overhead of your queries, especially if you don't actually need all the columns.

Questions of the "Look at my code and tell me what's wrong with it" kind are off topic here. It is not only because the code is intended to be run, by the computers, not read by the humans, but because code itself is seldom relevant to the problem.
Before asking a question here, you have to profile your code, determining slowest parts, and memory consumption as well.
I could make some guesses though I hate it.
it could be query itself, if not optimized
it could be buffering issue, making whole resultset burden script's memory.
it could be select * issue, burdening your resulting array with lots of junk data
it could be slow writes due to innodb settings.
But guesswork doesn't make a good answer. You have to work out your question first.

Related

Looping through mysql result set takes up lots of memory, how can I improve this

echo number_format(memory_get_usage()/1048576, 2).'<br/>';
$sql = "query";
$result = mysql_query($sql);
while($property = mysql_fetch_object($result)){
continue;
}
echo number_format(memory_get_usage()/1048576, 2);
Just doing the above uses 15mb of memory (it goes from 7 to 22mb between the two echos). It's a big data set.
I have tried:
$property = null;
before the continue. And I have also tried
mysql_free_result ($result);
Without success.

This probably has nothing to do with PHP. Rather your mysql DB is caching the results to save itself potential future reads, as it's set up to do: http://dev.mysql.com/doc/refman/5.1/en/query-cache.html
At that link there are a couple of options listed for disabling query cache.

The answer is to use the mysql_unbuffered_query function which doesn't hold the result set in the memory, unlike mysql_query.
Credit to airthomas for suggesting I look for an equivalent to mysql_use_result

Is it safe to run my queries from an array?

I'm running a loop to collect the data from an XML file, after the 2000+ lines are done there is some validation rules that run afterwards. If it is safe I'd like to store the MySQL data, in the before mentioned loops I am wanting to store all the MySQL queries to an array (that being almost 2000 queries) and run them in a loop after validation rules have finished.
Is this a bad idea? If so, what would be the best idea behind storing the query and running it later?
To give you an idea of what I'm talking about if you didn't understand:
foreach($xml->object as $control) {
// Stores relative xml data here
}
if(1=1) // Validation rules run
for(...) { // For Loop for my queries (2000 odd loop)
Mysql_Query(..)
}

To insert a lot of date into a MySQL database I would start a transaction (http://php.net/manual/en/mysqli.begin-transaction.php), and then create a prepared statement and just loop through all the items and validate them straight away and execute the prepared statement. Afterwards just mark the transaction as successful and end it. This is the fastest approach.
The use of prepared statements also prevents SQL Injection.

What you could do is use a prepared statement for a library that supports it (PDO or mysqli). In PDO:
$stmt = $pdo->prepare("INSERT INTO t1 VALUES (:f1, :f2, :f3)");
$stmt->bindParam(":f1", $f1);
$stmt->bindParam(":f2", $f2);
$stmt->bindParam(":f3", $f3);
foreach ($xml->obj as $control) {
// create $array
}
if ($valid) {
foreach ($array as $control) {
$f1 = $control['f1'];
$f2 = $control['f2'];
$f3 = $control['f3'];
$stmt->execute();
}
}

There's nothing wrong with this approach, except for the fact that it uses more RAM than would be used by progressively parsing and posting the XML data records to the database. If the server running the php belongs to you that's no problem. If you're on one of those cheap USD5 per month shared hosting plans, it's possible you'll run into memory or execution time limits.
You mentioned "safety." Are you concerned that you will have to back out all the database updates if any one of them fails? If that's the case, do two things:
Don't use the MyISAM access method for your table.
Use a transaction, and when you've posted all your data base changes, commit it.
This transactional approach is good, because if your posting process fails for any reason, you can roll back the transaction and get back to your starting point.

PDO pgsql fetch with absolute position cursor fails

I am trying to implement a paging feature using PDO's ability to create cursors.
Currently, my code looks a bit like this (very complicated, I know that):
$pdo = new PDO();
$pdo->setAttribute(PDO::ATTR_CURSOR, PDO::CURSOR_SCROLL);
// prepared select-query omitted
$pdoStatement = $pdo->execute();
$start_index = MAX_THINGS_PER_PAGE * $current_page - MAX_THINGS_PER_PAGE;
$stop_index = MAX_THINGS_PER_PAGE * $current_page;
$row_count = $this->statement->rowCount(); // works for the PgSQL driver
$index = $start_index;
while (($row_count > 0) && ($index < $stop_index))
{
// try-catch block omitted
$values[] = $this->statement->fetch(PDO::FETCH_ASSOC, PDO::FETCH_ORI_ABS, $index);
--$row_count;
++$index;
}
However, seemingly, no matter what $start_index is, the query only fetches the first 10 (which is the value of MAX_THINGS_PER_PAGE) rows of the resultset. Always.
Probably I am doing something wrong, but the art of using cursors for pagination seems to be somewhat arcane and underdocumented...

I just ran into this problem today. PDOStatement::rowCount() does not work for SELECT on some databases. From PDOStatement::rowCount:
If the last SQL statement executed by the associated PDOStatement was a SELECT statement, some databases may return the number of rows returned by that statement. However, this behaviour is not guaranteed for all databases and should not be relied on for portable applications.
It took me quite a bit to realize this, as I thought that it was a problem with using cursors.
This is the approach I took: Replace the use of PDOStatement::rowCount() with the following:
<?php
$row_count = count($stmt->fetchAll());
?>
It is not efficient memory-wise, but this is what many databases do to calculate the total number of rows anyway.

Are Prepared Statements a waste for normal queries? (PHP)

Nowadays, "Prepared statements" seem to be the only way anyone recommends sending queries to a database. I even see recommendations to use prepared statements for stored procs. However, do to the extra query prepared statements require - and the short time they last - I'm persuaded that they are only useful for a line of INSERT/UPDATE queries.
I'm hoping someone can correct me on this, but it just seems like a repeat of the whole "Tables are evil" CSS thing. Tables are only evil if used for layouts - not tabular data. Using DIV's for tabular data is a style violation of WC3.
Like wise, plain SQL (or that generated from AR's) seems to be much more useful for 80% of the queries used, which on most sites are a single SELECT not to be repeated again that page load (I'm speaking about scripting languages like PHP here). Why would I make my over-taxed DB prepare a statement that it is only to run once before being removed?
MySQL:
A prepared statement is specific to
the session in which it was created.
If you terminate a session without
deallocating a previously prepared
statement, the server deallocates it
automatically.
So at the end of your script PHP will auto-close the connection and you will lose the prepared statement only to have your script re-created it on the next load.
Am I missing something or is this just a way to decrease performance?
:UPDATE:
It dawned on me that I am assuming new connections for each script. I would assume that if a persistent connection is used then these problems would disappear. Is this correct?
:UPDATE2:
It seems that even if persistent connections are the solution - they are not a very good option for most of the web - especially if you use transactions. So I'm back to square one having nothing more than the benchmarks below to go on...
:UPDATE3:
Most people simply repeat the phrase "prepared statements protect against SQL injection" which doesn't full explain the problem. The provided "escape" method for each DB library also protects against SQL injection. But it is more than that:
When sending a query the normal way,
the client (script) converts the data
into strings that are then passed to
the DB server. The DB server then uses
CPU power to convert them back into
the proper binary datatype. The
database engine then parses the
statement and looks for syntax errors.
When using prepared statements... the
data are sent in a native binary form,
which saves the conversion-CPU-usage,
and makes the data transfer more
efficient. Obviously, this will also
reduce bandwidth usage if the client
is not co-located with the DB server.
...The variable types are predefined,
and hence MySQL take into account
these characters, and they do not need
to be escaped.
http://www.webdesignforums.net/showthread.php?t=18762
Thanks to OIS for finally setting me strait on this issue.

unlike the CSS tables debate, there are clear security implications with prepared statements.
if you use prepared statements as the ONLY way to put user-supplied data in to a query, then they are absolutely bullet-proof when it comes to SQL injection.

When you execute a sql statement on the database, the sql parser needs to analyse it beforehand, which is the exact same process as the preparation.
So, comparing executing sql statements directly to preparing and executing has no disadvantages, but some advantages:
First of all, as longneck already stated, passing user input into a prepared statement escapes the input automatically. It is as if the database has prepared filters for the values and lets in only those values that fit.
Secondly, if use prepared statements thoroughly, and you come in the situation where you need to execute it multiple times, you don't need to rewrite the code to prepare and execute, but you just execute it.
Thirdly: The code becomes more readable, if done properly:
$sql = 'SELECT u.id, u.user, u.email, sum(r.points)
FROM users u
LEFT JOIN reputation r on (u.id=r.user_id)
LEFT JOIN badge b on (u.id=b.user_id and badge=:badge)
WHERE group=:group';
$params = array(
':group' => $group,
':badge' => $_GET['badge']
);
$stmt = $pdo->prepare($sql);
$result = $stmt->execute($params);
Instead of
$sql = 'SELECT u.id, u.user, u.email, sum(r.points)
FROM users u
LEFT JOIN reputation r on (u.id=r.user_id)
LEFT JOIN badge b on (u.id=b.user_id and badge="'.mysql_real_escape_string($_GET['badge']).'")
WHERE group="'.mysql_real_escape_string($group).'"';
$result = mysql_query($sql);
Imagine you had to change the sql statement, which code would be your favourite? ;-)

Prepared Statements come in handy in several situations:
Great separation of query data from untrusted user data.
Performance increase when the same query is executed multiple times
Performance increase when binary data is being transmitted as the prepared statement can use the binary protocol, whereas a traditional query will end up doing encoding and such.
There is a performance hit under normal circumstances (not repeated, no binary data) as you now have to do two back and forths. The first to "prepare" the query, and the second to transmit the token along with the data to be inserted. Most people are willing to make this sacrifice for the security benefit.
With regards to persistent connections:
MySQL has one of the fastest connection build up times on the market. It's essentially free for most set ups, so you're not going to see too much of a change using persistent connections or not.

The answer has to do with security and abstraction. Everyone else has already mentioned security, but the real upside is that your input is completely abstracted from the query itself. This allows for a true database agnosticism when using an abstraction layer, whereas inlining the input is usually a database-dependent process. If you care anything for portability, prepared statements are the way to go.
In the real world, I rarely ever write DML queries. All of my INSERTS / UPDATES are automatically built by the abstraction layer and are executed by simply passing an input array. For all intents and purposes, there really is no "performance hit" for preparing queries and then executing them (save for connection latency in the initial PREPARE). But when using a UDS (Unix Domain Socket) connection, you're not going to notice (or even be able to benchmark) a difference. It's usually on the order of a few microseconds.
Given the security and abstraction upsides, I'd hardly call it wasteful.

The performance benefit doesn't come from less parsing - it comes from only having to calculate access paths once rather than repeatedly. This helps a lot when you're issuing thousands of queries.
Given mysql's very simple optimizer/planner this may be less of an issue than with a more mature database with much more sophisticated optimizers.
However, this performance benefit can actually turn into a detriment if you've got a sophisticated optimizer that is aware of data skews. In that case you can often be better off with getting a different access path for the same query using different literal values rather than reusing a preexisting path.

When using sql queries like SELECT x,y,z FROM foo WHERE c='mary had a little lamb' the server has to parse the sql statement including the data + you have to sanitize the "mary had..." part (a call to mysql_real_escape() or similar for each parameter).
Using prepared statements the server has to parse the statement, too, but without the the data and sends back only an identifier for the statement (a tiny tiny data packet). Then you send the actual data without first sanitizing it. I don't see the overhead here, though I freely admit I've never tested it. Have you? ;-)
edit: And using prepared statements can eliminate the need to convert each and every parameter (in/out) to strings. Probably even more so if your version of php uses mysqlnd (instead of the "old" libmysql client library). Haven't tested the performance aspect of that either.

I don't seem to be finding any good benefits to use persistent connections - or prepared statements for that mater. Look at these numbers - for 6000 select statements (which will never happen in a page request!) you can barely tell the difference. Most of my pages use less than 10 queries.
UPDATED I just revised my test to
include 4k SELECT and 4k INSERT
statements! Run it yourself and let me
know if there are any design errors.
Perhaps the difference would be greater if my MySQL server wasn't running on the same machine as Apache.
Persistent: TRUE
Prepare: TRUE
2.3399310112 seconds
Persistent: FALSE
Prepare: TRUE
2.3265211582184 seconds
Persistent: TRUE
Prepare: FALSE
2.3666892051697 seconds
Persistent: FALSE
Prepare: FALSE
2.3496441841125 seconds
Here is my test code:
$hostname = 'localhost';
$username = 'root';
$password = '';
$dbname = 'db_name';
$persistent = FALSE;
$prepare = FALSE;
try
{
// Force PDO to use exceptions for all errors
$attrs = array(PDO::ATTR_ERRMODE => PDO::ERRMODE_EXCEPTION);
if($persistent)
{
// Make the connection persistent
$attrs[PDO::ATTR_PERSISTENT] = TRUE;
}
$db = new PDO("mysql:host=$hostname;dbname=$dbname", $username, $password, $attrs);
// What type of connection?
print 'Persistent: '.($db->getAttribute(PDO::ATTR_PERSISTENT) ? 'TRUE' : 'FALSE').'<br />';
print 'Prepare: '.($prepare ? 'TRUE' : 'FALSE').'<br />';
//Clean table from last run
$db->exec('TRUNCATE TABLE `pdo_insert`');
}
catch(PDOException $e)
{
echo $e->getMessage();
}
$start = microtime(TRUE);
$name = 'Jack';
$body = 'This is the text "body"';
if( $prepare ) {
// Select
$select = $db->prepare('SELECT * FROM pdo_insert WHERE id = :id');
$select->bindParam(':id', $x);
// Insert
$insert = $db->prepare('INSERT INTO pdo_insert (`name`, `body`, `author_id`)
VALUES (:name, :body, :author_id)');
$insert->bindParam(':name', $name);
$insert->bindParam(':body', $body);
$insert->bindParam(':author_id', $x);
$run = 0;
for($x=0;$x<4000;++$x)
{
if( $insert->execute() && $select->execute() )
{
$run++;
}
}
}
else
{
$run = 0;
for($x=0;$x<4000;++$x) {
// Insert
if( $db->query('INSERT INTO pdo_insert (`name`, `body`, `author_id`)
VALUES ('.$db->quote($name).', '. $db->quote($body).', '. $db->quote($x).')')
AND
// Select
$db->query('SELECT * FROM pdo_insert WHERE id = '. $db->quote($x)) )
{
$run++;
}
}
}
print (microtime(true) - $start).' seconds and '.($run * 2).' queries';

Cassy is right. If you don't prepare/compile it, the dbms would have to in any case before able to run it.
Also, the advantage is you could check the prepare result and if prepare fail your algo can branch off to treat an exception without wasting db resources to run the failing query.

Why does this PHP code hang on calls to mysql_query()?

I'm having trouble with this PHP script where I get the error
Fatal error: Maximum execution time of 30 seconds exceeded in /var/www/vhosts/richmondcondo411.com/httpdocs/places.php on line 77
The code hangs here:
function getLocationsFromTable($table){
$query = "SELECT * FROM `$table`";
if( ! $queryResult = mysql_query($query)) return null;
return mysql_fetch_array($queryResult, MYSQL_ASSOC);
}
and here (so far):
function hasCoordinates($houseNumber, $streetName){
$query = "SELECT lat,lng FROM geocache WHERE House = '$houseNumber' AND StreetName = '$streetName'";
$row = mysql_fetch_array(mysql_query($query), MYSQL_ASSOC);
return ($row) ? true : false;
}
both on the line with the mysql_query() call.
I know I use different styles for each code snippet, it's because I've been playing with the first one trying to isolate the issue.
The $table in the first example is 'school' which is a table which definitely exists.
I just don't know why it sits there and waits to time out instead of throwing an error back at me.
The mysql queries from the rest of the site are working properly. I tried making sure I was connected like this
//connection was opened like this:
//$GLOBALS['dbh']=mysql_connect ($db_host, $db_user, $db_pswd) or die ('I cannot connect to the database because: ' . mysql_error());
if( ! $GLOBALS['dbh']) return null;
and it made it past that fine. Any ideas?
Update
It's not the size of the tables. I tried getting only five records and it still timed out. Also, with this line:
$query = "SELECT lat,lng FROM geocache WHERE House = '$houseNumber' AND StreetName = '$streetName'";
it is only looking for one specific record and this is where it's hanging now.

It sounds like MySQL is busy transmitting valid data back to PHP, but there's so much of it that there isn't time to finish the process before Apache shuts down the PHP process for exceeding its maximum execution time.
Is it really necessary to select everything from that table? How much data is it? Are there BLOB or TEXT columns that would account for particular lag?
Analyzing what's being selected and what you really need would be a good place to start.

Time spent waiting for mysql queries to return data does not count towards the execution time. See here.
The problem is most likely somewhere else in the code - the functions that you are blaming are possibly called in an infinite loop. Try commenting out the mysql code to see if I'm right.

Does your code timeout trying to connect or does it connect and hang on the query?
If your code actually gets past the mysql_query call (even if it has to wait a long time to timeout) then you can use the mysql_error function to determine what happened:
mysql_query("SELECT * FROM table");
echo mysql_errno($GLOBALS['dbh']) . ": " . mysql_error($GLOBALS['dbh']) . "\n";
Then, you can use the error number to determine the detailed reason for the error: MySQL error codes
If your code is hanging on the query, you might try describing and running the query in a mysql command line client to see if it's a data size issue. You can also increase the maximum execution time to allow the query to complete and see what's happening:
ini_set('max_execution_time', 300); // Allow 5 minutes for execution

I don't know about the size of your table, but try using LIMIT 10 and see if still hangs.
It might be that your table is just to big to fetch it in one query.

Unless the parameters $houseNumber and $streetName for hasCoordinates() are already sanitized for the MySQL query (very unlikely) you need to treat them with mysql_real_escape_string() to prevent (intentional or unintentional) sql injections. For mysql_real_escape_string() to work properly (e.g. if you have changed the charset via mysql_set_charset) you should also pass the MySQL connection resource to the function.
Is the error reporting set to E_ALL and do you look at the error.log of the webserver (or have set display_erorrs=On)?
Try this
function hasCoordinates($houseNumber, $streetName) {
$houseNumber = mysql_real_escape_string($houseNumber);
$streetName = mysql_real_escape_string($streetName);
$query = "
EXPLAIN SELECT
lat,lng
FROM
geocache
WHERE
House='$houseNumber'
AND StreetName='$streetName'
";
$result = mysql_query($query) or die(mysql_error());
while ( false!==($row=mysql_fetch_array($result, MYSQL_ASSOC)) ) {
echo htmlspecialchars(join(' | ', $row)), "<br />\n";
}
die;
}
and refer to http://dev.mysql.com/doc/refman/5.0/en/using-explain.html to interpret the output.

-If you upped the execution time to 300 and it still went through that 300 seconds, I think that by definition you've got something like an infinite loop going.
-My first suspect would be your php code since mysql is used to dealing with large sets of data, so definitely make sure that you're actually reaching the mysql query in question (die right before it with an error message or something).
-If that works, then try actually running that query with known data on your database via some database gui or via the command line access to the database if you have that, or replacing the code with known good numbers if you don't.
-If the query works on it's own, then I would check for accidental sql injection coming from with the $houseNumber or $streetName variables, as VolkerK mentioned.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.