PDO DELETE unexpectedly slow when working with millions of rows - php

I'm working with a MYISAM table that has about 12 million rows. A method is used to delete all records older than a specified date. The table is indexed on the date field. When run in-code, the log shows that this takes about 13 seconds when there are no records to delete and about 25 seconds when there are 1 day's records. When the same query is run in mysql client (taking the query from the SHOW PROCESSLIST when the code is running) it takes no time at all for no records, and about 16 seconds for a day's records.
The real-life problem is that this is taking a long time when there are records to delete when run once a day, so running it more often seems logical. But I'd like it to exit as quick as possible when there is nothing to do.
Method extract:
try {
$smt = DB::getInstance()->getDbh()->prepare("DELETE FROM " . static::$table . " WHERE dateSent < :date");
$smt->execute(array(':date' => $date));
return true;
} catch (\PDOException $e) {
// Some logging here removed to ensure a clean test
}
Log results when 0 rows for deletion:
[debug] ScriptController::actionDeleteHistory() success in 12.82 seconds
mysql client when 0 rows for deletion:
mysql> DELETE FROM user_history WHERE dateSent < '2013-05-03 13:41:55';
Query OK, 0 rows affected (0.00 sec)
Log results when 1 days results for deletion:
[debug] ScriptController::actionDeleteHistory() success in 25.48 seconds
mysql client when 1 days results for deletion:
mysql> DELETE FROM user_history WHERE dateSent < '2013-05-05 13:41:55';
Query OK, 672260 rows affected (15.70 sec)
Is there a reason why PDO is slower?
Cheers.
Responses to comments:
It's the same query on both, so the index is either being picked up or it's not. And it is.
EXPLAIN SELECT * FROM user_history WHERE dateSent < '2013-05-05 13:41:55'
1 SIMPLE user_history range date_sent date_sent 4 NULL 4 Using where
MySQL and Apache are running on the same server for the purposes of this test. If you're getting at an issue of load, then mysql does hit 100% for the 13 seconds on the in-code query. On the mysql client query, it doesn't get chance to register on top before the query is complete. I can't see how this is not something that PHP/PDO is adding to the equation but I'm open to all ideas.
:date is a PDO placeholder, and the fieldname is dateSent so there is no conflict with mysql keywords. Still, using :dateSent instead still causes the delay.
Also already tried without using placeholders but neglected to mention this so good call, thanks! Along the lines of this. Still the same delay with PHP/PDO.
DB::getInstance()->getDbh()->query(DELETE FROM user_history WHERE dateSent < '2013-05-03 13:41:55')
And using placeholders in mysql client still shows no delay:
PREPARE test from 'DELETE FROM user_history WHERE dateSent < ?';
SET #datesent='2013-05-05 13:41:55';
EXECUTE test USING #datesent;
Query OK, 0 rows affected (0.00 sec)
It's a MYISAM table so no transactions involved on this one.
Value of $date differs to test for no deletions or one day's deletions, as shown in the query run on mysql client which is taken from SHOW PROCESSLIST while the code is running. In this case it is not passed to the method and is derived from:
if (!isset($date)) {
$date = date("Y-m-d H:i:s", strtotime(sprintf("-%d days", self::DELETE_BEFORE)));
}
And at this point the table schema may get called into question, so:
CREATE TABLE IF NOT EXISTS `user_history` (
`userId` int(11) NOT NULL,
`asin` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`dateSent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`userId`,`asin`),
KEY `date_sent` (`dateSent`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
It's a decent sized website with lots of DB calls throughout. I see no evidence in the way the site performs in any other respect that suggests it is down to dodgy routing. Especially as I see this query on SHOW PROCESSLIST slowly creeping its way up to 13 seconds when run in PHP/PDO, but it takes no time at all when run in mysql (particularly referring to where no records are to be deleted which takes 13 seconds in PHP/PDO only).
Currently it is only this particular DELETE query that is in question. But I don't have another bulk DELETE statement like this anywhere else in this project, or any other project of mine that I can think of. So the question is particular to PDO DELETE queries on big-ish tables.
"Isn't that your answer then?" - No. The question is why does this take significantly longer in PHP/PDO compared to mysql client. The SHOW PROCESSLIST only shows this query taking time in PHP/PDO (for no records to be deleted). It takes no time at all in mysql client. That's the point.
Tried the PDO query without the try-catch block, and there is still a delay.
And trying with mysql_* functions shows the same timings as with using the mysql client directly. So the finger is pointing quite strongly at PDO right now. It could be my code that interfaces with PDO, but as no other queries have an unexpected delay, this seems less likely:
Method:
$conn = mysql_connect(****);
mysql_select_db(****);
$query = "DELETE FROM " . static::$table . " WHERE dateSent < '$date'";
$result = mysql_query($query);
Logs for no records to be deleted:
Fri May 17 15:12:54 [verbose] UserHistory::deleteBefore() query: DELETE FROM user_history WHERE dateSent < '2013-05-03 15:12:54'
Fri May 17 15:12:54 [verbose] UserHistory::deleteBefore() result: 1
Fri May 17 15:12:54 [verbose] ScriptController::actionDeleteHistory() success in 0.01 seconds
Logs for one day's records to be deleted:
Fri May 17 15:14:24 [verbose] UserHistory::deleteBefore() query: DELETE FROM user_history WHERE dateSent < '2013-05-07 15:14:08'
Fri May 17 15:14:24 [verbose] UserHistory::deleteBefore() result: 1
Fri May 17 15:14:24 [debug] ScriptController::apiReturn(): {"message":true}
Fri May 17 15:14:24 [verbose] ScriptController::actionDeleteHistory() success in 15.55 seconds
And tried again avoid calls to DB singleton by creating a PDO connection in the method and using that, and this has a delay once again. Though there are no other delays with other queries that all use the same DB singleton so worth a try, but didn't really expect any difference:
$connectString = sprintf('mysql:host=%s;dbname=%s', '****', '****');
$dbh = new \PDO($connectString, '****', '****');
$dbh->exec("SET CHARACTER SET utf8");
$dbh->setAttribute(\PDO::ATTR_ERRMODE, \PDO::ERRMODE_EXCEPTION);
$smt = $dbh->prepare("DELETE FROM " . static::$table . " WHERE dateSent < :date");
$smt->execute(array(':date' => $date));
Calling method with time logger:
$startTimer = microtime(true);
$deleted = $this->apiReturn(array('message' => UserHistory::deleteBefore()));
$timeEnd = microtime(true) - $startTimer;
Logger::write(LOG_VERBOSE, "ScriptController::actionDeleteHistory() success in " . number_format($timeEnd, 2) . " seconds");
Added PDO/ATTR_EMULATE_PREPARES to DB::connect(). Still has the delay when deleting no records at all. I've not used this before but it looks like the right format:
$this->dbh->setAttribute(\PDO::ATTR_EMULATE_PREPARES, false);
Current DB::connect(), though if there were general issues with this, surely it would affect all queries?
public function connect($host, $user, $pass, $name)
{
$connectString = sprintf('mysql:host=%s;dbname=%s', $host, $name);
$this->dbh = new \PDO($connectString, $user, $pass);
$this->dbh->exec("SET CHARACTER SET utf8");
$this->dbh->setAttribute(\PDO::ATTR_ERRMODE, \PDO::ERRMODE_EXCEPTION);
}
The indexes are shown above in the schema. If it was directly related to rebuilding the indexes after the deletion of the record, then mysql would take the same time as PHP/PDO. It doesn't. This is the issue. It's not that this query is slow - it's expected to take some time. It's that PHP/PDO is noticeably slower than queries executed in the mysql client or queries that use the mysql lib in PHP.
MYSQL_ATTR_USE_BUFFERED_QUERY tried, but still a delay
DB is a standard singleton pattern. DB::getInstance()->getDbh() returns the PDO connection object created in the DB::connect() method shown above, eg: DB::dbh. I believe I've proved that the DB singleton is not an issue as there is still a delay when creating the PDO connection in the same method as the query is executed (6 edits above).
I've found what it causing, but I don't know why this is happening right this minute.
I've created a test SQL that creates a table with 10 million random rows in the right format, and a PHP script that runs the offending query. And it takes no time at all in PHP/PDO or mysql client. Then I change the DB collation from the default latin1_swedish_ci to utf8_unicode_ci and it takes 10 seconds in PHP/PDO and no time at all in mysql client. Then I change it back to latin1_swedish_ci and it takes no time at all in PHP/PDO again.
Tada!
Now if I remove this from the DB connection, it works fine in either collation. So there is some sort of problem here:
$dbh->exec("SET CHARACTER SET utf8");
I shall research more, then follow up later.

So...
This post explains where the flaw was.
Is "SET CHARACTER SET utf8" necessary?
Essentially, it was the use of:
$this->dbh->exec("SET CHARACTER SET utf8");
which should have been this in DB::connect()
$this->dbh->exec("SET NAMES utf8");
My fault entirely.
It seems to have had dire effects because of a need on the part of the mysql server to convert the query to match the collation of the DB. The above post gives much better details than I can.
If anyone has the need to confirm my findings, this series of SQL queries will setup a test DB and allow you to check for yourself. Just make sure that the indexes are correctly enabled after the test data has been entered because I had to drop and re-add these for some reason. It creates 10 million rows. Maybe less will be enough to prove the point.
DROP DATABASE IF EXISTS pdo_test;
CREATE DATABASE IF NOT EXISTS pdo_test;
USE pdo_test;
CREATE TABLE IF NOT EXISTS test (
`userId` int(11) NOT NULL,
`asin` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`dateSent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`userId`,`asin`),
KEY `date_sent` (`dateSent`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
drop procedure if exists load_test_data;
delimiter #
create procedure load_test_data()
begin
declare v_max int unsigned default 10000000;
declare v_counter int unsigned default 0;
while v_counter < v_max do
INSERT INTO test (userId, asin, dateSent) VALUES (FLOOR(1 + RAND()*10000000), SUBSTRING(MD5(RAND()) FROM 1 FOR 10), NOW());
set v_counter=v_counter+1;
end while;
end #
delimiter ;
ALTER TABLE test DISABLE KEYS;
call load_test_data();
ALTER TABLE test ENABLE KEYS;
# Tests - reconnect to mysql client after each one to reset previous CHARACTER SET
# Right collation, wrong charset - slow
SET CHARACTER SET utf8;
ALTER DATABASE pdo_test COLLATE='utf8_unicode_ci';
DELETE FROM test WHERE dateSent < '2013-01-01 00:00:00';
# Wrong collation, no charset - fast
ALTER DATABASE pdo_test COLLATE='latin1_swedish_ci';
DELETE FROM test WHERE dateSent < '2013-01-01 00:00:00';
# Right collation, right charset - fast
SET NAMES utf8;
ALTER DATABASE pdo_test COLLATE='utf8_unicode_ci';
DELETE FROM test WHERE dateSent < '2013-01-01 00:00:00';

Try to Analyze and Optimize tables:
http://dev.mysql.com/doc/refman/5.5/en/optimize-table.html
http://dev.mysql.com/doc/refman/5.5/en/analyze-table.html

Related

How to alter an EVENT in mysql and change what is run

My DBA set up many events that run SQL code thousands of lines long. Now that he is gone how do I change the EVENT DO statement when the code that is run is 1000s of lines?
I have tried doing it in workbench, it just runs the 1000s of lines of code
I have tied PHP PDO to run an ALTER EVENT 'name'
DO :placeholder
I have tried mysqli
all with standard bind errors. I have been able to insert the code into a text field in another table so I do know the php PDO works and was thinking is there a SQL comand to copy this feild into the event?
you can do between the backticks very much, but it is not recomended
$sql = "ALTER EVENT myschema.`my | event`
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL 1 HOUR
DO
UPDATE myschema.mytable SET mycol = mycol + 1;";
ALTER EVETNS isn't supported by Prepared statements with parmeters, you can only use strung replacement. but there you must you backticks for columns and apostrophes for strings
$sql = "ALTER EVENT myschema.`$myeventt`
ON SCHEDULE AT CURRENT_TIMESTAMP + INTERVAL $mytime HOUR
DO
UPDATE myschema.`$mytable` SET `$mycol` = `$mycol` + $myaddnumber,`$mycol2` = '$mystrimng';";
this you can use with pdo or mysqli
And php would repace the variables
The solution I found that works is:
DROP EVENT IF EXISTS databasename.processes_name;
DELIMITER $$
CREATE EVENT IF NOT EXISTS databasename.processes_name;
ON SCHEDULE EVERY 1 DAY STARTS '2018-02-05 08:00:00'
ON COMPLETION PRESERVE ENABLE
DO BEGIN
*thousands of lines of SQL code *
END; $$;
that is it
I was missing the BEGIN and END part

PHP 5.5's (MS) SQL server functions not working as expected

My sys admin is upgrading my PHP server from 5.2 to 5.5 . As a result, the mssql family of functions is gone and I have to update my code (it seems to either the odbc functions or sqlsrv functions). Unfortunately, neither seems to be working correctly for anything beyond simple queries.
I've reduced one of the problematic queries down to the following two variants (middle line added is the only change):
IF OBJECT_ID('tempdb..#i') IS NOT NULL BEGIN DROP TABLE #i END
SELECT 'value' as test
IF OBJECT_ID('tempdb..#i') IS NOT NULL BEGIN DROP TABLE #i END
CREATE TABLE #i (id INT primary key) INSERT INTO #i SELECT 405782
SELECT 'value' as test
When I try them in SQL Server Mangement Studio, both work fine and return one row. When I try the first one from PHP, it works fine and returns one row. However, when I try to execute the second query from PHP I get an unexpected result:
$SQL_query= '********'; //Second query
$serverName = '**********';
$connectionInfo = array( "Database"=>"*****","UID"=>"********","PWD"=>"*********");
$conn = sqlsrv_connect( $serverName, $connectionInfo);
$msg1=sqlsrv_errors(SQLSRV_ERR_ALL); // ""
if($conn){ //truthy
$result=sqlsrv_query($conn,$SQL_query);
if(sqlsrv_has_rows ($result)){$rows='true';}else{$rows='false';} //false
$msg2=sqlsrv_errors(SQLSRV_ERR_ALL); // ""
$row=sqlsrv_fetch_array($result,SQLSRV_FETCH_NUMERIC) //false
}
(The odbc functions were even worse, choking if the query contained a SET #var statement...)
So the result of the query is incorrect, but no errors are reported.
Can anyone explain this? You'd think if the range of queries that could be handled by these functions was somehow limited that it would be at least mentioned in passing in the PHP documentation for these functions.
For reference: Microsoft SQL Server Standard Edition 9.00.1406.00, PHP 5.5.19 x86 thread safety disabled, running on Windows.
Edit: Per Rob Farley's suggestion, I've confirmed that the ##OPTIONS are either identical or immaterial to reproducing the problem.
Depending on what driver you're using (freetds?), you could be finding that ANSI_NULLS (and other ANSI settings, such as QUOTED_IDENTIFIER) are set differently to what is expected, and this can affect things such as the creation of tables. Try passing in a query that tests those values, and you will quite probably find the problem.
The problem is that the client framework is confused by the row counts that are returned by default for every statement. When the first INSERT happens, it returns a row count of 1, making it appear as if this is a statement returning results, which it's not -- the row count may even get confused for the result itself. PHP isn't the only technology troubled by this; jTDS is another driver that has trouble (in the case of jTDS, it requires the row count). SET NOCOUNT ON suppresses this extra information:
SET NOCOUNT ON
IF OBJECT_ID('tempdb..#i') IS NOT NULL BEGIN DROP TABLE #i END
CREATE TABLE #i (id INT primary key)
INSERT INTO #i SELECT 405782
SELECT 'value' as test

SQL query takes less time in MYSQL, take more in PHP application

I have MYISAM table and i wrote a FULL-TEXT SQL query.
When i execute this query, it takes < 0.53 seconds
But when i execute this php+mysql application, it takes more than 1-2 minutes
Query:
select concat(first_name, ' ', last_name) as `cand_full_name`, email
from resumes WHERE MATCH (thesis_text) AGAINST ('-java -j2ee -oracle -mysql -software' IN BOOLEAN MODE)
Update 1:
I am using mysqli_query($this->dbconnection, $this->query) in query
Update 2:
PHP Code http://codepad.org/qufZBC16
Note :
I am not running any other sql query on this page.
Any Idea, why it's taking more time.
Its very likely to be a charset issue. Set the charset after connection to speed up query
http://php.net/manual/en/mysqli.set-charset.php#refsect1-mysqli.set-charset-description
I'm not sure what charset your data Is stored in

lots of queries - slow?

To begin with, I apologize if this has been asked already, I could not find anything at least.
Anyway, I'm going to run a cron task each 5 minutes. The script loads 79 external pages, whereas each page contain ~200 values I need to check in database (in total, say 15000 values). 100% of the values will be checked if they exist in database, and if they do (say 10% does) I will use an UPDATE query.
Both queries are very basic, no INNER etc.. It's the first time I use cron and I'm already assuming I will get the response "don't use cron for that" but my host doesn't allow daemons.
The query goes as:
SELECT `id`, `date` FROM `users` WHERE `name` = xxx
And if there was a match, it will use an UPDATE query (sometimes with additional values).
The question is, will this overload my mysql server? If yes, what are the suggested methods? I'm using PHP if that matters.
If you are just checking the same query over and over, there are a few options. Off the top of my head, you can use WHERE name IN ('xxx','yyy','zzz','aaa','bbb'...etc). Other than that, you could possibly do a file import into another table and probably run one query to do an insert/update.
Update:
//This is what I'm assuming your data looks like after loading/parsing all the pages.
//if not, it should be similar.
$data = array(
'server 1'=>array('aaa','bbb','ccc'),
'server 2'=>array('xxx','yyy','zzz'),
'server 3'=>array('111','222', '333'));
//where the key is the name of the server and the value is an array of names.
//I suggest using a transaction for this.
mysql_query("SET AUTOCOMMIT=0");
mysql_query("START TRANSACTION");
//update online to 0 for all. This is why you need transactions. You will set online=1
//for all online below.
mysql_query("UPDATE `table` SET `online`=0");
foreach($data as $serverName=>$names){
$sql = "UPDATE `table` SET `online`=1,`server`='{$serverName}' WHERE `name` IN ('".implode("','", $names)."')";
$result = mysql_query($sql);
//if the query failed, rollback all changes
if(!$result){
mysql_query("ROLLBACK");
die("Mysql error with query: $sql");
}
}
mysql_query("COMMIT");
About MySQL and lot of queries
If you have enough rights on this server - you may try to increase Query Cache.
You can do it in SQL or in mysql config file.
http://dev.mysql.com/doc/refman/5.1/en/query-cache-configuration.html
mysql> SET GLOBAL query_cache_size = 1000000;
Query OK, 0 rows affected (0.04 sec)
mysql> SHOW VARIABLES LIKE 'query_cache_size';
+------------------+--------+
| Variable_name | Value |
+------------------+--------+
| query_cache_size | 999424 |
+------------------+--------+
1 row in set (0.00 sec)
Task sheduler in MySQL
If your updates may work only on data stored in database (there are no PHP variables) - consider using EVENT in MySQL instead of running SQL scripts from PHP.

SQL Server Query Slow from PHP, but FAST from SQL Mgt Studio - WHY?

I have a fast running query (sub 1 sec) when I execute the query in SQL Server Mgt Studio, but when I run the exact same query in PHP (on the same db instace)
using FreeTDS v8, mssql_query(), it takes much longer (70+ seconds).
The tables I'm hitting have an index on a date field that I'm using in the Where clause.
Could it be that PHP's mssql functions aren't utilizing the index?
I have also tried putting the query inside a stored procedure, then executing the SP from PHP - the same results in time difference occurs.
I have also tried adding a WITH ( INDEX( .. ) ) clause on the table where that has the date index, but no luck either.
Here's the query:
SELECT
1 History,
h.CUSTNMBR CustNmbr,
CONVERT(VARCHAR(10), h.ORDRDATE, 120 ) OrdDate,
h.SOPNUMBE OrdNmbr,
h.SUBTOTAL OrdTotal,
h.CSTPONBR PONmbr,
h.SHIPMTHD Shipper,
h.VOIDSTTS VoidStatus,
h.BACHNUMB BatchNmbr,
h.MODIFDT ModifDt
FROM SOP30200 h
WITH (INDEX (AK2SOP30200))
WHERE
h.SOPTYPE = 2 AND
h.DOCDATE >= DATEADD(dd, -61, GETDATE()) AND
h.VOIDSTTS = 0 AND
h.MODIFDT = CONVERT(VARCHAR(10), DATEADD(dd, -1*#daysAgo, GETDATE()) , 120 )
;
what settings are on, usually ARITHABORT is the culprit, it is ON in SSMS but you might be connecting with it off
Run this in SSMS while you are running your query and see what the first column is for the session that is connected from PHP
select arithabort,* from sys.dm_exec_sessions
where session_id > 50
Run the SQL Profiler, and set up a trace and see if there are any differences between the two runs.
Using the LOGIN EVENT (and EXISTING CONNECTION) in SQL Profiler with the Text column will show the connection settings of a lot of important SET commands--Arithabort, Isolation Level, Quoted Identifier, and others. Compare and contrast these between the fast and slow connections to see if anything stands out.
SET ARITHABORT ON; in your session, might improve query performance.
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-arithabort-transact-sql?view=sql-server-ver16
Always set ARITHABORT to ON in your logon sessions. Setting ARITHABORT to OFF can negatively impact query optimization, leading to performance issues.

Categories