How to update thousands of rows in mysql database

How to update thousands of rows in mysql database - php

Im trying to update 100.000 rows in my database, the following code should do that but I always get an error :
Error: Commands out of sync; you can't run this command now
Because it is an update I don't need the result and just want to get rid of them. The $count variable is used so that my database gets chunks of updates instead of one big update. (One big update is not working because of some limitations of the database).
I tried a lot of different things like mysqli_free_result and so on... nothing worked.
global $mysqliObject;
$count = 0;
$statement = "";
foreach ($songsArray as $song) {
$id = $song->getId();
$treepath = $song->getTreepath();
$statement = $statement."UPDATE songs SET treepath='".$treepath."' WHERE id=".$id."; ";
$count++;
if ($count > 10000){
$result = mysqli_multi_query($mysqliObject, $statement);
if(!$result) {
die('<br/><br/>Error1: ' . mysqli_error($mysqliObject));
}
$count = 0;
$statement = "";
}
}

Using a prepared query will reduce the CPU load in the mysqld process as DaveRandom and StevenVI suggest. However in this case I doubt that using prepared queries will materially impact your runtime. The challenge that you have is that you are attempting to update 100K rows in the songs table and this is going to involve a lot of physical I/O on your physical disk subsystem. It is these physical delays (say ~10 mSec per PIO) that will dominate runtimes. Factors such as what is contained in each row, how many indexes are you using on the table (especially those that involve treepath) will all blend into this mix.
The actual CPU costs of preparing a simple statement like
UPDATE songs SET treepath="some treepath" WHERE id=12345;
will be lost in this overall physical I/O delay, and the relative size of this will materially depend on the nature of the physical subsystem where you are storing your data: a single SATA disk; SSD; some NAS with large caches and SSD support ...
You need to rethink your overall strategy here, especially if you are also using the songs table at the same time as an resource for interactive requests through a web front-end. Updating 100K rows is going to take some time -- less if you are updating 100K out of 100K in storage order since this will be more aligned to the MYD organisation and the write-though caching will be better; more if you are update 100K rows in random order out of 1M rows, where the number of PIOs will be a lot more.
When you are doing this, the overall performance of your D/B is going to degrade badly.
Do you want to minimise impact on parallel use of your DB or are you just trying to do this as dedicated batch operation with other services offline?
Is your goal to minimise the total elapsed time or to keep it reasonable short subject to some overall impact constrain, or even just to complete without dying.
I suggest that you've got two sensible approaches: (i) do this as a proper batch activity with the D/B offline to other services. In this case you probably want to take out a lock on the table, and bracket the updates with ALTER TABLE ... DISABLE/ENABLE KEYS. (ii) do this as a trickle update with far smaller update sets and a delay between each set to allow the D/B to flush to disk.
Whatever, I'd drop the batch size. The multi_query essentially optimises RPC over heads involved in calling the out-of-process mysqld. A batch of 10 say cuts this by 90%. You've got diminishing returns after this -- especially saying the updates will be physical I/O intensive.

Try this code using prepared statements:
// Create a prepared statement
$query = "
UPDATE `songs`
SET `treepath` = ?
WHERE `id` = ?
";
$stmt = $GLOBALS['mysqliObject']->prepare($query); // Global variables = bad
// Loop over the array
foreach ($songsArray as $key => $song) {
// Get data about this song
$id = $song->getId();
$treepath = $song->getTreepath();
// Bind data to the statement
$stmt->bind_param('si', $treepath, $id);
// Execute the statement
$stmt->execute();
// Check for errors
if ($stmt->errno) {
echo '<br/><br/>Error: Key ' . $key . ': ' . $stmt->error;
break;
} else if ($stmt->affected_rows < 1) {
echo '<br/><br/>Warning: No rows affected by object at key ' . $key;
}
// Reset the statment
$stmt->reset();
}
// We're done, close the statement
$stmt->close();

I'd do something like this:
$link = mysqli_connect('host');
if ( $stmt = mysqli_prepare($link, "UPDATE songs SET treepath=? WHERE id=?") ) {
foreach ($songsArray as $song) {
$id = $song->getId();
$treepath = $song->getTreepath();
mysqli_stmt_bind_param($stmt, 's', $treepath); // Assuming it's a string...
mysqli_stmt_bind_param($stmt, 'i', $id);
mysqli_stmt_execute($stmt);
}
mysqli_stmt_close($stmt);
}
mysqli_close($link);
Or of course you normal mysql_query's but enclosed in a transaction.

I found another way...
Since this is not a production server - the fastest way to update 100k rows is by deleting all of them and inserting 100k from scratch with the new calculated values. It seems a little bit odd to delete everything and insert everything instead of updating but it is WAYYY faster.
Before: hours Now: seconds!

I would suggest to lock the table and disable the keys before executing multiple updates.
This would avoid that the database engine stops (at least in my case of 300,000 row update).
LOCK TABLES `TBL_RAW_DATA` WRITE;
/*!40000 ALTER TABLE `TBL_RAW_DATA` DISABLE KEYS */;
UPDATE TBL_RAW_DATA SET CREATION_DATE = ADDTIME(CREATION_DATE,'01:00:00') WHERE ID_DATA >= 1359711;
/*!40000 ALTER TABLE `TBL_RAW_DATA` ENABLE KEYS */;
UNLOCK TABLES;

Related

optimizing insertion of data into mysql

function generateRandomData(){
# $db = new mysqli('localhost','XXX','XXX','scores');
if(mysqli_connect_errno()) {
echo 'Failed to connect to database. Please try again later.';
exit;
}
$query = "insert into scoretable values(?,?,?)";
for($a = 0; $a < 1000000; $a++)
{
$stmt = $db->prepare($query);
$id = rand(1,75000);
$score = rand(1,100000);
$time = rand(1367038800 ,1369630800);
$stmt->bind_param("iii",$id,$score,$time);
$stmt->execute();
}
}
I am trying to populate a data table in mysql with a million rows of data. However, this process is extremely slow. Is there anything obvious I'm doing wrong that I could fix in order to make it run faster?

As hinted in the comments, you need to reduce the number of queries by catenating as many inserts as possible together. In PHP, it is easy to achieve that:
$query = "insert into scoretable values";
for($a = 0; $a < 1000000; $a++) {
$id = rand(1,75000);
$score = rand(1,100000);
$time = rand(1367038800 ,1369630800);
$query .= "($id, $score, $time),";
}
$query[strlen($query)-1]= ' ';
There is a limit on the maximum size of queries you can execute, which is directly related to the max_allowed_packet server setting (This page of the mysql documentation describes how to tune that setting to your advantage).
Therfore, you will have to reduce the loop count above to reach an appropriate query size, and repeat the process to reach the total number you want to insert, by wrapping that code with another loop.
Another practice is to disable check constraints on the table you wish to do bulk insert:
ALTER TABLE yourtablename DISABLE KEYS;
SET FOREIGN_KEY_CHECKS=0;
-- bulk insert comes here
SET FOREIGN_KEY_CHECKS=1;
ALTER TABLE yourtablename ENABLE KEYS;
This practice however must be done carefully, especially in your case since you generate the values randomly. If you have any unique key within the columns you generate, you cannot use that technique with your query as it is, as it may generate a duplicate key insert. You probably want to add a IGNORE clause to it:
$query = "insert INGORE into scoretable values";
This will cause the server to silently ignore duplicate entries on unique keys. To reach the total number of requiered inserts, just loop as many time as needed to fill up the remaining missing lines.
I suppose that the only place where you could have a unique key constraint is on the id column. In that case, you will never be able to reach the number of lines you wish to have, since it is way above the range of random values you generate for that field. Consider raising that limit, or better yet, generate your ids differently (perhaps simply by using a counter, which will make sure every record is using a different key).

You are doing several things wrong. First thing you have to take into account is what MySQL engine you're using.
The default one is InnoDB, previously the default engine is MyISAM.
I'll write this answer under assumption you're using InnoDB, which you should be using for plethora of reasons.
InnoDB operates in something called autocommit mode. That means that every query you make is wrapped in a transaction.
To translate that to a language that us mere mortals can understand - every query you do without specifying BEGIN WORK; block is a transaction - ergo, MySQL will wait until hard drive confirms data has been written.
Knowing that hard drives are slow (mechanical ones are still the ones most widely used), that means your inserts will be as fast as the hard drive is. Usually, mechanical hard drives can perform about 300 input output operations per second, ergo assuming you can do 300 inserts a second - yes, you'll wait quite a bit to insert 1 million records.
So, knowing how things work - you can use them to your advantage.
The amount of data that the HDD will write per transaction will be generally very small (4KB or even less), and knowing today's HDDs can write over 100MB/sec - that indicates that we should wrap several queries into a single transaction.
That way MySQL will send quite a bit of data and wait for the HDD to confirm it wrote everything and that the whole world is fine and dandy.
So, assuming you have 1M rows you want to populate - you'll execute 1M queries. If your transactions commit 1000 queries at a time, you should perform only about 1000 write operations.
That way, your code becomes something like this:
(I am not familiar with mysqli interface so function names might be wrong, and seeing I'm typing without actually running the code - the example might not work so use it at your own risk)
function generateRandomData()
{
$db = new mysqli('localhost','XXX','XXX','scores');
if(mysqli_connect_errno()) {
echo 'Failed to connect to database. Please try again later.';
exit;
}
$query = "insert into scoretable values(?,?,?)";
// We prepare ONCE, that's the point of prepared statements
$stmt = $db->prepare($query);
$start = 0;
$top = 1000000;
for($a = $start; $a < $top; $a++)
{
// If this is the very first iteration, start the transaction
if($a == 0)
{
$db->begin_transaction();
}
$id = rand(1,75000);
$score = rand(1,100000);
$time = rand(1367038800 ,1369630800);
$stmt->bind_param("iii",$id,$score,$time);
$stmt->execute();
// Commit on every thousandth query
if( ($a % 1000) == 0 && $a != ($top - 1) )
{
$db->commit();
$db->begin_transaction();
}
// If this is the very last query, then we just need to commit and end
if($a == ($top - 1) )
{
$db->commit();
}
}
}

DB querying involves many interrelated tasks. As a result it is an 'expensive' process. It is even more 'expensive' when it comes to insertion/update.
Running query once is the best way to enhance performance.
You can prepare the statements in the loop and run it once.
eg.
$query = "insert into scoretable values ";
for($a = 0; $a < 1000000; $a++)
{
$values = " ('".$?."','".$?."','".$?."'), ";
$query.=$values;
...
}
...
//remove the last comma
...
$stmt = $db->prepare($query);
...
$stmt->execute();

Have a look at this gist I've created. It takes about 5 minutes to insert a million rows on my laptop.

How do I make sure the rapidly changing data from my MySQL DB is accurately represented in php scripts?

I have a database with lots of game objects,
which is being queried by the following 3 PHP scripts.
List objects: gets a JSON object with all the items I need
Add object: adds an object to the database
Reset: wipes all objects from the table
All three of them work, somewhat. Although, there is a timing mismatch. When the game calls the reset function, it restarts. When the game restarts, it automatically loads all the objects. Unfortunately,and here's the problem, if the game has just been reset, objects will still be pulled by script 1.
I know of transactions, but I have never used them and I have no idea how I would implement those here, since my transaction is involving things from different scripts that are run at different times.
For bonus credit: will this setup (AS3 > php > MySQL) get me in trouble with a heavy load? The game might get picked up by 10, 100, 1000 people, is there anything I can read about that subject?
Edit: new idea/question
Currently, the wiping works as such: The objects table has a field 'deleted' which is set to '1' when the reset method is called. It might be smarter to copy the existing data into an archive table and then truncate the live table...
Edit: Here's the (relevant) PHP code I'm using
Add Object:
if ($db_found) {
$x = $_GET['x'];
$y = $_GET['y'];
$type = $_GET['type'];
$name = $_GET['name'];
$text = $_GET['text'];
$SQL = "INSERT INTO bodies (x,y,type,name,text)
VALUES ('".$x."','".$y."','".$type."','".$name."','".$text."' )";
if (!mysql_query($SQL))
{
die('Error: ' . mysql_error());
}
};
mysql_close($db_handle);
List/Get Objects:
if ($db_found) {
$SQL = "SELECT * FROM bodies WHERE deleted = 0";
$res = mysql_query($SQL);
$rows = array();
while($r = mysql_fetch_assoc($res)) {
print $r['x'] . ','
. $r['y']
. ','
. $r['type']
. ','
. $r['name']
. ','
. $r['text']
. ';';
}
};
mysql_close($db_handle);
Reset: (EDIT 2)
mysql_query("LOCK TABLES bodies WRITE;");
$SQL = " DELETE FROM bodies";
if (!mysql_query($SQL))
{
die('Error: ' . mysql_error());
}
};
mysql_query("UNLOCK TABLES;");

How to do Transactions in MySQL.
In your case you might be interessted in the atomicity and isolation of transactions, meaning that when restarting a game, you want to ensure that before the reset has not fully finished, nobody can fetch any of your intermediate data. Doing the reset inside a transaction will ensure this property*. (* for TRUNCATE see below)
You will need InnoDB as your Engine for all tables that are involved in your transactions. MyISAM does not support transactions.
Changing large amounts of data inside a transaction can potentially cause high query delays, as transaction use special undo/redo-logs to be able to undo all the things you did in your transaction, if you decide to ROLLBACK.
I wouldn't wipe the tables when starting a new game. Instead give your data a game_id and use a new game_id when starting a new game. Space shouldn't really be an issue nowadays. This has the advantage that you will need little to none table locking when reseting the game.
If you must, be sure to use TRUNCATE when clearing out the tables. As far as I know TRUNCATE in MySQL cannot be rolled back, so doing it inside a transaction won't do anything useful.
I think PHP/MySQL will perform fine if used correctly, even for larger visitor counts. You can use profiling tools like xdebug or the MySQL slow query log to trace and remove performance bottle necks.

Downloading Large Data Sets -> Text to MySQL or just to MySQL?

I'm downloading large sets of data via an XML Query through PHP with the following scenario:
- Query for records 1-1000, download all parts (1000 parts has roughly 4.5 megs of text), then store those in memory while i query the next 1001 - 2000, store in mem (up to potentially 400k)
I'm wondering if it would be better to write these entries to a text field, rather than storing them in memory and once the complete download is done trying to insert them all up into the DB or to try and write them to the DB as they come in.
Any suggestions would be greatly appreciated.
Cheers

You can run a query like this:
INSERT INTO table (id, text)
VALUES (null, 'foo'), (null, 'bar'), ..., (null, 'value no 1000');
Doing this you'll do the thing in one shoot, and the parser will be called once. The best you can do, is running something like this with the MySQL's Benchmark function, running 1000 times a query that inserts 1000 records, or 1000000 of inserts of one record.
(Sorry about the prev. answer, I've misunderstood the question).

I think write them to database as soon as you receive them. This will save memory and u don't have to execute a 400 times slower query at the end. You will need mechanism to deal with any problems that may occur in this process like a disconnection after 399K results.

In my experience it would be better to download everything in a temporary area and then, when you are sure that everything went well, to move the data (or the files) in place.
As you are using a database you may want to dump everything into a table, something like this code:
$error=false;
while ( ($row = getNextRow($db)) && !error ) {
$sql = "insert into temptable(key, value) values ($row[0], $row[1])";
if (mysql_query ($sql) ) {
echo '#';
} else {
$error=true;
}
}
if (!error) {
$sql = "insert into myTable (select * from temptable)";
if (mysql_query($sql) {
echo 'Finished';
} else {
echo 'Error';
}
}
Alternatively, if you know the table well, you can add a "new" flag field for newly inserted lines and update everything when you are finished.

Multiple MYSQL queries vs. Multiple php foreach loops

Database structure:
id galleryId type file_name description
1 `artists_2010-01-15_7c1ec` `image` `band602.jpg` `Red Umbrella Promo`
2 `artists_2010-01-15_7c1ec` `image` `nov7.jpg` `CD Release Party`
3 `artists_2010-01-15_7c1ec` `video` `band.flv` `Presskit`
I'm going to pull images out for one section of an application, videos on another, etc. Is it better to make multiple mysql queries for each section like so:
$query = mysql_query("SELECT * FROM galleries WHERE galleryId='$galleryId' && type='image');
...Or should I be building an associative array and just looping through the array over and over whenever I need to use the result set?
Thanks for the thoughts.

It depends what's more important: readability or performance. I'd expect a single query and prefilling PHP arrays would be faster to execute, since database connections are expensive, but then a simple query for each section is much more readable.
Unless you know (and not just hope) you're going to get a huge amount of traffic I'd go for separate queries and then worry about optimising if it looks like it'll be a problem. At that point there'll be other things you'll want to do anyway, such as building a data access layer and adding some caching.

If by "sections" you mean separate single pages (separate HTTP requests) that users can view, I would suggest query-per-type as needed. If on a page where there are only image data sets, you really don't need to fetch the video data set for example. You won't be really saving much time fetching everything, since you will be connecting to the database for every page hit anyway (I assume.)
If by "sections" you mean different parts of one page, then fetch everything at once. This will save you time on querying (only one query.)
But depending on the size of your data set, you could run into trouble with PHP's memory limit querying for everything, though. You could then try raising the memory limit, but if that fails you'll probably have to fall back to query-per-type.
Using the query-per-type approach moves some of the computing load to the database server, as you will only be requesting and fetching what you really need. And you don't have to write code to filter and sort your results. Filtering and sorting is something the database is generally better at than PHP code. If at all possible, enable MySQL's query cache, that will speed up these queries much more than anything you could write in PHP.

If your data is all coming from one table, I would only do one query.
I presume you are building a single page with a section for pictures, a section for video, a section for music, etc. Write your query return results sorted by media type - iterate through all the pictures, then all the video, then all the music.

Better to have multiple queries. Every time you run a query all the data is getting pulled out and loaded into memory. If you have 5 different types, it means each page of that type is loading 5 times as much data as it needs to do.
Even with just one at a time, you are probably going to want to start paginating with LIMIT/OFFSET queries fairly quickly if you have more than 100 or however many you can reasonably display on one page at a time.

It really depends,
IN operator
ini_set('memory_limit', '-1');
$startMemory = memory_get_usage();
$conn = mysqli_connect("localhost", "", "", "");
$ar = array();
$sql = "SELECT * FROM table WHERE e IN (.....)";
$result = mysqli_query($conn, $sql);
while ($row = mysqli_fetch_assoc($result)) {
$ar[$row['c']] = $row;
}
echo (memory_get_usage() - $startMemory) / 1024 / 1024, ' MB'; //1409.7124481201
$end_time = microtime(true);
echo ($end_time - $start_time) . ' Seconds'; //5.2406549453735 Seconds
Foreach
ini_set('memory_limit', '-1');
$startMemory = memory_get_usage();
$conn = mysqli_connect("localhost", "", "", "");
$ar = array();
$array_loop = array(....)
foreach($array_loop as $key => $value){
$sql = "SELECT * FROM table WHERE e = '$value'";
$result = mysqli_query($conn, $sql);
while ($row = mysqli_fetch_assoc($result)) {
$ar[$row['c']] = $row;
}
}
echo (memory_get_usage() - $startMemory) / 1024 / 1024, ' MB'; //42.773330688477 MB
$end_time = microtime(true);
echo ($end_time - $start_time) . ' Seconds'; //12.469061136246 Seconds
I noticed that foreach consumes time but not memory and IN operator consumes memory but not time. All the test done based on test data generated by sql procudre about 1 Million

Batch insertion of data to MySQL database using php

I have a thousands of data parsed from huge XML to be inserted into database table using PHP and MySQL. My Problem is it takes too long to insert all the data into table. Is there a way that my data are split into smaller group so that the process of insertion is by group? How can set up a script that will process the data by 100 for example? Here's my code:
foreach($itemList as $key => $item){
$download_records = new DownloadRecords();
//check first if the content exists
if(!$download_records->selectRecordsFromCondition("WHERE Guid=".$guid."")){
/* do an insert here */
} else {
/*do an update */
}
}
*note: $itemList is around 62,000 and still growing.

Using a for loop?
But the quickest option to load data into MySQL is to use the LOAD DATA INFILE command, you can create the file to load via PHP and then feed it to MySQL via a different process (or as a final step in the original process).
If you cannot use a file, use the following syntax:
insert into table(col1, col2) VALUES (val1,val2), (val3,val4), (val5, val6)
so you reduce to total amount of sentences to run.
EDIT: Given your snippet, it seems you can benefit from the INSERT ... ON DUPLICATE KEY UPDATE syntax of MySQL, letting the database do the work and reducing the amount of queries. This assumes your table has a primary key or unique index.
To hit the DB every 100 rows you can do something like (PLEASE REVIEW IT AND FIX IT TO YOUR ENVIRONMENT)
$insertOrUpdateStatement1 = "INSERT INTO table (col1, col2) VALUES ";
$insertOrUpdateStatement2 = "ON DUPLICATE KEY UPDATE ";
$counter = 0;
$queries = array();
foreach($itemList as $key => $item){
$val1 = escape($item->col1); //escape is a function that will make
//the input safe from SQL injection.
//Depends on how are you accessing the DB
$val2 = escape($item->col2);
$queries[] = $insertOrUpdateStatement1.
"('$val1','$val2')".$insertOrUpdateStatement2.
"col1 = '$val1', col2 = '$val2'";
$counter++;
if ($counter % 100 == 0) {
executeQueries($queries);
$queries = array();
$counter = 0;
}
}
And executeQueries would grab the array and send a single multiple query:
function executeQueries($queries) {
$data = "";
foreach ($queries as $query) {
$data.=$query.";\n";
}
executeQuery($data);
}

Yes, just do what you'd expect to do.
You should not try to do bulk insertion from a web application if you think you might hit a timeout etc. Instead drop the file somewhere and have a daemon or cron etc, pick it up and run a batch job (If running from cron, be sure that only one instance runs at once).

You should put it as said before in a temp directory with a cron job to process files, in order to avoid timeouts (or user loosing network).
Use only the web for uploads.
If you really want to import to DB on a web request you can either do a bulk insert or use at least a transaction which should be faster.
Then for limiting inserts by batches of 100 (commiting your trasnsaction if a counter is count%100==0) and repeat until all your rows were inserted.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.