My problem is explained below.
This is my PHP code running on my server right now :
$limit = 10000;
$annee = '2017';
//Counting the lines I need to delete
$sql = " SELECT COUNT(*) FROM historisation.cdr_".$annee." a
INNER JOIN transatel.cdr_transatel_v2 b ON a.id_cdr = b.id_cdr ";
$t = $db_transatel->selectAll($sql);
//The number of lines I have to delete
$i = $t[0][0];
do {
if ($i < $limit) {
$limit = $i;
}
//The problem is comming from that delete
$selectFromHistoryAndDelete = " DELETE FROM transatel.cdr_transatel_v2
WHERE id_cdr IN (
SELECT a.id_cdr FROM historisation.cdr_".$annee." a
INNER JOIN (SELECT id_cdr FROM historisation.cdr_transatel_v2) b ON a.id_cdr = b.id_cdr
)
LIMIT " . $limit;
$delete = $db_transatel->exec($selectFromHistoryAndDelete, $params);
$i = $i - $limit;
} while ($i > 0);
The execution of the query.
As you can see on the picture, in the first 195 loops the execution time was between 13 and 17 seconds.
It increased to 73 seconds on the 195th loop and to 1305 seconds on the 196th loop.
Now the query is running for 2000 seconds.
The query is deleting rows in a test table that no one is using right.
I'm deleting row 10,000 by 10,000 for the query to be quick and not overload the server.
I am wondering why is the execution time increasing like that, I though it will be quicker at the end because I though the inner join would be much quicker as they are less rows in the table.
Does anyone has an idea ?
Edit : The tables engine is MyISAM.
Based on your latest comment the inner join is redundant, since you're deleting from the table that contains the values you're joining on. In essence you're having to process b.id_cdr = a.id_cdr twice, since the number of values compared on cdr_2017 are not changed by the inner join, just the number of values queried to be deleted.
As for the cause of the incremental slowness, it is because you are manually performing the same function as SELECT cdr_id FROM cdr_2017 LIMIT 10000 OFFSET x.
That is to say, your query has to perform a full-table scan on cdr_2017 to determine the id values to delete. As you delete the values, the SQL optimizer has to move further through the cdr_2017 table to retrieve the values.
Resulting in
DELETE FROM IN(1,2,3,...10000)
DELETE FROM IN(1,2,3,...20000)
...
DELETE FROM IN(1,2,3,...1000000)
Assuming cdr_id is the incremental primary key, to resolve the issue you could use the last index retrieved from cdr_2017 to filter the selected values.
This will be much faster, as a full-table scan is no longer required to validate the joined records, since you're now utilizing an indexed value on both sides of the query.
$sql = " SELECT COUNT(a.cdr_id) FROM historisation.cdr_".$annee." a
INNER JOIN transatel.cdr_transatel_v2 b ON a.id_cdr = b.id_cdr ";
$t = $db_transatel->selectAll($sql);
//The number of lines I have to delete
$i = $t[0][0];
//set starting index
$previous = 0;
do {
if ($i < $limit) {
$limit = $i;
}
$selectFromHistoryAndDelete = 'DELETE d
FROM transatel.cdr_transatel_v2 AS d
JOIN (
SELECT #previous := cdr_id AS cdr_id
FROM historisation.cdr_2017
WHERE cdr_id > ' . $previous . '
ORDER BY cdr_id
LIMIT 10000
) AS a
ON a.cdr_id = d.cdr_id';
$db_transatel->exec($selectFromHistoryAndDelete, $params);
//retrieve last id selected in cdr_2017 to use in next iteration
$v = $db_transatel->selectAll('SELECT #previous'); //prefer fetchColumn
$previous = $v[0][0];
$i = $i - $limit;
} while ($i > 0);
//optionally reclaim table-space
$db_transatel->exec('OPTIMIZE TABLE transatel.cdr_transatel_v2', $params);
You could also refactor to use cdr_id > $previous AND cdr_id < $last to remove the order by limit clauses, which should also improve performance.
Though I would like to note, that a table lock on cdr_transatel_v2 is performed during this operation by the MyISAM database engine. Due to the way MySQL handles concurrent sessions and queries, there is not much gain from a batch delete in this manner, and is really only applicable to InnoDB and transactions. Especially when using PHP with FastCGI, as opposed to Apache mod_php. Since other queries not on cdr_transatel_v2 will still be executed and write operations on cdr_transatel_v2 will still be queued. If using mod_php I would reduce the limit to 1,000 records to reduce queue times.
For more information see https://dev.mysql.com/doc/refman/5.7/en/internal-locking.html#internal-table-level-locking
Alternative approach.
Considering the large number of records that need to be deleted, when the records deleted exceed those that are kept, it would be more beneficial to invert the operation by using INSERT instead of DELETE.
#ensure the storage table doesn't exist already
DROP TABLE IF EXISTS cdr_transatel_temp;
#duplicate the structure of the original table
CREATE TABLE transatel.cdr_transatel_temp
LIKE transatel.cdr_transatel_v2;
#copy the records that are not to be deleted from the original table
INSERT transatel.cdr_transatel_temp
SELECT *
FROM transatel.cdr_transatel_v2 AS d
LEFT JOIN historisation.cdr_2017 AS b
ON b.cdr_id = d.cdr_id
WHERE b.cdr_id IS NULL;
#replace the original table with the storage table
RENAME TABLE transatel.cdr_transatel_v2 to transatel.backup,
transatel.cdr_transatel_temp to cdr_transatel_v2;
#remove the original table
DROP TABLE transatel.backup;
Related
I need to generate close to a million(100 batches of 10000 numbers) unique and random 12 digit codes for a scratch card application. This process will be repeated and will need an equal number of codes to be generated everytime.
Also the generated codes need to be entered in a db so that they can be verified later when a consumer enters this on my website. I am using PHP and Mysql to do this. These are the steps I am following
Get admin input on the number of batches and the codes per batch
Using for loop generate the code using
mt_rand(100000000000,999999999999)
Check every time a number is generated to see if a duplicate exists
in the db and if not add to results variable else regenerate.
Save generated number in db if unique
Repeat b,c, and d over required number of codes
Output codes to admin in a csv
Code used(removed most of the comments to make it less verbose and because I have already explained the steps earlier):
$totalLabels = $numBatch*$numLabelsPerBatch;
// file name for download
$fileName = $customerName."_scratchcodes_" . date('Ymdhs') . ".csv";
$flag = false;
$generatedCodeInfo = array();
// headers for download
header("Content-Disposition: attachment; filename=\"$fileName\"");
header("Content-Type: application/vnd.ms-excel");
$codeObject = new Codes();
//get new batch number
$batchNumber = $codeObject->getLastBatchNumber() + 1;
$random = array();
for ($i = 0; $i < $totalLabels; $i++) {
do{
$random[$i] = mt_rand(100000000000,999999999999); //need to optimize this to reduce collisions given the databse will be grow
}while(isCodeNotUnique($random[$i],$db));
$codeObject = new Codes();
$codeObject->UID = $random[$i];
$codeObject->customerName = $customerName;
$codeObject->batchNumber = $batchNumber;
$generatedCodeInfo[$i] = $codeObject->addCode();
//change batch number for next batch
if($i == ($numLabelsPerBatch-1)){$batchNumber++;}
//$generatedCodeInfo[i] = array("UID" => 10001,"OID"=>$random[$i]);
if(!$flag) {
// display column names as first row
echo implode("\t", array_keys($generatedCodeInfo[$i])) . "\n";
$flag = true;
}
// filter data
array_walk($generatedCodeInfo[$i], 'filterData');
echo implode("\t", array_values($generatedCodeInfo[$i])) . "\n";
}
function filterData(&$str)
{
$str = preg_replace("/\t/", "\\t", $str);
$str = preg_replace("/\r?\n/", "\\n", $str);
if(strstr($str, '"')) $str = '"' . str_replace('"', '""', $str) . '"';
}
function isCodeNotUnique($random){
$codeObject = new Codes();
$codeObject->UID = $random;
if(!empty($codeObject->getCodeByUID())){
return true;
}
return false;
}
Now this is taking really long to execute and I believe is not optimal.
How can I optimize so that the unique random numbers are generated quickly?
Will it be faster if the numbers were instead generated in mysql or other way rather than php and if so how do I do that?
When the db starts growing the duplicate check in step b will be really time consuming so how do I avoid that?
Is there a limit on the number of rows in mysql?
Note: The numbers need to be unique across all batches across lifetime of the application.
1) Divide your range of numbers up to smaller ranges based on the number of batches. E.g. if your range 0 - 1000 and you have 10 batches, then have a batch from 0 - 99, the next 100 - 199, etc. When you generate the numbers for a batch, only generate the random number from the batch range. This way you know that you can only have duplicate numbers within a batch.
Do not insert each number into the database individually, but store them in an array. When you generate a new random number, then check against the array, not the database using in_array() function. When the batch is complete, then use a single insert statement to insert the contents of the batch:
insert into yourtable (bignumber) values (1), (2), ..., (n)
Check MySQL's max_allowed_packet setting to see if it is able to receive the complete sql statement in one go.
Implement a fallback plan, just in case a duplicate value is still found during the insert (error handling and number regeneration).
2) MySQL is not that great on procedural stuff, so I would stick with an external language, such as php.
3) Add a unique index on the field containing the random numbers. If you try to insert a duplicate record, MySQL will prevent it and throws an error. It is really quick.
4) Depending on the actual table engine used (innodb, myisam, etc), its configuration, and the OS, certain limits may apply on the size of the table. See Maximum number of records in a MySQL database table question here on SO for a more detailed answer (check the most upvoted answer, not the accepted one).
You can do the following:
$random = getExistingCodes(); // Get what you already have (from the DB).
$random = array_flip($random); //Make them into keys
$existingCount = count($random); //The codes you already have
do {
$random[mt_rand(100000000000,999999999999)] = 1;
} while ((count($random)-$existingCount) < $totalLabels);
$random = array_keys($random);
When you generate a duplicate number it will just overwrite that key and not increase the count.
To insert you can start a transaction and do as many inserts as needed. MySQL will try to optimize all operations within a single transaction.
Here is a query that generates 1 million pseudo-random numbers without repetitions:
select cast( (#n := (13*#n + 97) % 899999999981)+1e11 as char(12)) as num
from (select #n := floor(rand() * 9e11) ) init,
(select 1 union select 2) m01,
(select 1 union select 2) m02,
(select 1 union select 2) m03,
(select 1 union select 2) m04,
(select 1 union select 2) m05,
(select 1 union select 2) m06,
(select 1 union select 2) m07,
(select 1 union select 2) m08,
(select 1 union select 2) m09,
(select 1 union select 2) m10,
(select 1 union select 2) m11,
(select 1 union select 2) m12,
(select 1 union select 2) m13,
(select 1 union select 2) m14,
(select 1 union select 2) m15,
(select 1 union select 2) m16,
(select 1 union select 2) m17,
(select 1 union select 2) m18,
(select 1 union select 2) m19,
(select 1 union select 2) m20
limit 1000000;
How it works
It starts by generating a random integer value n with 0 <= n < 900000000000. This number will have the function of the seed for the generated sequence:
#n := floor(rand() * 9e11)
Through multiple (20) joins with inline pairs of records, this single record is multiplied to 220 copies, which is just a bit over 1 million.
Then the selection starts, and as record after record is fetched, the value of the #n variable is modified according to this incremental formula:
#n := (13*#n + 97) % 899999999981
This formula is a linear congruential generator. The three constant numbers need to obey some rules to maximise the period (of non-repetition), but it is the easiest when 899999999981 is prime, which it is. In that case we have a period of 899999999981, meaning that the first 899999999981 generated numbers will be unique (and we need much less). This number is in fact the largest prime below 900000000000.
As a final step, 100000000000 is added to the number to ensure the number always has 12 digits, so excluding numbers that are smaller than 100000000000. Because of the choice of 899999999981 there will be 20 numbers that will never be generated, namely those between 999999999981 and 999999999999 inclusive.
As this generates 220 records, the limit clause will make sure this is chopped off to exactly one million records.
The cast to char(12) is optional, but may be necessary to visualise the 12-digit numbers without them being rendered on the screen in scientific notation. If you will use this to insert records, and the target data type is numeric, then you would leave out this conversion of course.
CREATE TABLE x (v BIGINT(12) ZEROFILL NOT NULL PRIMARY KEY);
INSERT IGNORE INTO x (v) VALUES
(FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()),
(FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()),
(FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()),
(FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()),
(FLOOR(1e12*RAND()), (FLOOR(1e12*RAND()), (FLOOR(1e12*RAND());
Do that INSERT 1e6/15 times.
Check COUNT(*) to see if you have a million. Do this until the table as a million rows:
INSERT IGNORE INTO x (v) VALUES
(FLOOR(1e12*RAND());
Notes:
ZEROFILL is assuming that you want the display to have leading zeros.
IGNORE is because there will be some number of duplicates. This avoids the costly check after each insert.
"Batch insert" is faster than one row at a time. (Doing 100 at a time is about optimal, but I am lazy.)
Potential problem: While I think the pattern of values for RAND() does not repeat at, say 2^16 or 2^32 values, I do not know for a fact. If you can't get to a million, then the random number generator is bad; you should switch to PHP's rand, or something else.
Beware of linear consequential random number generators. They are probably easily hacked. (I assume there is some "money" behind the scratch cards.)
Do not plan on mt_rand() being unique for small ranges
<?php
// Does mt_rand() repeat?
TryMT(100);
TryMT(100);
TryMT(1000);
TryMT(10000);
TryMT(1e6);
TryMT(1e8);
TryMT(1e10);
TryMT(1e12);
TryMT(1e14);
function TryMT($max) {
$h = [];
for ($j = 0; $j<$max; $j++) {
$v = mt_rand(1, $max);
if (isset($h[$v])) {
echo "Dup after $j iterations (limit=$max)<br>\n";
return;
}
$h[$v] = 1;
}
}
Sample output:
Dup after 7 iterations (limit=100)<br>
Dup after 13 iterations (limit=100)<br>
Dup after 29 iterations (limit=1000)<br>
Dup after 253 iterations (limit=10000)<br>
Dup after 245 iterations (limit=1000000)<br>
Dup after 3407 iterations (limit=100000000)<br>
Dup after 29667 iterations (limit=10000000000)<br>
Dup after 82046 iterations (limit=1000000000000)<br>
Dup after 42603 iterations (limit=1.0E+14)<br>
mt_rand() is a "good" random number generated because it does have dups.
I have project in php + mysql (over 2 000 000 rows). Please view this php code.
<?php
for($i=0;$i<20;$i++)
{
$start = rand(1,19980);
$select_images_url_q = "SELECT * FROM photo_gen WHERE folder='$folder' LIMIT $start,2 ";
$result_select = (mysql_query($select_images_url_q));
while($row = mysql_fetch_array($result_select))
{
echo '<li class="col-lg-2 col-md-3 col-sm-3 col-xs-4" style="height:150px">
<img class="img-responsive" src="http://static.gif.plus/'.$folder.'/'.$row['code'].'_s.gif">
</li>';
}
}
?>
This code work very slowly in $start = rand(1,19980); position, Please help how I can make select request with mysql random function, thank you
Depending on what your code is doing with $folder, you may be vulnerable to SQL injection.
For better security, consider moving to PDO or MySQLi and using prepared statements. I wrote a library called EasyDB to make it easier for developers to adopt better security practices.
The fast, sane, and efficient way to select N distinct random elements from a database is as follows:
Get the number of rows that match your condition (i.e. WHERE folder = ?).
Generate a random number between 0 and this number.
Select a row with a given offset like you did.
Store the ID of the previously generated row in an ever-growing list to exclude from the results, and decrement the number of rows.
An example that uses EasyDB is as follows:
// Connect to the database here:
$db = \ParagonIE\EasyDB\Factory::create(
'mysql;host=localhost;dbname=something',
'username',
'putastrongpasswordhere'
);
// Maintain an array of previous record IDs in $exclude
$exclude = array();
$count = $db->single('SELECT count(id) FROM photo_gen WHERE folder = ?', $folder);
// Select _up to_ 40 values. If we have less than 40 in the folder, stop
// when we've run out of photos to load:
$max = $count < 40 ? $count : 40;
// The loop:
for ($i = 0; $i < $max; ++$i) {
// The maximum value will decrease each iteration, which makes
// sense given that we are excluding one more result each time
$r = mt_rand(0, ($count - $i - 1));
// Dynamic query
$qs = "SELECT * FROM photo_gen WHERE folder = ?";
// We add AND id NOT IN (2,6,7,19, ...) to prevent duplicates:
if ($i > 0) {
$qs .= " AND id NOT IN (" . implode(', ', $exclude) . ")";
}
$qs .= "ORDER BY id ASC LIMIT ".$r.", 1";
$row = $db->row($qs, $folder);
/**
* Now you can operate on $row here. Feel free to copy the
* contents of your while($row=...) loop in place of this comment.
*/
// Prevent duplicates
$exclude []= (int) $row['id'];
}
Gordon's answer suggests using ORDER BY RAND(), which in general is a bad idea and can make your queries very slow. Furthermore, although he says that you shouldn't need to worry about there being less than 40 rows (presumably, because of the probability involved), this will fail in edge cases.
A quick note about mt_rand(): It's a biased and predictable random number generator with only 4 billion possible seeds. If you want better results, look into random_int() (PHP 7 only, but I'm working on a compatibility layer for PHP 5 projects. See the linked answer for more information.)
Actually, even though the table has 2+ million rows, I'm guessing that a given folder has many fewer. Hence, this should be reasonable with an index on photo_gen(folder):
SELECT *
FROM photo_gen
WHERE folder = '$folder'
ORDER BY rand()
LIMIT 40;
If a folder can still have tens or hundreds of thousands of examples, I would suggest a slight variation:
SELECT pg.**
FROM photo_gen pg cross join
(select count(*) cnt from photo_gen where folder = $folder) as cnt
WHERE folder = '$folder' and
rand() < 500 / cnt
ORDER BY rand()
LIMIT 40;
The WHERE expression should get about 500 rows (subject to the vagaries of sample variation). There is a really high confidence that there will be at least 40 (you don't need to worry about it). The final sort should be fast.
There are definitely other methods, but they are complicated by the where clause. The index is probably the key thing you need for improved performance.
It's better to firstly compose your SQL query (as a string in PHP) once and then just execute it once.
Or you could use this way to select values if it fits your case: Select n random rows from SQL Server table
I need to synchronize specific information between two databases (one mysql, the other a remote hosted SQL Server database) for thousands of rows. When I execute this php file it gets stuck/timeouts after several minutes I guess, so I wonder how I can fix this issue and maybe also optimize the way of "synchronizing" it.
What the code needs to do:
Basically I want to get for every row (= one account) in my database which gets updated - two specific pieces of information (= 2 SELECT queries) from another SQL Server database. Therefore I use a foreach loop which creates 2 SQL queries for each row and afterwards I update those information into 2 columns of this row. We talk about ~10k Rows which needs to run thru this foreach loop.
My idea which may help?
I have heard about things like PDO Transactions which should collect all those queries and sending them afterwards in a package of all SELECT queries, but I have no idea whether I use them correctly or whether they even help in such cases.
This is my current code, which is timing out after few minutes:
// DBH => MSSQL DB | DB => MySQL DB
$dbh->beginTransaction();
// Get all referral IDs which needs to be updated:
$listAccounts = "SELECT * FROM Gifting WHERE refsCompleted <= 100 ORDER BY idGifting ASC";
$ps_listAccounts = $db->prepare($listAccounts);
$ps_listAccounts->execute();
foreach($ps_listAccounts as $row) {
$refid=$row['refId'];
// Refsinserted
$refsInserted = "SELECT count(username) as done FROM accounts WHERE referral='$refid'";
$ps_refsInserted = $dbh->prepare($refsInserted);
$ps_refsInserted->execute();
$row = $ps_refsInserted->fetch();
$refsInserted = $row['done'];
// Refscompleted
$refsCompleted = "SELECT count(username) as done FROM accounts WHERE referral='$refid' AND finished=1";
$ps_refsCompleted = $dbh->prepare($refsCompleted);
$ps_refsCompleted->execute();
$row2 = $ps_refsCompleted->fetch();
$refsCompleted = $row2['done'];
// Update fields for local order db
$updateGifting = "UPDATE Gifting SET refsInserted = :refsInserted, refsCompleted = :refsCompleted WHERE refId = :refId";
$ps_updateGifting = $db->prepare($updateGifting);
$ps_updateGifting->bindParam(':refsInserted', $refsInserted);
$ps_updateGifting->bindParam(':refsCompleted', $refsCompleted);
$ps_updateGifting->bindParam(':refId', $refid);
$ps_updateGifting->execute();
echo "$refid: $refsInserted Refs inserted / $refsCompleted Refs completed<br>";
}
$dbh->commit();
You can do all of that in one query with a correlated sub-query:
UPDATE Gifting
SET
refsInserted=(SELECT COUNT(USERNAME)
FROM accounts
WHERE referral=Gifting.refId),
refsCompleted=(SELECT COUNT(USERNAME)
FROM accounts
WHERE referral=Gifting.refId
AND finished=1)
A correlated sub-query is essentially using a sub-query (query within a query) that references the parent query. So notice that in each of the sub-queries I am referencing the Gifting.refId column in the where clause of each sub-query. While this isn't the best for performance because each of those sub-queries still has to run independent of the other queries, it would perform much better (and likely as good as you are going to get) than what you have there.
Edit:
And just for reference. I don't know if a transaction will help here at all. Typically they are used when you have several queries that depend on each other and to give you a way to rollback if one fails. For example, banking transactions. You don't want the balance to deduct some amount until a purchase has been inserted. And if the purchase fails inserting for some reason, you want to rollback the change to the balance. So when inserting a purchase, you start a transaction, run the update balance query and the insert purchase query and only if both go in correctly and have been validated do you commit to save.
Edit2:
If I were doing this, without doing an export/import this is what I would do. This makes a few assumptions though. First is that you are using a mssql 2008 or newer and second is that the referral id is always a number. I'm also using a temp table that I insert numbers into because you can insert multiple rows easily with a single query and then run a single update query to update the gifting table. This temp table follows the structure CREATE TABLE tempTable (refId int, done int, total int).
//get list of referral accounts
//if you are using one column, only query for one column
$listAccounts = "SELECT DISTINCT refId FROM Gifting WHERE refsCompleted <= 100 ORDER BY idGifting ASC";
$ps_listAccounts = $db->prepare($listAccounts);
$ps_listAccounts->execute();
//loop over and get list of refIds from above.
$refIds = array();
foreach($ps_listAccounts as $row){
$refIds[] = $row['refId'];
}
if(count($refIds) > 0){
//implode into string for use in query below
$refIds = implode(',',$refIds);
//select out total count
$totalCount = "SELECT referral, COUNT(username) AS cnt FROM accounts WHERE referral IN ($refIds) GROUP BY referral";
$ps_totalCounts = $dbh->prepare($totalCount);
$ps_totalCounts->execute();
//add to array of counts
$counts = array();
//loop over total counts
foreach($ps_totalCounts as $row){
//if referral id not found, add it
if(!isset($counts[$row['referral']])){
$counts[$row['referral']] = array('total'=>0,'done'=>0);
}
//add to count
$counts[$row['referral']]['total'] += $row['cnt'];
}
$doneCount = "SELECT referral, COUNT(username) AS cnt FROM accounts WHERE finished=1 AND referral IN ($refIds) GROUP BY referral";
$ps_doneCounts = $dbh->prepare($doneCount);
$ps_doneCounts->execute();
//loop over total counts
foreach($ps_totalCounts as $row){
//if referral id not found, add it
if(!isset($counts[$row['referral']])){
$counts[$row['referral']] = array('total'=>0,'done'=>0);
}
//add to count
$counts[$row['referral']]['done'] += $row['cnt'];
}
//now loop over counts and generate insert queries to a temp table.
//I suggest using a temp table because you can insert multiple rows
//in one query and then the update is one query.
$sqlInsertList = array();
foreach($count as $refId=>$count){
$sqlInsertList[] = "({$refId}, {$count['done']}, {$count['total']})";
}
//clear out the temp table first so we are only inserting new rows
$truncSql = "TRUNCATE TABLE tempTable";
$ps_trunc = $db->prepare($truncSql);
$ps_trunc->execute();
//make insert sql with multiple insert rows
$insertSql = "INSERT INTO tempTable (refId, done, total) VALUES ".implode(',',$sqlInsertList);
//prepare sql for insert into mssql
$ps_insert = $db->prepare($insertSql);
$ps_insert->execute();
//sql to update existing rows
$updateSql = "UPDATE Gifting
SET refsInserted=(SELECT total FROM tempTable WHERE refId=Gifting.refId),
refsCompleted=(SELECT done FROM tempTable WHERE refId=Gifting.refId)
WHERE refId IN (SELECT refId FROM tempTable)
AND refsCompleted <= 100";
$ps_update = $db->prepare($updateSql);
$ps_update->execute();
} else {
echo "There were no reference ids found from \$dbh";
}
I wrote a product price/stock update script for Magento. I load the csv into an array and then iterate through it. The current code takes around 10 minutes to complete for 5,000 products, is there a faster way to do this? I've already bypassed Magento's API as that was extremely slow and switched to updating the database directly since its not many tables and its faster. Using timers to record the time, it takes about 10 minutes for the foreach loop and two minutes for the reindexALL
$con = mysql_connect("localhost","root","");
$selected = mysql_select_db("magento",$con);
$processes = Mage::getSingleton('index/indexer')->getProcessesCollection();
$processes->walk('setMode', array(Mage_Index_Model_Process::MODE_MANUAL));
$processes->walk('save');
foreach($all_rows as $final)
{
$sql = mysql_query("SELECT entity_id from catalog_product_entity where sku = '".$final[ITEM]."'");
if ($row = mysql_fetch_array($sql)) {
//update price
$pricenew = $final['PRICE'] + ($final['PRICE']*.30);
mysql_query("UPDATE catalog_product_entity_decimal SET value = '$pricenew' where attribute_id = 75 AND entity_id = '".$row[entity_id]."' ");
//update retail price
$retailprice = $final['RETAIL'];
mysql_query("UPDATE catalog_product_entity_decimal SET value = '$retailprice' where attribute_id = 120 AND entity_id = '".$row[entity_id]."' ");
//update stock quantity and is in stock
$stockquantity = $final['QTY'];
$stockquantity = number_format($stockquantity, 4, '.', '');
mysql_query("UPDATE cataloginventory_stock_item SET qty = '$stockquantity', SET is_in_stock = 1 where product_id = '".$row[entity_id]."' ");
}
$processes->walk('reindexAll');
$processes->walk('setMode', array(Mage_Index_Model_Process::MODE_REAL_TIME));
$processes->walk('save');
mysql_close($con);
If your table catalog_product_entity_decimal has index, that covers id (obviously it is) - then you have no other ways to speed it up. Since the slowest thing here is physical changing of the value.
Probably you can put a WHERE clause to to avoid of updating the price to the same value.
Other thoughts:
While most people look at performance optimizations for SELECT statements, UPDATE and DELETE statements are often overlooked. These can benefit from the principles of analyzing the Query Execution Plan (QEP). You can only run an EXPLAIN on a SELECT statement, however it’s possible to rewrite an UPDATE or DELETE statement to perform like a SELECT statement.
To optimize an UPDATE, look at the WHERE clause. If you are using the PRIMARY KEY, no further analysis is necessary. If you are not, it is of benefit to rewrite your UPDATE statement as a SELECT statement and obtain a QEP as previously detailed to ensure optimal indexes are used. For example:
UPDATE t
SET c1 = ‘x’, c2 = ‘y’, c3 = 100
WHERE c1 = ‘x’
AND d = CURDATE()
You can rewrite this UPDATE statement as a SELECT statement for using EXPLAIN:
EXPLAIN SELECT c1, c2, c3 FROM t WHERE c1 = ‘x’ AND d = CURDATE()
You should now apply the same principles as you would when optimizing SELECT statements.
I have a table with roughly 1 million rows. I'm doing a simple program that prints out one field from each row. However, when I started using mysql_pconnect and mysql_query the query would take a long time, I am assuming the query needs to finish before I can print out even the first row. Is there a way to process the data a bit at a time?
--Edited--
I am not looking to retrieve a small set of the data, I'm looking for a way to process the data a chunk at a time (say fetch 10 rows, print 10 rows, fetch 10 rows, print 10 rows etc etc) rather than wait for the query to retrieve 1 million rows (who knows how long) and then start the printing.
Printing one million fields will take some time. Retrieving one million records will take some time. Time adds up.
Have you profiled your code? I'm not sure using limit would make such a drastic difference in this case.
Doing something like this
while ($row = mysql_fetch_object($res)) {
echo $row->field."\n";
}
outputs one record at a time. It does not wait for the whole resultset to be returned.
If you are dealing with a browser you will need something more.
Such as this
ob_start();
$i = 0;
while ($row = mysql_fetch_object($res)) {
echo $row->field."\n";
if (($i++ % 1000) == 0) {
ob_flush();
}
}
ob_end_flush();
Do you really want to print one million fields?
The customary solution is to use some kind of output pagination in your web application, showing only part of the result. On SELECT queries you can use the LIMIT keyword to return only part of the data. This is basic SQL stuff, really. Example:
SELECT * FROM table WHERE (some conditions) LIMIT 40,20
shows 20 entries, starting from the 40th (off by one mistakes on my part may be possible).
It may be necessary to use ORDER BY along with LIMIT to prevent the ordering from randomly changing under your feet between requests.
This is commonly needed for pagination. You can use the limit keyword in your select query. Search for limit here:
The LIMIT clause can be used to constrain the number of rows returned by the SELECT statement. LIMIT takes one or two numeric arguments, which must both be nonnegative integer constants (except when using prepared statements).
With two arguments, the first argument specifies the offset of the first row to return, and the second specifies the maximum number of rows to return. The offset of the initial row is 0 (not 1):
SELECT * FROM tbl LIMIT 5,10; # Retrieve rows 6-15
To retrieve all rows from a certain offset up to the end of the result set, you can use some large number for the second parameter. This statement retrieves all rows from the 96th row to the last:
SELECT * FROM tbl LIMIT 95,18446744073709551615;
With one argument, the value specifies the number of rows to return from the beginning of the result set:
SELECT * FROM tbl LIMIT 5; # Retrieve first 5 rows
In other words, LIMIT row_count is equivalent to LIMIT 0, row_count.
You might be able to use
Mysqli::use_result
combined with a flush to output the data set to the browser. I know flush can be used to output data to the browser at an incremental state as I have used it before to do just that, however I am not sure if mysqli::use_result is the correct function to retrieve incomplete result sets.
This is how I do something like that in Oracle. I'm not sure how it would cross over:
declare
my_counter integer := 0;
begin
for cur in (
select id from table
) loop
begin
-- do whatever your trying to do
update table set name = 'steve' where id = cur.id;
my_counter := my_counter + 1;
if my_counter > 500 then
my_counter := 0;
commit;
end if;
end;
end loop;
commit;
end;
An example using the basic mysql driver.
define( 'CHUNK_SIZE', 500 );
$result = mysql_query( 'select count(*) as num from `table`' );
$row = mysql_fetch_assoc( $result );
$totalRecords = (int)$row['num'];
$offsets = ceil( $totalRecords / CHUNK_SIZE );
for ( $i = 0; $i < $offsets; $i++ )
{
$result = mysql_query( "select * from `table` limit " . CHUNK_SIZE . " offset " . ( $i * CHUNK_SIZE ) );
while ( $row = mysql_fetch_assoc( $result ) )
{
// your per-row operations here
}
unset( $result, $row );
}
This will iterate over your entire row volume, but do so only 500 rows at a time to keep memory usage down.
It sounds like you're hitting the limits of various buffer sizes within the mysql server... Some methods you could do would be to specify the field you want in the SQL statement to reduce this buffer size, or play around with the various admin settings.
OR, you can use a pagination like method but have it output all on one page...
(pseudocode)
function q($part) {
$off = $part*SIZE_OF_PARTITIONS;
$size = SIZE_OF_PARTITIONS;
return( execute_and_return_sql('SELECT `field` FROM `table` LIMIT $off, $size'));
}
$ii = 0;
while ($elements = q($ii)) {
print_fields($elements);
$ii++;
}
Use mysql_unbuffered_query() or if using PDO make sure PDO::MYSQL_ATTR_USE_BUFFERED_QUERY is false.
Also see this similar question.
Edit: and as others have said, you may wish to combine this with flushing your output buffer after each batch of processing, depending on your circumstances.