I am trying to update many rows (100 000+) in my database but it's taking a while (over 10 mins and still not finished). I'm wondering if this is intended behavior or is there something wrong in my code. To prevent the database from hanging while performing the update I've been told to update one row at a time, not sure if this is how it should be implemented.
I am updating images in my song table to be null if those songs were played in my playlist table
private function updateBlogSongs ($blog_id) {
$db = Yii::app()->db;
$affectedRows = 0;
$sql = "SELECT *
FROM `firstdatabase`.song s
INNER JOIN `seconddatabase`.playlist p ON s.name LIKE p.song_name";
$dataReader = $db->createCommand($sql)->query(); // Rows from the song table that were played in the given blog
$row = $dataReader->read();
while ($row != false) {
$sql = "UPDATE `firstdatabase`.song s
SET s.image = NULL
WHERE s.song_id = " . $row['song_id'];
$affectedRows += $db->createCommand($sql)->execute();
$row = $dataReader->read();
}
return $affectedRows;
}
Edit: after reading The Dog's comment I made some changes:
With 500 000 rows in the song table it takes about 10 minutes if I increase my batchSize to 10000 (was taking 8 hours with the code above). At 250 at the batch size it's taking about 50 minutes. I chose 250 because the query takes about 1 second to run, and it's taking 10+ seconds to run at 10000 batch size (constraint is 1 second). I would like to make it faster but not sure what else to change
$batchSize = 250;
$lastSongID = 0;
$rowIndex = 0;
$affectedRows = 0;
$sql = "SELECT max(song_id) FROM `firstdatabase`.song";
$lastSongID = intval($db->createCommand($sql)->query()->read()['max(song_id)']);
echo($lastSongID . ' songs in table.' . PHP_EOL);
echo('Updating songs...' . PHP_EOL);
while($rowIndex <= $lastSongID) {
$startTime = microtime(true);
$sql = "UPDATE `firstdatabase`.song
SET image = NULL
WHERE song_id in (
SELECT song_id
FROM (
SELECT song_id, name
FROM `firstdatabase`.song
WHERE song_id > " . $rowIndex . "
LIMIT " . $batchSize . "
) s
INNER JOIN (
SELECT DISTINCT song_name
FROM `seconddatabase`.playlist
) p ON s.name LIKE p.song_name
ORDER BY s.song_id ASC
)";
$affectedRows += $db->createCommand($sql)->execute();
$rowIndex += $batchSize;
$endTime = microtime(true);
$elapsedTime = round($endTime - $startTime, 2);
}
This is really more a question for the SQL world instead of the PHP world but here's my recommendations:
Don't do this one row at a time in a while loop. Make a more complex update statement that can do it all in one database hit. Database commands are the slowest part of your php code, you want to limit the number of calls you do to the database.
When you are confident that you can get the operation done in one sql command, or even if you don't think it is possible then pull your code into a stored procedure in the database. Having complex sql queries as stored procedures can help a lot with maintaining your code.
Make sure you have indexes on your tables. You need to make sure your queries hit those indexes for best performance.
Here's an option for the single query:
update `firstdatabase`.song
set image = null
where song_id in (
select s.song_id
from `firstdatabase`.song s
INNER JOIN `seconddatabase`.playlist p
ON s.name LIKE p.song_name"
);
Obviously we don't have access to your database so you'll need to make changes where necessary but hopefully it can get you on the right track.
EDIT:
Try replacing your second code set with the following:
$lastSongID = 0;
$rowIndex = 0;
$affectedRows = 0;
$sql = "SELECT max(song_id) FROM `firstdatabase`.song";
$lastSongID = intval($db->createCommand($sql)->query()->read()['max(song_id)']);
echo($lastSongID . ' songs in table.' . PHP_EOL);
echo('Updating songs...' . PHP_EOL);
$startTime = microtime(true);
$sql = "
update `firstdatabase`.song
set image = null
where song_id in (
select s.song_id
from `firstdatabase`.song s
INNER JOIN `seconddatabase`.playlist p
ON s.name LIKE p.song_name"
)";
$affectedRows += $db->createCommand($sql)->execute();
$endTime = microtime(true);
$elapsedTime = round($endTime - $startTime, 2);
If it works, then let me know the time it takes to run, if it doesn't work, is it an issue with the SQL (again I can't see the tables so I'm guessing).
Related
I have one table based on which one I have to update 6 rows in the other table for matching ids. It is total of over 1000 records so most of the time I get timeout error with current script.
The way I do it now is, I select the range of ids between two dates from the first table, store it into an array and then run foreach loop making update in the second table where the ids are the same, so basically I run a query for every single id.
Is there anyway I could speed it up the process?
I found only a way to generate the each within the foreach loop
UPDATE product SET price = CASE
WHEN ID = $ID1 THEN $price1
WHEN ID = $ID1 THEN $price2
END
But I don't know how could I modify this to update multiple rows at the same time not just one.
My script code look like that
$sql = "SELECT * FROM `games` where (ev_tstamp >= '".$timestamp1."' and ev_tstamp <= '".$timestamp2."')";
while($row = mysqli_fetch_array($sql1)){
$one_of =[
"fix_id" =>$row['fix_id'],
"t1_res" =>$row['t1_res'],
"t2_res" =>$row['t2_res'],
"ht_res_t1" =>$row['ht_res_t1'],
"ht_res_t2" =>$row['ht_res_t2'],
"y_card_t1" =>$row['y_card_t1'],
"y_card_t2" =>$row['y_card_t2'],
"t1_corners" =>$row['t1_corners'],
"t2_corners" =>$row['t2_corners'],
"red_card_t1" =>$row['red_card_t1'],
"red_card_t2" =>$row['red_card_t2']
];
array_push($today_games,$one_of);
}
foreach($today_games as $key=>$val){
$cards_t1=$val['red_card_t1']+$val['y_card_t1'];
$cards_t2=$val['red_card_t2']+$val['y_card_t2'];
$sql = "Update sights SET t1_res='".$val['t1_res']."',
t2_res='".$val['t2_res']."', ev_tstamp='".$val['ev_tstamp']."',
ht_res_t1='".$val['ht_res_t1']."', ht_res_t2='".$val['ht_res_t2']."',
t1_corners='".$val['t1_corners']."',t2_corners='".$val['t2_corners']."',
t1_cards='".$cards_t1."',t2_cards='".$cards_t2."'
where fix_id='".$val['fix_id']."' "
}
Consider an UPDATE...JOIN query using fix_id as join column. Below runs mysqli parameterized query using timestamps. No loop needed.
$sql = "UPDATE sights s
INNER JOIN `games` g
ON s.fix_id = g.fix_id
AND g.ev_tstamp >= ? and g.ev_tstamp <= ?
SET s.t1_res. = g.t1_res,
s.t2_res. = g.t2_res,
s.ev_tstamp = g.ev_tstamp,
s.ht_res_t1 = g.ht_res_t1,
s.ht_res_t2 = g.ht_res_t2,
s.t1_corners = g.t1_corners,
s.t2_corners = g.t2_corners,
s.t1_cards = (g.red_card_t1 + g.y_card_t1),
s.t2_cards = (g.red_card_t2 + g.y_card_t2)";
$stmt = mysqli_prepare($conn, $sql);
mysqli_stmt_bind_param($stmt, 'ss', $timestamp1, $timestamp2);
mysqli_stmt_execute($stmt);
This is 4 queries put into one. This is really old code and once I can make this work we can update it later to PDO for security. What I am trying to do is count rows from
select count(*) from dialogue_employees d_e,
dialogue_leaders d_l where
d_l.leader_group_id = d_e.leader_group_id
and use it in a formula where I also count how many rows from dialogue.status = 1.
The formula is on the bottom to create a percentage total from the results. This is PHP and MySQL and I wasn't sure the best way to count the rows and put them as a variable in php to be used in the formula on the bottom?
function calculate_site_score($start_date, $end_date, $status){
while($rows=mysql_fetch_array($sqls)){
$query = "
SELECT
dialogue.cycle_id,
$completecount = sum(dialogue.status) AS calculation,
$total_employees = count(dialogue_employees AND dialogue_leaders), dialogue_list.*,
FROM dialogue,
(SELECT * FROM dialogue_list WHERE status =1) AS status,
dialogue_employees d_e,
u.fname, u.lname, d_e.*
user u,
dialogue_list,
dialogue_leaders d_l
LEFT JOIN dialogue_list d_list
ON d_e.employee_id = d_list.employee_id,
WHERE
d_l.leader_group_id = d_e.leader_group_id
AND d_l.cycle_id = dialogue.cycle_id
AND u.userID = d_e.employee_id
AND dialogue_list.employee_id
AND site_id='$_SESSION[siteID]'
AND start_date >= '$start_date'
AND start_date <= '$end_date'";
$sqls=mysql_query($query) or die(mysql_error());
}
$sitescore=($completecount/$total_employees)*100;
return round($sitescore,2);
}
If you separate out your queries you will gain more control over your data. You have to be careful what your counting. It's pretty crowded in there.
If you just wanted to clean up your function you can stack your queries like this so they make more sense, that function is very crowded.
function calculate_site_score($start_date, $end_date, $status){
$query="select * from dialogue;";
if ($result = $mysqli->query($query))) {
//iterate your result
$neededElem = $result['elem'];
$query="select * from dialogue_list where status =1 and otherElem = " . $neededElem . ";";
//give it a name other than $sqls, something that makes sense.
$list = $mysqli->query($query);
//iterate list, and parse results for what you need
foreach($list as $k => $v){
//go a level deeper, or calculate, rinse and repeat
}
}
Then do your counts separately.
So it would help if you separate queries each on their own.
Here is a count example How do I count columns of a table
I have a SELECT statement that pulls a limited number of items based on the value of one of the fields. (ie ORDER BY rate LIMIT 15).
However, I need to do some comparisons that and change the value of rate, and subsequently could alter the results that I want.
I could pull everything (without the LIMIT), alter the rate, re-sort, and then just process the number that I need. However, I don't know if it's possible to alter values in a php result array. I'm using:
$query_raw = "SELECT dl.dragon_list_id, dl.dragon_id, dl.dragon_name, dl.dragon_level, d.type, d.opposite, d.image, dr.dragon_earn_rate
FROM dragon_list dl
LEFT JOIN dragons d ON d.dragon_id = dl.dragon_id
LEFT JOIN dragon_rates dr ON dr.dragon_id = dl.dragon_id
AND dr.dragon_level = dl.dragon_level
WHERE dl.dragon_id IN (
SELECT dragon_id
FROM dragon_elements
WHERE element_id = 3
)
AND dl.dragon_list_id NOT IN (
SELECT dh.dragon_list_id
FROM dragon_to_habitat dh, dragon_list dl
WHERE dl.user_id = 1
AND dh.dragon_list_id = dl.dragon_list_id
AND dl.is_deleted = 0
)
AND dl.user_id = " . $userid . "
AND dl.is_deleted = 0
ORDER BY dr.dragon_earn_rate DESC, dl.dragon_name
LIMIT 15;";
$query = mysqli_query($link, $query_raw);
if (!$query) {
echo "DB Error, could not query the database\n";
echo 'MySQL Error: ' . mysqli_error($link);
exit;
}
$d = mysqli_fetch_array($d_query);
Well, after a lot of research and some trial and error I found my answers....
Yes, I CAN alter the result rows using something like:
$result['field'] = $newvalue;
I also learned I could reset the pointer by using:
mysqli_data_seek($d_query,0);
However, when I reset the counter, I lost the changes I made. So ultimately, I'm still a little stuck, but individually I had the answers.
i have 3 tables table_a(4000 rows) and table_b(35000 rows) and table_c to store the result,
it takes 670 sec to complete...., is there another way to do this..?,( i also try left join , but the right table give result more than one, and the left result become more than one, and it takes about 300 sec to complete.....
autocommit = 0
$c_mgp = "select * from table_a where .......";
$c_mgp_r = mysqli_query($con_a,$c_mgp) or die (mysqli_error($con_a));
$multi_sq = '';
$r = 0;
while($c_mgp_f = mysqli_fetch_array($c_mgp_r)) {
$r++;
$mgpstat = trim($c_mgp_f['STATUS']);
$mgpval= trim($c_mgp_f['VAL']);
$sand = trim(($c_mgp_f['SAND']);
$multi_sq .= "insert into table_c (NAME,VAL,VAL_RES) values('$mgpstat','$mgpval',
(select SUM(VAL_RES) from table_b where DATE = '$date_a' and GRUP = '$grup' and ACNO= '$sand'));" //this part is the most important thing, $sand always different (and always more than one row in result) each loop
if($r == 500){
mysqli_multi_query...........;
$r=0;
$multi_sq='';
}
}
commit
many thanks for the help...
I need to count the number of rows from different(!) tables and save the results for some kind of statistic. The script is quite simple and working as expected, but I'm wondering if it's better to use a single query with (in this case) 8 subqueries, or if I should use separate 8 queries or if there's even a better, faster and more advanced solution...
I'm using MySQLi with prepared statements, so the single query could look like this:
$sql = 'SELECT
(SELECT COUNT(cat1_id) FROM `cat1`),
(SELECT COUNT(cat2_id) FROM `cat2`),
(SELECT COUNT(cat2_id) FROM `cat2` WHERE `date` >= DATE(NOW())),
(SELECT COUNT(cat3_id) FROM `cat3`),
(SELECT COUNT(cat4_id) FROM `cat4`),
(SELECT COUNT(cat5_id) FROM `cat5`),
(SELECT COUNT(cat6_id) FROM `cat6`),
(SELECT COUNT(cat7_id) FROM `cat7`)';
$stmt = $db->prepare($sql);
$stmt->execute();
$stmt->bind_result($var1, $var2, $var3, $var4, $var5, $var6, $var7, $var8);
$stmt->fetch();
$stmt->free_result();
$stmt->close();
while the seperate queries would look like this (x 8):
$sql = 'SELECT
COUNT(cat1_id)
FROM
`cat1`';
$stmt = $db->prepare($sql);
$stmt->execute();
$stmt->bind_result($var1);
$stmt->fetch();
$stmt->free_result();
$stmt->close();
so, which would be faster or "better style" related to this kind of query (e.g. statistics, counter..)
My inclination is to put queries into the FROM rather than the SELECT, where possible. In this example, it requires a cross join between the tables:
select c1.val, c2.val . . .
from (select count(cat1_id) as val from cat1) c1 cross join
(select count(cat2_id as val from cat2) c2 cross join
. . .
The performance should be the same. However, the advantage appears with your cat2 table:
select c1.val, c2.val, c2.valnow, . . .
from (select count(cat1_id) as val from cat1) c1 cross join
(select count(cat2_id) as val
count(case when date >= date(now()) then cat2_id end)
from cat2
) c2 cross join
. . .
You get a real savings here by not having to scan the table twice to get two values. This also helps when you realize that you might want to modify queries to return more than one value.
I believe the cross join and select-within-select would have the same performance characteristics. The only way to really be sure is to test different versions.
The better way, is use just one query, because is only one conecction with database, instead of, if you use many queries, then are many conecctions with database, this process involves: coneccting and disconeccting, and this is more slower.
Just to follow up your comment, here is an example using one of my DBs. Using a prepared statement here buys you nothing. This multiple query in fact only executes one RPC to the D/B engine. All of the other calls are local to the PHP runtime system.
$db = new mysqli('localhost', 'user', 'password', 'blog');
$table = explode( ' ', 'articles banned comments config language members messages photo_albums photos');
foreach( $table as $t ) {
$sql[] = "select count(*) as count from blog_$t";
}
if ($db->multi_query( implode(';',$sql) )) {
foreach( $table as $t ) {
if ( ($rs = $db->store_result() ) &&
($row = $rs->fetch_row() ) ) {
$result[$t] = $row[0];
$rs->free();
$db->next_result(); // you must execute one per result set
}
}
}
$db->close();
var_dump( $result );
Just out of interest, I did an strace on this and the relevant four lines are
16:54:09.894296 write(4, "\211\1\0\0\3select count(*) as count fr"..., 397) = 397
16:54:09.895264 read(4, "\1\0\0\1\1\33\0\0\2\3def\0\0\0\5count\0\f?\0\25\0\0\0\10\201"..., 16384) = 544
16:54:09.896090 write(4, "\1\0\0\0\1", 5) = 5
16:54:09.896192 shutdown(4, 2 /* send and receive */) = 0
There was ~1 mSec between the query and the response to and from the MySQLd process (this is because this was on localhost, and the results were in its query cache, BTW).. and 0.8 mSec later the DB close was executed. And that's on my 4-yr old laptop.
Regarding to TerryE's example and the advice to use multi_query(!), I checked the manual and changed the script to fit my needs.. finally I got a solution that looks like this:
$sql = 'SELECT COUNT(cat1_id) as `cat1` FROM `cat1`;';
$sql .= 'SELECT COUNT(cat2_id) as `cat2` FROM `cat2`;';
$sql .= 'SELECT COUNT(cat2_id) as `cat2_b` FROM `cat2` WHERE `date` >= DATE(NOW());';
$sql .= 'SELECT COUNT(cat3_id) as `cat3` FROM `cat3`;';
$sql .= 'SELECT COUNT(cat4_id) as `cat4` FROM `cat4`;';
$sql .= 'SELECT COUNT(cat5_id) as `cat5` FROM `cat5`;';
$sql .= 'SELECT COUNT(cat6_id) as `cat6` FROM `cat6`;';
$sql .= 'SELECT COUNT(cat7_id) as `cat7` FROM `cat7`;';
if ($db->multi_query($sql))
{
do
{
if ($stmt = $db->store_result())
{
while ($row = $stmt->fetch_assoc())
{
foreach ($row as $key => $value)
{
$count[$key] = $value;
}
}
$stmt->free_result();
}
} while ($db->more_results() && $db->next_result());
}
There are some differences to TerryE's example, but the result is the same. I'm aware that there are 7 line at the beginning that are almost identical, but as soon as I need a WHERE clause or something else, I prefer this solution to a foreach loop where I'd need to add queries manually or use exceptions with if { ... } ...
As far as I can see, there should be no problem with my solution, or did I miss something?