I have approx 230K records in MySQL user table and i was updating 14K user information via php with simple foreach from variable and using update query to update the column of 14K user . It took 1 hour, 9 min approx to run the query. First it took 13 min to update approx 1100 records and than it took another 56 min to update the remaining records.
Is this usual or do i need to upgrade my system, where i am running. I have 8GB RAM, i5 processor 2nd gen and using ubuntu 14.04.
If this is usual, what is the best way big companies handle such or bigger number of user records in fastest possible way in mysql ?
Just in case, if some one needs to know the exact code that ran.
foreach($_SESSION['tupple'] as $key => $value) {
$commitUpdate = $mysqli->query("UPDATE user_table set col_name='".$mysqli->real_escape_string($value['col_name'])."' where id='".$mysqli->real_escape_string($value['id'])."'");
}
You can speed things up using a prepared query, so it only has to parse the query once.
$stmt = $mysqli->prepare("UPDATE user_table SET col_name = ? WHERE id = ?");
$stmt->bind_param("si", $col_name, $id);
foreach ($_SESSION['tupple'] as $value) {
$col_name = $value['col_name'];
$id = $value['id'];
$stmt->execute();
}
Also, if you're using InnoDB, start a transaction before the loop, and commit it at the end.
$mysqli->begin_transaction(MYSQLI_TRANS_START_READ_WRITE);
// above loop
$mysqli->commit();
It's also possible to update multiple rows in a single query:
UPDATE user_table
SET col_name = CASE id
WHEN $id1 THEN '$name1'
WHEN $id2 THEN '$name2'
WHEN $id3 THEN '$name3'
...
END
WHERE id IN ($id1, $id2, $id3, ...)
You can use this type of syntax to group multiple entries in $_SESSION['tupple'] into batches.
Related
I am using the following code to check if a row exists in my database:
$sql = "SELECT COUNT(1) FROM myTable WHERE user_id = :id_var";
$stmt = $conn->prepare($sql);
$stmt->bindParam(':id_var', $id_var);
$stmt->execute();
if ($stmt->fetch()[0]>0)
{
//... many lines of code
}
All of the code works and the doubts I have are concerning if the previous code is clean and efficient or if there is room for improvement.
Currently there are two questions bugging me with my previous code:
Should I have a LIMIT 1 at the end of my SQL statement? Does COUNT(1) already limit the amount of rows found by 1 or does the server keep searching for more records even after finding the first one?
The if ($stmt->fetch()[0]>0). Would this be the cleanest way to fetch the information from the SQL Query and execute the "if conditional"?
Of course if anyone spots anything else that can improve my code, I would love your feedback.
Q: Should I have a LIMIT 1 at the end of my SQL statement? Does COUNT(1) already limit the amount of rows found by 1 or does the server keep searching for more records even after finding the first one?
Your SELECT COUNT() FROM query will return one row, if the execution is successful, because there is no GROUP BY clause. There's no need to add a LIMIT 1 clause, it wouldn't have any affect.
The database will search for all rows that satisfy the conditions in the WHERE clause. If the user_id column is UNIQUE, and there is an index with that as the leading column, or, if that column is the PRIMARY KEY of the table... then the search for all matching rows will be efficient, using the index. If there isn't an index, then MySQL will need to search all the rows in the table.
It's the index that buys you good performance. You could write the query differently, to get a usable result. But what you have is fine.
Q: Is this the cleanest...
if ($stmt->fetch()[0]>0)
My personal preference would be to avoid that construct, and break that up into two or more statements. The normal pattern...separate statement to fetch the row, and then do a test.
Personally, I would tend to avoid the COUNT() and just get a row, and test whether there was a row to fetch...
$sql = "SELECT 1 AS `row_exists` FROM myTable WHERE user_id = :id_var";
$stmt = $conn->prepare($sql);
$stmt->bindParam(':id_var', $id_var);
$stmt->execute();
if($stmt->fetch()) {
// row found
} else {
// row not found
}
$stmt->closeCursor();
I have a script with needs to update a value over night - each night.
My mysql db has 119k rows which is grouped into 35k rows.
For each of these rows i need to calculate the highest and the lowest value and the update the row with a new percent difference between these rows.
Right now i can't even execute the update with a limit of 50+
My code:
$query_updates = mysqli_query($con,"SELECT partner FROM trolls WHERE GROUP BY partner LIMIT 0, 50")
or die(mysqli_error($con));
while($item = mysqli_fetch_assoc($query_updates)) {
$query_updates_prices = mysqli_query($con,"SELECT
MIN(partner1) AS p1,
MAX(partner2) AS p2,
COUNT(partner3) AS p3
FROM trolls WHERE partner='". $item["partner"] ."'")
or die(mysqli_error($con));
$partner = mysqli_fetch_assoc($query_updates_prices);
$partner1 = $partner["p1"];
$partner2 = $partner["p2"];
$difference = $partner1 - $partner2;
$savings = round($difference / $partner1 * 100);
$partner3 = $prices["p3"];
$update_tyre = mysqli_query($con, "UPDATE trolls SET
partner1='". $partner1 ."',
partner2='". $partner2 ."',
partner3='". $partner3 ."',
partner4='". $savings ."'
WHERE partner='". $item["partner"] ."'")
or die(mysqli_error($con));
echo '<strong>Updated: '. $item["partner"] .'</strong><br>';
}
How can i make this more simple / able to execute?
+1 for the cron, also command line running would help you out as it won't timeout. However you might have problems with group by locking tables.
To be honest (you won't like this) but if you are doing a group by on a field with a large amount of fields, then I would say that you have done something wrong.
So I would look at redoing the tables, having a table for 'partner' and then referencing off trolls that would help.
But to give you a solution just to speed this up a touch, move you towards a better database/table setup and to remove the locking problem. I would do this.
Step 1.
Create a table called
Partners
Field1: partner_id
Field2: partner
Field3: p1
Field4: p2
Field5: p3
Step 2:
Run the query
SELECT partner FROM trolls (this could be changed in the future to SELECT * FROM partners)
Step 3:
Check are they in Partners - if not insert
Step 4:
Run your
SELECT
MIN(partner1) AS p1,
MAX(partner2) AS p2,
COUNT(partner3) AS p3
FROM trolls WHERE partner='". $item["partner"] ."'
Step 5:
Updates the values from this into the Partners table, and (for the time being) update the trolls table.
Done.
Oh and incase it's not already there add in an index to the partners field.
You can do those 2 SELECTs in one:
SELECT partner, MIN(partner1) AS p1, MAX(partner2) AS p2, COUNT(partner3) AS p3
FROM trolls GROUP BY partner LIMIT 0, 50
Create a BTREE index on trolls (partner) without locking the table:
CREATE INDEX CONCURRENTLY IX_TROLLS_PARTNER ON trolls USING btree(partner);
If you choose to still do those 2 SELECTs separated, use PDO->prepare instead of PDO->query, PDO->prepare doc on php.net:
Calling PDO::prepare() and PDOStatement::execute() for statements that will be issued multiple times with different parameter values optimizes the performance of your application by allowing the driver to negotiate client and/or server side caching of the query plan and meta information
Maybe change php.ini max_execution_time to a higher value if it's too low (I keep it 300 (5 minutes) but each case is a case :P ).
$insert_sql = "INSERT INTO exported_leads (lead_id, partner_id) VALUES ('$id','$partner_id')
ON DUPLICATE KEY UPDATE
lead_id = VALUES(lead_id), partner_id = VALUES(partner_id), export_date = CURRENT_TIMESTAMP";
$count = $pdo->exec($insert_sql);
$count_total = $count_total + $count;
i have this code. problem is that whenever there's an existing match in the db it overwrites the partner_id with a new value. i need to keep which one exported the row when. i was thinking about adding columns to cover those, but the problem with that is i dont know how many exporters there can be. sometimes it can be 5, sometimes it can be over 20.
i currently use this query to find what i should export:
$sql = $pdo->query('SELECT p.* FROM prospects p
LEFT JOIN exported_leads e
on p.id = e.lead_id WHERE p.partner_id != '.$partner_id.' AND (e.lead_id IS NULL OR datediff(now(), e.export_date) > 90)
LIMIT '.$monthly_uttag.'');
and this also shows why i cant just paste new rows in the already exported db. because it keeps finding the first record. not the newest entry first. which screws it up.
anyhow, how would you do this and still keep a date when every partner_id exported that row?
say for example the row id=1. partner1 exported that row on 2014-01-01 and partner3 exported it on 2014-05-15. i need to keep both those dates somehow.
how would you do this? im looking for an indefinitely large possible amount of partners
I have a script that selects a row from MySQL database.
Then updates this row. Like this:
$statement = $db->prepare("SELECT id, link from persons WHERE processing = 0");
$statement->execute();
$row = $statement->fetch();
$statement = $db->prepare("UPDATE persons SET processing = 1 WHERE id = :id");
$success = $statement->execute(array(':id' => $row['id']));
The script calls this php code multiple times simultaneously. And sometimes it SELECTS the row eventhough it should be "processing = 1" because the other script call it at the exact time.
How can I avoid this?
What you need to do is add some kind of lock here to prevent race conditions like the one you've created:
UPDATE persons SET processing=1 WHERE id=:id AND processing=0
That will avoid double-locking it.
To improve this even more, create a lock column you can use for claiming:
UPDATE persons
SET processing=:processing_uuid
WHERE processing IS NULL
LIMIT 1
This requires a VARCHAR, indexed processing column used for claiming that has a default of NULL. If you get a row modified in the results, you've claimed a record and can go and work with it by using:
SELECT * FROM persons WHERE processing=:processing_uuid
Each time you try and claim, generate a new claim UUID key.
Try using transactions for your queries. Read about them at the mysql dev site
You can wrap your code with:
$dbh->beginTransaction();
// ... your transactions here
$dbh->commit();
You'll find the documentation here.
Use SELECT FOR UPDATE
http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
eg
SELECT counter_field FROM child_codes FOR UPDATE;
UPDATE child_codes SET counter_field = counter_field + 1;
Wrap this in a transaction and the locks will be released when the transaction ends.
I have a lot of entries in a table that are fetched for performing jobs. this is scaled to several servers.
when a server fetches a bunch of rows to add to its own job queue they should be "locked" so that no other server fetches them.
when the update is performed a timestamp is increased and they are "unlocked".
i currently do this by updating a field that is called "jobserver" in the table that defaults to null with the id of the jobserver.
a job server only selects rows where the field is null.
when all rows are processed their timestamp is updated and finally the job field set to null again.
so i need to synchronize this:
$jobs = mysql_query("
SELECT itemId
FROM items
WHERE
jobserver IS NULL
AND
DATE_ADD(updated_at, INTERVAL 1 DAY) < NOW()
LIMIT 100
");
mysql_query("UPDATE items SET jobserver = 'current_job_server' WHERE itemId IN (".join(',',mysql_fetch_assoc($jobs)).")");
// do the update process in foreach loop
// update updated_at for each item and set jobserver to null
every server executes the above in an infinite loop. if no fields are returned, everything is up 2 date (last update is not longer ago than 24 hours) and is sent to 10 minutes.
I currently have MyIsam and i would like to stay with it because it had far better performance than innodb in my case, but i heard that innodb has ACID transactions.
So i could execute the select and update as one. but how would that look and work?
the problem is that i cannot afford to lock the table or something because other processes neeed to read/write and cannot be locked.
I am also open to a higher level solution like a shared semaphore etc. the problem is the synchronization needs to be across several servers.
is the approach generally sane? would you do it differently?
how can i synchronize the job selectino to ensure that two servers dont update the same rows?
You can run the UPDATE first but with the WHERE and LIMIT that you had on the SELECT. You then SELECT the rows that have the jobserver field set to your server.
If you can't afford to lock the tables, then I would make the update conditional on the row not being modified. Something like:
$timestamp = mysql_query("SELECT DATE_SUB(NOW(), INTERVAL 1 DAY)");
$jobs = mysql_query("
SELECT itemId
FROM items
WHERE
jobserver IS NULL
AND
updated_at < ".$timestamp."
LIMIT 100
");
// Update only those which haven't been updated in the meantime
mysql_query("UPDATE items SET jobserver = 'current_job_server' WHERE itemId IN (".join(',',mysql_fetch_assoc($jobs)).") AND updated_at < ".$timestamp);
// Now get a list of jobs which were updated
$actual_jobs_to_do = mysql_query("
SELECT itemId
FROM items
WHERE jobserver = 'current_job_server'
");
// Continue processing, with the actual list of jobs
You could even combine the select and update queries, like this:
mysql_query("
UPDATE items
SET jobserver = 'current_job_server'
WHERE jobserver IS NULL
AND updated_at < ".$timestamp."
LIMIT 100
");