MySQL trigger abort - php

MyIsam DB
To be specific, I insert into one table 5-10 rows at the time usually. After inserts are done, I want to get COUNT(DISTINCT(column)) and SUM(some_other_column) of the newly inserted rows and insert it into another table.
How I figured this might be done (but I don't know if it can) is to make a trigger on after insert and then make one select on table where I inserted the rows, insert it into another table and break trigger's for loop.
Suggestions please. I feel bad about this somehow.

Related

Splitting up data in MySQL to make it faster and more accessible

I have a MySQL database that is becoming really large. I can feel the site becoming slower because of this.
Now, on a lot of pages I only need a certain part of the data. For example, I store information about users every 5 minutes for history purposes. But on one page I only need the information that is the newest (not the whole history of data). I achieve this by a simple MAX(date) in my query.
Now I'm wondering if it wouldn't be better to make a separate table that just stores the latest data so that the query doesn't have to search for the latest data from a specific user between millions of rows but instead just has a table with only the latest data from every user.
The con here would be that I have to run 2 queries to insert the latest history in my database every 5 minutes, i.e. insert the new data in the history table and update the data in the latest history table.
The pro would be that MySQL has a lot less data to go through.
What are common ways to handle this kind of issue?
There are a number of ways to handle slow queries in large tables. The three most basic ways are:
1: Use indexes, and use them correctly. It is important to avoid table scans on large tables; this is almost always your most significant performance hit with single queries.
For example, if you're querying something like: select max(active_date) from activity where user_id=?, then create an index on the activity table for the user_id column. You can have multiple columns in an index, and multiple indexes on a table.
CREATE INDEX idx_user ON activity (user_id)
2: Use summary/"cache" tables. This is what you have suggested. In your case, you could apply an insert trigger to your activity table, which will update the your summary table whenever a new row gets inserted. This will mean that you won't need your code to execute two queries. For example:
CREATE TRIGGER update_summary
AFTER INSERT ON activity
FOR EACH ROW
UPDATE activity_summary SET last_active_date=new.active_date WHERE user_id=new.user_id
You can change that to check if a row exists for the user already and do an insert if it is their first activity. Or you can insert a row into the summary table when a user registers...Or whatever.
3: Review the query! Use MySQL's EXPLAIN command to grab a query plan to see what the optimizer does with your query. Use it to ensure that the optimizer is avoiding table scans on large tables (and either create or force an index if necesary).

Check if exists records to insert or update In MYSQL

Every week I need to load 50K~200K rows of records from a raw CSV file to my system.
Currently I am solution is to load the CVS to a temp table(empty it after the process), then run my Stored procedure to manipulate the data to different relevant tables in my system. If records already exists will run update query (80% records in CSV are already in my system table), if not exists will Insert the records.
The problem i am facing now is the tables are growing to few millions records, approx. 5~6 millions each tables.
"Select Exist" seems very slow too, after that i change to left join tables by batch also slow.
Even I just loaded 5K records it may took about few hours to finish the Stored Procedure process.
Any good and faster solutions to handle huge records when comparing tables to decide insert/update records?
Thanks!!
Jack
Do the following process which will reduce your time
First try to update the record and check the number of rows affected if number of rows affected = 0 then insert record.
But make sure every time you need to modify the modified_Date if modified_Date not exist in table then you need to add that because if the all data are same in new and old record then it will create new query just because there is no modification in table record so it will return 0.
Slow responds of MySQL is almost always a problem of wrong indexing or uncorrect use of it.
If you use keys or/and index correct, a INSERT ... ON DUPLICATE KEY UPDATE ... should work.
Try to work only on an existing index/key. Check your statements with a EXPLAIN SELECT.
IMHO your tmp-table based preprocessing is ok.

Optimized ways to update every record in a table after running some calculations on each row

There is a large table that holds millions of records. phpMyAdmin reports 1.2G size for the table.
There is a calculation that needs to be done for every row. The calculation is not simple (cannot be put in set col= calc format), it uses a stored function to get the values, so currently we have for each row a single update.
This is extremely slow and we want to optimize it.
Stored function:
https://gist.github.com/a9c2f9275644409dd19d
And this is called by this method for every row:
https://gist.github.com/82adfd97b9e5797feea6
This is performed on a off live server, and usually it is updated once per week.
What options we have here.
Why not setup a separate table to hold the computed values to take the load off your current table. It can have two columns: primary key for each row in your main table and a column for the computed value.
Then your process can be:
a) Truncate computedValues table - This is faster than trying to identify new rows
b) Compute the values and insert into the computed values table
c) So when ever you need your computed values you join to the computedValues table using a primary key join which is fast, and in case you need more computations well you just add new columns.
d) You can also update the main table using the computed values if you have to
Well, the problem doesn't seem to be the UPDATE query because no calculations are performed in the query itself. As it seems the calculations are performed first and then the UPDATE query is run. So the UPDATE should be quick enough.
When you say "this is extremely slow", I assume you are not referring to the UPDATE query but the complete process. Here are some quick thoughts:
As you said there are millions of records, updating those many entries is always time consuming. And if there are many columns and indexes defined on the table, it will add to the overhead.
I see that there are many REPLACE INTO queries in the function getNumberOfPeople(). These might as well be a reason for the slow process. Have you checked how efficient are these REPLACE INTO queries? Can you try removing them and then see if it has any impact on the UPDATE process.
There are a couple of SELECT queries too in getNumberOfPeople(). Check if they might be impacting the process and if so, try optimizing them.
In procedure updateGPCD(), you may try replacing SELECT COUNT(*) INTO _has_breakdown with SELECT COUNT(1) INTO _has_breakdown. In the same query, the WHERE condition is reading _ACCOUNT but this will fail when _ACCOUNT = 0, no?
On another suggestion, if it is the UPDATE that you think is slow because of reason 1, it might make sense to move the updating column gpcd outside usage_bill to another table. The only other column in the table should be the unique ID from usage_bill.
Hope the above make sense.

Proper way of 'updating' rows in MySQL

This is my db structure:
ID NAME SOMEVAL API_ID
1 TEST 123456 A123
2 TEST2 223232 A123
3 TEST3 918922 A999
4 TEST4 118922 A999
I'm filling it using a function that calls an API and gets some data from an external service.
The first run, I want to insert all the data I get back from the API. After that, each time I run the function, I just want to update the current rows and add rows in case I got them from the API call and are not in the db.
So my initial thought regarding the update process is to go through each row I get from the API and SELECT to see if it already exists.
I'm just wondering if this is the most efficient way to do it, or maybe it's better to DELETE the relevant rows from the db and just re-inserting them all.
NOTE: each batch of rows I get from the API has an API_ID, so when I say delete the rows i mean something like DELETE FROM table WHERE API_ID = 'A999' for example.
If you retrieving all the rows from the service i recommend you the drop all indexes, truncate the table, then insert all the data and recreate indexes.
If you retrieving some data from the service i would drop all indexes, remove all relevant rows, insert all rows then recreate all indexes.
In such scenarios I'm usually going with:
start transaction
get row from external source
select local store to check if it's there
if it's there: update its values, remember local row id in list
if it's not there: insert it, remember local row id in list
at the end delete all rows that are not in remembered list of local row ids (NOT IN clause if the count of ids allows for this, or other ways if it's possible that there will be many deleted rows)
commit transaction
Why? Because usually I have local rows referenced by other tables, and deleting them all would break the references (not to mention deletete cascade).
I don't see any problem in performing SELECT, then deciding between an INSERT or UPDATE. However, MySQL has the ability to perform so-called "upserts", where it will insert a row if it does not exist, or update an existing row otherwise.
This SO answer shows how to do that.
I would recommend using INSERT...ON DUPLICATE KEY UPDATE.
If you use INSERT IGNORE, then the row won't actually be inserted if it results in a duplicate key on API_ID.
Add unique key index on API_ID column.
If you have all of the data returned from the API that you need to completely reconstruct the rows after you delete them, then go ahead and delete them, and insert afterwards.
Be sure, though, that you do this in a transaction, and that you are using an engine that supports transactions properly, such as InnoDB, so that other clients of the database don't see rows missing from the table just because they are going to be updated.
For efficiency, you should insert as many rows as you can in a single query. Much faster that way.
BEGIN;
DELETE FROM table WHERE API_ID = 'A987';
INSERT INTO table (NAME, SOMEVAL, API_ID) VALUES
('TEST5', 12345, 'A987'),
('TEST6', 23456, 'A987'),
('TEST7', 34567, 'A987'),
...
('TEST123', 123321, 'A987');
COMMIT;

In MySQL, is it faster to delete and then insert or is it faster to update existing rows?

First of all, let me just say that I'm using the PHP framework Yii, so I'd like to stay within its defined set of SQL statement if possible. I know I could probably create one huge long SQL statement that would do everything, but I'd rather not go there.
OK, imagine I have a table Users and a table FavColors. Then I have a form where users can select their color preferences by checking one or more checkboxes from a large list of possible colors.
Those results are stored as multiple rows in the FavColors table like this (id, user_id, color_id).
Now imagine the user goes in and changes their color preference. In this scenario, what would be the most efficient way to get the new color preferences into the database?
Option 1:
Do a mass delete of all rows where user_id matches
Then do a mass insert of all new rows
Option 2:
Go through each current row to see what's changed, and update accordingly
If more rows need to be inserted, do that.
If rows need to be deleted, do that.
I like option one because it only requires two statements, but something just feels wrong about deleting a row just to potentially put back almost the exact same data in. There's also the issue of making the ids auto-increment to higher values more quickly, and I don't know if that should be avoided whenever possible.
Option 2 will require a lot more programming work, but would prevent situations where I'd delete a row just to create it again. However, adding more load in PHP may not be worth the decrease in load for MySQL.
Any thoughts? What would you all do?
UPDATE is by far much faster. When you UPDATE, the table records are just being rewritten with new data. And all this must be done again on INSERT.
When you DELETE, the indexes should be updated (remember, you delete the whole row, not only the columns you need to modify) and data blocks may be moved (if you hit the PCTFREE limit). Also deleting and adding new changes records IDs on auto_increment, so if those records have relationships that would be broken, or would need updates too. I'd go for UPDATE.
That's why you should prefer INSERT ... ON DUPLICATE KEY UPDATE instead of REPLACE.
The former one is an UPDATE operation in case of a key violation, while the latter one is DELETE / INSERT
UPDATE: Here's an example INSERT INTO table (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
For more details read update documentation
Philip,
Have you tried doing prepared statements? With prepared statements you can batch one query with different parameters and call it multiple times. At the end of your loop, you can execute all of them with minimal amount of network latency. I have used prepared statements with php and it works great. Little more confusing than java prepared statements.

Categories