I have a MySQL database that is becoming really large. I can feel the site becoming slower because of this.
Now, on a lot of pages I only need a certain part of the data. For example, I store information about users every 5 minutes for history purposes. But on one page I only need the information that is the newest (not the whole history of data). I achieve this by a simple MAX(date) in my query.
Now I'm wondering if it wouldn't be better to make a separate table that just stores the latest data so that the query doesn't have to search for the latest data from a specific user between millions of rows but instead just has a table with only the latest data from every user.
The con here would be that I have to run 2 queries to insert the latest history in my database every 5 minutes, i.e. insert the new data in the history table and update the data in the latest history table.
The pro would be that MySQL has a lot less data to go through.
What are common ways to handle this kind of issue?
There are a number of ways to handle slow queries in large tables. The three most basic ways are:
1: Use indexes, and use them correctly. It is important to avoid table scans on large tables; this is almost always your most significant performance hit with single queries.
For example, if you're querying something like: select max(active_date) from activity where user_id=?, then create an index on the activity table for the user_id column. You can have multiple columns in an index, and multiple indexes on a table.
CREATE INDEX idx_user ON activity (user_id)
2: Use summary/"cache" tables. This is what you have suggested. In your case, you could apply an insert trigger to your activity table, which will update the your summary table whenever a new row gets inserted. This will mean that you won't need your code to execute two queries. For example:
CREATE TRIGGER update_summary
AFTER INSERT ON activity
FOR EACH ROW
UPDATE activity_summary SET last_active_date=new.active_date WHERE user_id=new.user_id
You can change that to check if a row exists for the user already and do an insert if it is their first activity. Or you can insert a row into the summary table when a user registers...Or whatever.
3: Review the query! Use MySQL's EXPLAIN command to grab a query plan to see what the optimizer does with your query. Use it to ensure that the optimizer is avoiding table scans on large tables (and either create or force an index if necesary).
I am uploading data into temp table after uploading. I need to performs joins and select few columns and push those data to another table in php . Is there any way of doing it without using triggers
If you don't want to use a trigger to react to an insertion in a database, you pretty much need to do it by polling that table to see if there is new insertions at a given time interval. You will need a flag in that polled table to know if a row has already been processed in a previous run to prevent reacting multiple times for a single insert.
One way of doing this could be with a CRON job ran at every minutes that execute a php script that simply "select all rows of your table where the "isProcessed" flag is false". Of course the default value for the "isProcessed" column must be false.
Then for each row obtained with the previous DB query, you do what you want to do, i.e. "performs joins and select few columns and push those data to another table in php" to quote yourself.
Then update the same rows in your original "temp" table so that their "isProcessed" is now at true.
I guess you have a good reason not to use database triggers , because triggers are there to do exactly what you want to do in the simplest way that it can be done.
I have a table called users and a table called pages. Users of the system can subscribe to a page and receive updates about the page. My problem is that users and pages will be updated dynamically (ie. no manual intervention to the tables) and I don't want to keep adding another column everytime someone subscribes to the page.
How can I achieve updating both the users table and the pages table dynamically to reflect that they have subscribed to that page?
My idea would be to add an comma separated array of usernames into the pages table and update them as users subscribe/unsubscribe.
Just making it an official answer:
While the initial hunch may be to use comma separated values to represent the link between those 2 tables (or any other way of saving the data in one column like saving a json string), it is actually bad practice because it does not conform to the First Normal Form (and definitely not 2nd and 3rd).
First Normal Form - Wikipedia
First Normal Form says you should never store more than 1 value in 1 table cell.
The problem, in short, starts when you'll need to use that data, which will actually take you at least 2 actions - 1 is reading the data from the database and 2nd is to parse it in your languaging script. Imagine what happens when you need then to use that data to read some other data from the database - you are making more sql queries than you need and taking at least twice the time (+resources). It becomes even more complicated when you need to use JOIN queries or have other one-to-many data relationships.
The solution then is simple - you need to create a 3rd table that serves as an intermediate table.
You can call it users_pages or user2pages and that represents the 1 to many relationship between 1 user and many pages.
The structure of the table is as simple as:
users_pages
-----------
-- id // a unique id for the relationship, can be auto generated
-- user_id // the user id
-- page_id // the page id
-----------
This allows you to build a more robust application as well as run advanced queries and calculations without the need to parse the data in your script (i.e count amount of pages each user is subscribed to, or amount of users subscribed to 1 page).
Unsubscribing can be also much easier this way since you don't need to read the users or pages table at all. You simply delete the relation from the users_pages table.
Without it, you will need to (a) first read the users table (b) get the pages data comma separated (c) parse the data and remove the specific page from it (d) save the new data again to the database. That's 4 actions and 2 SQL queries...
I hope this helps!
Hello I have a mysql database and all I want is basically to get a value on the second table from a first table query
I have figured something like this but is not working.
select src, dst_number, state, duration
from cdrs, area_code_infos
where SUBSTRING(cdrs.src,2,3) = area_code_infos.`npa`;
Please help me figure out this. I have tried in PHP to have multiple queries running one after the other but when I loaded the page after 45 minutes of wait time I gave up.
Thanks,
I assume the tables are farily big, and you are also doing an unindexed query.. basically substring has to be calculated for every row.
Whenever you do a join, you want to make sure both of the joined fields are indexed.
An option would be to create another column containing the substring calculation and then creating an index on that.
However, a better option would be to have an areaCodeInfosID column and set it as a foreign key to the area_code_infos table
This is my db structure:
ID NAME SOMEVAL API_ID
1 TEST 123456 A123
2 TEST2 223232 A123
3 TEST3 918922 A999
4 TEST4 118922 A999
I'm filling it using a function that calls an API and gets some data from an external service.
The first run, I want to insert all the data I get back from the API. After that, each time I run the function, I just want to update the current rows and add rows in case I got them from the API call and are not in the db.
So my initial thought regarding the update process is to go through each row I get from the API and SELECT to see if it already exists.
I'm just wondering if this is the most efficient way to do it, or maybe it's better to DELETE the relevant rows from the db and just re-inserting them all.
NOTE: each batch of rows I get from the API has an API_ID, so when I say delete the rows i mean something like DELETE FROM table WHERE API_ID = 'A999' for example.
If you retrieving all the rows from the service i recommend you the drop all indexes, truncate the table, then insert all the data and recreate indexes.
If you retrieving some data from the service i would drop all indexes, remove all relevant rows, insert all rows then recreate all indexes.
In such scenarios I'm usually going with:
start transaction
get row from external source
select local store to check if it's there
if it's there: update its values, remember local row id in list
if it's not there: insert it, remember local row id in list
at the end delete all rows that are not in remembered list of local row ids (NOT IN clause if the count of ids allows for this, or other ways if it's possible that there will be many deleted rows)
commit transaction
Why? Because usually I have local rows referenced by other tables, and deleting them all would break the references (not to mention deletete cascade).
I don't see any problem in performing SELECT, then deciding between an INSERT or UPDATE. However, MySQL has the ability to perform so-called "upserts", where it will insert a row if it does not exist, or update an existing row otherwise.
This SO answer shows how to do that.
I would recommend using INSERT...ON DUPLICATE KEY UPDATE.
If you use INSERT IGNORE, then the row won't actually be inserted if it results in a duplicate key on API_ID.
Add unique key index on API_ID column.
If you have all of the data returned from the API that you need to completely reconstruct the rows after you delete them, then go ahead and delete them, and insert afterwards.
Be sure, though, that you do this in a transaction, and that you are using an engine that supports transactions properly, such as InnoDB, so that other clients of the database don't see rows missing from the table just because they are going to be updated.
For efficiency, you should insert as many rows as you can in a single query. Much faster that way.
BEGIN;
DELETE FROM table WHERE API_ID = 'A987';
INSERT INTO table (NAME, SOMEVAL, API_ID) VALUES
('TEST5', 12345, 'A987'),
('TEST6', 23456, 'A987'),
('TEST7', 34567, 'A987'),
...
('TEST123', 123321, 'A987');
COMMIT;