Best way to update over million records in mysql - php

I have a mysql table having over a million rows with user information. I want to add a column 'password' in that table, randomly generate password and update all the records (using PHP).
What is the fastest way to perform this task?
Thanks in advance.

A possible scenario
Generate all passwords into a file with your php script
Create temp table and use LOAD DATA INFILE (which happens to be the fastest method to import data from file) to load data from the password file.
Alter your table and add password column
Use UPDATE with join to update password column in your original table from temp table
Drop temp table

Keep in mind that the operation for altering the table might take very long time, I would recommend to sepearate this two steps.
1) Alter the table, here I would recommend the following steps (pseudocode):
create new_table like old_table
alter new_table add column password
insert into new_table (columns) select * from old_table
rename old_table to old_table_bck, new_table to old_table
drop old_table_bck
At this point you have your original table with the new column.
2) Now after you changed your structure you can populate the new column 'password' with your php. If you are using InnoDB as storage engine time doesn´t matter since you are not locking the table with your updates. If you do this within transactions I would suggest to break the update process down to smaller transactions instead of inserting one large bulk.
If you can´t take (minimal) downtime, i would suggest that you look at pt-online-schema-change, we use this tool on major databases where we can´t effort any downtime in order to make schema changes while database are running. It roughly performs above steps (1) and additionally ensures with the help of triggers that data inserted on the original table while performing the alter process is also written to the altered table.

Related

Splitting up data in MySQL to make it faster and more accessible

I have a MySQL database that is becoming really large. I can feel the site becoming slower because of this.
Now, on a lot of pages I only need a certain part of the data. For example, I store information about users every 5 minutes for history purposes. But on one page I only need the information that is the newest (not the whole history of data). I achieve this by a simple MAX(date) in my query.
Now I'm wondering if it wouldn't be better to make a separate table that just stores the latest data so that the query doesn't have to search for the latest data from a specific user between millions of rows but instead just has a table with only the latest data from every user.
The con here would be that I have to run 2 queries to insert the latest history in my database every 5 minutes, i.e. insert the new data in the history table and update the data in the latest history table.
The pro would be that MySQL has a lot less data to go through.
What are common ways to handle this kind of issue?
There are a number of ways to handle slow queries in large tables. The three most basic ways are:
1: Use indexes, and use them correctly. It is important to avoid table scans on large tables; this is almost always your most significant performance hit with single queries.
For example, if you're querying something like: select max(active_date) from activity where user_id=?, then create an index on the activity table for the user_id column. You can have multiple columns in an index, and multiple indexes on a table.
CREATE INDEX idx_user ON activity (user_id)
2: Use summary/"cache" tables. This is what you have suggested. In your case, you could apply an insert trigger to your activity table, which will update the your summary table whenever a new row gets inserted. This will mean that you won't need your code to execute two queries. For example:
CREATE TRIGGER update_summary
AFTER INSERT ON activity
FOR EACH ROW
UPDATE activity_summary SET last_active_date=new.active_date WHERE user_id=new.user_id
You can change that to check if a row exists for the user already and do an insert if it is their first activity. Or you can insert a row into the summary table when a user registers...Or whatever.
3: Review the query! Use MySQL's EXPLAIN command to grab a query plan to see what the optimizer does with your query. Use it to ensure that the optimizer is avoiding table scans on large tables (and either create or force an index if necesary).

Issue in copying rows from one table to another

I am implementing a request mechanism where the user have to approve a request. For that i have implemented a temporary table and main table. Initially when the request is added the data will be inserted to the temporary table, on approval it will be copied to the main table.
The issue is there will be more than 5k rows to be moved to the main table after approval + another 3-5 row for each row in the detail table (stores the details).
My current implementation is like this
//Get the rows from temporary table (batch_temp)
//Loop through the data
//Insert the data to the main table (batch_main) and return the id
//Get the details row from the temporary detail table (batch_temp_detail) using detail_tempid
//Loop through the data
//Insert the details to the detail table (batch_main_detail) with the main table id amount_id
//End Loop
//End Loop
But this implementation would take atleast 20k queries. Is there any better ways to implement the same.
I tried to create a sqlfiddle but was unable to create one. So i have pasted the query in pgsql.privatepaste.com
I'm sorry that I'm not familiar with PostgreSQL. My solution is in MySQL, I hope it can help since if they (MySQL & PostgreSQL) are same.
First, we should add 1 more field into your batch_main table to track the origin batch_temp record for each batch_main record.
ALTER TABLE `batch_main`
ADD COLUMN tempid bigint;
Then, on approval, we will insert 5k rows by 1 query:
INSERT INTO batch_main
(batchid, userid, amount, tempid)
SELECT batchid, userid, amount, amount_id FROM batch_temp;
So, with each new batch_main record we have its origin batch_temp record's id. Then, insert the detail records
INSERT INTO `batch_main_detail`
(detail_amount, detail_mainid)
SELECT
btd.detail_amount, bm.amount_id
FROM
batch_temp_detail `btd`
INNER JOIN batch_main `bm` ON btd.detail_tempid = bm.tempid
Done!
P/S:
I'm confuse a bit about the way you name your fields, and since I do not know about PostgreSQL and by looking into your syntax, can you use same sequence for primary key of both table batch_temp & batch_main? If you can, it's no need to add 1 more field.
Hope this help,
Simply need to update your Schema. Instead of having two tables: one main and one temporary, you should have all the data in main table, but have a flag which indicates whether a certain record is approved or no. Initially it will be set to false, and once approved it will simply be set to true and then the data can display on your website etc. That way you will not need to write the data two times, or even have to move it from one table to another
You haven't specified RDBMS you are using, but good old INSERT with SELECT in it must do the trick in one command:
insert main (field1,...,fieldN) select field1,...,fieldN from temporary

Big Data : Handling SQL Insert/Update or Merge best line by line or by CSV?

So basically I have a bunch of 1 Gig data files (compressed) with just text files containing JSON data with timestamps and other stuff.
I will be using PHP code to insert this data into MYSQL database.
I will not be able to store these text files in memory! Therefor I have to process each data-file line by line. To do this I am using stream_get_line().
Some of the data contained will be updates, some will be inserts.
Question
Would it be faster to use Insert / Select / Update statements, or create a CSV file and import it that way?
Create a file thats a bulk operation and then execute it from sql?
I need to basically insert data with a primary key that doesnt exist, and update fields on data if the primary key does exist. But I will be doing this in LARGE Quantities.
Performance is always and issue.
Update
The table has 22,000 Columns, and only say 10-20 of them do not contain 0.
I would load all of the data to a temporary table and let mysql do the heavy lifting.
create the temporary table by doing create table temp_table as select * from live_table where 1=0;
Read the file and create a data product that is compatible for loading with load data infile.
Load the data into the temporary table and add an index for your primary key
Next Isolate you updates by doing a inner query between the live table and the temporary table. walk through and do your updates.
remove all of your updates from the temporary (again using an inner join between it and the live table).
process all of the inserts with a simple insert into live_table as select * from temp_table.
drop the temporary table, go home and have a frosty beverage.
This may be over simplified for your use case but with a little tweaking it should work a treat.

Keep updated mysql data between multiple mysql tables

I have two tables in mysql. When I insert/delete values in the first table I want that the values get duplicated in table 2 to keep them "aligned".
table1:
id - username
1 - test_user
table2:
Same id as table1 and username as table1 (on insert/delete)
I want to keep the data between the tables aligned without doing multiple queries. I've read about triggers not sure if it's the correct road, i am a beninner.
I said two tables but i will need to do this in multiple tables.
You can use Mysql triggers. This way you can auto insert/update/delete datas from second table.
MySql Using Triggers
When you INSERT new records, given that you don't want to do two inserts for some reason, using a trigger to insert into the second table will work. For UPDATE and DELETE you might want to look at the CASCADE option with foreign keys. If all you are doing is keeping the data consistent between tables, that's exactly what cascade is for.
When you create table2 you just add a foreign key like this:
FOREIGN KEY (id, username)
REFERENCES table1(id, username) ON UPDATE CASCADE ON DELETE CASCADE
Then whenever you alter table1 the changes will automatically get pushed through to table2.
Couple prerequisites for this to work:
You have to use a storage engine that supports foreign keys, something like InnoDB and not MyISAM
You need to have an index on (id,username) in table1; the foriegn key needs to match a key in the parent table
You should read the doc page for foreign keys. There are a couple other ways you can tweak them, and you should figure out what works best for your purposes.
You can certainly put triggers on your table1 to make parallel changes to your other tables as your application changes table1.
See here for the documentation: http://dev.mysql.com/doc/refman/5.0/en/trigger-syntax.html
But, you should think over your design. It will take multiple queries to do your inserts and updates; they'll just be done "behind your back" on the server. They'll still take time. Triggers can really slow things down.
Also, triggers are a little bit fragile. If you add a column to a table, you'll have to rework your triggers. Triggers are generally a pain in the neck to keep in a source-control system and a huge pain in the neck to test, so using them will make your application more troublesome to maintain.
Could you think of another approach to handling this need for duplication? Could you, for example, use a view or a join to present the data you need to your application program without actually duplicating tables and the rows in them? If you figure out how to do that you'll be much happier in the long run.
CREATE VIEW table2 AS
SELECT *
FROM table1;
will produce a "fake" table2 with the contents of table1.
Or if you're hoping to view only the test users in a second table, a view can do that for you too, for example:
CREATE VIEW table3 AS
SELECT *
FROM table1
WHERE usertype = 'test_user' ;
If you're using duplicate tables for "backup," that's a bad way to make sure your information is safe. Instead, you need to back up your MySQL server instance.
Formal relational database design principles teach us to duplicating data, but instead use view and joins to structure the data the way applications need to see it.

Reset the database after 3 hrs & make it behave as a new database through php script

How to reset the database after 3 hrs & make it behave as a new database through php script
Possibly the easiest way would be to have a cron job that executes every three hours and calls mysql with "clean" database set up. The crontab set up would be something along the lines of:
* */03 * * * mysql -u XXX -pXXX < clean_database.sql
However, the "clean_database.sql" file would need to use "DROP TABLE IF EXISTS ..." for each of the tables you want to reset. That said, you can simply use mysqldump with a "known good" version of the database to create this file. (You'll need to add a "use <database name>;" statement at the top that said.)
The easiest way is to drop the database and recreate it using your create scripts. If you don't have create scripts you can get them by making a dump of your database.
To delete the data in each table without dropping the tables you can use the TRUNCATE TABLE tablename command on each table.
If you don't have permission to use TRUNCATE you can use DELETE FROM tablename without a WHERE clause.
Note that if you have foreign key constraints you may have to run the statements in a specific order to avoid violating these constraints.
To get a list of all tables you can use SHOW TABLES.
steps to do:
connect to database server
select database
mysql_query("SHOW TABLES");
read in array or object
foreach($tables as $tableName) of the item mysql_query("TRUNCATE TABLE $tableName")
I hope the principle is clean to you ;-)
mysql_query('DROP DATABASE yourdatabase');
mysql_query('CREATE DATABASE yourdatabase');
mysql_query('CREATE TABLE yourdatabase.sometable ...'); // etc.
This will drop the database, and create it anew. You can then use the CREATE TABLE syntax to recreate the tables - note that as this script has significant powers, you should consider creating a special mySQL user for it, one that's not used during normal operations.

Categories