How do I queue inserts to create a bulk insert - php

I am going to change my application so that it does bulk inserts instead of individual ones to ease the load on my server. I am not sure the best way to go about this. Thoughts so far are:
Use a text file and write all the insert / update statements to this file and process it every 5 mins - I am not sure of the best way to handle this one. Would reading from one process (to create the bulk insert) cause issues when the main process is still trying to add more statements to it? Would I need to create a new file every 5 mins and delete it when its processed.
Store the inserts in a session and then just process them. Would this cause any problems with memory ect?
I am using PHP and MySQL with MyISAM tables. I am open to all ideas on the best way to handle this, I just know I need to stop doing single inserts / updates.
Thanks.

The fastest way to get data into the database is to use load data infile on a text file.
See: http://dev.mysql.com/doc/refman/5.1/en/load-data.html
You can also use bulk inserts of course, if you want them to queue behind selects, use a syntax like:
INSERT LOW PRIORITY INTO table1 (field1, field2) VALUES (1,1),(2,2),(3,3),...
or
INSERT DELAYED INTO .....
Note that delayed does not work with InnoDB.
Also note that low priority is not recommended when using MyISAM.
See:
http://dev.mysql.com/doc/refman/5.5/en/insert.html
http://dev.mysql.com/doc/refman/5.5/en/insert-delayed.html

I think you should create a new file for each 5 minutes fro insert and update separately and remove the ones after processing.
For bulk insert
You can use LOAD DATA INFILE with disabled keys on table.
If you use innodb you should run all inserts in transaction for preventing flushing indexes on each query and use form with multiple VALUES(),(),().
If you use MyIsam you should insert with DELAYED option. Also if you don't remove rows from table it's possible to have concurrent read/write.
For bulk updates you should use transaction because you will get the same effect.

Related

Very Slow Eloquent Insert/Update Queries in Laravel

I have a laravel application which must insert/update thousands of records per second in a for loop. my problem is that my Database insert/update rate is 100-150 writes per second . I have increased the amount of RAM dedicated to my database but got no luck.
is there any way to increase the write rate for mysql to thousands of records per second ?
please provide me optimum configurations for performance tuning
and PLEASE do not down mark the question . my code is correct . Its not a code problem because I have no problem with MONGODB . but I have to use mysql .
My Storage Engine is InnoDB
Inserting rows one at a time, and autocommitting each statement, has two overheads.
Each transaction has overhead, probably more than one insert. So inserting multiple rows in one transaction is the trick. This requires a code change, not a configuration change.
Each INSERT statement has overhead. One insert has about 90% over head and 10% actual insert.
The optimal is 100-1000 rows being inserted per transaction.
For rapid inserts:
Best is LOAD DATA -- if you are starting with a .csv file. If you must build the .csv file first, then it is debatable whether that overhead makes this approach lose.
Second best is multi-row INSERT statements: INSERT INTO t (a,b) VALUES (1,2), (2,3), (44,55), .... I recommend 1000 per statement, and COMMIT each statement. This is likely to get you past 1000 rows per second being inserted.
Another problem... Since each index is updated as the row is inserted, you may run into trouble with thrashing I/O to achieve this task. InnoDB automatically "delays" updates to non-unique secondary indexes (no need for INSERT DELAYED), but the work is eventually done. (So RAM size and innodb_buffer_pool_size come into play.)
If the "thousands" of rows/second is a one time task, then you can stop reading here. If you expect to do this continually 'forever', there are other issues to contend with. See High speed ingestion .
For insert, you might want to look into the INSERT DELAYED syntax. That will increase insert performance, but it won't help with update and the syntax will eventually be deprecated. This post offers an alternative for updates, but it involves custom replication.
One way my company's succeeded in speeding up inserts is by writing the SQL to a file, and then doing using a MySQL LOAD DATA INFILE command, but I believe we found that required the server's command line to have the mysql application installed.
I've also found that inserting and updating in a batch is often faster. So if you're calling INSERT 2k times, you might be better off running 10 inserts of 200 rows each. This would decrease the lock requirements and decrease information/number of calls sent over the wire.

MySql INSERT vs PHP file_put_contents

I have a rapidly growing, write-heavy PHP/MySql application that inserts new rows at a rate of a dozen or so per second into an INNODB table of several million rows.
I started out using realtime INSERT statements and then moved to PHP's file_put_contents to write entries to a file and LOAD DATA INFILE to get the data into the database. Which is the better approach?
Are there any alternatives I should consider? How can I expect the two methods to handle collisions and increased load in the future?
Thanks!
Think of LOAD DATA INFILE as a batch-method of inserting data. It eliminates the overhead of firing up an insert query for every statement therefore is much faster. However, you lose some of the control when handling errors. It's much easier to handle an error on a single insert query vs one row in the middle of a file.
Depending on whether you can afford to have the data inserted by the PHP not being instantly available in the table, then INSERT DELAYED might be an option.
MySQL will accept the data to be inserted and will deal with the insertion later on, putting it into a queue. So this won't block your PHP application while MySQL ensures the data to be inserted later on.
As it says in the manual:
Another major benefit of using INSERT DELAYED is that inserts from many clients are bundled together and written in one block. This is much faster than performing many separate inserts.
I have used this for logging data where a data loss is not fatal but if you want to be protected from server crashes when data from INSERT DELAYED hadn't been inserted yet, you could look into replicating the changes away to a dedicated slave machine.
The way we deal with our inserts is to have them sent to a message queue system like ActiveMQ. From there we have a separate application that loads the inserts using LOAD DATA INFILE in batches of about 5000. Error handling can still take place with the infile however it processes the inserts much faster. If setting up a message queue is outside of the scope of your application there is no reason that file_put_contents would not be an acceptable option -- Especially if it's already implemented and is working fine.
Additionally you may want to test disabling indexes during writes to see if that improves performance.
It doesn't sound like you should be using innoDB. Regardless, a dozen inserts per second should not be problematic even for crappy hardware - unless, possibly, your data model is very complex, but for that, LOAD DATA INFILE is very good because, among other things, it rebuilds the indexes only once, as opposed to on every insert. So using files is a decent approach, but do make sure you open them in append only mode.
in the long run (1k+ of writes/s), look at other databases - particularly cassandra for write heavy applications.
if you do go the sql insert route, wrap the pdo execute statements in a transaction. doing so will greatly speed up the process.
LOAD DATA is disabled on some servers for security reasons:
http://dev.mysql.com/doc/mysql-security-excerpt/5.0/en/load-data-local.html
Also I don't enjoy writing my applications upside down to maintain database integrity.

What would be the best approach of adding 100,000+ records to a MySQL database via PHP?

I'm developing an application and have over 100,000 records, that have been read in from a file, which need to be inserted into a MySQL database.
What would be the best approach of inserting these to the database?
The approach I'm currently working from is generating a SQL query to insert all the records in one call to the database.
INSERT INTO table (field1, field2)
VALUES
(123, 456),
(125, 984),
...
Is there a limit to the number of records that can be inserted at once? And in terms of performance is this the best method?
I have considered an alternative of splitting the records into a number of queries but I'm unsure if this would have any benefit?
Any advice would be greatly appreciated, thanks!
The way you're doing it is indeed very efficient. The maximum query length is determined by the max_allowed_packet setting so you may need to split the query into several queries, see http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_max_allowed_packet
I would recommend adding one row at a time with LOW_PRIORITY or DELAYED depending on your engine. The reason for this is that other users of your table will still have access to read and write once in a while instead of having the entire table locked for an extended period of time.
Permissions and/or format not withstanding, using
load data infile to import the file directly
LOAD DATA INFILE is probably the best way to go. Especially if the data file you're reading from is already in a format that can be used by it.
You may consider as well the LOAD DATA INFILE command of MySQL to insert data from a CSV file.

Optimize massive MySQL INSERTs

I've got an application which needs to run a daily script; the daily script consists in downloading a CSV file with 1,000,000 rows, and inserting those rows into a table.
I host my application in Dreamhost. I created a while loop that goes through all the CSV's rows and performs an INSERT query for each one. The thing is that I get a "500 Internal Server Error". Even if I chop it out in 1000 files with 1000 rows each, I can't insert more than 40 or 50 thousand rows in the same loop.
Is there any way that I could optimize the input? I'm also considering going with a dedicated server; what do you think?
Thanks!
Pedro
Most databases have an optimized bulk insertion process - MySQL's is the LOAD DATA FILE syntax.
To load a CSV file, use:
LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
Insert multiple values, instead of doing
insert into table values(1,2);
do
insert into table values (1,2),(2,3),(4,5);
Up to an appropriate number of rows at a time.
Or do bulk import, which is the most efficient way of loading data, see
http://dev.mysql.com/doc/refman/5.0/en/load-data.html
Normally I would say just use LOAD DATA INFILE, but it seems you can't with your shared hosting environment.
I haven't used MySQL in a few years, but they have a very good document which describes how to speed up insertions for bulk insertions:
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
A few ideas that can be gleaned from this:
Disable/enable keys around the insertions:
ALTER TABLE tbl_name DISABLE KEYS;
ALTER TABLE tbl_name ENABLE KEYS;
Use many values in your insert statements.
I.e.: INSERT INTO table (col1, col2) VALUES (val1, val2),(.., ..), ...
If I recall correctly, you can have up to 4096 values per insertion statement.
Run a FLUSH TABLES command before you even start, to ensure that there are no pending disk writes that may hurt your insertion performance.
I think this will make things fast. I would suggest using LOCK TABLES, but I think disabling the keys makes that moot.
UPDATE
I realized after reading this that by disabling your keys you may remove consistency checks that are important for your file loading. You can fix this by:
Ensuring that your table has no data that "collides" with the new data being loaded (if you're starting from scratch, a TRUNCATE statement will be useful here).
Writing a script to clean your input data to ensure no duplicates locally. Checking for duplicates is probably costing you a lot of database time anyway.
If you do this, ENABLE KEYS should not fail.
You can create cronjob script which adds x records to the database at one request.
Cronjob script will check if last import have not addded all needed rows he takes another x rows.
So you can add as many you need rows.
If you have your dedicated server it's more easier. You just run loop with all insert queries.
Of course you can try to set time_limit to 0 (if it's working on dreamhost) or make it bigger.
Your PHP script is most likely being terminated because it exceeded the script time limit. Since you're on a shared host, you're pretty much out of luck.
If you do switch to a dedicated server and if you get shell access, the best way would be to use the mysql command-line tool to insert the data.
OMG Ponies suggestion is great, but I've also 'manually' formatted data into the same format that mysqldump uses, then loaded it that way. Very fast.
Have you tried doing transactions? Just send the command BEGIN to MySQL, do all your inserts then do COMMIT. This would speed it up significantly,but like casablanca said, your script is probably timing out as well.
I've ran into this problem myself before and nos pretty much got it right on the head, but you'll need to do a bit more to get it to perform the best.
I found that in my situation that I couldn't MySQL to accept one large INSERT statement, but found that if I split it up into groups of about 10k INSERTS at a time like how nos suggested then it'll do it's job pretty quickly. One thing to note is that when doing multiple INSERTs like this that you will most likely hit PHP's timeout limit, but this can be avoided by resetting the timout with set_time_limit($seconds), I found that doing this after each successful INSERT worked really well.
You have to be careful about doing this, because you could find yourself in a loop on accident with an unlimited timout and for that I would suggest testing to make sure that each INSERT was successful by either checking for errors reported by MySQL with mysql_errno() or mysql_error(). You could also catch errors by checking the number of rows affected by the INSERT with mysql_affected_rows(). You could then stop after the first error happens.
It would be better if you use sqlloader.
You would need two things first control file that specifies the actions which SQL Loader should do and second csv file that you want to be loaded
Here is the below link that would help you out.
http://www.oracle-dba-online.com/sql_loader.htm
Go to phpmyadmin and select the table you would like to insert into.
Under the "operations" tab, and then the ' table options' option /section , change the storage engine from InnoDB to MyISAM.
I once had a similar challenge.
Have a good time.

How to do long time batch processes in PHP?

I have batch process when i need to update my db table, around 100000-500000 rows, from uploaded CVS file. Normally it takes 20-30 minutes, sometimes longer.
What is the best way to do ? any good practice on that ? Any suggest would be appreciated
Thanks.
It takes 30 minutes to import 500.000 rows from a CSV?
Have you considered letting MySQL do the hard work? There is LOAD DATA INFILE, which supports dealing with CSV files:
LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\n';
If the file is not quite in the right shape to be imported right to the target table, you can either use PHP to transform it beforehand, or LOAD it into a "staging" table and let MySQL handle the necessary transformation — whichever is faster and more convenient.
As an additional option, there seems to be a possibility to run MySQL queries asynchronously through the MySQL Native Driver for PHP (MYSQLND). Maybe you can explore that option as well. It would enable you to retain snappy UI performance.
If you're doing a lot of inserts, are you doing bulk inserts? i.e. like this:
INSERT INTO table (col1 col2) VALUES (val1a, val2a), (val1b, val2b), (....
That will dramatically speed up inserts.
Another thing you can do is disable indexing while you make the changes, then let it rebuild the indexes in one go when you're finished.
A bit more detail about what you're doing and you might get more ideas
The PEAR has a package called Benchmark has a Benchmark_Profiler class that can help you find the slowest section of your code so you can optimize.
We had a feature like that in a big application. We had the issue of inserting millions of rows from a csv into a table with 9 indexes. After lots of refactoring we found the ideal way to insert the data was to load it into a [temporary] table with the mysql LOAD DATA INFILE command, do the transformations there and copy the result with multiple insert queries into the actual table (INSERT INTO ... SELECT FROM) processing only 50k lines or so with each query (which performed better than issuing a single insert but YMMV).
I cant do it with cron, coz this is under user control, A user click process buttons and later on can check logs to see process status
When the user presses said button, set a flag in a table in the database. Then have your cron job check for this flag. If it's there, start processing, otherwise don't. I applicable, you could use the same table to post some kind of status update (eg. xx% done), so the user has some feedback about the progress.

Categories