How to increase MySQL INSERT performances?

How to increase MySQL INSERT performances? - php

I have a program to collect infomation from many merchants.
Each request from merchant, my program do an InSERT query:
INSERT INTO `good` (id,code,merchant,netcost,ip) values('','GC8958','merchantname','581000','192.168.34.30');
There are many request from merchants at a time ( over 500+ request ) so MYSQL do 500+ insert query.
Is this a problem and how can I solve it with MYSQL?

It should not be a problem unless you're strapped for hardware (in which case the answer is "faster disk, more RAM, faster CPU" once you verify which of the three is the bottleneck on average). You can "paper over" peaks using the INSERT DELAYED syntax if you use MyISAM tables (it's likely not worth it; the syntax has been deprecated).
If you're doing this in batches (i.e. not different clients each inserting one row) then multiple INSERTs or even LOAD DATA INFILE will be a huge help. In a pinch, you can save them unindexed on disk, or session (which amounts to the same thing)... (or maybe in a small MEMORY table - but I'd run some tests before resorting to that) and run the real INSERT at leisure.
I'd leave further optimizations for later; "premature optimization is the root of all evil". Anyhow, you may be interested in some Google results (this last deals with esoterics such as "the question is: is it better to have the InnoDB double write buffer enabled or to use the ext4 transaction log").

Related

Very Slow Eloquent Insert/Update Queries in Laravel

I have a laravel application which must insert/update thousands of records per second in a for loop. my problem is that my Database insert/update rate is 100-150 writes per second . I have increased the amount of RAM dedicated to my database but got no luck.
is there any way to increase the write rate for mysql to thousands of records per second ?
please provide me optimum configurations for performance tuning
and PLEASE do not down mark the question . my code is correct . Its not a code problem because I have no problem with MONGODB . but I have to use mysql .
My Storage Engine is InnoDB

Inserting rows one at a time, and autocommitting each statement, has two overheads.
Each transaction has overhead, probably more than one insert. So inserting multiple rows in one transaction is the trick. This requires a code change, not a configuration change.
Each INSERT statement has overhead. One insert has about 90% over head and 10% actual insert.
The optimal is 100-1000 rows being inserted per transaction.
For rapid inserts:
Best is LOAD DATA -- if you are starting with a .csv file. If you must build the .csv file first, then it is debatable whether that overhead makes this approach lose.
Second best is multi-row INSERT statements: INSERT INTO t (a,b) VALUES (1,2), (2,3), (44,55), .... I recommend 1000 per statement, and COMMIT each statement. This is likely to get you past 1000 rows per second being inserted.
Another problem... Since each index is updated as the row is inserted, you may run into trouble with thrashing I/O to achieve this task. InnoDB automatically "delays" updates to non-unique secondary indexes (no need for INSERT DELAYED), but the work is eventually done. (So RAM size and innodb_buffer_pool_size come into play.)
If the "thousands" of rows/second is a one time task, then you can stop reading here. If you expect to do this continually 'forever', there are other issues to contend with. See High speed ingestion .

For insert, you might want to look into the INSERT DELAYED syntax. That will increase insert performance, but it won't help with update and the syntax will eventually be deprecated. This post offers an alternative for updates, but it involves custom replication.
One way my company's succeeded in speeding up inserts is by writing the SQL to a file, and then doing using a MySQL LOAD DATA INFILE command, but I believe we found that required the server's command line to have the mysql application installed.
I've also found that inserting and updating in a batch is often faster. So if you're calling INSERT 2k times, you might be better off running 10 inserts of 200 rows each. This would decrease the lock requirements and decrease information/number of calls sent over the wire.

large amount of inserts per seconds causing massive CPU load

I have a PHP script that in every run, inserts a new row to a Mysql db (with a relative small amount of data..)
I have more than 20 requests per second, and this is causing my CPU to scream for help..
I'm using the sql INSERT DELAYED method with a MyISAM engine (although I just notice that INSERT DELAYED is not working with MyISAM).
My main concern is my CPU load and I started to look for ways to store this data with more CPU friendly solutions.
My first idea was to write this data to an hourly log files and once an hour to retrieve the data from the logs and insert it to the DB at once.
Maybe a better idea is to use NoSQL DB instead of log files and then once an hour to insert the data from the NoSQL to the Mysql..
I didn't test yet any of these ideas, so I don't really know if this will manage to decrease my CPU load or not. I wanted to ask if someone can help me find the right solution that will have the lowest affect over my CPU.

I recently had a very similar problem and my solution was to simply batch the requests. This sped things up about 50 times because of the reduced overhead of mysql connections and also the greatly decreased amount of reindexing. Storing them to a file then doing one larger (100-300 individual inserts) statement at once probably is a good idea. To speed things up even more turn off indexing for the duration of the insert with
ALTER TABLE tablename DISABLE KEYS
insert statement
ALTER TABLE tablename ENABLE KEYS
doing the batch insert will reduce the number of instances of the php script running, it will reduce the number of currently open mysql handles (large improvement) and it will decrease the amount of indexing.

Ok guys, I manage to lower the CPU load dramatically with APC-cache
I'm doing it like so:
storing the data in memory with APC-cache, with TTL of 70 seconds:
apc_store('prfx_SOME_UNIQUE_STRING', $data, 70);
once a minute I'm looping over all the records in the cache:
$apc_list=apc_cache_info('user');
foreach($apc_list['cache_list'] as $apc){
if((substr($apc['info'],0,5)=='prfx_') && ($val=apc_fetch($apc['info']))){
$values[]=$val;
apc_delete($apc['info']);
}
}
inserting the $values to the DB
and the CPU continues to smile..
enjoy

I would insert a sleep(1); function at the top of your PHP script, before every insert at the top of your loop where 1 = 1 second. This only allows the loop to cycle once per second.
This way it will regulate a bit just how much load the CPU is getting, this would be ideal assuming your only writing a small number of records in each run.
You can read more about the sleep function here : http://php.net/manual/en/function.sleep.php

It's hard to tell without profiling both methods, if you write to a log file first you could end up just making it worse as your turning your operation count from N to N*2. You gain a slight edge by writing it all to a file and doing a batch insert but bear in mind that as the log file fills up it's load/write time increases.
To reduce database load, look at using mem cache for database reads if your not already.
All in all though your probably best of just trying both and seeing what's faster.

Since you are trying INSERT DELAYED, I assume you don't need up to the second data. If you want to stick with MySQL, you can try using replication and the BLACKHOLE table type. By declaring a table as type BLACKHOLE on one server, then replicating it to a MyISAM or other table type on another server, you can smooth out CPU and io spikes. BLACKHOLE is really just a replication log file, so "inserts" into it are very fast and light on the system.

I do not know what is your table size or your server capabilities but I guess you need to make a lot of inserts per single table. In such a situation I would recommend checking for the construction of vertical partitions that will reduce the physical size of each partition and significantly reduce the insertion time to the table.

Cassandra is much slower than Mysql for simple operations?

I see a lot of statements like: "Cassandra very fast on writes", "Cassandra has reads really slower than writes, but much faster than Mysql"
On my windows7 system:
I installed Mysql of default configuration.
I installed PHP5 of default configuration.
I installed Casssandra of default configuration.
Making simple write test on mysql: "INSERT INTO wp_test (id,title) VALUES ('id01','test')" gives me result: 0.0002(s)
For 1000 inserts: 0.1106(s)
Making simple same write test on Cassandra: $column_faily->insert('id01',array('title'=>'test')) gives me result of: 0.005(s)
For 1000 inserts: 1.047(s)
For reads tests i also got that Cassandra is much slower than mysql.
So the question, does this sounds correct that i have 5ms for one write operation on Cassadra? Or something is wrong and should be at least 0.5ms.

When people say "Cassandra is faster than MySQL", they mean when you are dealing with terabytes of data and many simultaneous users. Cassandra (and many distributed NoSQL databases) is optimized for hundreds of simultaneous readers and writers on many nodes, as opposed to MySQL (and other relational DBs) which are optimized to be really fast on a single node, but tend to fall to pieces when you try to scale them across multiple nodes. There is a generalization of this trade-off by the way- the absolute fastest disk I/O is plain old UNIX flat files, and many latency-sensitive financial applications use them for that reason.
If you are building the next Facebook, you want something like Cassandra because a single MySQL box is never going to stand up to the punishment of thousands of simultaneous reads and writes, whereas with Cassandra you can scale out to hundreds of data nodes and handle that load easily. See scaling up vs. scaling out.
Another use case is when you need to apply a lot of batch processing power to terabytes or petabytes of data. Cassandra or HBase are great because they are integrated with MapReduce, allowing you to run your processing on the data nodes. With MySQL, you'd need to extract the data and spray it out across a grid of processing nodes, which would consume a lot of network bandwidth and entail a lot of unneeded complication.

Cassandra benefits greatly from parallelisation and batching. Try doing 1 million inserts on each of 100 threads (each with their own connection & in batches of 100) and see which ones is faster.
Finally, Cassandra insert performance should be relatively stable (maintaining high throughput for a very long time). With MySQL, you will find that it tails off rather dramatically once the btrees used for the indexes grow too large memory.

It's likely that the maturity of the MySQL drivers, especially the improved MySQL drivers in PHP 5.3, is having some impact on the tests. It's also entirely possible that the simplicity of the data in your query is impacting the results - maybe on 100 value inserts, Cassandra becomes faster.
Try the same test from the command line and see what the timestamps are, then try with varying numbers of values. You can't do a single test and base your decision on that.

Many user space factors can impact write performance. Such as:
Dozens of settings in each of the database server's configuration.
The table structure and settings.
The connection settings.
The query settings.
Are you swallowing warnings or exceptions? The MySQL sample would on face value be expected to produce a duplicate key error. It could be failing while doing nothing at all. What Cassandra might do in the same case isn't something I'm familiar with.
My limited experience of Cassandra tell me one thing about inserts, while performance of everything else degrades as data grows, inserts appear to maintain the same speed. How fast it is compared to MySQL however isn't something I've tested.
It might not be so much that inserts are fast but rather tries to be never slow. If you want a more meaningful test you need to incorporate concurrency and more variations on scenario such as large data sets, various batch sizes, etc. More complex tests might test latency for availability of data post insert and read speed over time.
It would not surprise me if Cassandra's first port of call for inserting data is to put it on a queue or to simply append. This is configurable if you look at consistency level. MySQL similarly allows you to balance performance and reliability/availability though each will have variations on what they allow and don't allow.
Outside of that unless you get into the internals it may be hard to tell why one performs better than the other.
I did some benchmarks of a use case I had for Cassandra a while ago. For the benchmark it would insert tens of thousands of rows first. I had to make the script sleep for a few seconds because otherwise queries run after the fact would not see the data and the results would be inconsistent between implementations I was testing.
If you really want fast inserts, append to a file on ramdisk.

SQL Insert at 15 minute intervals, big MySQL table

I'm fairly familiar with most aspects of web development and I consider myself a junior level programmer. I'm always anxious when I think about application scaling and would like to learn a little more about it. Let's have a hypothetical situation.
I'm working on a web application that polls a device and fetches about 2kb of XML data at 15 minute intervals. This data must be stored for A Very Long Time (at least a couple years?). Now imagine that this web application has 100 users that each have this device.
After 10 years we're talking tens of millions of table rows. With 100 users we have a cron job that is querying each users device, getting 2kb of XML, and inserting it into the SQL database every 15 minutes.
Assuming my queries are relatively simple, only collecting the columns necessary, using joins, and avoiding subqueries, is there any reason this should not scale?

Inserting doesn't generally get slower as a table gets larger, but index updates may take longer. At some point you may want to split the table into two parts. One for archival storage, optimized for data retrieval (basically index the heck out of it), and a second table to handle the newer data, optimized more for insertion (fewer indexes).
But as always, the only way to tell for sure is to benchmark things. Set up some cloned tables with a few thousand rows, and some with multi-millions of rows, and see what happens.

You could always consider using partitioning to automagically split your data files by date, and age older records off to an slower, high-capacity disk array while keeping the newer records (and the INSERTs) on a high-speed array. Then, your index builds will only have to work on a subset of the data rather than the whole deal, and should go quickly (disk I/O is typically the slowest part of a database system).

Assuming my queries are relatively simple, only collecting the columns
necessary, using joins, and avoiding subqueries, is there any reason
this should not scale?
When you get large you should put you active dataset in a in-memory database(faster than disc) just like Facebook, Twitter, etc do. Twitter became very slow when they did not put active dataset in memory/scale up => A lot of people called this fail whale. Both use memcached for this, but you could also use Redis(I like this) or APC if you are just a single box. You should always install APC if want performance because APC is used for caching the compiled bytecode.
Most PHP accelerators work by caching the compiled bytecode of PHP
scripts to avoid the overhead of parsing and compiling source code on
each request (some or all of which may never even be executed). To
further improve performance, the cached code is stored in shared
memory and directly executed from there, minimizing the amount of slow
disk reads and memory copying at runtime.

MySQL vs Web Server for processing data

I was wondering if it's faster to process data in MySQL or a server language like PHP or Python. I'm sure native functions like ORDER will be faster in MySQL due to indexing, caching, etc, but actually calculating the rank (including ties returning multiple entries as having the same rank):
Sample SQL
SELECT TORCH_ID,
distance AS thisscore,
(SELECT COUNT(distinct(distance))+1 FROM torch_info WHERE distance > thisscore) AS rank
FROM torch_info ORDER BY rank
Server
...as opposed to just doing a SELECT TORCH_ID FROM torch_info ORDER BY score DESC and then figure out rank in PHP on the web server.

Edit: Since posting this, my answer has changed completely, partly due to the experience I've gained since then and partly because relational database systems have gotten significantly better since 2009. Today, 9 times out of 10, I would recommend doing as much of your data crunching in-database as possible. There are three reasons for this:
Databases are highly optimized for crunching data—that's their entire job! With few exceptions, replicating what the database is doing at the application level is going to be slower unless you invest a lot of engineering effort into implementing the same optimizations that the DB provides to you for free—especially with a relatively slow language like PHP, Python, or Ruby.
As the size of your table grows, pulling it into the application layer and operating on it there becomes prohibitively expensive simply due to the sheer amount of data transferred. Many applications will never reach this scale, but if you do, it's best to reduce the transfer overhead and keep the data operations as close to the DB as possible.
In my experience, you're far more likely to introduce consistency bugs in your application than in your RDBMS, since the DB can enforce consistency on your data at a low level but the application cannot. If you don't have that safety net built-in, so you have to be more careful to not make mistakes.
Original answer: MySQL will probably be faster with most non-complex calculations. However, 90% of the time database server is the bottleneck, so do you really want to add to that by bogging down your database with these calculations? I myself would rather put them on the web/application server to even out the load, but that's your decision.

In general, the answer to the "Should I process data in the database, or on the web server question" is, "It depends".
It's easy to add another web server. It's harder to add another database server. If you can take load off the database, that can be good.
If the output of your data processing is much smaller than the required input, you may be able to avoid a lot of data transfer overhead by doing the processing in the database. As a simple example, it'd be foolish to SELECT *, retrieve every row in the table, and iterate through them on the web server to pick the one where x = 3, when you can just SELECT * WHERE x = 3
As you pointed out, the database is optimized for operation on its data, using indexes, etc.

The speed of the count is going to depend on which DB storage engine you are using and the size of the table. Though I suspect that nearly every count and rank done in mySQL would be faster than pulling that same data into PHP memory and doing the same operation.

Ranking is based on count, order. So if you can do those functions faster, then rank will obviously be faster.

A large part of your question is dependent on the primary keys and indexes you have set up.
Assuming that torchID is indexed properly...
You will find that mySQL is faster than server side code.
Another consideration you might want to make is how often this SQL will be called. You may find it easier to create a rank column and update that as each track record comes in. This will result in a lot of minor hits to your database, versus a number of "heavier" hits to your database.
So let's say you have 10,000 records, 1000 users who hit this query once a day, and 100 users who put in a new track record each day. I'd rather have the DB doing 100 updates in which 10% of them hit every record (9,999) then have the ranking query get hit 1,000 times a day.
My two cents.

If your test is running individual queries instead of posting transactions then I would recommend using a JDBC driver over the ODBC dsn because youll get 2-3 times faster performance. (im assuming your using an odbc dsn here in your tests)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.