How to implement background/asynchronous write-behind caching in PHP? - php

I have a particular PHP page that, for various reasons, needs to save ~200 fields to a database. These are 200 separate insert and/or update statements. Now the obvious thing to do is reduce this number but, like I said, for reasons I won't bother going into I can't do this.
I wasn't expecting this problem. Selects seem reasonably performant in MySQL but inserts/updates aren't (it takes about 15-20 seconds to do this update, which is naturally unacceptable). I've written Java/Oracle systems that can happily do thousands of inserts/updates in the same time (in both cases running local databases; MySQL 5 vs OracleXE).
Now in something like Java or .Net I could quite easily do one of the following:
Write the data to an in-memory
write-behind cache (ie it would
know how to persist to the database
and could do so asynchronously);
Write the data to an in-memory cache
and use the PaaS (Persistence as a
Service) model ie a listener to the
cache would persist the fields; or
Simply start a background process
that could persist the data.
The minimal solution is to have a cache that I can simply update, which will separately go and upate the database in its own time (ie it'll return immediately after update the in-memory cache). This can either be a global cache or a session cache (although a global shared cache does appeal in other ways).
Any other solutions to this kind of problem?

mysql_query('INSERT INTO tableName VALUES(...),(...),(...),(...)')
Above given query statement is better. But we have another solution to improve the performance of insert statement.
Follow the following steps..
1. You just create a csv(comma separated delimited file)or simple txt file and write all the data that you want to insert using file writing mechanism (like FileOutputStream class in Java).
2. use this command
LOAD DATA INFILE 'data.txt' INTO TABLE table2
FIELDS TERMINATED BY '\t';
3 if you are not clear about this command then follow the link

You should be able to do 200 inserts relatively quickly, but it will depend on lots of factors. If you are using a transactional engine and doing each one in its own transaction, don't - that creates way too much I/O.
If you are using a non-transactional engine, it's a bit trickier. Using a single multi-row insert is likely to be better as the flushing policy of MySQL means that it won't need to flush its changes after each row.
You really want to be able to reproduce this on your production-spec development box and analyse exactly why it's happening. It should not be difficult to stop.
Of course, another possibility is that your inserts are slow because of extreme sized tables or large numbers of indexes - in which case you should scale your database server appropriately. Inserting lots of rows into a table whose indexes don't fit into RAM (or doesn't have RAM correctly configured to be used for caching those indexes) generally gets pretty smelly.
BUT don't try to look for a way of complicating your application when there is a way of easily turning it instead, keeping the current algorithm.

One more solution that you could use (instead of tuning mysql :) ) is to use some JMS server and STOMP connection driver for PHP for write data to database server in a asynchronous manner. ActiveMQ have built-in support for STOMP protocol. And there is StompConnect project which is STOMP proxy for any JMS compilant server (OpenMQ, JBossMQ etc).

You can update your local cache (hopefully memcached) and then push the write requests through beanstalkd.

I would suspect a problem with your SQL inserts - it really shouldn't take that long. Would prepared queries help? Does your mysql server need some more memory dedicated to the keyspace? I think some more questions need asked.

How are you doing the inserts, are you doing one insert per record
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
or are you using a single query
mysql_query('INSERT INTO tableName VALUES(...),(...),(...),(...)');
The later of the two options is substantially faster, and from experience the first option will cause it to take much longer as PHP must wait for the first query to finish before moving to the second and so on.

Look at the statistics for your database while you do the inserts. I'm guessing that one of your updates locks the table and therefor all your statements are queued up and you experience this delay. Another thing to look into is your index creation/updating because the more indices you have on a table, the slower all UPDATE and INSERT statements get.
Another thing is that I think you use MYISAM (table engine) which locks the entire table on UPDATE.I suggest you use INNODB instead. INNODB is slower on SELECT-queries, but faster on INSERT and UPDATE because it only locks the row it's working on and not the entire table.

consider this:
mysql_query('start transaction');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('commit;')

Note that if your table is INSERT-ONLY (no deletes, and no updates on variable-length columns), then inserts will not lock or block reads when using MyISAM.
This may or may not improve insert performance, but it could help if you are having concurrent insert/read issues.
I'm using this, and only purging old records daily, followed by 'optimize table'.

you can use CURL with PHP to do Asynchronous database manipulations.
One possible solution is fork each query into a separate thread but, PHP doesnot support threads. We can use PCNTL functions but it’s a bit tricky for me to use them. I prefer to use this another solution to create fork and perform asynchronous operations.
Refer this
http://gonzalo123.wordpress.com/2010/10/11/speed-up-php-scripts-with-asynchronous-database-queries/

Related

MySql INSERT vs PHP file_put_contents

I have a rapidly growing, write-heavy PHP/MySql application that inserts new rows at a rate of a dozen or so per second into an INNODB table of several million rows.
I started out using realtime INSERT statements and then moved to PHP's file_put_contents to write entries to a file and LOAD DATA INFILE to get the data into the database. Which is the better approach?
Are there any alternatives I should consider? How can I expect the two methods to handle collisions and increased load in the future?
Thanks!
Think of LOAD DATA INFILE as a batch-method of inserting data. It eliminates the overhead of firing up an insert query for every statement therefore is much faster. However, you lose some of the control when handling errors. It's much easier to handle an error on a single insert query vs one row in the middle of a file.
Depending on whether you can afford to have the data inserted by the PHP not being instantly available in the table, then INSERT DELAYED might be an option.
MySQL will accept the data to be inserted and will deal with the insertion later on, putting it into a queue. So this won't block your PHP application while MySQL ensures the data to be inserted later on.
As it says in the manual:
Another major benefit of using INSERT DELAYED is that inserts from many clients are bundled together and written in one block. This is much faster than performing many separate inserts.
I have used this for logging data where a data loss is not fatal but if you want to be protected from server crashes when data from INSERT DELAYED hadn't been inserted yet, you could look into replicating the changes away to a dedicated slave machine.
The way we deal with our inserts is to have them sent to a message queue system like ActiveMQ. From there we have a separate application that loads the inserts using LOAD DATA INFILE in batches of about 5000. Error handling can still take place with the infile however it processes the inserts much faster. If setting up a message queue is outside of the scope of your application there is no reason that file_put_contents would not be an acceptable option -- Especially if it's already implemented and is working fine.
Additionally you may want to test disabling indexes during writes to see if that improves performance.
It doesn't sound like you should be using innoDB. Regardless, a dozen inserts per second should not be problematic even for crappy hardware - unless, possibly, your data model is very complex, but for that, LOAD DATA INFILE is very good because, among other things, it rebuilds the indexes only once, as opposed to on every insert. So using files is a decent approach, but do make sure you open them in append only mode.
in the long run (1k+ of writes/s), look at other databases - particularly cassandra for write heavy applications.
if you do go the sql insert route, wrap the pdo execute statements in a transaction. doing so will greatly speed up the process.
LOAD DATA is disabled on some servers for security reasons:
http://dev.mysql.com/doc/mysql-security-excerpt/5.0/en/load-data-local.html
Also I don't enjoy writing my applications upside down to maintain database integrity.

Optimize massive MySQL INSERTs

I've got an application which needs to run a daily script; the daily script consists in downloading a CSV file with 1,000,000 rows, and inserting those rows into a table.
I host my application in Dreamhost. I created a while loop that goes through all the CSV's rows and performs an INSERT query for each one. The thing is that I get a "500 Internal Server Error". Even if I chop it out in 1000 files with 1000 rows each, I can't insert more than 40 or 50 thousand rows in the same loop.
Is there any way that I could optimize the input? I'm also considering going with a dedicated server; what do you think?
Thanks!
Pedro
Most databases have an optimized bulk insertion process - MySQL's is the LOAD DATA FILE syntax.
To load a CSV file, use:
LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
FIELDS TERMINATED BY ',' ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
IGNORE 1 LINES;
Insert multiple values, instead of doing
insert into table values(1,2);
do
insert into table values (1,2),(2,3),(4,5);
Up to an appropriate number of rows at a time.
Or do bulk import, which is the most efficient way of loading data, see
http://dev.mysql.com/doc/refman/5.0/en/load-data.html
Normally I would say just use LOAD DATA INFILE, but it seems you can't with your shared hosting environment.
I haven't used MySQL in a few years, but they have a very good document which describes how to speed up insertions for bulk insertions:
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
A few ideas that can be gleaned from this:
Disable/enable keys around the insertions:
ALTER TABLE tbl_name DISABLE KEYS;
ALTER TABLE tbl_name ENABLE KEYS;
Use many values in your insert statements.
I.e.: INSERT INTO table (col1, col2) VALUES (val1, val2),(.., ..), ...
If I recall correctly, you can have up to 4096 values per insertion statement.
Run a FLUSH TABLES command before you even start, to ensure that there are no pending disk writes that may hurt your insertion performance.
I think this will make things fast. I would suggest using LOCK TABLES, but I think disabling the keys makes that moot.
UPDATE
I realized after reading this that by disabling your keys you may remove consistency checks that are important for your file loading. You can fix this by:
Ensuring that your table has no data that "collides" with the new data being loaded (if you're starting from scratch, a TRUNCATE statement will be useful here).
Writing a script to clean your input data to ensure no duplicates locally. Checking for duplicates is probably costing you a lot of database time anyway.
If you do this, ENABLE KEYS should not fail.
You can create cronjob script which adds x records to the database at one request.
Cronjob script will check if last import have not addded all needed rows he takes another x rows.
So you can add as many you need rows.
If you have your dedicated server it's more easier. You just run loop with all insert queries.
Of course you can try to set time_limit to 0 (if it's working on dreamhost) or make it bigger.
Your PHP script is most likely being terminated because it exceeded the script time limit. Since you're on a shared host, you're pretty much out of luck.
If you do switch to a dedicated server and if you get shell access, the best way would be to use the mysql command-line tool to insert the data.
OMG Ponies suggestion is great, but I've also 'manually' formatted data into the same format that mysqldump uses, then loaded it that way. Very fast.
Have you tried doing transactions? Just send the command BEGIN to MySQL, do all your inserts then do COMMIT. This would speed it up significantly,but like casablanca said, your script is probably timing out as well.
I've ran into this problem myself before and nos pretty much got it right on the head, but you'll need to do a bit more to get it to perform the best.
I found that in my situation that I couldn't MySQL to accept one large INSERT statement, but found that if I split it up into groups of about 10k INSERTS at a time like how nos suggested then it'll do it's job pretty quickly. One thing to note is that when doing multiple INSERTs like this that you will most likely hit PHP's timeout limit, but this can be avoided by resetting the timout with set_time_limit($seconds), I found that doing this after each successful INSERT worked really well.
You have to be careful about doing this, because you could find yourself in a loop on accident with an unlimited timout and for that I would suggest testing to make sure that each INSERT was successful by either checking for errors reported by MySQL with mysql_errno() or mysql_error(). You could also catch errors by checking the number of rows affected by the INSERT with mysql_affected_rows(). You could then stop after the first error happens.
It would be better if you use sqlloader.
You would need two things first control file that specifies the actions which SQL Loader should do and second csv file that you want to be loaded
Here is the below link that would help you out.
http://www.oracle-dba-online.com/sql_loader.htm
Go to phpmyadmin and select the table you would like to insert into.
Under the "operations" tab, and then the ' table options' option /section , change the storage engine from InnoDB to MyISAM.
I once had a similar challenge.
Have a good time.

What should I do to make mysql 100% optimal?

Recently I've been doing quite a big project with php + mysql. And now I'm concerned about my mysql. What should I do to make my mysql as optimal as possible? Tell everything you know, I'll be really very grateful.
Second question, I use one mysql query per page load which takes information from mysql. It's quite a big query, because I take information from a few tables with a join. Maybe I should do something else?
Thank you.
Some top tips from MySQL Performance tips forge
Specific Query Performance:
Use EXPLAIN to profile the query
execution plan
Use Slow Query Log (always have it
on!)
Don't use DISTINCT when you have or
could use GROUP BY Insert
performance
Batch INSERT and REPLACE
Use LOAD DATA instead of INSERT
LIMIT m,n may not be as fast as it
sounds
Don't use ORDER BY RAND() if you
have > ~2K records
Use SQL_NO_CACHE when you are
SELECTing frequently updated data or
large sets of data
Avoid wildcards at the start of LIKE
queries
Avoid correlated subqueries and in
select and where clause (try to
avoid in)
Scaling Performance Tips:
Use benchmarking
isolate workloads don't let administrative work interfere with customer performance. (ie backups)
Debugging sucks, testing rocks!
As your data grows, indexing may change (cardinality and selectivity change). Structuring may want to change. Make your schema as modular as your code. Make your code able to scale. Plan and embrace change, and get developers to do the same.
Network Performance Tips:
Minimize traffic by fetching only what you need.
1. Paging/chunked data retrieval to limit
2. Don't use SELECT *
3. Be wary of lots of small quick queries if a longer query can be more efficient
Use multi_query if appropriate to reduce round-trips
Use stored procedures to avoid bandwidth wastage
OS Performance Tips:
Use proper data partitions
1. For Cluster. Start thinking about Cluster before you need them
Keep the database host as clean as possible. Do you really need a windowing system on that server?
Utilize the strengths of the OS
pare down cron scripts
create a test environment
Learn to use the explain tool.
Three things:
Joins are not necessarily suboptimal. Oftentimes schemata that use joins will be faster than those that achieve the same but avoid table joins. The important thing is to know that your joins are optimal. EXPLAIN is very helpful but you also need to know how indexes work.
If you're grabbing data from the DB on every page hit, consider if a cacheing system would work for you. If so, check out PHP memcache and memcached. It's easy to use in PHP and very fast. It's popular for a reason.
Back to mysql: make sure you're key buffer is sized correctly. You can also think about using dedicated key buffers for critical indices that should remain in cache. Read about CACHE INDEX and LOAD INDEX INTO CACHE. See also here.
"...because I take information from a few tables with a join"
Joins, even "big" joins aren't bad. Just be sure that you have good indexes.
Also note that performance with a couple of records is a lot different than performance with hundreds of thousands of records, so test accordingly.
For performance, this book is good: High Perofmanace MYSQL. The associated blog is good too.
my 2cents: set your log_slow_queries to <2sec and use mysqlsla (get it from hackmysql.com) to analyse the 'slow' queries... Thisway you can just drilldown into the slower queries as they come along...
(the mysqlsla can also benefit from the log-queries-not-using-indexes option)
on mysqlhack.com there's a script called 'mysqlreport' that gives estimates on how your installation is runnig... (once it's running a while) and also gives pointers as to where to tune your setup more precisely...
Being perfect is a bit of a challenge and not the first target to set yourself.
Enable mysql logging of all queries, and write some code which parses the log files and removes any literal values from the SQL statements.
e.g. changes
SELECT * FROM atable WHERE something=5 AND other='splodgy';
and
SELECT * FROM atable WHERE something=1 AND other='zippy';
to something like:
SELECT * FROM atable WHERE something=:1 AND other=:2;
(Sorry, I've not got my code which does this to hand - but it's not rocket science)
Then shove the re-written log into a table so you can prioritize your performance fixes based on length and frequency of execution.

INSERT 6000 Rows - best practice

I have a PHP script that calls an API method that can easily return 6k+ results.
I use PEAR DB_DataObject to write each row in a foreach loop to the DB.
The above script is batch processing 20 users at a time - and although some will only have a few results from the API others will have more. Worst case is that all have 1000's of results.
The loop to call the API seems to be ok, batches of 20 every 5 minutes works fine. My only concern is 1000's of mysql INSERTs for each user (with a long pause between each user for fresh API calls)
Is there a good way to do this? Or am I doing it a good way?!
Well, the fastest way to do it would be to do one insert statement with lots of values, like this:
INSERT INTO mytable (col1, col2) VALUES ( (?,?), (?,?), (?,?), ...)
But that would probably require ditching the DB_DataObject method you are using now. You'll just have to weigh the performance benefits of doing it that way vs. the "ease of use" benefits of using DB_DataObject.
Like Kalium said, check where the bottleneck is.
If it is really the database, you could try the bulk import feature some DBMS offer.
In DB2, for example, it is called LOAD.
It works without SQL, but reads directly from a named pipe.
It is especially designed to be fast when you need to bring a large number of new rows
into the database.
It can be configured to skip checks and index building, making it even faster.
Well, is your method producing more load than you can handle? If it's working, then I don't see any reason to change it offhand.
Database abstraction layers usually add a pretty decent amount of overhead. I've found that, in PHP atleast, it's much easier to use a plain mysql_query for the sake of speed than it is to optimize your library of choice.
Like Eric P and weinzierl.name have said, using a multi-row insert or LOAD will give you the best direct performance.
I have a few ideas, but you will have to verify them with testing.
If the table you are inserting to has indexes, try to make sure they are optimized for inserts.
Check out optimization options here:
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
Consider mysqli directly, or Pear::MDB2 or PDO. I understand that Pear::DB is fairly slow, though I don't use PEAR myself, so can't verify.
MySQL LOAD DATA INFILE feature is probably the fastest way to do what you want.
You can take a look at the chapter Speed of INSERT statements on MySQL Documentation.
It talks about a lot of way to improve INSERTING in MySQL.
I don't think a few thousand records should put any strain on your database; even my laptop should handle it nicely. Your biggest concern might be(come) gigantic tables if you don't do any cleanup or partitioning. Avoid premature optimization on that part.
As for your method, make sure you do each user (or batch) in a separate transaction. If mysql, make sure you're using innodb to avoid unnecessary locking. If you're already using innodb/postgres/other database that supports transactions you might see a significant performance increase.
Consider using COPY (at least on postgres - unsure about mysql).
Make sure your table is properly indexed (including removing unused ones). Indexes hurt insert speed.
Remember to optimize/vacuum regularly.

Is it possible to do count(*) while doing insert...select... query in mysql/php?

Is it possible to do a simple count(*) query in a PHP script while another PHP script is doing insert...select... query?
The situation is that I need to create a table with ~1M or more rows from another table, and while inserting, I do not want the user feel the page is freezing, so I am trying to keep update the counting, but by using a select count(\*) from table when background in inserting, I got only 0 until the insert is completed.
So is there any way to ask MySQL returns partial result first? Or is there a fast way to do a series of insert with data fetched from a previous select query while having about the same performance as insert...select... query?
The environment is php4.3 and MySQL4.1.
Without reducing performance? Not likely. With a little performance loss, maybe...
But why are you regularily creating tables and inserting millions of row? If you do this only very seldom, can't you just warn the admin (presumably the only one allowed to do such a thing) that this takes a long time. If you're doing this all the time, are you really sure you're not doing it wrong?
I agree with Stein's comment that this is a red flag if you're copying 1 million rows at a time during a PHP request.
I believe that in a majority of cases where people are trying to micro-optimize SQL, they could get much greater performance and throughput by approaching the problem in a different way. SQL shouldn't be your bottleneck.
If you're doing a single INSERT...SELECT, then no, you won't be able to get intermediate results. In fact this would be a Bad Thing, as users should never see a database in an intermediate state showing only a partial result of a statement or transaction. For more information, read up on ACID compliance.
That said, the MyISAM engine may play fast and loose with this. I'm pretty sure I've seen MyISAM commit some but not all of the rows from an INSERT...SELECT when I've aborted it part of the way through. You haven't said which engine your table is using, though.
The other users can't see the insertion until it's committed. That's normally a good thing, since it makes sure they can't see half-done data. However, if you want them to see intermediate data, you could throw in an occassional call to "commit" while you're inserting.
By the way - don't let anybody tell you to turn autocommit on. That a HUGE time waster. I have a "delete and re-insert" job on my database that takes 1/3rd as long when I turn off autocommit.
Just to be clear, MySQL 4 isn't configured by default to use transactions. It uses the MyISAM table type which locks the entire table for each insert, if I remember correctly.
Your best bet would be to use one of the MySQL bulk insertion functions, such as LOAD DATA INFILE, as these are dramatically faster at inserting large amounts of data. As for the counting, well, you could break the inserts into N groups of 1000 (or Y) then divide your progress meter into N sections and just update it on each group's request.
Edit: Another thing to consider is, if this is static data for a template, then you could use a "select into" to create a new table with the same data. Not sure what your application is, or the intended functionality, but that could work as well.
If you can get to the console, you can ask various status questions that will give you the information you are looking for. There's a command that goes something like "SHOW processlist".

Categories