I have a PHP script that calls an API method that can easily return 6k+ results.
I use PEAR DB_DataObject to write each row in a foreach loop to the DB.
The above script is batch processing 20 users at a time - and although some will only have a few results from the API others will have more. Worst case is that all have 1000's of results.
The loop to call the API seems to be ok, batches of 20 every 5 minutes works fine. My only concern is 1000's of mysql INSERTs for each user (with a long pause between each user for fresh API calls)
Is there a good way to do this? Or am I doing it a good way?!
Well, the fastest way to do it would be to do one insert statement with lots of values, like this:
INSERT INTO mytable (col1, col2) VALUES ( (?,?), (?,?), (?,?), ...)
But that would probably require ditching the DB_DataObject method you are using now. You'll just have to weigh the performance benefits of doing it that way vs. the "ease of use" benefits of using DB_DataObject.
Like Kalium said, check where the bottleneck is.
If it is really the database, you could try the bulk import feature some DBMS offer.
In DB2, for example, it is called LOAD.
It works without SQL, but reads directly from a named pipe.
It is especially designed to be fast when you need to bring a large number of new rows
into the database.
It can be configured to skip checks and index building, making it even faster.
Well, is your method producing more load than you can handle? If it's working, then I don't see any reason to change it offhand.
Database abstraction layers usually add a pretty decent amount of overhead. I've found that, in PHP atleast, it's much easier to use a plain mysql_query for the sake of speed than it is to optimize your library of choice.
Like Eric P and weinzierl.name have said, using a multi-row insert or LOAD will give you the best direct performance.
I have a few ideas, but you will have to verify them with testing.
If the table you are inserting to has indexes, try to make sure they are optimized for inserts.
Check out optimization options here:
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
Consider mysqli directly, or Pear::MDB2 or PDO. I understand that Pear::DB is fairly slow, though I don't use PEAR myself, so can't verify.
MySQL LOAD DATA INFILE feature is probably the fastest way to do what you want.
You can take a look at the chapter Speed of INSERT statements on MySQL Documentation.
It talks about a lot of way to improve INSERTING in MySQL.
I don't think a few thousand records should put any strain on your database; even my laptop should handle it nicely. Your biggest concern might be(come) gigantic tables if you don't do any cleanup or partitioning. Avoid premature optimization on that part.
As for your method, make sure you do each user (or batch) in a separate transaction. If mysql, make sure you're using innodb to avoid unnecessary locking. If you're already using innodb/postgres/other database that supports transactions you might see a significant performance increase.
Consider using COPY (at least on postgres - unsure about mysql).
Make sure your table is properly indexed (including removing unused ones). Indexes hurt insert speed.
Remember to optimize/vacuum regularly.
Related
I am writing a fairly simple webapp that pulls data from 3 tables in a mysql database. Because I don't need a ton of advanced filtering it seems theoretically faster to construct and then work within large multi-dimensional arrays instead of doing a mysql query whenever possible.
In theory I could just have one query from each table and build large arrays with the results, essentially never needing to query that table again. Is this a good practice, or is it better to just query for the data when it's needed? Or is there some kind of balance, and if so, what is it?
PHP arrays can be very fast, but it depends on how big are those tables, when the numbers get huge MySQL is going to be faster because, with the right indexes, it won't have to scan all the data, but just pick the ones you need.
I don't recommend you to try what you're suggesting, MySQL has a query cache, so repeated queries won't even hit the disk, so in a way, the optimization you're thinking about is already done.
Finally, as Chris said, never think about optimizations when they are not needed.
About good practices, a good practice is writing the simplest (and easy to read) code that does the job.
If in the end you'll decide to apply an optimization, profile the performance, you might be surprised, by unexpected results.
it depends ...
Try each solution with microtime function and you'll seethe results.
I think a MySQL Query cache can be a good solution. and if you've filtering on , you can create view.
If you can pull it off with a single query - go for it! In your case, I'd say that is a good practice. You might also consider having your data in a CSV or similar file, which would give you even better performance.
I absolutely concur with chris on optimizations: the LAMP stack is a good solution for 99% of web apps, without any need for optimization. ONLY optimize if you really run into a performance problem.
One more thought for your mental model of php + databases: you did not take into account that reading a lot of data from the database into php also takes time.
I have a simple importer, it goes through each line of a rather big csv and imports it to the database.
My question is: Should I call another method to insert each object (generating a DO and telling it's mapper to insert) or should I hardcode the insert process in the import method, duplicating the code?
I know the elegant thing to do is to call the second method, but I keep hearing in my head that function calls are expensive.
What do you think?
Many RDBMS brands support a special command to do bulk imports. For example:
MySQL: LOAD DATA INFILE
PostgreSQL: COPY
Microsoft SQL Server: BULK INSERT
Oracle: SQL*Loader
Using these commands is preferred over inserting one row at a time from a CSV data source because the bulk-loading command usually runs at least an order of magnitude faster.
I don't think this matters too much. Consider a bulk insert. At least make sure you're using a transaction, and consider to disable indices before inserting.
It shouldn't matter, as the insertion will take probably orders of magnitude longer than the php code.
As others have stated, bulk insert will give you much more benefit.
Those line-level optimizations will only make you blind for the good higher level optimizations.
If you are unsure, do a simple timing with both ways, it shouldn't take longer than a couple of minutes to find out.
Consider combining both approaches to make batch inserts, if all-at-once hits some memory/time/.... limits.
Recently I've been doing quite a big project with php + mysql. And now I'm concerned about my mysql. What should I do to make my mysql as optimal as possible? Tell everything you know, I'll be really very grateful.
Second question, I use one mysql query per page load which takes information from mysql. It's quite a big query, because I take information from a few tables with a join. Maybe I should do something else?
Thank you.
Some top tips from MySQL Performance tips forge
Specific Query Performance:
Use EXPLAIN to profile the query
execution plan
Use Slow Query Log (always have it
on!)
Don't use DISTINCT when you have or
could use GROUP BY Insert
performance
Batch INSERT and REPLACE
Use LOAD DATA instead of INSERT
LIMIT m,n may not be as fast as it
sounds
Don't use ORDER BY RAND() if you
have > ~2K records
Use SQL_NO_CACHE when you are
SELECTing frequently updated data or
large sets of data
Avoid wildcards at the start of LIKE
queries
Avoid correlated subqueries and in
select and where clause (try to
avoid in)
Scaling Performance Tips:
Use benchmarking
isolate workloads don't let administrative work interfere with customer performance. (ie backups)
Debugging sucks, testing rocks!
As your data grows, indexing may change (cardinality and selectivity change). Structuring may want to change. Make your schema as modular as your code. Make your code able to scale. Plan and embrace change, and get developers to do the same.
Network Performance Tips:
Minimize traffic by fetching only what you need.
1. Paging/chunked data retrieval to limit
2. Don't use SELECT *
3. Be wary of lots of small quick queries if a longer query can be more efficient
Use multi_query if appropriate to reduce round-trips
Use stored procedures to avoid bandwidth wastage
OS Performance Tips:
Use proper data partitions
1. For Cluster. Start thinking about Cluster before you need them
Keep the database host as clean as possible. Do you really need a windowing system on that server?
Utilize the strengths of the OS
pare down cron scripts
create a test environment
Learn to use the explain tool.
Three things:
Joins are not necessarily suboptimal. Oftentimes schemata that use joins will be faster than those that achieve the same but avoid table joins. The important thing is to know that your joins are optimal. EXPLAIN is very helpful but you also need to know how indexes work.
If you're grabbing data from the DB on every page hit, consider if a cacheing system would work for you. If so, check out PHP memcache and memcached. It's easy to use in PHP and very fast. It's popular for a reason.
Back to mysql: make sure you're key buffer is sized correctly. You can also think about using dedicated key buffers for critical indices that should remain in cache. Read about CACHE INDEX and LOAD INDEX INTO CACHE. See also here.
"...because I take information from a few tables with a join"
Joins, even "big" joins aren't bad. Just be sure that you have good indexes.
Also note that performance with a couple of records is a lot different than performance with hundreds of thousands of records, so test accordingly.
For performance, this book is good: High Perofmanace MYSQL. The associated blog is good too.
my 2cents: set your log_slow_queries to <2sec and use mysqlsla (get it from hackmysql.com) to analyse the 'slow' queries... Thisway you can just drilldown into the slower queries as they come along...
(the mysqlsla can also benefit from the log-queries-not-using-indexes option)
on mysqlhack.com there's a script called 'mysqlreport' that gives estimates on how your installation is runnig... (once it's running a while) and also gives pointers as to where to tune your setup more precisely...
Being perfect is a bit of a challenge and not the first target to set yourself.
Enable mysql logging of all queries, and write some code which parses the log files and removes any literal values from the SQL statements.
e.g. changes
SELECT * FROM atable WHERE something=5 AND other='splodgy';
and
SELECT * FROM atable WHERE something=1 AND other='zippy';
to something like:
SELECT * FROM atable WHERE something=:1 AND other=:2;
(Sorry, I've not got my code which does this to hand - but it's not rocket science)
Then shove the re-written log into a table so you can prioritize your performance fixes based on length and frequency of execution.
Is it possible to do a simple count(*) query in a PHP script while another PHP script is doing insert...select... query?
The situation is that I need to create a table with ~1M or more rows from another table, and while inserting, I do not want the user feel the page is freezing, so I am trying to keep update the counting, but by using a select count(\*) from table when background in inserting, I got only 0 until the insert is completed.
So is there any way to ask MySQL returns partial result first? Or is there a fast way to do a series of insert with data fetched from a previous select query while having about the same performance as insert...select... query?
The environment is php4.3 and MySQL4.1.
Without reducing performance? Not likely. With a little performance loss, maybe...
But why are you regularily creating tables and inserting millions of row? If you do this only very seldom, can't you just warn the admin (presumably the only one allowed to do such a thing) that this takes a long time. If you're doing this all the time, are you really sure you're not doing it wrong?
I agree with Stein's comment that this is a red flag if you're copying 1 million rows at a time during a PHP request.
I believe that in a majority of cases where people are trying to micro-optimize SQL, they could get much greater performance and throughput by approaching the problem in a different way. SQL shouldn't be your bottleneck.
If you're doing a single INSERT...SELECT, then no, you won't be able to get intermediate results. In fact this would be a Bad Thing, as users should never see a database in an intermediate state showing only a partial result of a statement or transaction. For more information, read up on ACID compliance.
That said, the MyISAM engine may play fast and loose with this. I'm pretty sure I've seen MyISAM commit some but not all of the rows from an INSERT...SELECT when I've aborted it part of the way through. You haven't said which engine your table is using, though.
The other users can't see the insertion until it's committed. That's normally a good thing, since it makes sure they can't see half-done data. However, if you want them to see intermediate data, you could throw in an occassional call to "commit" while you're inserting.
By the way - don't let anybody tell you to turn autocommit on. That a HUGE time waster. I have a "delete and re-insert" job on my database that takes 1/3rd as long when I turn off autocommit.
Just to be clear, MySQL 4 isn't configured by default to use transactions. It uses the MyISAM table type which locks the entire table for each insert, if I remember correctly.
Your best bet would be to use one of the MySQL bulk insertion functions, such as LOAD DATA INFILE, as these are dramatically faster at inserting large amounts of data. As for the counting, well, you could break the inserts into N groups of 1000 (or Y) then divide your progress meter into N sections and just update it on each group's request.
Edit: Another thing to consider is, if this is static data for a template, then you could use a "select into" to create a new table with the same data. Not sure what your application is, or the intended functionality, but that could work as well.
If you can get to the console, you can ask various status questions that will give you the information you are looking for. There's a command that goes something like "SHOW processlist".
I have a particular PHP page that, for various reasons, needs to save ~200 fields to a database. These are 200 separate insert and/or update statements. Now the obvious thing to do is reduce this number but, like I said, for reasons I won't bother going into I can't do this.
I wasn't expecting this problem. Selects seem reasonably performant in MySQL but inserts/updates aren't (it takes about 15-20 seconds to do this update, which is naturally unacceptable). I've written Java/Oracle systems that can happily do thousands of inserts/updates in the same time (in both cases running local databases; MySQL 5 vs OracleXE).
Now in something like Java or .Net I could quite easily do one of the following:
Write the data to an in-memory
write-behind cache (ie it would
know how to persist to the database
and could do so asynchronously);
Write the data to an in-memory cache
and use the PaaS (Persistence as a
Service) model ie a listener to the
cache would persist the fields; or
Simply start a background process
that could persist the data.
The minimal solution is to have a cache that I can simply update, which will separately go and upate the database in its own time (ie it'll return immediately after update the in-memory cache). This can either be a global cache or a session cache (although a global shared cache does appeal in other ways).
Any other solutions to this kind of problem?
mysql_query('INSERT INTO tableName VALUES(...),(...),(...),(...)')
Above given query statement is better. But we have another solution to improve the performance of insert statement.
Follow the following steps..
1. You just create a csv(comma separated delimited file)or simple txt file and write all the data that you want to insert using file writing mechanism (like FileOutputStream class in Java).
2. use this command
LOAD DATA INFILE 'data.txt' INTO TABLE table2
FIELDS TERMINATED BY '\t';
3 if you are not clear about this command then follow the link
You should be able to do 200 inserts relatively quickly, but it will depend on lots of factors. If you are using a transactional engine and doing each one in its own transaction, don't - that creates way too much I/O.
If you are using a non-transactional engine, it's a bit trickier. Using a single multi-row insert is likely to be better as the flushing policy of MySQL means that it won't need to flush its changes after each row.
You really want to be able to reproduce this on your production-spec development box and analyse exactly why it's happening. It should not be difficult to stop.
Of course, another possibility is that your inserts are slow because of extreme sized tables or large numbers of indexes - in which case you should scale your database server appropriately. Inserting lots of rows into a table whose indexes don't fit into RAM (or doesn't have RAM correctly configured to be used for caching those indexes) generally gets pretty smelly.
BUT don't try to look for a way of complicating your application when there is a way of easily turning it instead, keeping the current algorithm.
One more solution that you could use (instead of tuning mysql :) ) is to use some JMS server and STOMP connection driver for PHP for write data to database server in a asynchronous manner. ActiveMQ have built-in support for STOMP protocol. And there is StompConnect project which is STOMP proxy for any JMS compilant server (OpenMQ, JBossMQ etc).
You can update your local cache (hopefully memcached) and then push the write requests through beanstalkd.
I would suspect a problem with your SQL inserts - it really shouldn't take that long. Would prepared queries help? Does your mysql server need some more memory dedicated to the keyspace? I think some more questions need asked.
How are you doing the inserts, are you doing one insert per record
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
or are you using a single query
mysql_query('INSERT INTO tableName VALUES(...),(...),(...),(...)');
The later of the two options is substantially faster, and from experience the first option will cause it to take much longer as PHP must wait for the first query to finish before moving to the second and so on.
Look at the statistics for your database while you do the inserts. I'm guessing that one of your updates locks the table and therefor all your statements are queued up and you experience this delay. Another thing to look into is your index creation/updating because the more indices you have on a table, the slower all UPDATE and INSERT statements get.
Another thing is that I think you use MYISAM (table engine) which locks the entire table on UPDATE.I suggest you use INNODB instead. INNODB is slower on SELECT-queries, but faster on INSERT and UPDATE because it only locks the row it's working on and not the entire table.
consider this:
mysql_query('start transaction');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('INSERT INTO tableName VALUES(...)');
mysql_query('commit;')
Note that if your table is INSERT-ONLY (no deletes, and no updates on variable-length columns), then inserts will not lock or block reads when using MyISAM.
This may or may not improve insert performance, but it could help if you are having concurrent insert/read issues.
I'm using this, and only purging old records daily, followed by 'optimize table'.
you can use CURL with PHP to do Asynchronous database manipulations.
One possible solution is fork each query into a separate thread but, PHP doesnot support threads. We can use PCNTL functions but it’s a bit tricky for me to use them. I prefer to use this another solution to create fork and perform asynchronous operations.
Refer this
http://gonzalo123.wordpress.com/2010/10/11/speed-up-php-scripts-with-asynchronous-database-queries/