I am doing a booking site using PHP and MySql where i will get lots of data for insertion for a single insertion. Means if i get 1000 booking at a time i will be very slow. So what i am thinking to dump those data in MongoDb and run task to save in MySql. Also i am thing to use Redis for caching most viewed data.
Right now i am directly inserting in db.
Please suggest any one has any idea/suggestion about it.
In pure insert terms, it's REALLY hard to outrun MySQL... It's one of the fastest pure-append engines out there (that flushes consistently to disk).
1000 rows is nothing in MySQL insert performance. If you are falling at all behind, reduce the number of secondary indexes.
Here's a pretty useful benchmark: https://www.percona.com/blog/2012/05/16/benchmarking-single-row-insert-performance-on-amazon-ec2/, showing 10,000-25,000 inserts individual inserts per second.
Here is another comparing MySQL and MongoDB: DB with best inserts/sec performance?
Related
My question involves what the most efficient way to store an entire JSON document in a database table and refresh it periodically is.
Essentially, I'm calling the Google Analytics API once every 15 minutes via a cron job to pull out data about my site. I'm dumping this information into a sql table so that my front end application can search, sort and consume it. This JSON is paginated, such that only 5,000 rows come through at a time. I'll be storing as many as 100,000.
What I'm trying to do is optimize the way I rebuild the table. The most naive approach would be to truncate the table and insert every row from the JSON fresh. I have the feeling this is a bad approach, but maybe I'm underestimating sql.
I could also update each existing row and add new rows as necessary. However, I'm struggling with how I should delete old rows that might not be in the freshest JSON object.
Or perhaps I'm missing a more obvious solution.
The real answer to this question is it depends on what works the best. As I am not familiar with the data I cant give you a straight forward answer but some guidelines.
Firstly 100 000 rows is nothing for a SQL server to handle. So truncating the table and inserting the values fresh might actually be workable, however if this data was to grow substantially this might not be a solution that scales well. The main disadvantage to this approach is that for a period of time the table will be empty and this might be a problem for some users.
Summary of this approach:
Easy and quick to code and maintain.
Truncate will always be fast but insert will slow down as volumes increases.
Data will be offline during truncate and insert cycle.
Inserting and updating as we go along is known as Upserts/Merges. This approach will involve more work but the data will always be online. One of the difficulties you face is working with the JSON data and the SQL data(finding differences in the native JSON dataset compared to SQL table), this is going to be ineffective and cumbersome.
So I would create a staging table for the JSON. This table will be a exact copy of the final production table. I would then use LEFT and RIGHT JOINS to insert the new data and remove the deleted data. You could also create a hash for each row and compare these hashes to identify the rows that have changes and then update only were necessary. All these transformations can be handles in a simple SQL script. Yes you are underestimating SQL a bit...
Summary of this approach:
More complicated to code but not difficult to code simple joins and hash comparisons will do the trick.
Only insert new value, update changed values and delete old values. When scaling this solution it will eventually outperform the truncate, insert cycle.
Data remains online all the time.
If you need clarification around this please ask away.
I have a problem with a project I am currently working on, built in PHP & MySQL. The project itself is similar to an online bidding system. Users bid on a project, and they get a chance to win if they follow their bid by clicking and cliking again.
The problem is this: if 5 users for example, enter the game at the same time, I get a 8-10 seconds delay in the database - I update the database using the UNIX_TIMESTAMP(CURRENT_TIMESTAMP), which makes the whole system of the bids useless.
I want to mention too that the project is very database intensive (around 30-40 queries per page) and I was thinking maybe the queries get delayed, but I'm not sure if that's happening. If that's the case though, any suggestions how to avoid this type of problem?
Hope I've been at least clear with this issue. It's the first time it happened to me and I would appreciate your help!
You can decide on
Optimizing or minimizing required queries.
You can cache queries do not need to update on each visit.
You can use Summery tables
Update the queries only on changes.
You have to do this cleverly. You can follow this MySQLPerformanceBlog
I'm not clearly on what you're doing, but let me elaborate on what you said. If you're using UNIX_TIMESTAMP(CURRENT_TIMESTAMP()) in your MySQL query you have a serious problem.
The problem with your approach is that you are using MySQL functions to supply the timestamp record that will be stored in the database. This is an issue, because then you have to wait on MySQL to parse and execute your query before that timestamp is ever generated (and some MySQL engines like MyISAM use table-level locking). Other engines (like InnoDB) have slower writes due to row-level locking granularity. This means the time stored in the row will not necessarily reflect the time the request was generated to insert said row. Additionally, it can also mean that the time you're reading from the database is not necessarily the most current record (assuming you are updating records after they were inserted into the table).
What you need is for the PHP request that generates the SQL query to provide the TIMESTAMP directly in the SQL query. This means the timestamp reflects the time the request is received by PHP and not necessarily the time that the row is inserted/updated into the database.
You also have to be clear about which MySQL engine you're table is using. For example, engines like InnoDB use MVCC (Multi-Version Concurrency Control). This means while a row is being read it can be written to at the same time. If this happens the database engine uses something called a page table to store the existing value that will be read by the client while the new value is being updated. That way you have guaranteed row-level locking with faster and more stable reads, but potentially slower writes.
So I'm trying to import some sales data into my MySQL database. The data is originally in the form of a raw CSV file, which my PHP application needs to first process, then save the processed sales data to the database.
Initially I was doing individual INSERT queries, which I realized was incredibly inefficient (~6000 queries taking almost 2 minutes). I then generated a single large query and INSERTed the data all at once. That gave us a 3400% increase in efficiency, and reduced the query time to just over 3 seconds.
But as I understand it, LOAD DATA INFILE is supposed to be even quicker than any sort of INSERT query. So now I'm thinking about writing the processed data to a text file and using LOAD DATA INFILE to import it into the database. Is this the optimal way to insert large amounts of data to a database? Or am I going about this entirely the wrong way?
I know a few thousand rows of mostly numeric data isn't a lot in the grand scheme of things, but I'm trying to make this intranet application as quick/responsive as possible. And I also want to make sure that this process scales up in case we decide to license the program to other companies.
UPDATE:
So I did go ahead and test LOAD DATA INFILE out as suggested, thinking it might give me only marginal speed increases (since I was now writing the same data to disk twice), but I was surprised when it cut the query time from over 3300ms down to ~240ms. The page still takes about ~1500ms to execute total, but it's still noticeably better than before.
From here I guess I'll check to see if I have any superfluous indexes in the database, and, since all but two of my tables are InnoDB, I will look into optimizing the InnoDB buffer pool to optimize the overall performance.
LOAD DATA INFILE is very fast, and is the right way to import text files into MySQL. It is one of the recommended methods for speeding up the insertion of data -up to 20 times faster, according to this:
https://dev.mysql.com/doc/refman/8.0/en/insert-optimization.html
Assuming that writing the processed data back to a text file is faster than inserting it into the database, then this is a good way to go.
LOAD DATA or multiple inserts are going to be much better than single inserts; LOAD DATA saves you a tiny little bit you probably don't care about that much.
In any case, do quite a lot but not too much in one transaction - 10,000 rows per transaction generally feels about right (NB: this is not relevant to non-transactional engines). If your transactions are too small then it will spend all its time syncing the log to disc.
Most of the time doing a big insert is going to come from building indexes, which is an expensive and memory-intensive operation.
If you need performance,
Have as few indexes as possible
Make sure the table and all its indexes fit in your innodb buffer pool (Assuming innodb here)
Just add more ram until your table fits in memory, unless that becomes prohibitively expensive (64G is not too expensive nowadays)
If you must use MyISAM, there are a few dirty tricks there to make it better which I won't discuss further.
Guys, i had the same question, my needs might have been a little more specific than general, but i have written a post about my findings here.
http://www.mediabandit.co.uk/blog/215_mysql-bulk-insert-vs-load-data
For my needs load data was fast, but the need to save to a flat file on the fly meant the average load times took longer than a bulk insert. Moreover i wasnt required to do more than say 200 queries, where before i was doing this one at a time, i'm now bulking them up, the time savings are in the region of seconds.
Anyway, hopefully this will help you?
You should be fine with your approach. I'm not sure how much faster LOAD DATA INFILE is compared to bulk INSERT, but I've heard the same thing, that it's supposed to be faster.
Of course, you'll want to do some benchmarks to be sure, but I'd say it's worth writing some test code.
Using PHP (1900 secs time limit and more than 1GB memory limit) and MySQL (using PEAR::MDB2) on this one...
I am trying to create a search engine that will load data from site feeds in a mysql database. Some sites have rather big feeds with lots of data in them (for example more than 80.000 records in just one file). Some data checking for each of the records is done prior to inserting the record in the database (data checking that might also insert or update a mysql table).
My problem is as many of you might have already understood...time! For each record in the feed there are more than 20 checks and for a feed with eg: 10.000 records there might be >50.000 inserts to the database.
I tried to do this with 2 ways:
Read the feed and store the data in an array and then loop through the array and do the data checking and inserts. (This proves to be the fastest of all)
Read the feed and do the data checking line by line and insert.
The database uses indexes on each field that is constantly queried. The PHP code is tweaked with no extra variables and the SQL queries are simple select, update and insert statements.
Setting time limits higher and memory is nor a problem. The problem is that I want this operation to be faster.
So my question is:
How can i make the process of importing the feed's data faster? Are there any other tips that I might not be aware of?
Using LOAD DATA INFILE is often many times faster than using INSERT to do a bulk load.
Even if you have to do your checks in PHP code, dump it to a CSV file and then use LOAD DATA INFILE, this can be a big win.
If your import is a one time thing, and you use a fulltext index, a simple tweak to speed the import up is to remove the index, import all your data and add the fulltext index once the import is done. This is much faster, according to the docs :
For large data sets, it is much faster
to load your data into a table that
has no FULLTEXT index and then create
the index after that, than to load
data into a table that has an existing
FULLTEXT index.
You might take a look at php's PDO extension, and it's support for preapeared statements. You could also consider to use stored procedures in mysql.
2) You could take a look at other database systems, as CouchDB and others, and sacrifice consistency for performance.
I managed to double the inserted data with INSERT DELAYED command in 1800 sec. The 'LOAD DATA INFILE' suggestion was not the case since the data should be strongly validated and it would messup my code.
Thanks for all your answers and suggestions :)
Is it possible to do a simple count(*) query in a PHP script while another PHP script is doing insert...select... query?
The situation is that I need to create a table with ~1M or more rows from another table, and while inserting, I do not want the user feel the page is freezing, so I am trying to keep update the counting, but by using a select count(\*) from table when background in inserting, I got only 0 until the insert is completed.
So is there any way to ask MySQL returns partial result first? Or is there a fast way to do a series of insert with data fetched from a previous select query while having about the same performance as insert...select... query?
The environment is php4.3 and MySQL4.1.
Without reducing performance? Not likely. With a little performance loss, maybe...
But why are you regularily creating tables and inserting millions of row? If you do this only very seldom, can't you just warn the admin (presumably the only one allowed to do such a thing) that this takes a long time. If you're doing this all the time, are you really sure you're not doing it wrong?
I agree with Stein's comment that this is a red flag if you're copying 1 million rows at a time during a PHP request.
I believe that in a majority of cases where people are trying to micro-optimize SQL, they could get much greater performance and throughput by approaching the problem in a different way. SQL shouldn't be your bottleneck.
If you're doing a single INSERT...SELECT, then no, you won't be able to get intermediate results. In fact this would be a Bad Thing, as users should never see a database in an intermediate state showing only a partial result of a statement or transaction. For more information, read up on ACID compliance.
That said, the MyISAM engine may play fast and loose with this. I'm pretty sure I've seen MyISAM commit some but not all of the rows from an INSERT...SELECT when I've aborted it part of the way through. You haven't said which engine your table is using, though.
The other users can't see the insertion until it's committed. That's normally a good thing, since it makes sure they can't see half-done data. However, if you want them to see intermediate data, you could throw in an occassional call to "commit" while you're inserting.
By the way - don't let anybody tell you to turn autocommit on. That a HUGE time waster. I have a "delete and re-insert" job on my database that takes 1/3rd as long when I turn off autocommit.
Just to be clear, MySQL 4 isn't configured by default to use transactions. It uses the MyISAM table type which locks the entire table for each insert, if I remember correctly.
Your best bet would be to use one of the MySQL bulk insertion functions, such as LOAD DATA INFILE, as these are dramatically faster at inserting large amounts of data. As for the counting, well, you could break the inserts into N groups of 1000 (or Y) then divide your progress meter into N sections and just update it on each group's request.
Edit: Another thing to consider is, if this is static data for a template, then you could use a "select into" to create a new table with the same data. Not sure what your application is, or the intended functionality, but that could work as well.
If you can get to the console, you can ask various status questions that will give you the information you are looking for. There's a command that goes something like "SHOW processlist".