Was curious to know which is faster - If i have an array of 25000 key-value pairs and a MySQL database of identical information, which would be faster to search through?
thanks a lot everyone!
The best way to answer this question is to perform a benchmark.
Although you should just try it out yourself, I'm going to assume that there's a proper index and conclude that the DB can do it faster than PHP due to being built to be all about that.
However, it might come down to network latencies, speed of parsing SQL vs PHP, or DB load and percentage of memory use.
My first thought would be it is faster searching with arrays. But, on the other side it really depends on several factors:
How is your databasa table designed (does it use indexes properly, etc)
How is your query built
Databases are generally pretty optimized for such searches.
What type of search you are doing on the array? There are several type of searches you could do. The slowest is a straight search where you go through each row an check for a value, a faster approach is a binary search.
I presumed that you are comparing a select statement executed directly on a database, and an array search in php. Not
One thing to keep in mind: If your search is CPU intensive on the database it might be worth doing it in PHP even if it's not as fast. It's usually easier to add web servers than database servers when scaling.
Test it - profiling something as simple as this should be trivial.
Also, remember that databases are designed to handle exactly this sort of task, so they'll naturally be good at it. Even a naive binary search would only have 17 compares for this, so 25k elements isn't a lot. The real problem is sorting, but that has been conquered to death over the past 60+ years.
It depends. In mysql you can use indexing, which will increase speed, but with php you don't need to send information through net(if mysql database on another server).
MySQL is built to efficiently sort and search through large amounts of data, such as this. This is especially true when you are searching for a key, since MySQL indexes the primary keys in a table.
Related
So I've been developing an application that needs to compare a lot of data. The max (I'm assuming) at once will be about 28800 rows that need to be compared against another table separately.
So what I'm doing right now is making 28800 ajax calls to a database and it works, but it's obviously not the fastest procedure in the world.
Would it be faster to iterate over an array of about say, 80,000 elements 28800 times and just making 2 db calls?
I'm assuming it is not, but just figured I'd ask to be sure.
For a little background knowledge, check out an earlier question I posted:
How to avoid out of memory error in a browser due to too many ajax calls
Thank you!
Unless there is some very specific reason you are making that many calls it sounds like thre must be an easier way to retrieve all this data, have the database do some of the work, or rethink what is happening.
Can you write a database script that is passed some params and does the comparison?
Chances are the database can work faster than looping over all that data 28800 times as the database uses other methods for comparing and retrieving data. You are in a way comparing brute force looping to a database engine.
Can your write a query that does the comparison?
It sounds like you are pulling the data and then doing more work to it, can you pull the data you need with a single query?
The general rule that I haver found is that MySQL is massively fast on doing simple queries, especially when indexed, and abominably slow on complex ones involving multiple joins or nested sub queries.
This doesn't actually answer your question though, because its too vague. Mysql retrieval is critical with respect to what indexes are applied. It is also critical to the complexity of the query.
So there are no general rules. You must try both and see which is better.
Likewise javascript performance varies wildly between browsers.
A final comment: some things that are simple to define in procedural code, like checking the result of one query to formulate the shape of the next, are massively easier to code than to work out what exactly the optimal SQL syntax should be to combine them into a single query, and even then, you may find that the mysql server is building massive temporary tables to do it.
My question really revolves around the repetitive use of a large amount of data.
I have about 50mb of data that I need to cross reference repetitively during a single php page execution. This task is most easily solved by using sql queries with table joins. The problem is the sheer volume of data that I need to process in an very short amount of time and the number of queries required to do it.
What I am currently doing is dumping the relevant part of each table (usually in excess of 30% or 10k rows) into an array and looping. The table joins are always on a single field, so I built a really basic 'index' of sorts to identify which rows are relevant.
The system works. It's been in my production environment for over a year, but now I'm trying to squeeze even more performance out of it. On one particular page I'm profiling, the second highest total time is attributed to the increment line that loops though these arrays. It's hit count is 1.3 million, for a total execution time of 30 seconds. This represents the work that would have been preformed by about 8200 sql queries it to achieve the same result.
What I'm looking for is anyone else that has run a situation like this. I really can't belive that I'm anywhere near the first person to have large amounts of data that needs to be processed in PHP.
Thanks!
Thank you very much to everyone that offered some advice here. It looks like there's isn't really a sliver bullet here like I was hoping. I think what I'm going to end up doing is using a mix of mysql memory tables and some version of a paged memcache.
This solution depends closely on what are you doing with the data, but I found that working unique-value columns inside array keys accelerate things a lot when you are trying to look up for a row given certain value on a column.
This is because php uses a hash table to store the keys for fast lookups. It's hundreds of times faster than iterating over the array, or using array_search.
But without seeing a code example is hard to say.
Added from comment:
The next step is use some memory database. You can use memory tables in mysql, or SQLite. Also depends on how much of your running environment you control, because those methods would need more memory than a shared hosting provider would usually allow. It would probably also simplify your code because of grouping, sorting, aggregate functions, etc.
Well, I'm looking at a similar situation in which I have a large amount of data to process, and a choice to try to do as much via MySQL queries, or off-loading it to PHP.
So far, my experience has been this:
PHP is a lot slower than using MySQL queries.
MySQL query speed is only acceptable if I cram the logic into a single call, as the latency between calls is severe.
I'm particularly shocked by how slow PHP is for looping over an even modest amount of data. I keep thinking/hoping I'm doing something wrong...
I am writing a fairly simple webapp that pulls data from 3 tables in a mysql database. Because I don't need a ton of advanced filtering it seems theoretically faster to construct and then work within large multi-dimensional arrays instead of doing a mysql query whenever possible.
In theory I could just have one query from each table and build large arrays with the results, essentially never needing to query that table again. Is this a good practice, or is it better to just query for the data when it's needed? Or is there some kind of balance, and if so, what is it?
PHP arrays can be very fast, but it depends on how big are those tables, when the numbers get huge MySQL is going to be faster because, with the right indexes, it won't have to scan all the data, but just pick the ones you need.
I don't recommend you to try what you're suggesting, MySQL has a query cache, so repeated queries won't even hit the disk, so in a way, the optimization you're thinking about is already done.
Finally, as Chris said, never think about optimizations when they are not needed.
About good practices, a good practice is writing the simplest (and easy to read) code that does the job.
If in the end you'll decide to apply an optimization, profile the performance, you might be surprised, by unexpected results.
it depends ...
Try each solution with microtime function and you'll seethe results.
I think a MySQL Query cache can be a good solution. and if you've filtering on , you can create view.
If you can pull it off with a single query - go for it! In your case, I'd say that is a good practice. You might also consider having your data in a CSV or similar file, which would give you even better performance.
I absolutely concur with chris on optimizations: the LAMP stack is a good solution for 99% of web apps, without any need for optimization. ONLY optimize if you really run into a performance problem.
One more thought for your mental model of php + databases: you did not take into account that reading a lot of data from the database into php also takes time.
How would you temporarily store several thousands of key => value or key => array pairs within a single process. Lookups on key will be done continuously within the process, and the data is discarded when the process ends.
Should i use arrays? temporary MySQL tables? Or something in between?
It depends on how many several thousands mean and how big the array gets in the memory. If you can handle it in PHP, you should do it, because the usage of mysql creates a little overhead here.
But if you are on a shared host, or you have limited memory_limit in the php.ini and can't increase it you can use a temporary table in MySQL.
Also you can use some simple and fast key value storage like Memcached or Redis, they can also work in Memory only, and have a real fast lookup of keys (Redis promises Time Complexity of O(1))
Several thousand?! You mean it could take up several KILObytes?!
Are you sure this is going to be an issue? Before optimizing, write the code the simplest, straightforward way, and check later what really needs optimalization. Also, only having the benchmark and the full code will you be able to decide on the proper way of caching. Everything else is a waste of time and the root of all evil...
Memcached is a popular way of caching data.
If you're only running that one process and don't need to worry about concurrent access, I would do it inside php. If you have multiple processes I would use some established solution so you don't have to worry about the details.
It all depends on your application and your hardware. My bet, is to let databases do (especially MySQL) just Databases' work. I mean, not to much work than store and retrieve data. Other DBMS may be real efficient (Informix, for example) but sadly, MySQL is not.
Temporary tables may be more efficient than PHP arrays, but you increase the number of connections tu the DB.
Scalability is an issue too. Doing it in PHP is better in that way.
It is kind of difficult to give a straight answer if we don't get the complete picture.
It depens where you source data is.
If your data is in the database, you better keep it there and manipulate it there and just get the items you need. Use temp tables if necessarily
If you data is already in PHP you probably better keep in there. Although handling data in PHP is quite intensive
If the data lookup will be done with only few queries do it with mysql temporary table.
If there will be many data lookups its almost always best to store it in php side. (connection overhead)
I've been doing a lot of calculating stuff nowadays. Usually I prefer to do these calculations in PHP rather than MySQL though I know PHP is not good at this. I thought MySQL may be worse. But I found some performance problem: some pages were loaded so slowly that 30 seconds' time limit is not enough for them! So I wonder where is the better place to do the calculations, and any principles for that? Suggestions would be appreciated.
Anything that can be done using a RDBMS (GROUPING, SUMMING, AVG) where the data can be filtered on the server side, should be done in the RDBMS.
If the calculation would be better suited in PHP then fine, go with that, but otherwise don't try to do in PHP what a RDBMS was made for. YOU WILL LOSE.
I would recommend doing any row level calculations using the RDBMS.
Not only are you going to benefit from better performance but it also makes your applications more portable if you need to switch to another scripting language, let's say PHP to Python, because you've already sorted, filtered and processed the data using your RBDMS.
It also helps separate your application logic, it has helped me keep my controllers cleaner and neater when working in an MVC environment.
i would say do calculations in languages that were created for that, like c++. But if you choose between mysql and php, php is better.
Just keep track of where your bottlenecks are. If your table gets locked up because you're trying to run some calculations, everyone else is in queue waiting to read/write the data in the selected tables and the queue will continue to grow.
MySQL is typically faster at processing your commands, but PHP should be able to handle simple problems without too much of a fuss. Of course, that does not mean you should be pinging your database multiple times for the same calculation over and over.
You might be better off caching your results if you can and have a cron job updating it once a day/hour (please don't do it every minute, your hosting provider will probably hate you).
Do as much filtering and merging as possible to bring the minimum amount of data into php. Once you have that minimum data set, then it depends on what you are doing, server load, and perhaps other factors.
If you can do something equally well in either, and the sql is not overly complex to write (and maintain) then do that. For simple math, sql is usually a good bet. For string manipulations where the strings will end up about the same length or grow, php is probably a good bet.
The most important thing is to request as little data as possible. The manipulation of the data, at least what sql can do, is secondary to retrieving and transferring the data.
Native MySQL functions are very quick. So do what makes sense in your queries.
If you have multiples servers (ie, a web server and a DB server), note DB servers are much more expensive then web servers, so if you have a lot of traffic or a very busy DB server do not do the 'extras' that can be handled just as easily/efficiently on a web server machine to help prevent slowdowns.
cmptrgeekken is right we would need some more information. BUT if you are needing to do calculations that pertain to database queries or doing operations on them, comparisons certian fields from the database, make the database do it. Doing special queries in SQL is cheape r(as far as time is concerned and it is optimized for that) But both PHP and MySQL are both server side it won't really matter where you are doing the calculations. But like I said before if they are operations on with database information, make a more complicated SQL query and use that.
Use PHP, don't lag up your MySQL doing endless calculations. If your talking about things like sorting its OK to use MySQL for stuff like that, SUM, AVG but don't overdo it.