I have to search over huge data from db through php code. I don't want to give many db hits. So i selected all data from db to be searched and tried to store it in array to do further search on array not on db, but problem is that the data exceeds the limit of array.
What to do?
Don't do that.
Databases are designed specifically to handle large amounts of data. Arrays are not.
Your best bet would be to properly index your db, and then write your optimized query that will get the data you need from the database. You can use PHP to construct the query. You can get almost anything from a db through a good query, no need for PHP array processing.
If you gave a specific example, we could help you construct that SQL query.
Databases are there to filter the data for you. Use the most accurate query you can, and only filter in code if it's too hard (or impossible) to do in SQL.
A full table selection can be much more expensive (especially for I/O on the db server, and it can have dire effects on the server's cache) than a correctly indexed select with the appropriate where clause(s).
There is communication overhead involved when obtaining records from a database to PHP, so not only is it a good idea to reduce the number of calls from PHP to the database, but it is also ideal to minimize the number of elements returned by the database and processed in your PHP code. You should structure your query (depending on the type of database) to return just the entries you need or as few entries as possible for whatever you need to do. There are a lot of databases that support fairly complex operations directly within the database query, and typically the database will do it way faster than PHP.
Two simple steps:
Increase the amount of memory php can use via the memory_limit setting
Install more RAM
Seriously, you'll be better off optimizing your database in a way that you can quickly pull the data you need to work on.
If you are actually running into problems, then run a query analyzer to see which queries are taking too much time. Fix them. Repeat the process.
You do not need to store your data in an array, it makes no sense. Structure your query accordingly your purpose and then fetch the data with PHP.
In case, if you need to increase your memory limit you can change memory_limit in php.ini (or update .htaccess with desired memory limit php_value memory_limit '1024M')
Last but not least - use a pagination, rather than load the whole data at once.
Related
Simplified scenario:
I have a table with about 100,000 rows.
I will need to pick about 300-400 rows, based on certain criteria, to display them on a web page.
Considering the above scenario, which one of the below approaches will you recommend?
Approach 1: Use just one database query to select the entire table into one big array of 100,000 rows. Using loops, pick required 300-400 rows from the array and pass it one to the front-end. Minimum load on the database server, as it's just one query. Put's more load on the PHP, as it has to store and search through an array of 100,000.
Approach 2: Using a loop, PHP will generate a new query for each row of required data. Collecting all the data will require 300-400 independent queries. More load on the server. Compared to approach 1, lesser load on PHP.
Opinions / thoughts will be appreciated!
100,000 rows is a small amount for MySQL rdbms.
You would better do fine tuning of the db server.
So I recommend neither 1 nor 2.
Just:
SELECT * FROM `your_table` WHERE `any_field` = 'YOUR CRITERIA' LIMIT 300;
When your data overcomes 1,000,000 rows you should think about strong indexes optimization and maybe you'll have to create a stored procedure for complicated select. I assure you it's not PHP work in any case.
As your question asks from Performance prospective, your both approaches would consume some resources. I would still go for approach 1 in this case, as it doesn't make query to database again and again, if you generate query for each row i.e. 300-400 queries. When it comes to huge project designing, database always comes as bottleneck.
To be honest, both approaches are not good. Its good practice to have good database design and query selection. What you are trying to achieve could be done by suitable query.
Using PHP to loop through the data is really a bad idea, after all, a database is designed to perform queries. PHP will need to loop through all the record, and doesn't use an index to speed things up; this is roughly equivalent to a 'table scan' in the database.
In order to get the most performance out of your database, it's important to have a good design and (for example) create indexes on the right columns.
Also, if you haven't decided yet what RDBMS you're going to use, depending on your usage, some databases have more advanced options that can assist in better performance (e.g. PostgreSQL has support for geographical information)
Pease provide some actual data (what kind of data will be stored, what kind of fields) and samples of the kind of queries / filters that will need to be performed so that people will be able to give you an actual answer, not a hypothetical
I have a MySQL table with about 9.5K rows, these won't change much but I may slowly add to them.
I have a process where if someone scans a barcode I have to check if that barcode matches a value in this table. What would be the fastest way to accomplish this? I must mention there is no pattern to these values
Here Are Some Thoughts
Ajax call to PHP file to query MySQL table ( my thoughts would this would be slowest )
Load this MySQL table into an array on log in. Then when scanning Ajax call to PHP file to check the array
Load this table into an array on log in. When viewing the scanning page somehow load that array into a JavaScript array and check with JavaScript. (this seems to me to be the fastest because it eliminates Ajax call and MySQL Query. Would it be efficient to split into smaller arrays so I don't lag the server & browser?)
Honestly, I'd never load the entire table for anything. All I'd do is make an AJAX request back to a PHP gateway that then queries the database, and returns the result (or nothing). It can be very fast (as it only depends on the latency) and you can cache that result heavily (via memcached, or something like it).
There's really no reason to ever load the entire array for "validation"...
Much faster to used a well indexed MySQL table, then to look through an array for something.
But in the end it all depends on what you really want to do with the data.
As you mentions your table contain around 9.5K of data. There is no logic to load data on login or scanning page.
Better to index your table and do a ajax call whenever required.
Best of Luck!!
While 9.5 K rows are not that much, the related amount of data would need some time to transfer.
Therefore - and in general - I'd propose to run validation of values on the server side. AJAX is the right technology to do this quite easily.
Loading all 9.5 K rows only to find one specific row, is definitely a waste of resources. Run a SELECT-query for the single value.
Exposing PHP-functionality at the client-side / AJAX
Have a look at the xajax project, which allows to expose whole PHP classes or single methods as AJAX method at the client side. Moreover, xajax helps during the exchange of parameters between client and server.
Indexing to be searched attributes
Please ensure, that the column, which holds the barcode value, is indexed. In case the verification process tends to be slow, look out for MySQL table scans.
Avoiding table scans
To avoid table scans and keep your queries run fast, do use fixed sized fields. E.g. VARCHAR() besides other types makes queries slower, since rows no longer have a fixed size. No fixed-sized tables effectively prevent the database to easily predict the location of the next row of the result set. Therefore, you e.g. CHAR(20) instead of VARCHAR().
Finally: Security!
Don't forget, that any data transferred to the client side may expose sensitive data. While your 9.5 K rows may not get rendered by client's browser, the rows do exist in the generated HTML-page. Using Show source any user would be able to figure out all valid numbers.
Exposing valid barcode values may or may not be a security problem in your project context.
PS: While not related to your question, I'd propose to use PHPexcel for reading or writing spreadsheet data. Beside other solutions, e.g. a PEAR-based framework, PHPExcel depends on nothing.
My question really revolves around the repetitive use of a large amount of data.
I have about 50mb of data that I need to cross reference repetitively during a single php page execution. This task is most easily solved by using sql queries with table joins. The problem is the sheer volume of data that I need to process in an very short amount of time and the number of queries required to do it.
What I am currently doing is dumping the relevant part of each table (usually in excess of 30% or 10k rows) into an array and looping. The table joins are always on a single field, so I built a really basic 'index' of sorts to identify which rows are relevant.
The system works. It's been in my production environment for over a year, but now I'm trying to squeeze even more performance out of it. On one particular page I'm profiling, the second highest total time is attributed to the increment line that loops though these arrays. It's hit count is 1.3 million, for a total execution time of 30 seconds. This represents the work that would have been preformed by about 8200 sql queries it to achieve the same result.
What I'm looking for is anyone else that has run a situation like this. I really can't belive that I'm anywhere near the first person to have large amounts of data that needs to be processed in PHP.
Thanks!
Thank you very much to everyone that offered some advice here. It looks like there's isn't really a sliver bullet here like I was hoping. I think what I'm going to end up doing is using a mix of mysql memory tables and some version of a paged memcache.
This solution depends closely on what are you doing with the data, but I found that working unique-value columns inside array keys accelerate things a lot when you are trying to look up for a row given certain value on a column.
This is because php uses a hash table to store the keys for fast lookups. It's hundreds of times faster than iterating over the array, or using array_search.
But without seeing a code example is hard to say.
Added from comment:
The next step is use some memory database. You can use memory tables in mysql, or SQLite. Also depends on how much of your running environment you control, because those methods would need more memory than a shared hosting provider would usually allow. It would probably also simplify your code because of grouping, sorting, aggregate functions, etc.
Well, I'm looking at a similar situation in which I have a large amount of data to process, and a choice to try to do as much via MySQL queries, or off-loading it to PHP.
So far, my experience has been this:
PHP is a lot slower than using MySQL queries.
MySQL query speed is only acceptable if I cram the logic into a single call, as the latency between calls is severe.
I'm particularly shocked by how slow PHP is for looping over an even modest amount of data. I keep thinking/hoping I'm doing something wrong...
I have a PHP/MySQL based web application that has internationalization support by way of a MySQL table called language_strings with the string_id, lang_id and lang_text fields.
I call the following function when I need to display a string in the selected language:
public function get_lang_string($string_id, $lang_id)
{
$db = new Database();
$sql = sprintf('SELECT lang_string FROM language_strings WHERE lang_id IN (1, %s) AND string_id=%s ORDER BY lang_id DESC LIMIT 1', $db->escape($lang_id, 'int'), $db->escape($string_id, 'int'));
$row = $db->query_first($sql);
return $row['lang_string'];
}
This works perfectly but I am concerned that there could be a lot of database queries going on. e.g. the main menu has 5 link texts, all of which call this function.
Would it be faster to load the entire language_strings table results for the selected lang_id into a PHP array and then call that from the function? Potentially that would be a huge array with much of it redundant but clearly it would be one database query per page load instead of lots.
Can anyone suggest another more efficient way of doing this?
There isn't an answer that isn't case sensitive. You can really look at it on a case by case statement. Having said that, the majority of the time, it will be quicker to get all the data in one query, pop it into an array or object and refer to it from there.
The caveat is whether you can pull all your data that you need in one query as quickly as running the five individual ones. That is where the performance of the query itself comes into play.
Sometimes a query that contains a subquery or two will actually be less time efficient than running a few queries individually.
My suggestion is to test it out. Get a query together that gets all the data you need, see how long it takes to execute. Time each of the other five queries and see how long they take combined. If it is almost identical, stick the output into an array and that will be more efficient due to not having to make frequent connections to the database itself.
If however, your combined query takes longer to return data (it might cause a full table scan instead of using indexes for example) then stick to individual ones.
Lastly, if you are going to use the same data over and over - an array or object will win hands down every single time as accessing it will be much faster than getting it from a database.
OK - I did some benchmarking and was surprised to find that putting things into an array rather than using individual queries was, on average, 10-15% SLOWER.
I think the reason for this was because, even if I filtered out the "uncommon" elements, inevitably there was always going to be unused elements as a matter of course.
With the individual queries I am only ever getting out what I need and as the queries are so simple I think I am best sticking with that method.
This works for me, of course in other situations where the individual queries are more complex, I think the method of storing common data in an array would turn out to be more efficient.
Agree with what everybody says here.. it's all about the numbers.
Some additional tips:
Try to create a single memory array which holds the minimum you require. This means removing most of the obvious redundancies.
There are standard approaches for these issues in performance critical environments, like using memcached with mysql. It's a bit overkill, but this basically lets you allocate some external memory and cache your queries there. Since you choose how much memory you want to allocate, you can plan it according to how much memory your system has.
Just play with the numbers. Try using separate queries (which is the simplest approach) and stress your PHP script (like calling it hundreds of times from the command-line). Measure how much time this takes and see how big the performance loss actually is.. Speaking from my personal experience, I usually cache everything in memory and then one day when the data gets too big, I run out of memory. Then I split everything to separate queries to save memory, and see that the performance impact wasn't that bad in the first place :)
I'm with Fluffeh on this: look into other options at your disposal (joins, subqueries, make sure your indexes reflect the relativity of the data -but don't over index and test). Most likely you'll end up with an array at some point, so here's a little performance tip, contrary to what you might expect, stuff like
$all = $stmt->fetchAll(PDO::FETCH_ASSOC);
is less memory efficient compared too:
$all = array();//or $all = []; in php 5.4
while($row = $stmt->fetch(PDO::FETCH_ASSOC);
{
$all[] = $row['lang_string '];
}
What's more: you can check for redundant data while fetching the data.
My answer is to do something in between. Retrieve all strings for a lang_id that are shorter than a certain length (say, 100 characters). Shorter text strings are more likely to be used in multiple places than longer ones. Cache the entries in a static associative array in get_lang_string(). If an item isn't found, then retrieve it through a query.
I am currently at the point in my site/application where I have had to put the brakes on and think very carefully about speed. I think these speed tests mentioned should consider the volume of traffic on your server as an important variable that will effect the results. If you are putting data into javascript data structures and processing it on the client machine, the processing time should be more regular. If you are requesting lots of data through mysql via php (for example) this is putting demand on one machine/server rather than spreading it. As your traffic grows you are having to share server resources with many users and I am thinking that this is where getting JavaScript to do more is going to lighten the load on the server. You can also store data in the local machine via localstorage.setItem(); / localstorage.getItem(); (most browsers have about 5mb of space per domain). If you have data in database that does not change that often then you can store it to client and then just check at 'start-up' if its still in date/valid.
This is my first comment posted after having and using the account for 1 year so I might need to fine tune my rambling - just voicing what im thinking through at present.
How would you temporarily store several thousands of key => value or key => array pairs within a single process. Lookups on key will be done continuously within the process, and the data is discarded when the process ends.
Should i use arrays? temporary MySQL tables? Or something in between?
It depends on how many several thousands mean and how big the array gets in the memory. If you can handle it in PHP, you should do it, because the usage of mysql creates a little overhead here.
But if you are on a shared host, or you have limited memory_limit in the php.ini and can't increase it you can use a temporary table in MySQL.
Also you can use some simple and fast key value storage like Memcached or Redis, they can also work in Memory only, and have a real fast lookup of keys (Redis promises Time Complexity of O(1))
Several thousand?! You mean it could take up several KILObytes?!
Are you sure this is going to be an issue? Before optimizing, write the code the simplest, straightforward way, and check later what really needs optimalization. Also, only having the benchmark and the full code will you be able to decide on the proper way of caching. Everything else is a waste of time and the root of all evil...
Memcached is a popular way of caching data.
If you're only running that one process and don't need to worry about concurrent access, I would do it inside php. If you have multiple processes I would use some established solution so you don't have to worry about the details.
It all depends on your application and your hardware. My bet, is to let databases do (especially MySQL) just Databases' work. I mean, not to much work than store and retrieve data. Other DBMS may be real efficient (Informix, for example) but sadly, MySQL is not.
Temporary tables may be more efficient than PHP arrays, but you increase the number of connections tu the DB.
Scalability is an issue too. Doing it in PHP is better in that way.
It is kind of difficult to give a straight answer if we don't get the complete picture.
It depens where you source data is.
If your data is in the database, you better keep it there and manipulate it there and just get the items you need. Use temp tables if necessarily
If you data is already in PHP you probably better keep in there. Although handling data in PHP is quite intensive
If the data lookup will be done with only few queries do it with mysql temporary table.
If there will be many data lookups its almost always best to store it in php side. (connection overhead)