I been looking at some database (MySQL) wrapper classes, A lot of them work like this,
1) Run the sql query
2) while fetching associative mysql array, they cyle through the results and add them to there own array
3) then you would run the class like this below and cycle through it's array
<?php
$database = new Database();
$result = $database->query('SELECT * FROM user;');
foreach ($result as $user){
echo $user->username;
}
?>
So my question here, is this not good on a high traffic type site? I ask because as far as I can tell, mysql is returning an array which eats memory, then you are building a new array from that array and then cycleing through the new array. Is this not good or pretty much normal?
The short answer is: it's bad, very bad.
The trouble is you pay a nasty performance hit (cycles and memory!) by iterating over the results twice. (what if you have 1000 rows returned? you'd get all of the data, loop 1000 times, keep it all in memory, and then loop over it again).
If you refactor your class a bit you can still wrap the query and the fetch, but you'll want to do the fetch_array outside the query. In this case, you can discard each row from memory as soon as your done, so you don't need to store the entire result set, and you loop just one time.
IIRC, PHP won't load the entire MySQL result set into memory, basically, when you call mysql_fetch_array you're asking for the next row in the set, which is loaded only upon asking for it, so you're not paying the memory hit for the full set (on the PHP side) just by running the original query. The whole result gets loaded into memory when you use mysql_query (thanks VolkerK), but you're still paying that CPU cost twice and that could be a substantial penalty.
The code is fine.
foreach() just moves the array pointer on each pass.
You can read all about it here:
http://php.net/foreach
For a deeper understanding, look at how pointers work in C:
http://home.netcom.com/~tjensen/ptr/ch2x.htm
Nothing is copied, iteration is almost always performed by incrementing pointers.
This kind of query is pretty much normal. It's better to fetch only a row at a time if you can, but for normal small datasets of the kind you'd get for small and paged queries, the extra memory utilisation isn't going to matter a jot.
SELECT * FROM user, however, could certainly generate an unhelpfully large dataset, if you have a lot of users and a lot of information in each user row. Try to keep the columns and number of rows selected down to the minimum, the information you're actually going to put on the page.
Related
I read everywhere that using PDOStatement::fetch() would mean you will not run out of memory, no matter how large the resultset is. So that begs the question: where are the rows stored?
PHP has to store the results somewhere when it gets them from the database. Where are these results stored? Or, are they stored in the database, and PHP has to query the database for the next row every time?
It's similar to reading a file, you open a stream and read data piece by piece, except that database internals are a way more complicated.
For example, look at the description of $driver_options PDO::prepare, so you can even set scrollable cursor in order to control the direction of reading.
PDOStatement::fetch will get you next row from your query. I don't think that it will not run out of memory (if one row contains lots of data) because your data will be held in memory (read about Buffered and Unbuffered queries).
I have a heavy script that we run plenty. Below is the algorithm used:
Load 4500 rows from database and store them as an array. (A)
Load 600000 rows from database and store them as an array. (B)
For each element in (A) looks for a match in (B).
go to next element in (A)
So the maximum amount of iteration of this script are 4500 * 60 000 which is 270,000,000 so you understand that this can be a bit sweaty for PHP.
Can I make this process more efficient somehow?
Reading the rows from a database is not really an issue it is the array iterations that bring heavy costs.
It does work pretty fast but one factor (60000) will increase greatly in the years to come.
So any ideas?
Here are a few different answers. My guess is that the first one is the right one, the
easy one and sufficient, but it's very hard to be sure.
Possible Answer 1: Use SQL
As the comments indicate it sounds awfully lot like a join. In addition your post seem
to indicate that you only take an action when a match is found and that not every element
in A have a match. This mean your SQL statement should only return the matching rows, not
all of them. It doesn't matter that you can't do everything i SQL, if you can let it
organise your data for you.
Possible Answer 2: Sort the arrays
Maybe you can sort the arrays (again, preferably let your database do this). Possibly you
can sort B so that search for a match is quicker. Or put the search value in the key of the
array so that searching is very quick. Or if you are lucky you might be able to sort both
arrays in a way that makes all A's and B's in the same order. i.e. for any A you pick you
know that the right B either do not exist or exist later in the B array.
Possible Answer 3: Explain more about the problem
You have only given us your current algorithm, not what you are actually trying to do. Most
likely iterating over everything is not the best idea, but no one can say unless they know
more about your data and what you want to do in the end.
it depends on your data, of course....
some general aspects:
this really sounds like a use case for the database queries, not the php script. Looking for matches in datasets is what databases are good at, no tricks will make php scripts play even in the same league
if you really have to use the php scripting functions try to
not to hit your allowed memory limits. your php server will just exit with an error, but if your sql site result set becomes too big your sql server may begin to write temp data to the hd, which will slow down the whole execution time -> if possible, fetch and process the data in chunks (offset, limit)
if you match whole words, build your matching array in such a way, that the search criterion is a key, not a value, so that you can use isset($potentialMatches[$searchTerm]), which is way faster than in_array($searchTerm, $potentialMatches) for larger arrays. Mockup:
while ($row = $resultSet->fetch_assoc()) {
$potentialMatches[$row['search_column']] = $row;
}
but it can't be stressed enough: the usual course to handle this would be:
do the matching DB-side
process the matches in your script
if necessary: do a new query for non-matches
if 3., process those results in your script
Our app currently works like this:
class myClass{
private $names = array();
function getNames($ids = array()){
$lookup = array();
foreach($ids as $id)
if (!isset($this->names[$id]))
$lookup[] = $id;
if(!empty($lookup)){
$result;//query database for names where id in $lookup
// now contains associative array of id => name pairs
$this->names = array_merge($this->names, $result);
}
$result = array();
foreach($ids as $id)
$result[$id] = $this->names[$id];
return $result;
}
}
Which works fine, except it can still (and often does) result in several queries (400+ in this instance).
So, I am thinking of simply querying the database and populating the $this->names array with every name from the database.
But I am concerned about how many entries in the database I should start worrying about memory when doing this? (database column is varchar(100))
How much memory do you have? And how many concurrent users does your service generally support during peak access times? These are pertinent pieces of information. Without them any answer is useless. Generally, this is a question easily solved by load testing. Then, find the bottlenecks and optimize. Until then, just make it work (within reason).
But ...
If you really want an idea of what you're looking at ...
If we assume you aren't storing multibyte characters, you have 400 names * 100 chars (assume every name maxes your char limit) ... you're looking at ~40Kb of memory. Seems way too insignificant to worry about, doesn't it?
Obviously you'll get other overhead from PHP to hold the datastructure itself. Could you store things more efficiently using a data structure like SplFixedArray instead of a plain array? Probably -- but then you're losing the highly optimized array_* functions that you'd otherwise have to manipulate the list.
Will the user be using every one of the entries you're planning to buffer in memory? If you have to have them for your application it doesn't really matter how big they are, does it? It's not a good idea to keep lots of information you don't need in memory "just because." One thing you definitely don't want to do is query the database for 4000 records on every page load. At the very least you'd need to put those types of transactions into a memory store like memcached or use APC.
This question -- like most questions in computer science -- is simply a constrained maximization problem. It can't be solved correctly unless you know the variables at your disposal.
Once you get over a thousand items or so keyed look ups start to get really slow (there is a delay when you access a specific key). You can fix that with ksort(). (I saw a script go from 15min run time down to under 2 mins just by adding a ksort)
Other then that you are really only limited by memory.
A better way would be to build an array of missing data in your script and then fetch them all in one query using an IN list.
You really shouldn't waste memory storing data the user will never see if you can help it.
I have a PHP/MySQL based web application that has internationalization support by way of a MySQL table called language_strings with the string_id, lang_id and lang_text fields.
I call the following function when I need to display a string in the selected language:
public function get_lang_string($string_id, $lang_id)
{
$db = new Database();
$sql = sprintf('SELECT lang_string FROM language_strings WHERE lang_id IN (1, %s) AND string_id=%s ORDER BY lang_id DESC LIMIT 1', $db->escape($lang_id, 'int'), $db->escape($string_id, 'int'));
$row = $db->query_first($sql);
return $row['lang_string'];
}
This works perfectly but I am concerned that there could be a lot of database queries going on. e.g. the main menu has 5 link texts, all of which call this function.
Would it be faster to load the entire language_strings table results for the selected lang_id into a PHP array and then call that from the function? Potentially that would be a huge array with much of it redundant but clearly it would be one database query per page load instead of lots.
Can anyone suggest another more efficient way of doing this?
There isn't an answer that isn't case sensitive. You can really look at it on a case by case statement. Having said that, the majority of the time, it will be quicker to get all the data in one query, pop it into an array or object and refer to it from there.
The caveat is whether you can pull all your data that you need in one query as quickly as running the five individual ones. That is where the performance of the query itself comes into play.
Sometimes a query that contains a subquery or two will actually be less time efficient than running a few queries individually.
My suggestion is to test it out. Get a query together that gets all the data you need, see how long it takes to execute. Time each of the other five queries and see how long they take combined. If it is almost identical, stick the output into an array and that will be more efficient due to not having to make frequent connections to the database itself.
If however, your combined query takes longer to return data (it might cause a full table scan instead of using indexes for example) then stick to individual ones.
Lastly, if you are going to use the same data over and over - an array or object will win hands down every single time as accessing it will be much faster than getting it from a database.
OK - I did some benchmarking and was surprised to find that putting things into an array rather than using individual queries was, on average, 10-15% SLOWER.
I think the reason for this was because, even if I filtered out the "uncommon" elements, inevitably there was always going to be unused elements as a matter of course.
With the individual queries I am only ever getting out what I need and as the queries are so simple I think I am best sticking with that method.
This works for me, of course in other situations where the individual queries are more complex, I think the method of storing common data in an array would turn out to be more efficient.
Agree with what everybody says here.. it's all about the numbers.
Some additional tips:
Try to create a single memory array which holds the minimum you require. This means removing most of the obvious redundancies.
There are standard approaches for these issues in performance critical environments, like using memcached with mysql. It's a bit overkill, but this basically lets you allocate some external memory and cache your queries there. Since you choose how much memory you want to allocate, you can plan it according to how much memory your system has.
Just play with the numbers. Try using separate queries (which is the simplest approach) and stress your PHP script (like calling it hundreds of times from the command-line). Measure how much time this takes and see how big the performance loss actually is.. Speaking from my personal experience, I usually cache everything in memory and then one day when the data gets too big, I run out of memory. Then I split everything to separate queries to save memory, and see that the performance impact wasn't that bad in the first place :)
I'm with Fluffeh on this: look into other options at your disposal (joins, subqueries, make sure your indexes reflect the relativity of the data -but don't over index and test). Most likely you'll end up with an array at some point, so here's a little performance tip, contrary to what you might expect, stuff like
$all = $stmt->fetchAll(PDO::FETCH_ASSOC);
is less memory efficient compared too:
$all = array();//or $all = []; in php 5.4
while($row = $stmt->fetch(PDO::FETCH_ASSOC);
{
$all[] = $row['lang_string '];
}
What's more: you can check for redundant data while fetching the data.
My answer is to do something in between. Retrieve all strings for a lang_id that are shorter than a certain length (say, 100 characters). Shorter text strings are more likely to be used in multiple places than longer ones. Cache the entries in a static associative array in get_lang_string(). If an item isn't found, then retrieve it through a query.
I am currently at the point in my site/application where I have had to put the brakes on and think very carefully about speed. I think these speed tests mentioned should consider the volume of traffic on your server as an important variable that will effect the results. If you are putting data into javascript data structures and processing it on the client machine, the processing time should be more regular. If you are requesting lots of data through mysql via php (for example) this is putting demand on one machine/server rather than spreading it. As your traffic grows you are having to share server resources with many users and I am thinking that this is where getting JavaScript to do more is going to lighten the load on the server. You can also store data in the local machine via localstorage.setItem(); / localstorage.getItem(); (most browsers have about 5mb of space per domain). If you have data in database that does not change that often then you can store it to client and then just check at 'start-up' if its still in date/valid.
This is my first comment posted after having and using the account for 1 year so I might need to fine tune my rambling - just voicing what im thinking through at present.
this question may seem too basic to some, but please bear with be, it's been a while since I dealt with decent database programming.
I have an algorithm that I need to program in PHP/MySQL to work on a website. It performs some computations iteratively on an array of objects (it ranks the objects based on their properties). In each iteration the algorithm runs through all collection a couple of times, accessing various data from different places of the whole collection. The algorithm needs several hundred iterations to complete. The array comes from a database.
The straightforward solution that I see is to take the results of a database query and create an object for each row of the query, put the objects to an array and pass the array to my algorithm.
However, I'm concerned with efficacy of such solution when I have to work with an array of several thousand of items because what I do is essentially mirror the results of a query to memory.
On the other hand, making database query a couple of times on each iteration of the algorithm also seems wrong.
So, my question is - what is the correct architectural solution for a problem like this? Is it OK to mirror the query results to memory? If not, which is the best way to work with query results in such an algorithm?
Thanks!
UPDATE: The closest problem that I can think of is ranking of search results by a search engine - I need to do something similar to that. Each result is represented as a row of a database and all results of the set are regarded when the rank is computed.
Don't forget, premature optimization is the root of all evil. Give it a shot copying everything to memory. If that uses too much mem, then optimize for memory.
Memory seems like the best way to go - iff you can scale up to meet it. Otherwise you'll have to revise your algorithm to maybe use a divide and conquer type of approach - do something like a merge sort.
It really depends on the situation at hand. It's probably rarely required to do such a thing, but it's very difficult to tell based off of the information you've given.
Try to isolate the data as much as possible. For instance, if you need to perform some independent action on the data that doesn't have data dependencies amongst iterations of the loop, you can write a query to update the affected rows rather than loading them all into memory, only to write them back.
In short, it is probably avoidable but it's hard to tell until you give us more information :)
If you are doing a query to the database, when the results come back, they are already "mirrored to memory". When you get your results using mysql_fetch_assoc (or equiv) you have your copy. Just use that as the cache.
Is the computation of one object dependent on another, or are they all independent? If they are independent, you could load just a small number of rows from the database, converting them to objects as you describe. Then run your hundreds of iterations on these, and then output the result for that block. You then proceed to the next block of items.
This keeps memory usage down, since you are only dealing with a small number of items rather than the whole data set, and avoids running multiple queries on the database.
The SQL keywords LIMIT and OFFSET can help you step through the data block by block.
Writing ranking queries with MySQL is possible as well, you just need to play with user-defined variables a bit. If you will provide some input data and the result you are going to achieve, the replies will be more detailed
can you use a cron job to do your ranking, say once per day, hour, or whatever you need, and then save the items ranking to a field in its row?
that way when you call your rows up you could just order them by the ranking field.