What are some simple optimization tips? - php

I mean, things like doing:
$length=count($someArray);
for($i=0; $i<$length; $i++){
//Stuff
}
rather than:
for($i=0; $i<count($someArray); $i++){
//Stuff
}
so that it doesn't have to calculate the length of the array every time it loops.
Does anyone else have any tips like these that are pretty simple concepts but improve performance?

Consider having a look here:
http://tipsandtricks.runicsoft.com/General/Performance.html
I think hat covers almost all of the generic improvement tips.
After that, pay special attention to point 5: Know your language as many optimizations can be achieved in a language-dependent manner.

The Best tip i can think of
Don't
At least not until you understand the actual performance characteristics of your program!
The thing is, 99% of compilers are going to make that optimization for you anyway. And even if they didn't, the performance gain from that example alone is going to be completely unnoticeable on most platforms in most situations.
Write code that makes the most since/expresses what is going and uses good algorithms first. If and only if you have performance issues should you go back and investigate why after.

Three simple things:
1.) Do less, less often
2.) Do allocate less memory, less often
3.) Think hard about your algorithms and database requests and layout
to 1.) Just like what you did here, calling count less often, this can be done on large scale also. Stuff like, why requesting more than 20 rows from a database when you can only show 15 at a time if the user dos not scroll.
to 2.)
// Allocate here what you will need, eventually reusing, reinitializing it
for($i=0; $i<count($someArray); $i++){
// Only allocate here what can not be avoided by reusing
}
to 3.)
There are slow and fast algorithms, if you handle something in a loop, make sure you understand the implications. For a database, make sure you have the right structure for your specific needs (i.e. relational)
But there is one major rule in professional development:
Premature optimization is the root of all evil -- DonaldKnuth
Yes, programming clean and avoiding the badest performance issues upfront is good. But at number one there is always understandability, readability and maintainability. And the "you ain't gonna need it" principle. If your code is readable and well structured, and you have no lag, nor any complaints from users or the server admins or a profiler tool shows wierd things are going on. Do not "optimize". Without profiling and a known reason you never know if you are really fixing the bottleneck or wasting time and money on the wrong place. In worst case messing up perfectly understandable code for nothing.
Here is a nice discussion on that topic http://c2.com/cgi/wiki?PrematureOptimization

Related

Using C for calculation in PHP application: is it worth it?

I have a PHP application where sometimes heavy calculations are needed (I search for operations recorded by the users and make lots of economics analysis in long periods of time).
I'd like to improve the speed of these calculations, is it worth it to rewrite these calculations parts in C? (Among the faster languages here, C is the one I know the most).
I had already decided doing this, but when I was looking for "how to do it" I found this Stack Overflow question. There someone commented "Why not just write the whole site/page using either PHP or C?" and I know I need extra info.
If you are really worried about performance, measure first if a PHP (or other) implementation is fast enough. Possibly you will find out that there is no need to worry. If it is really heavy calculations (and there is a chance they will grow in complexity as your application evolves), it could be worth to run the calculations asynchronously in a separate backend service. For example your PHP frontend could dispatch to a C/C++ service, which eventually places results in a database. This requires lots of extra logic, somebody (the client) will have to poll regularly, but scales nicely.
There is other points to be considered than performance: if your math is complex and keeps growing, PHP may not be a good environment to formulate it. Then again, maybe a Java-based stack with a clear separation of frontend and business logic could be better just from a maintenance point of view.

PHP Optimization - Reducing memory usage

I'm running Eclipse in Linux and I was told I could use Xdebug to optimize my program. I use a combination algorithm in my script that takes too long too run.
I am just asking for a starting point to debug this. I know how to do the basics...break points, conditional break points, start, stop, step over, etc... but I want to learn more advanced techniques so I can write better, optimized code.
The first step is to know how to calculate the asymptotic memory usage, which means how much the memory grows when the problem gets bigger. This is done by saying that one recursion takes up X bytes (X = a constant, the easiest is to set it to 1). Then you write down the recurrence, i.e., in what manner the function calls itself or loops and try to conclude how much the memory grows (is it quadratic to the problem size, linear or maybe less?)
This is taught in elementary computer science classes at the universities since it's really useful when concluding how effective an algorithm is. The exact method is hard to describe in a simple forum post, so I recommend you to pick up a book on algorithms (I recommend "Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein - MIT Press).
But if you don't have a clue about this type of work, start by using get_memory_usage and echoing how much memory you're using in your loop/recursion. This can give you a hint about were the problem is. Try to reduce the amount of things you keep in memory. Throw away everything you don't need (for example, don't build up a giant array of all data if you can boil it down to intermediary values earlier).

functions vs repeated code

I am writing some PHP code to create PDFs using the FPDF library. And I basically use the same 4 lines of code to print every line of the document. I was wondering which is more efficient, repeating these 4 lines over and over, or would making it into a function be better? I'm curious because it feels like a function would have a larger overhead becuse the function would only be 4 lines long.
The code I am questioning looks like this:
$pdf->checkIfPageBreakNeeded($lineheight * 2, true);
$text = ' label';
$pdf->MultiCell(0, $lineheight, $text, 1, 'L', 1);
$text = $valueFromForm;
$pdf->MultiCell(0, $lineheight, $text, 1, 'L');
$pdf->Ln();
This should answer it:
http://en.wikipedia.org/wiki/Don%27t_repeat_yourself
and
http://www.codinghorror.com/blog/2007/03/curlys-law-do-one-thing.html
Curly's Law, Do One Thing, is
reflected in several core principles
of modern software development:
Don't Repeat Yourself
If you have more than one way to express the same thing, at some point
the two or three different
representations will most likely fall
out of step with each other. Even if
they don't, you're guaranteeing
yourself the headache of maintaining
them in parallel whenever a change
occurs. And change will occur. Don't
repeat yourself is important if you
want flexible and maintainable
software.
Once and Only Once
Each and every declaration of behavior should occur once, and only
once. This is one of the main goals,
if not the main goal, when refactoring
code. The design goal is to eliminate
duplicated declarations of behavior,
typically by merging them or replacing
multiple similar implementations with
a unifying abstraction.
Single Point of Truth
Repetition leads to inconsistency and code that is subtly
broken, because you changed only some
repetitions when you needed to change
all of them. Often, it also means that
you haven't properly thought through
the organization of your code. Any
time you see duplicate code, that's a
danger sign. Complexity is a cost;
don't pay it twice.
Rather than asking yourself which is more efficient you should instead ask yourself which is more maintainable.
Writing a function is far more maintainable.
I'm curious because it feels like a
function would have a larger overhead
becuse the function would only be 4
lines long.
This is where spaghetti comes from.
Defininely encapsulate it into a function and call it. The overhead that you fear is the worst kind of premature optimization.
DRY - Don't Repeat Yourself.
Make it a function. Function call overhead is pretty small these days. In general you'll be able to save far more time by finding better high-level algorithms than fiddling with such low-level details. And making and keeping it correct is far easier with such a function. For what shall it profit a man, if he shall gain a little speed, and lose his program's correctness?
A function is certainly preferable, especially if you have to go back later to make a change.
Don't worry about overhead; worry about yourself, a year in the future, trying to debug this.
In the light of the above, Don't Repeat Yourself and make a tiny function.
In addition to all the valuable answers about the far more important topic of maintainability; I'd like to add a little something on the question of overhead.
I don't understand why you fear that a four line function would have a greater overhead.
In a compiled language, a good compiler would probably be able to inline it anyway, if appropriate.
In an interpreted language (such as PHP) the interpreter has to parse all of this repeated code each time it is encountered, at runtime. To me, that suggests that repetition might carry an even greater overhead than a function call.
Worrying about function call overhead here is ghastly premature optimisation. In matters like this, the only way to really know which is faster, is to profile it.
Make it work, make it right, make it fast. In that order.
The overhead is actually very small and wont be causing a big difference in your application.
Would u rather these small overhead, but have a easier program to maintain, or u want to save the mere millisecond but take hours to correct small changes which are repeated.
If you ask me or other developer out there, we definitely want the 1st option.
So go on with the function. U may not be maintaining the code today, but when u do, u will hate yourself for trying to save that mere milliseconds

Reusing MySQL results

I'm having somewhat theoretical question: I'm designing my own CMS/app-framework (as many PHP programmers on various levels did before... and always will) to either make production-ready solution or develop various modules/plugins that I'll use later.
Anyway, I'm thinking on gathering SQL connections from whole app and then run them on one place:
index.php:
<?php
include ('latestposts.php');
include ('sidebar.php');
?>
latestposts.php:
<?php
function gather_data ($arg){ $sql= ""; }
function draw ($data) {...}
?>
sidebar.php:
<?php
function gather_data ($arg){ $sql= ""; }
function draw ($data) {...}
?>
Now, while whole module system application is yet-to-be-figured, it's idea is already floating somewhere in my brain. However, I'm thinking, if I'm able to first load all gather_data functions, then run sql and then run draw functions - and if I'm able to reuse results!
If, in example, $sql is SELECT * FROM POSTS LIMIT 10 and $sql2 is SELECT * FROM POSTS LIMIT 5, is it possible to program PHP to see: "ah, it's the same SQL, I'll call it just once and reuse the first 5 rows"?
Or is it possible to add this behavior to some DRM?
However, as tags say, this is still just an idea in progress. If it proves to be easy to accomplish, then I will post more question how :)
So, basically: Is it possible, does it make sense? If both are yes, then... any ideas how?
Don't get me wrong, that sounds like a plausible idea and you can probably get it running. But I wonder if it is really going to be beneficial. Will it cause a system to be faster? Give you more control? Make development easier?
I would just look into using (or building) a system using well practiced MVC style coding standards, build a good DB structure, and tweak the heck out of Apache (or use something like Lighttpd). You will have a lot more widespread acceptance of your code if you ever decide to make it open source, and if you ever need a hand with it another developer could step right in and pick up the keyboard.
Also, check out query caching in MySQL--you will see a similar (though not one-to-one) benefit from caching your query results server side with regard to your query example. Even better that is stored in server memory so PHP/MySQL overhead is dropped AND you don't have to code it.
All of that aside, I do think it is possible. =)
Generally speaking, such a cache system can generate significant time savings, but at the cost of memory and complexity. The more results you want to keep, the more memory it will take; and there's no guarantee that your results will ever be used again, particularly the larger result sets.
Second, there are certain queries that should never be cached, or that should be run again even if they're in the cache. For the most part, only SELECT and SHOW queries can be cached effectively, but you need to worry about invalidating them when you modify the underlying data. Even in the same pageview, you might find yourself working around your own cache system on occasion.
Third, this kind of problem has already been solved several times. First, consider turning on the MySQL query cache. Most of the time, it will speed things up a bit without requiring any code changes on your end. However, it's a bit aggressive about invalidating entries, so you could gain some performance at a higher level.
If you need another level, consider memcached. You'll have to store and invalidate entries manually, but it can store results across page views (where you'll really find the performance benefit), and will let unused entries expire before running out of memory.

What are the rules of thumb to follow when building highly efficient PHP/MySQL programs?

A few minutes ago, I asked whether it was better to perform many queries at once at log in and save the data in sessions, or to query as needed. I was surprised by the answer, (to query as needed). Are there other good rules of thumb to follow when building PHP/MySQL multi-user apps that speed up performance?
I'm looking for specific ways to create the most efficient application possible.
hashing
know your hashes (arrays/tables/ordered maps/whatever you call them). a hash lookup is very fast, and sometimes, if you have O(n^2) loops, you may reduce them to O(n) by organizing them into an array (keyed by primary key) first and then processing them.
an example:
foreach ($results as $result)
if (in_array($result->id, $other_results)
$found++;
is slow - in_array loops through the whole $other_result, resulting in O(n^2).
foreach ($other_results as $other_result)
$hash[$other_result->id] = true;
foreach ($results as $result)
if (isset($hash[$result->id]))
$found++;
the second one is a lot faster (depending on the result sets - the bigger, the faster), because isset() is (almost) constant time. actually, this is not a very good example - you could do this even faster using built in php functions, but you get the idea.
optimizing (My)SQL
mysql.conf: i don't have any idea how much performance you can gain by optimizing your mysql configuration instead of leaving the default. but i've read you can ignore every postgresql benchmark that used the default configuration. afaik with configuration matters less with mysql, but why ignore it? rule of thumb: try to fit the whole database into memory :)
explain [query]: an obvious one, a lot of people get wrong. learn about indices. there are rules you can follow, you can benchmark it and you can make a huge difference. if you really want it all, learn about the different types of indices (btrees, hashes, ...) and when to use them.
caching
caching is hard, but if done right it makes the difference (not a difference). in my opinion: if you can live without caching, don't do it. it often adds a lot of complexity and points of failures. google did a bit of proxy caching once (to make the intertubes faster), and some people saw private information of others.
in php, there are 4 different kinds of caching people regulary use:
query caching: almost always translates to memcached (sometimes to APC shared memory). store the result set of a certain query to a fast key/value (=hashing) storage engine. queries (now lookups) become very cheap.
output caching: store your generated html for later use (instead of regenerating it every time). this can result in the biggest speed-ups, but somewhat works against PHPs dynamic nature.
browser caching: what about etags and http responses? if done right you may avoid most of the work right at the beginning! most php programmers ignore this option because they have no idea what HTTP is.
opcode caching: APC, zend optimizer and so on. makes php code load faster. can help with big applications. got nothing to do with (slow) external datasources though, and the potential is somewhat limited.
sometimes it's not possible to live without caches, e.g. if it comes to thumbnails. image resizing is very expensive, but fortunatley easy to control (most of the time).
profiler
xdebug shows you the bottlenecks of your application. if your app is too slow, it's helpful to know why.
queries in loops
there are (php-)experts who do not know what a join is (and for every one you educate, two new ones without that knowledge will surface - and they will write frameworks, see schnalles law). sometimes, those queries-in-loops are not that obvious, e.g. if they come with libraries. count the queries - if they grow with the results shown, there is something wrong.
inexperienced developers do have a primal, insatiable urge to write frameworks and content management systems
schnalle's law
Optimize your MySQL queries first, then the PHP that handles it, and then lastly cache the results of large queries and searches. MySQL is, by far, the most frequent bottleneck in an application. A poorly designed query can take two to three times longer than a well designed query that only selects needed information.
Therefore, if your queries are optimized before you cache them, you have saved a good deal of processing time.
However, on some shared hosts caching is file-system only thanks to a lack of Memcached. In this instance it may be better to run smaller queries than it is to cache them, as the seek time of the hard drive (and waiting for access due to other sites) can easily take longer than the query when your site is under load.
Cache.
Cache.
Speedy indexed queries.

Categories