inner workings of PHP (really long PHP script)

inner workings of PHP (really long PHP script) - php

I have a really long php script for just 1 page i.e. something like:
mywebsite.com/page.php?id=99999
I have about 10000-20000 cases of the id, each with a different settings. Will this slow down my website significantly?
i.e. my question is really along the lines of, what happens when php is executed. does the server execute it and display the results, or does the client's computer download it, execute it and display the results.
if its the latter, does it mean a really slow load time? each of the 10000-20000 cases has about 20-25 lines of code after it.
thanks, xoxo

A PHP file is usually processed (interpreted) on the web server and the output is passed to the client.
If the website is slow or not, that totally depends on what the PHP script actually does. However, a PHP file with 10000-20000 cases sounds really, really bad code-wise. Yet, it might perform well for your case (pardon the pun).
Everything comes down to what code is actually performed: Do you just print out different text depending on the given id or do you run a really expensive operation (eg. create a zip file, download stuff, compute PI to the last decimal, ...)?

10,000 to 20,000 distinct cases sounds like a nightmare. Although it's technically possible, I find it hard to believe that your processing needs require that level of granularity.
Is the processing in each of the 10,000 to 20,000 cases really so different that it needs completely separate testing and handling? Aren't there cases similar enough to be handled in a similar way?
For example, if the processing for case $x = 5 is something like:
echo 5;
And the processing for case $x = 10 is something like:
echo 10;
Then these could be grouped into a single test and single handler:
function dumbEcho($x){
echo $x;
}
function isDumbEchoAble($x){
return in_array($x, array(5,10));
}
if (isDumbEchoAble($x)){
dumbEcho($x);
}
For each structurally similar set of processing, you could create a isXXXAble() function to test and an XXX() function to process. [Of course, this is just a simple example, intended to demonstrate a principle, a concept, not necessarily code that you can copy/paste into your current situation.]
The essence of programming - IMHO - is to find these structural similarities, find a parameterization sufficient to handle the unique cases, and then apply this paramaterized processing to those cases.

Related

PHP efficiency question

I am working on website and I am trying to make it fast as much as possible - especially the small things that can make my site a little bit quicker.
So, my to my question - I got loop that run 5 times and in each time it echo something, If I'll make variable and the loop will add the text I want to echo into the variable and just in the end I'll echo the variable - will it be faster?
loop 1 (with the echo inside the loop)
for ($i = 0;$i < 5;$i++)
{
echo "test";
}
loop 2 (with the echo outside [when the loop finish])
$echostr = "";
for ($i = 0;$i < 5;$i++)
{
$echostr .= "test";
}
echo $echostr;
I know that loop 2 will increase a bit the file size and therfore the user will have to download more bytes but If I got huge loop will it be better to use second loop or not?
Thanks.

The difference is negligible. Do whatever is more readable (which in this case is definitely the first case). The first approach is not a "naive" approach so there will be no major performance difference (it may actually be faster, I'm not sure). The first approach will also use less memory. Also, in many languages (not sure about PHP), appending to strings is expensive, and therefore so is concatenation (because you have to seek to the end of the string, reallocate memory, etc.).
Moreover, file size does not matter because PHP is entirely server-side -- the user never has to download your script (in fact, it would be scary if they did/could). These types of things may matter in Javascript but not in PHP.
Long story short -- don't write code constantly trying to make micro-optimizations like this. Write the code in the style that is most readable and idiomatic, test to see if performance is good, and if performance is bad then profile and rewrite the sections that perform poorly.
I'll end on a quote:
"premature emphasis on efficiency is a big mistake which may well be the source of most programming complexity and grief."
- Donald Knuth

This is a classic case of premature optimization. The performance difference is negligible.
I'd say that in general you're better off constructing a string and echoing it at the end, but because it leads to cleaner code (side effects are bad, mkay?) not because of performance.
If you optimize like this, from the ground up, you're at risk of obfuscating your code for no perceptable benefit. If you really want your script to be as fast as possible then profile it to find out where the real bottlenecks are.
Someone else mentioned that using string concatenation instead of an immediate echo will use more memory. This isn't true unless the size of the string exceeds the size of output buffer. In any case to actually echo immediately you'd need to call flush() (perhaps preceded by ob_flush()) which adds the overhead of a function call*. The web server may still keep its own buffer which would thwart this anyway.
If you're spending a lot of time on each iteration of the loop then it may make sense to echo and flush early so the user isn't kept waiting for the next iteration, but that would be an entirely different question.
Also, the size of the PHP file has no effect on the user - it may take marginally longer to parse but that would be negated by using an opcode cache like APC anyway.
To sum up, while it may be marginally faster to echo each iteration, depending on circumstance, it makes the code harder to maintain (think Wordpress) and it's most likely that your time for optimization would be better spent elsewhere.
* If you're genuinely worried about this level of optimization then a function call isn't to be sniffed at. Flushing in pieces also implies extra protocol overhead.

The size of your PHP file does not increase the size of the download by the user. The output of the PHP file is all that matters to the user.
Generally, you want to do the first option: echo as soon as you have the data. Assuming you are not using output buffering, this means that the user can stream the data while your PHP script is still executing.

The user does not download the PHP file, but only its output, so the second loop has no effect on the user's download size.
It's best not to worry about small optimizations, but instead focus on quickly delivering working software. However, if you want to improve the performance of your site, Yahoo! has done some excellent research: developer.yahoo.com/performance/rules.html

The code you identify as "loop 2" wouldn't be any larger of a file size for users to download. String concatination is faster than calling a function like echo so I'd go with loop 2. For only 5 iterations of a loop I don't think it really matters all that much.
Overall, there are a lot of other areas to focus on such as compiling PHP instead of running it as a scripted language.
http://phplens.com/lens/php-book/optimizing-debugging-php.php

Your first example would, in theory, be fastest. Because your provided code is so extremely simplistic, I doubt any performance increase over your second example would be noticed or even useful.
In your first example the only variable PHP needs to initialize and utilize is $i.
In your second example PHP must first create an empty string variable. Then create the loop and its variable, $i. Then append the text to $echostr and then finally echo $echostr.

php speed with if statements

So I'm working on a project written in old-style (no OOP) PHP with no full rewrite in the near future. One of the problems with it currently is that its slow—much of the time is spent requireing over 100 files based on where it is in the boot process.
I was wondering if we could condense this (on deployment, not development of course) into a single file or two with all the require'd text just built in. However, since there are so many lines of code that aren't used for each page, I'm wondering if doing this would backfire.
At its core, I think, it's a question of whether:
<?php
echo 'hello world!';
?>
is any faster than
<?php
if(FALSE) {
// thousands of lines of code here
}
echo 'hello world!';
?>
And if so, how much slower?
(Also, if what I've outlined above is a bad idea for some other reasons, please let me know.)

The difference between the two will be negligible. If most of the execution time is currently spent requiring files you're likely to see a significant boost by using an optcode cache like APC, if you are not already.
Other than that - benchmark, find out exactly where the bottlenecks are. In my experience requires are often the slowest part of an old-style procedural PHP app, but even with many included files I'd be surprised if these all added up to a 'slow' app.
Edit: ok, a quick 'n dirty benchmark. I created three 'hello world' PHP scripts like the example. The first (basic.php) was just echoing the string. The second (complex.php) included an if false statement that contained ~5000 lines of PHP code pasted in from another app. The third (require.php) included the same if statement but required in the ~5000 lines of code from another file.
Page generation time (as measured by microtime()) between basic.php and complex.php was around ~0.000004 seconds, so really not significant. Some more comprehensive results from apache bench:
without APC with APC
req/sec avg (ms) req/sec avg (ms)
basic.php: 7819.87 1.277 6960.49 1.437
complex.php: 346.82 2.883 352.12 2.840
require.php: 6819.24 1.446 5995.49 1.668
APC's not doing a lot here but using up memory, but it's likely to be a different picture in a real world app.

require does have some overhead. 100 requires is probably a lot. Parsing an entire file that has the 100 includes is probably slow too. The overhead from require might cost you more, but it is hard to say. It might not cost you enough.
All benchmarks are evil, but here is what I did:
ran a single include of a file that was about 8000 lines (didn't do anything useful each line, just declares a variable). Compared to the time it takes to run an include of an 80 line file (same declarations) 100 times. Results were inconclusive.
Is the including of the files really causing the problem? Is there not something in the script execution that can be optimized? Caching may be an option..

Keep in mind that PHP will parse all the code it sees, even if it's not run.
It will still take relatively long to process the a file too, and from experience, lots of code will eat up considerable amounts of memory even though they're not executed.
Opcode caching as suggested by #Tim should be your first port of call.
If that is out of the question (e.g. due to server limitations): If the functions are somehow separable into categories, one possibility to make things a bit faster and lighter could be (ab)using PHP's Autoloading by putting the functions into separate files as methods of static classes.
function xyz() { ... }
would become
class generic_tools
{
public static function xyz() { ... }
}
and any call to xyz() is replaced by generic_tools::xyz();
The call would then trigger the inclusion of (e.g.) generic_tools.class.php on demand, instead of including everything at once.
This would require rewriting the function calls to static method calls, which may be dead easy or a bit more difficult (if function calls are cooked up dynamically or something). But beyond that, no refactoring would be needed, because you're not really using any OOP mechanisms.
How much this will actually help strongly depends on the app's architecture and how intertwined the functions are with each other.

Best approach for running an "endless" process monitoring MySQL?

I have a process that has to be ran against certain things and it isn't suitable to be ran at the users end (15+ seconds to process) so I considered using a cron job but again, this is also unsuitable because it will create a back log. I have narrowed my options down to either running an endless process that monitors for mysql changes, or configuring mysql to trigger the script when it detects a change but the latter is not something I want to get into unless it's my only option, which leaves me with the "endless" monitoring option.
The sort of thing I'm considering with PHP is:
while (true) {
$db->query('SELECT * FROM database');
while($row = $db->fetch_assoc()){
// do the stuff here
}
sleep(5);
}
and then running it via the command line. Now this is theoretically sound but in practice it isn't doing as well as I hoped, using more resources than I would expect (but not out of my range, just not what I'm aiming for optimally). So my questions are as follows:
Is PHP the wrong language to do this in? PHP is what I work with, but I understand that there are times when it's the wrong choice and I think maybe this is. If it is, what language should I use?
Is there a better approach that I haven't considered and that isn't any of the ideas I have listed?
If PHP is the correct option, how can I optimise the code I posted, is there a method better than sleeping for 5 seconds after each completed operation?
Thanks in advance! I'm open to any ideas as long as they're not too far out there, I'm running my own server with free reign so there's no theoretical limit on what I can do.

I recommend moving the loop out into a shell script and then executing a new PHP process for every iteration. This way PHP will never use unbounded resources (even if there is a memory/connection leak somewhere) since the process is terminated on each iteration. Something like the following should be fine (Bash):
while true; do
php /path/to/your/script.php 2>&1 | logger ...(logger options)
sleep 5
done
I've found this approach to be far more robust for long-running scripts in PHP, probably because this is very like the way PHP operates when run as a CGI script.

You should always work with the language you're most familiar with. If this is PHP, then it's not a wrong choice.
Disconnect from the database before sleeping. This way your script won't keep a connection reserved, and it will work fine even after database restart.
Free mysql result after using it. Always check for error conditions in daemonized processes, and deal with them appropriately.

PHP might be the wrong language as it's really designed for serving requests on an ad-hoc basis, rather than creating long-running daemons. (It was originally created as a preprocessor language, then later on came into general use as a web application language.)
Something like Python might work better for your needs; it's a little more naturally designed for "daemon-like" programs.
That said, it is possible to do what you want in PHP.

what kind of problems are you experiencing?
i dont know about the database class you have there in $db, but it could generate a memory leak.
furthermore i would suggest closing all your connections and unsetting all your variables if necessary at the end of the loop and re open on the beginning!
if its only 5 second sleep maby only on every 10th interation or something. you can do a counter for that...
theese points considered theres nothing wrong with this approach.

Reusing MySQL results

I'm having somewhat theoretical question: I'm designing my own CMS/app-framework (as many PHP programmers on various levels did before... and always will) to either make production-ready solution or develop various modules/plugins that I'll use later.
Anyway, I'm thinking on gathering SQL connections from whole app and then run them on one place:
index.php:
<?php
include ('latestposts.php');
include ('sidebar.php');
?>
latestposts.php:
<?php
function gather_data ($arg){ $sql= ""; }
function draw ($data) {...}
?>
sidebar.php:
<?php
function gather_data ($arg){ $sql= ""; }
function draw ($data) {...}
?>
Now, while whole module system application is yet-to-be-figured, it's idea is already floating somewhere in my brain. However, I'm thinking, if I'm able to first load all gather_data functions, then run sql and then run draw functions - and if I'm able to reuse results!
If, in example, $sql is SELECT * FROM POSTS LIMIT 10 and $sql2 is SELECT * FROM POSTS LIMIT 5, is it possible to program PHP to see: "ah, it's the same SQL, I'll call it just once and reuse the first 5 rows"?
Or is it possible to add this behavior to some DRM?
However, as tags say, this is still just an idea in progress. If it proves to be easy to accomplish, then I will post more question how :)
So, basically: Is it possible, does it make sense? If both are yes, then... any ideas how?

Don't get me wrong, that sounds like a plausible idea and you can probably get it running. But I wonder if it is really going to be beneficial. Will it cause a system to be faster? Give you more control? Make development easier?
I would just look into using (or building) a system using well practiced MVC style coding standards, build a good DB structure, and tweak the heck out of Apache (or use something like Lighttpd). You will have a lot more widespread acceptance of your code if you ever decide to make it open source, and if you ever need a hand with it another developer could step right in and pick up the keyboard.
Also, check out query caching in MySQL--you will see a similar (though not one-to-one) benefit from caching your query results server side with regard to your query example. Even better that is stored in server memory so PHP/MySQL overhead is dropped AND you don't have to code it.
All of that aside, I do think it is possible. =)

Generally speaking, such a cache system can generate significant time savings, but at the cost of memory and complexity. The more results you want to keep, the more memory it will take; and there's no guarantee that your results will ever be used again, particularly the larger result sets.
Second, there are certain queries that should never be cached, or that should be run again even if they're in the cache. For the most part, only SELECT and SHOW queries can be cached effectively, but you need to worry about invalidating them when you modify the underlying data. Even in the same pageview, you might find yourself working around your own cache system on occasion.
Third, this kind of problem has already been solved several times. First, consider turning on the MySQL query cache. Most of the time, it will speed things up a bit without requiring any code changes on your end. However, it's a bit aggressive about invalidating entries, so you could gain some performance at a higher level.
If you need another level, consider memcached. You'll have to store and invalidate entries manually, but it can store results across page views (where you'll really find the performance benefit), and will let unused entries expire before running out of memory.

Singular Value Decomposition (SVD) in PHP

I would like to implement Singular Value Decomposition (SVD) in PHP. I know that there are several external libraries which could do this for me. But I have two questions concerning PHP, though:
1) Do you think it's possible and/or reasonable to code the SVD in PHP?
2) If (1) is yes: Can you help me to code it in PHP?
I've already coded some parts of SVD by myself. Here's the code which I made comments to the course of action in. Some parts of this code aren't completely correct.
It would be great if you could help me. Thank you very much in advance!

SVD-python
Is a very clear, parsimonious implementation of the SVD.
It's practically psuedocode and should be fairly easy to understand
and compare/draw on for your php implementation, even if you don't know much python.
SVD-python
That said, as others have mentioned I wouldn't expect to be able to do very heavy-duty LSA with php implementation what sounds like a pretty limited web-host.
Cheers
Edit:
The module above doesn't do anything all by itself, but there is an example included in the
opening comments. Assuming you downloaded the python module, and it was accessible (e.g. in the same folder), you
could implement a trivial example as follow,
#!/usr/bin/python
import svd
import math
a = [[22.,10., 2., 3., 7.],
[14., 7.,10., 0., 8.],
[-1.,13.,-1.,-11., 3.],
[-3.,-2.,13., -2., 4.],
[ 9., 8., 1., -2., 4.],
[ 9., 1.,-7., 5.,-1.],
[ 2.,-6., 6., 5., 1.],
[ 4., 5., 0., -2., 2.]]
u,w,vt = svd.svd(a)
print w
Here 'w' contains your list of singular values.
Of course this only gets you part of the way to latent semantic analysis and its relatives.
You usually want to reduce the number of singular values, then employ some appropriate distance
metric to measure the similarity between your documents, or words, or documents and words, etc.
The cosine of the angle between your resultant vectors is pretty popular.
Latent Semantic Mapping (pdf)
is by far the clearest, most concise and informative paper I've read on the remaining steps you
need to work out following the SVD.
Edit2: also note that if you're working with very large term-document matrices (I'm assuming this
is what you are doing) it is almost certainly going to be far more efficient to perform the decomposition
in an offline mode, and then perform only the comparisons in a live fashion in response to requests.
while svd-python is great for learning, the svdlibc is more what you would want for such heavy
computation.
finally as mentioned in the bellegarda paper above, remember that you don't have to recompute the
svd every single time you get a new document or request. depending on what you are trying to do you could
probably get away with performing the svd once every week or so, in an offline mode, a local machine,
and then uploading the results (size/bandwidth concerns notwithstanding).
anyway good luck!

Be careful when you say "I don't care what the time limits are". SVD is an O(N^3) operation (or O(MN^2) if it's a rectangular m*n matrix) which means that you could very easily be in a situation where your problem can take a very long time. If the 100*100 case takes one minute, the 1000*1000 case would 10^3 minutes, or nearly 17 hours (and probably worse, realistically, as you're likely to be out of cache). With something like PHP, the prefactor -- the number multiplying the N^3 in order to calculate the required FLOP count, could be very, very large.
Having said that, of course it's possible to code it in PHP -- the language has the required data structures and operations.

I know this is an old Q, but here's my 2-bits:
1) A true SVD is much slower than the calculus-inspired approximations used, eg, in the Netflix prize. See: http://www.sifter.org/~simon/journal/20061211.html
There's an implementation (in C) here:
http://www.timelydevelopment.com/demos/NetflixPrize.aspx
2) C would be faster but PHP can certainly do it.
PHP Architect author Cal Evans: "PHP is a web scripting language... [but] I’ve used PHP as a scripting language for writing the DOS equivalent of BATCH files or the Linux equivalent of shell scripts. I’ve found that most of what I need to do can be accomplished from within PHP. There is even a project to allow you to build desktop applications via PHP, the PHP-GTK project."

Regarding question 1: It definitely is possible. Whether it's reasonable depends on your scenario: How big are your matrices? How often do you intend to run the code? Is it run in a web site or from the command line?
If you do care about speed, I would suggest writing a simple extension that wraps calls to the GNU Scientific Library.

Yes it's posible, but implementing SVD in php ins't the optimal approach. As you can see here PHP is slower than C and also slower than C++, so maybe it was better if you could do it in one of this languages and call them as a function to get your results. You can find an implementation of the algorithm here, so you can guide yourself trough it.
About the function calling can use:
The exec() Function
The system function is quite useful and powerful, but one of the biggest problems with it is that all resulting text from the program goes directly to the output stream. There will be situations where you might like to format the resulting text and display it in some different way, or not display it at all.
The system() Function
The system function in PHP takes a string argument with the command to execute as well as any arguments you wish passed to that command. This function executes the specified command, and dumps any resulting text to the output stream (either the HTTP output in a web server situation, or the console if you are running PHP as a command line tool). The return of this function is the last line of output from the program, if it emits text output.
The passthru() Function
One fascinating function that PHP provides similar to those we have seen so far is the passthru function. This function, like the others, executes the program you tell it to. However, it then proceeds to immediately send the raw output from this program to the output stream with which PHP is currently working (i.e. either HTTP in a web server scenario, or the shell in a command line version of PHP).

Yes. this is perfectly possible to be implemented in PHP.
I don't know what the reasonable time frame for execution and how large it can compute.
I would probably have to implement the algorithm to get a rought idea.
Yes I can help you code it. But why do you need help? Doesn't the code you wrote work?
Just as an aside question. What version of PHP do you use?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.