What's the difference between handling gzip with PHP and Apache? - php

How can we handle compression in both? Whose responsibility is it? Which one is better, using apache or php to compress files?
PHP compression code:
ob_start("ob_gzhandler");
Or the apache one:
AddOutputFilterByType DEFLATE text/html text/plain text/xml
Is this right that requests first reach in apache then to PHP? If answer is positive so can we infer that we should use the apache one?

Well here is what I know, presented in a pros and cons way.
Apache:
The .htaccess code will always be executed faster, because servers
cache .htaccess files by default.
With .htaccess, you can define custom rules for individual folders
and the server will automatically pick them up
With PHP, you cannot write everything in once place. There are
many other things your .htaccess should have, besides compression:
A charset, expiry/cache control, most likely a few URL re-write
rules, permissions, robot(Googlebot etc) specific stuff.
As far as I know, you cannot do all of this solely with PHP, and since you may need to get all of this done, I don't see why you should combine both of them.
I have always relied on .htaccess or server level configuration to control the above enumerated aspects, and rarely ever had a problem.
PHP:
perhaps a bit more hassle free. With .htaccess files on shared hosting planes you are rather limited and you might run into tedious
problems.
Some servers won't pick up certain commands, some(like 1and1) have a
default configuration which messes with your settings(and nerves).
probably easier to use for someone who is less of a tech person
Overall, Apache is the winner. That's what I would go with all the time!

I don't see why any of the two should be faster but do keep in mind that apache can also do the compression for css files/js files... You don't want to parse these files with php to compress them before you deliver them to the browser.
So I would suggest use the apache method.

In my company we usually use gzip compression on static resources. Apache asks PHP to process those resources (if necessary), then it compresses the output result. I would say that it is faster in theory (C & C++ are faster than PHP) and 'safer' to use Apache compression.
NB: Safer here means that the whole page is going to be compressed while you can forget to compress part of your web page with the ob_start function.

You would have to run your own tests to see which is faster, but I don't believe there should be any difference in how the content is served. Using PHP, you have to handle the output buffering on your own which may be more difficult. It's more transparent with the apache method.

Apache is better since it prevents memory limit errors of php and acts faster because of compiled code vs interpreted code in php and also it is more meaningful to do compression in a different layer than php

Related

PHP Include for static remote HTML files on a CDN

I have an app that creates static HTML files. The files are intended to be hosted on a remote CDN, they'd be standard .html files.
I am wondering two things:
If it's possible to do a PHP include on these files?
Can you possibly have good performance doing it this way?
Can it be done?
To answer the question directly, yes, you technically can include a remote file using the PHP include function. In order to do this, you simply need to set the allow_url_include directive to On in your php.ini.
Depending on exactly what you intend to use this for, I would also encourage you to look at file_get_contents.
To enable remote files for file_get_contents, you will need to set allow_url_fopen to On.
Should it be done?
To answer your second question directly, there are many factors that will determine whether you will get good performance, but all in all, it is unlikely to make a dramatic difference to performance.
However, there are other considerations:
From a security perspective, it is ill-advised to enable either of these directives
By delivering the file from your server instead of the CDN you will be negating all of the benefits of the CDN (see below)
Is it really necessary?
CDNs
A frequent misunderstanding when it comes to CDNs is that all they do is serve your data from a closer location, thus it makes the request slightly faster... This is wrong!
There are endless benefits to CDN's, but I have listed a few below (obviously depends on configuration and provider):
They strip out unnecessary headers
No cookies are sent as the CDN tends to be on a different, thus cookie-free, domain
They handle compression
They deliver your content from the nearest location
They handle caching
... and a lot more
By serving the file from your server, you will lose all of the above benefits, unless, of course, you set the server up to handle requests in the same way (this can take time).
To conclude; personally, I would avoid including your .html files into PHP remotely and just serve them directly to the client from the CDN.
To see how you can further optimise your site, and to see the many benefits that most CDNs offer, take a look at GTMetrix.

Mod_gzip and php output bufferring redundant?

I'm just learning ways to compress my website and i came upon two methods. One of them is apache's mod_gzip module and the other is php's output buffering, since they both compress files, would using both methods be redundant and unnecessary? Or is there a distinction between the two methods i should know about?
Yes, the two methods are redundant to one another. Between the two, mod_gzip is generally easier to set up and more comprehensive -- as it's part of Apache, mod_gzip can compress content (like static HTML, Javascript, and CSS) which isn't served through PHP.
Note: PHP's output buffering doesn't necessarily compress output. But yes, if you have it configured to do so, then it and mod_gzip/mod_deflate are redundant. In most cases, I prefer to leave the compression to Apache, since it will also compress things that aren't run through PHP, like JS, CSS, XML, text, etc.

Need some thoughts and advice if I need to do anything more to improve performance of my webapp

I'm working on a webapp that uses a lot of ajax to display data and I'm wondering if I could get any advice on what else I could do to speed up the app, and reduce bandwidth, etc.
I'm using php, mysql, freeBSD, Apache, Tomcat for my environment. I own the server and have full access to all config files, etc.
I have gzip deflate compression turned on in the apache http.conf file. I have obfuscated and minified all the .js and .css files.
My webapp works in this general manner. After login the user lands on the index.php page. All links on the index page are ajax calls to read a .php class function that will retrieve the html in a string and display it inside a div somewhere on the main index.php page.
Most of the functions returning the html are returning strings like:
<table>
<tr>
<td>Data here</td>
</tr>
</table>
I don't return the full "<html><head>" stuff, because it already exists in the main index.php page.
However, the html strings returned are formatted with tabs, spaces, comments, etc. for easy reading of the code. Should I take the time to minify these pages and remove the tabs, comments, spaces? Or is it negligible to minify the .php pages because its on the server?
I guess I'm trying to figure out if the way I've structured the webapp is going to cause bandwidth issues and if I can reduce the .php class file size could I improve some performance by reducing them. Most of the .php classes are 40-50KB with the largest being 99KB.
For speed, I have thought about using memcache, but don't really know if adding it after the fact is worth it and I don't quite know how to implement it. I don't know if there is any caching turned on on the server...I guess I have left that up to the browser...I'm not very well versed in the caching arena.
Right now the site doesn't appear slow, but I'm the only user...I'm just wondering if its worth the extra effort.
Any advice, or articles would be appreciated.
Thanks in advance.
My recommendation would be to NOT send the HTML over the AJAX calls. Instead, send just the underlying data ("Data here" part) through JSON, then process that data through a client-side function that would decorate it with the right HTML, then injecting it into the DOM. This will drastically speed up the Ajax calls.
Memcache provides an API that allows you to cache data. What you additionally need (and in my opinion more important is) is a strategy about what to cache and when to invalidate the cache. This cannot be determined by looking at the source code, it comes from how your site is used.
However, an opcode cache (e.g. APC) could be used right away.
Code beautifier is for human not for machine.
As part of the optimization you should take off.
Or simply add a flag checking in your application, certain condition match (like debug mode), it return nicely formatted javascript. Otherwise, whitespace does not mean anything to machine.
APC
You should always use APC to compile & cache php script into op-code.
Why?
changes are hardly make after deployment
if every script is op-code ready, your server does not required to compile plain-text script into binary op-code on the fly
compile once and use many
What are the benefits?
lesser execution cycle to compile plain-text script
lesser memory consume (both related)
a simple math, if a request served in 2 seconds in your current environment, now with APC is served in 0.5 seconds, you gain 4 times better performance, 2 seconds with APC can served 4 requests. That's mean previously you can fit 50 concurrent users, now you can allow 200 concurrent users
Memcache - NO GO?
depends, if you are in single host environment, probably not gain any better. The biggest advantages of memcache is for information sharing & distribution (which mean multiple server environment, cache once and use many).
etc?
static files with expiration header (prime cache concept, no request is fastest, and save bandwidth)
cache your expensive request into memcache/disk-cache or even database (expensive request such as report/statistics generation)
always review your code for best optimization (but do not over-do)
always do benchmark and compare the results (was and current)
fine-tune your apache/tomcat configuration
consider to re-compile PHP with minimum library/extension and load the necessary libraries during run-time only (such as application using mysqli, not using PDO, no reason to keep it)

SSI or PHP Include()?

basically i am launching a site soon and i predict ALOT of traffic. For scenarios sake, lets say i will have 1m uniques a day. The data will be static but i need to have includes aswell
I will only include a html page inside another html page, nothing dynamic (i have my reasons that i wont disclose to keep this simple)
My question is, performance wise what is faster
<!--#include virtual="page.htm" -->
or
<?php include 'page.htm'; ?>
Performance wise fastest is storing the templates elsewhere, generating the full HTML, and regenerate based on alterations in your template.
If you really want a comparison between PHP & SSI, I guess SSI is probably faster, and more important: not having PHP is a lot lighter on RAM needed on the webservers processes/threads, thereby enabling you to have more apache threads/processes to serve requests.
SSI is built in to Apache, while Apache has to spawn a PHP process to process .php files, so I would expect SSI to be somewhat faster and lighter.
I'll agree with the previous answer, though, that going the PHP route will give you more flexibility to change in the future.
Really, any speed difference that exists is likely to be insignificant in the big picture.
Perhaps you should look into HipHop for php which compiles PHP into C++. Since C++ is compiled its way faster. Facebook uses it to reduce the load on their servers.
https://github.com/facebook/hiphop-php/wiki/
I don't think anyone can answer this definitively for you. It depends on your web server configuration, operating system and filesystem choices, complexity of your SSI usage, other competing processes on your server, etc.
You should put together some sample files and run tests on the server you intend to deploy on. Use some http testing tools such as ab or siege or httperf or jmeter to generate some load and compare the two approaches. That's the best way to get an answer that's correct for your environment.
Using PHP with mod_php and an opcode cache like APC might be very quick because it would cache high-demand files automatically. If you turn off apc.stat it won't have to hit the disk at all to serve the PHP script (with the caveat that this makes it harder to update the PHP script on a running system).
You should also make sure you follow other high-scalability best practices. Use a CDN for static resources, optimize your scripts and stylesheets, etc. Get books by Steve Souders and Theo & George Schlossnagle and read them cover to cover.
I suggest you use a web cache like Squid or, for something more sophisticated, Oracle Web Cache.

How reliable are URIs like /index.php/seo_path

I noticed, that sometimes (especially where mod_rewrite is not available) this path scheme is used:
http://host/path/index.php/clean_url_here
--------------------------^
This seems to work, at least in Apache, where index.php is called, and one can query the /clean_url_here part via $_SERVER['PATH_INFO']. PHP even kind of advertises this feature. Also, e.g., the CodeIgniter framework uses this technique as default for their URLs.
The question: How reliable is the technique? Are there situations, where Apache doesn't call index.php but tries to resolve the path? What about lighttpd, nginx, IIS, AOLServer?
A ServerFault question? I think it's got more to do with using this feature inside PHP code. Therefore I ask here.
Addendum: As suggested by VolkerK, a reasonable extension to this question is: How can a programmer influence the existence of $_SERVER['PATH_INFO'] on various server types?
I think this a question which is equally suited for stackoverflow and serverfault. E.g. I as a developer can only tell you that pathinfo is as trustworthy as any user-input (which means it can contain virtually anything) and your script may or may not receive it depending on the webserver version and configuration:
Apache: AcceptPathInfo
IIS: e.g. AllowPathInfoForScriptMappings and others
and so on and on...
But server admins possibly can tell you which settings you can expect "in the real world" and why those settings are preferred.
So the question becomes: How much influence do you (or the expected userbase) have on the server configuration.
AcceptPathInfo needs to be enabled in order to have this working.
From my experience I'd say PATH_INFO is usually available in normal web hosting environments and server setups - even on IIS - but on rare occasions, it is not. When building an application that is supposed to be deployable on as many platforms as possible, I would not trust path_info on a hard-coded level.
Whenever I can, I try to build a wrapper function build_url() that, depending on a configuration setting, uses either
the raw URL www.example.com/index.php?clean_url=clean_url_here
the path_info mechanism www.example.com/index.php/clean_url
mod_rewrite www.example.com/clean_url
and use that in all URLs the application emits.
There might be naive scripts (auto-linking for example) that do not recognize this URL's format. Thereby decreasing the chance that links to your content will be created.
Since home-grown regular expression patterns are common for these tasks, the chance of failure is quite real.
Technically, those URLs are fine. SEO-wise, they are 'less perfect'.

Categories