PHP 5.5 FastCGI Caching

PHP 5.5 FastCGI Caching - php

I've implemented FastCGI caching on our site, and have seen great speed improvements. However the FastCGI cache key does not seem to be unique enough. If I login, my name appears in the header. However the next person to login still sees my name in the header, assuming the cache is still valid.
Is there a way to make the cache key unique on a per-user basis? Ideally using a unique identifier from the user's Cookies or a PHP Session? I tried implemented the answer below, but Nginx failed to restart.
Log in value from Set-Cookie header in nginx
Note my cache key looks like this:
fastcgi_cache_key "$scheme$request_method$host$request_uri";
Update:
My thought is if I can parse the HTTP headers sent to Nginx, then I can grab the PHP SESSION ID and use that. However I cannot find an example of how to do this anywhere. Right now I have something like this, which doesn't work.
http_cookie ~* PHPSESSID=([0-9a-z]+) {
set $ses_id $1;
}

I was able to solve the above problem using the Nginx ngx_http_userid_module. The hardest part was actually finding the module, implementing the solution was quite trivial.
I used their example configuration:
userid on;
userid_name uid;
userid_domain example.com;
userid_path /;
userid_expires 365d;
userid_p3p 'policyref="/w3c/p3p.xml", CP="CUR ADM OUR NOR STA NID"';
And then added the userid to my fastCGI cache key:
fastcgi_cache_key "$scheme$request_method$host$request_uri$uid_got";
Hopefully this answer helps someone discover this useful module quicker than I did.

Related

Nginx to cache only for bots

I have a decent website (nginx -> apache -> mod_php/mysql) to tune it a bit, and I find the biggest problem is that search bots used to overload it sending many requests at once.
There is a cache in site's core (that is, in PHP), so the site's author reported there should be no problem but in fact the bottleneck is that apache's reply is too long as there is too many requests for the page.
What I can imagine is to have some nginx based cache to cache pages only for bots. The TTL may be high enough (there is nothing that dynamic on page that can't wait another 5-10 minutes to be refreshed) Let's define 'bot' as any client that have 'Bot' in its UA string ('BingBot' as an example).
So I try to do something like that:
map $http_user_agent $isCache {
default 0;
~*(google|bing|msnbot) 1;
}
proxy_cache_path /path/to/cache levels=1:2 keys_zone=my_cache:10m max_size=10g inactive=60m use_temp_path=off;
server {
...
location / {
proxy_cache my_cache;
proxy_cache_bypass $isCache;
proxy_cache_min_uses 3;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
proxy_cache_lock on;
proxy_pass http://my_upstream;
}
# location for images goes here
}
Am I right with my approach? Looks like it won't work.
Any other approaches to limit load from bots? Surely without sending 5xx codes to them (as Search Engines can lower positions for sites that are too 5xx-ed).
Thank you!

If your content pages may differ (i.e. say a user is logged in and it the page contains "welcome John Doe", then that version of the page may be cached, as each request is updating the cached copy (i.e. a logged in person will update the cached version, including their session cookies, which is bad).
It is best to do something similar to the following:
map $http_user_agent $isNotBot {
~*bot "";
default "IAmNotARobot";
}
server {
...
location / {
...
# Bypass the cache for humans
proxy_cache_bypass $isNotBot;
# Don't cache copies of requests from humans
proxy_no_cache $isNotBot;
...
}
...
}
This way, only requests by a bot are cached for future bot requests, and only bots are served cached pages

Limit Page to one Simultanious Request

Using PHP or Apache (I'd also consider moving to Nginx if necessary), how can I make a page only be ran once at a time? If three requests come in at the same time, one would complete entirely, then the next, and then the next. At no time should the page be accessed by more than one request at a time.
Think of it like transactions! How can I achieve this? It should be one page per server, no user or IP.

What do you want to do with the "busy" state on server? return an error right away or keep requests in waiting until the previous finishes?
If you just want the server to refuse content to the client, you can do it on both nginx and apache:
limit_req module for nginx
using mod_security on apache
The "tricky" part in your request is not to limit by an IP as people usually want, but globally per URI. I know it should be possible with mod_security bud I didn't do that myself but I have this configuration working for nginx:
http {
#create a zone by $package id (my internal variable, similar to a vhost)
limit_req_zone $packageid zone=heavy-stuff:10m rate=1r/s;
}
then later:
server {
set $packageid some-id;
location = /some-heavy-stuff {
limit_req zone=heavy-stuff burst=1 nodelay;
}
}
what it does for me is creating N limit zones, one for each of my servers. The zone is then used to count requests and allow only 1 per second.
Hope it helps

If the same user gives the request from the same page then Use session_start() this will block the other requests until finish the first request
Example:
http://codingexplained.com/coding/php/solving-concurrent-request-blocking-in-php
If you want to block the request from different browser/client keep the entries in database and process it one by one.

Varnish doesn't cache without expire header

On my server I have Varnish (caching) running on port 80 with Apache on 8080.
Varnish caches very well when I set the headers like below:
$this->getResponse()->setHeader('Expires', '', true);
$this->getResponse()->setHeader('Cache-Control', 'public', true);
$this->getResponse()->setHeader('Cache-Control', 'max-age=2592000');
$this->getResponse()->setHeader('Pragma', '', true);
But this means people cache my website without ever retrieving a new version when its available.
When I remove the headers people retrieve a new version every page reload (so Varnish never caches).
I can not figure out what goes wrong here.
My ideal situation is people don't cache the html on the client side but leave that up to Varnish.

My ideal situation is people don't cache the html on the client side but leave that up to Varnish.
What you want is varnish to cache the resource and serve it to clients, and only generate a new version if something changed. The easiest way to do this is have varnish cache it for a long time, and invalidate the entry in varnish (with a PURGE command) when this something changed.
By default, varnish will base its cache rules on the headers the back-end supplies. So, if your php code generates the headers you described, the default varnish vcl will adjust its caching strategy accordingly. However, it can only do this in generalized, safe way (e.g. if you use a cookie, it will never cache). You know how your back-end works, and you should change the cache behavior of varnish not by sending different headers from the back-end, but write a varnish .vcl file. You should tell varnish to cache the resource for a long time even though the Cache-Control of Max-Age headers are missing (set the TimeToLive ttl in your .vcl file). Varnish will then serve the generated entry until ttl has passed or you purged the entry.
If you've got this working, there's a more advanced option: cache the resource on the client but have the client 'revalidate' it every time it want to use it. A browser does this with an HTTP GET plus If-Modified-Since header (your response should include a Date header to provoke his behavior) or If-Match header (your response should include an ETAG header to provoke his behavior). This saves bandwith because varnish can respond with a 304 NOT-MODIFIED response, without sending the whole resource again.

Simplest approach is just to turn down the max-age to something more reasonable. Currently, you have it set to 30 days. Try setting it to 15 minutes:
$this->getResponse()->setHeader('Cache-Control', 'max-age=900');
Web caching is a somewhat complicated topic, exacerbated by some very different client interpretations. But in general this will lighten the load on your web server while ensuring that new content is available in a reasonable timeframe.

Set your standard HTTP headers for the client cache to whatever you want. Set a custom header that only Varnish will see, such as X-Varnish-TTL Then in your VCL, incorporate the following code in your vcl_fetch sub:
if (beresp.http.X-Varnish-TTL) {
C{
char *ttl;
/* first char in third param is length of header plus colon in octal */
ttl = VRT_GetHdr(sp, HDR_BERESP, "\016X-Varnish-TTL:");
VRT_l_beresp_ttl(sp, atoi(ttl));
}C
unset beresp.http.X-Varnish-TTL; // Remove so client never sees this
}

How to properly set up Varnish for Symfony2 sites?

I have a website (with ESI) that uses Symfony2 reverse proxy for caching. Average response is around 100ms. I tried to install Varnish on server to try it out. I followed guide from Symfony cookbook step by step, deleted everything in cache folder, but http_cache folder was still created when I tried it out. So I figured I could try to comment out $kernel = new AppCache($kernel); from app.php. That worked pretty well. http_cache wasn't created anymore and by varnishstat, Varnish seemed to be working:
12951 0.00 0.08 cache_hitpass - Cache hits for pass
1153 0.00 0.01 cache_miss - Cache misses
That was out of around 14000 requests, so I thought everything would be alright. But after echoping I found out responses raised to ~2 seconds.
Apache runs on port 9000 and Varnish on 8080. So I echoping using echoping -n 10 -h http://servername/ X.X.X.X:8080.
I have no idea what could be wrong. Are there any additional settings needed to use Varnish with Symfony2? Or am I simply doing something wrong?
Per requests, here's my default.vcl with modifications I've done so far.
I found 2 issues with Varnish's default config:
it doesn't cache requests with cookies (and everyone in my app has session assigned)
it ignores Cache-Control: no-cache header
So I added conditions for these cases to my config and it performs fairly well now (~175 req/s up from ~160 with S2 reverse proxy - but honestly, I expected bit more). I just have no idea how to check if it's all set ok, so any inputs are welcome.
Most of pages have cache varied by cookie, with s-maxage 1200. Common ESI includes aren't varied by cookie, with s-maxage quite low (articles, article lists). User profile pages aren't cached at all (no-cache) and I'm not really sure if ESI includes on these are even being cached by Varnish. Only ESI that's varied by cookies is header with user specific information (that's on 100% of pages).
Everything in this post is Varnish 3.X specific (I'm personally using 3.0.2).
Also, after few weeks of digging into this, I have really no idea what I'm doing anymore, so if you find something odd in configs, just let me know.

I'm surprised this hasn't had a really full answer in 10 months. This could be a really useful page.
You pointed out yourself that:
Varnish doesn't cache requests with cookies
Varnish ignores Cache-Control: no-cache header
The first thing is, does everyone in your app need a session? If not, don't start the session, or at least delay starting it until it's really necessary (i.e. they log in or whatever).
If you can still cache pages when users are logged in, you need to be really careful you don't serve a user a page which was meant for someone else. But if you're going to do it, edit vcl_recv() to strip the session cookie for the pages that you want to cache.
You can easily get Varnish to process the no-cache directive in vcl_fetch() and in fact you've already done that.
Another problem which I found is that Symfony by default sets max-age to 0, which means that they won't ever get cached by the default logic in vcl_fetch
I also noticed that you had the port set in Varnish to:
backend default {
.host = "127.0.0.1";
.port = "80";
}
You yourself said that Apache is running on port 9000, so this doesn't seem to match. You would normally set Varnish to listen on the default port (80) and set Varnish to look up the backend on port 9000 or whatever.

If that's your entire configuration, the vcl_recv is configured twice.
In the pages you want to cache, can you send the caching headers? This would make the most sense, since images probably already have your apache caching headers and the app logic decide the pages that can be actually cached, but you can force this in varnish also.
You could use a vcl_recv like this:
# Called after a document has been successfully retrieved from the backend.
sub vcl_fetch {
# set minimum timeouts to auto-discard stored objects
# set beresp.prefetch = -30s;
set beresp.grace = 120s;
if (beresp.ttl < 48h) {
set beresp.ttl = 48h;}
if (!beresp.cacheable)
{pass;}
if (beresp.http.Set-Cookie)
{pass;}
# if (beresp.http.Cache-Control ~ "(private|no-cache|no-store)")
# {pass;}
if (req.http.Authorization && !beresp.http.Cache-Control ~ "public")
{pass;}
}
This one caches, in varnish, only the requests that are set cacheable. Also, be aware that your configuration doesn't cache requests with cookies.

Is it possible that REMOTE_ADDR could be blank?

As far as I'm aware, the webserver (Apache/Nginx) provides the ($_SERVER['REMOTE_ADDR']) based on the claimed location of the requesting user agent. So I understand they can be lying, but is it possible that this value could be blank? Would the network interface or webserver even accept a request without a correctly formed IP?
http://php.net/manual/en/reserved.variables.server.php

It is theoretically possible, as the matter is up to the http server or at least the corresponding PHP SAPI.
In practice, I haven't encountered such a situation, except with the CLI SAPI.
EDIT: For Apache, it would seem this is always set, as ap_add_common_vars always adds it to the table that ends up being read by the Apache module PHP SAPI (disclaimer: I have very limited knowledge of Apache internals).
If using PHP in a CGI environment, the specification in RFC 3875 seems to guarantee the existence of this variable:
4.1.8. REMOTE_ADDR
The REMOTE_ADDR variable MUST be set to the network address of the
client sending the request to the server.

Yes. I currently see values of "unknown" in my logs of Apache-behind-Nginx, for what looks like a normal request/response sequence in the logs. I believe this is possible because mod_extract_forwarded is modifying the request to reset REMOTE_ADDR based on data in the X-Forwarded-For header. So, the original REMOTE_ADDR value was likely valid, but as part of passing through our reverse proxy and Apache, REMOTE_ADDR appears invalid by the time it arrives at the application.
If you have installed Perl's libwww-perl, you can test this situation like this (changing example.com to be your own domain or application):
HEAD -H 'X-Forwarded-For: ' -sSe http://www.example.com/
HEAD -H 'X-Forwarded-For: HIMOM' -sSe http://www.example.com/
HEAD -H 'X-Forwarded-For: <iframe src=http://example.com>' -sSe http://www.example.com/
( You can also use any other tool that allows you to handcraft HTTP requests with custom request headers. )
Now, go check your access logs to see what values they logged, and check your applications to see how they handled the bad input. `

Well, it's reserved but writable. I've seen badly written apps that were scribbling all over the superglobals - could the script be overwriting it, e.g. with $_SERVER['REMOTE_ADDR'] = '';?
Other than that, even if the request were proxied, there should be the address of the proxy - could it be some sort of internal-rewrite module messing with it (mod_rewrite allows internal redirects, not sure if it affects this)?

It shouldn't be blank, and nothing can't connect to your web service. Whatever's connecting must have an IP address to send and receive data. Whether that IP address can be trusted is a different matter.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.