Nginx to cache only for bots

Nginx to cache only for bots - php

I have a decent website (nginx -> apache -> mod_php/mysql) to tune it a bit, and I find the biggest problem is that search bots used to overload it sending many requests at once.
There is a cache in site's core (that is, in PHP), so the site's author reported there should be no problem but in fact the bottleneck is that apache's reply is too long as there is too many requests for the page.
What I can imagine is to have some nginx based cache to cache pages only for bots. The TTL may be high enough (there is nothing that dynamic on page that can't wait another 5-10 minutes to be refreshed) Let's define 'bot' as any client that have 'Bot' in its UA string ('BingBot' as an example).
So I try to do something like that:
map $http_user_agent $isCache {
default 0;
~*(google|bing|msnbot) 1;
}
proxy_cache_path /path/to/cache levels=1:2 keys_zone=my_cache:10m max_size=10g inactive=60m use_temp_path=off;
server {
...
location / {
proxy_cache my_cache;
proxy_cache_bypass $isCache;
proxy_cache_min_uses 3;
proxy_cache_use_stale error timeout updating http_500 http_502 http_503 http_504;
proxy_cache_lock on;
proxy_pass http://my_upstream;
}
# location for images goes here
}
Am I right with my approach? Looks like it won't work.
Any other approaches to limit load from bots? Surely without sending 5xx codes to them (as Search Engines can lower positions for sites that are too 5xx-ed).
Thank you!

If your content pages may differ (i.e. say a user is logged in and it the page contains "welcome John Doe", then that version of the page may be cached, as each request is updating the cached copy (i.e. a logged in person will update the cached version, including their session cookies, which is bad).
It is best to do something similar to the following:
map $http_user_agent $isNotBot {
~*bot "";
default "IAmNotARobot";
}
server {
...
location / {
...
# Bypass the cache for humans
proxy_cache_bypass $isNotBot;
# Don't cache copies of requests from humans
proxy_no_cache $isNotBot;
...
}
...
}
This way, only requests by a bot are cached for future bot requests, and only bots are served cached pages

Related

PHP 5.5 FastCGI Caching

I've implemented FastCGI caching on our site, and have seen great speed improvements. However the FastCGI cache key does not seem to be unique enough. If I login, my name appears in the header. However the next person to login still sees my name in the header, assuming the cache is still valid.
Is there a way to make the cache key unique on a per-user basis? Ideally using a unique identifier from the user's Cookies or a PHP Session? I tried implemented the answer below, but Nginx failed to restart.
Log in value from Set-Cookie header in nginx
Note my cache key looks like this:
fastcgi_cache_key "$scheme$request_method$host$request_uri";
Update:
My thought is if I can parse the HTTP headers sent to Nginx, then I can grab the PHP SESSION ID and use that. However I cannot find an example of how to do this anywhere. Right now I have something like this, which doesn't work.
http_cookie ~* PHPSESSID=([0-9a-z]+) {
set $ses_id $1;
}

I was able to solve the above problem using the Nginx ngx_http_userid_module. The hardest part was actually finding the module, implementing the solution was quite trivial.
I used their example configuration:
userid on;
userid_name uid;
userid_domain example.com;
userid_path /;
userid_expires 365d;
userid_p3p 'policyref="/w3c/p3p.xml", CP="CUR ADM OUR NOR STA NID"';
And then added the userid to my fastCGI cache key:
fastcgi_cache_key "$scheme$request_method$host$request_uri$uid_got";
Hopefully this answer helps someone discover this useful module quicker than I did.

Limit Page to one Simultanious Request

Using PHP or Apache (I'd also consider moving to Nginx if necessary), how can I make a page only be ran once at a time? If three requests come in at the same time, one would complete entirely, then the next, and then the next. At no time should the page be accessed by more than one request at a time.
Think of it like transactions! How can I achieve this? It should be one page per server, no user or IP.

What do you want to do with the "busy" state on server? return an error right away or keep requests in waiting until the previous finishes?
If you just want the server to refuse content to the client, you can do it on both nginx and apache:
limit_req module for nginx
using mod_security on apache
The "tricky" part in your request is not to limit by an IP as people usually want, but globally per URI. I know it should be possible with mod_security bud I didn't do that myself but I have this configuration working for nginx:
http {
#create a zone by $package id (my internal variable, similar to a vhost)
limit_req_zone $packageid zone=heavy-stuff:10m rate=1r/s;
}
then later:
server {
set $packageid some-id;
location = /some-heavy-stuff {
limit_req zone=heavy-stuff burst=1 nodelay;
}
}
what it does for me is creating N limit zones, one for each of my servers. The zone is then used to count requests and allow only 1 per second.
Hope it helps

If the same user gives the request from the same page then Use session_start() this will block the other requests until finish the first request
Example:
http://codingexplained.com/coding/php/solving-concurrent-request-blocking-in-php
If you want to block the request from different browser/client keep the entries in database and process it one by one.

How to Enable geoip on magento with varnish page cache

I currently have 3 stores online with 3 different domains, running magento with Apache and varnish (using Phoenix page cache extension) running on centos
One store is for uk, another for Ireland and another for USA
Trouble is (Example) If an US user hits the uk store , I would like the user to be notified to go to the correct store on the page, (I do not want them automatically redirected)
I was able to php-pecl-geoip with maxmind database to get this to work, but as users on my website have increased I had to begin using varnish.
how could I implement this functionality on with varnish so I know what country the user is from so I can display a message to the user to view their relevant website?

Gunah, I think you missed the point here.
When put Varnish in front of Apache, the client IP that PHP would see will always be the IP of Varnish (127.0.0.1 if it stay in the same server).
molleman, In this case you need to look at X-Forwarded-For header set by Varnish to get the real client IP. You can see how Varnish set it in the default.vcl:
if (req.http.x-forwarded-for) {
set req.http.X-Forwarded-For =
req.http.X-Forwarded-For + ", " + client.ip;
} else {
set req.http.X-Forwarded-For = client.ip;
}
If your web server is behind a load balancer, then you need more works. Please refer here for a solution: Varnish removes Public IP from X-Forwarded-for

you can create your Crontroller with a JSON Action Result in Magento.
then you can check these with JavaScript and output the result.
Do not forget to add your controller to the withlist in Varnish.

How can I limit connections to my web application per minute?

I have tried to use nginx (http://nginx.org/) to limit the amount of requests per minute. For example my settings have been:
server{
limit_req_zone $binary_remote_addr zone=pw:5m rate=20r/m;
}
location{
limit_req zone=pw nodelay;
}
What I have found with Nginx is that even if I try 1 request per minute, I am allowed back in many times within that minute. Of course fast refreshing of a page will give me the limit page message which is a "503 Service Temporarily Unavailable" return code.
I want to know what kind of settings can be applied to limit a request exactly to 20 requests a minute. I am not looking for flood protection only because Nginx provides this where if a page is constatnly refreshed for example it limits the user and lets them back in after some time with some delay (unless you apply a nodelay setting).
If there is an alternative to Nginx other than HAProxy (because its quite slow). Also the setup I have on Nginx is acting as a reverse proxy to the real site.

Right there's 2 things:
the limit_conn directive in combination with a limit_conn_zone lets you limit the number of (simultaneous) connnections from an ip (see http://nginx.org/en/docs/http/ngx_http_limit_conn_module.html#limit_conn)
the limit_req directive in combination with a limit_req_zone lets you limit the number of request from a given ip per timeunit (see http://nginx.org/en/docs/http/ngx_http_limit_req_module.html#limit_req)
note:
you need to do the limit_conn_zone/limit_req_zone in the http block not the serverblock
you then refer to the zone name you set up in the http block from within the server/location block with the etup zone with the limit_con/limit_req settings (as approriate)
since you stated below you're looking to limit requests you need the limit_req directives. Specically to get a max 5 requests per minute, try adding the following:
http {
limit_req_zone $binary_remote_addr zone=example:10m rate=5r/m;
}
server {
limit_req zone=example burst=0 nodelay;
}
note: obviously add those to your existing http/server blocks

How to properly set up Varnish for Symfony2 sites?

I have a website (with ESI) that uses Symfony2 reverse proxy for caching. Average response is around 100ms. I tried to install Varnish on server to try it out. I followed guide from Symfony cookbook step by step, deleted everything in cache folder, but http_cache folder was still created when I tried it out. So I figured I could try to comment out $kernel = new AppCache($kernel); from app.php. That worked pretty well. http_cache wasn't created anymore and by varnishstat, Varnish seemed to be working:
12951 0.00 0.08 cache_hitpass - Cache hits for pass
1153 0.00 0.01 cache_miss - Cache misses
That was out of around 14000 requests, so I thought everything would be alright. But after echoping I found out responses raised to ~2 seconds.
Apache runs on port 9000 and Varnish on 8080. So I echoping using echoping -n 10 -h http://servername/ X.X.X.X:8080.
I have no idea what could be wrong. Are there any additional settings needed to use Varnish with Symfony2? Or am I simply doing something wrong?
Per requests, here's my default.vcl with modifications I've done so far.
I found 2 issues with Varnish's default config:
it doesn't cache requests with cookies (and everyone in my app has session assigned)
it ignores Cache-Control: no-cache header
So I added conditions for these cases to my config and it performs fairly well now (~175 req/s up from ~160 with S2 reverse proxy - but honestly, I expected bit more). I just have no idea how to check if it's all set ok, so any inputs are welcome.
Most of pages have cache varied by cookie, with s-maxage 1200. Common ESI includes aren't varied by cookie, with s-maxage quite low (articles, article lists). User profile pages aren't cached at all (no-cache) and I'm not really sure if ESI includes on these are even being cached by Varnish. Only ESI that's varied by cookies is header with user specific information (that's on 100% of pages).
Everything in this post is Varnish 3.X specific (I'm personally using 3.0.2).
Also, after few weeks of digging into this, I have really no idea what I'm doing anymore, so if you find something odd in configs, just let me know.

I'm surprised this hasn't had a really full answer in 10 months. This could be a really useful page.
You pointed out yourself that:
Varnish doesn't cache requests with cookies
Varnish ignores Cache-Control: no-cache header
The first thing is, does everyone in your app need a session? If not, don't start the session, or at least delay starting it until it's really necessary (i.e. they log in or whatever).
If you can still cache pages when users are logged in, you need to be really careful you don't serve a user a page which was meant for someone else. But if you're going to do it, edit vcl_recv() to strip the session cookie for the pages that you want to cache.
You can easily get Varnish to process the no-cache directive in vcl_fetch() and in fact you've already done that.
Another problem which I found is that Symfony by default sets max-age to 0, which means that they won't ever get cached by the default logic in vcl_fetch
I also noticed that you had the port set in Varnish to:
backend default {
.host = "127.0.0.1";
.port = "80";
}
You yourself said that Apache is running on port 9000, so this doesn't seem to match. You would normally set Varnish to listen on the default port (80) and set Varnish to look up the backend on port 9000 or whatever.

If that's your entire configuration, the vcl_recv is configured twice.
In the pages you want to cache, can you send the caching headers? This would make the most sense, since images probably already have your apache caching headers and the app logic decide the pages that can be actually cached, but you can force this in varnish also.
You could use a vcl_recv like this:
# Called after a document has been successfully retrieved from the backend.
sub vcl_fetch {
# set minimum timeouts to auto-discard stored objects
# set beresp.prefetch = -30s;
set beresp.grace = 120s;
if (beresp.ttl < 48h) {
set beresp.ttl = 48h;}
if (!beresp.cacheable)
{pass;}
if (beresp.http.Set-Cookie)
{pass;}
# if (beresp.http.Cache-Control ~ "(private|no-cache|no-store)")
# {pass;}
if (req.http.Authorization && !beresp.http.Cache-Control ~ "public")
{pass;}
}
This one caches, in varnish, only the requests that are set cacheable. Also, be aware that your configuration doesn't cache requests with cookies.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.