I am confused about cache headers.
I am trying to make the browser cache my php/html pages because they are mostly static and they would not change for a month or so.
The urls look like: example.com/article-url and in background that is a php page like /article_url.php
I tried this in PHP:
$cache_seconds = 60*60*24*30;
header("Expires: ".gmdate('D, d M Y H:i:s \G\M\T', time()+$cache_seconds));
header("Cache-Control:public, max-age=".$cache_seconds);
And in browser debug window I can see that indeed the page would expire next month:
Request URL: https://www.example.com/article-url
Request Method: GET
Status Code: 200
Referrer Policy: no-referrer-when-downgrade
cache-control: public, max-age=2592000
content-encoding: gzip
content-length: 2352
content-type: text/html
date: Mon, 25 Nov 2019 14:23:40 GMT
expires: Wed, 25 Dec 2019 14:23:40 GMT
server: nginx
status: 200
vary: Accept-Encoding
But if I access that page again, I see it was generated again, I made it print the request timestamp in footer, and I can see the page is generated again on each page load.
I was expecting browser to show exact same page from cache.
What am I doing wrong ?
Related
I have an API and I've been trying to add cache control headers to it.
The API already makes use of PhpFastCache for server side caching but I wanted to add an additional layer of browser control caching. I came across this intelligent php cache control page and modified it slightly.
Using PhpFastCache, I do a check to see if the server side cache exists, if it doesn't then query the DB and output normally with a 200 response code. If the cache does exist then I do the following:
//get the last-modified-date of this very file
$lastModified=filemtime(__FILE__);
//get a unique hash of this file (etag)
$etagFile = md5( $CachedString->get() );
//get the HTTP_IF_MODIFIED_SINCE header if set
$ifModifiedSince=(isset($_SERVER['HTTP_IF_MODIFIED_SINCE']) ? $_SERVER['HTTP_IF_MODIFIED_SINCE'] : false);
//get the HTTP_IF_NONE_MATCH header if set (etag: unique file hash)
$etagHeader=(isset($_SERVER['HTTP_IF_NONE_MATCH']) ? trim($_SERVER['HTTP_IF_NONE_MATCH']) : false);
//set last-modified header
header("Last-Modified: ".gmdate("D, d M Y H:i:s", $lastModified)." GMT");
//set etag-header
header("Etag: $etagFile");
//make sure caching is turned on
header('Cache-Control: public');
//check if page has changed. If not, send 304 and exit
if (#strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE'])==$lastModified || $etagHeader == $etagFile)
{
header("HTTP/1.1 304 Not Modified");
exit;
}else{
//Cache Match - Output Cache Result
header('Content-Type: application/json');
echo $CachedString->get();
}
I'm using this line to get the cached response as md5:
$etagFile = md5( $CachedString->get() );
Then doing a check to see if this md5 content has changed:
if (#strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE'])==$lastModified || $etagHeader == $etagFile)
{
header("HTTP/1.1 304 Not Modified");
exit;
}else{
//Cache Match - Output Cache Result
header('Content-Type: application/json');
echo $CachedString->get();
}
However I can never seem to get the 304 response header. It is ALWAYS a 200 code response header.
curl -I -L https://db.ygoprodeck.com/api/v7/cardinfo.php?name=Tornado%20Dragon
With the response always being:
HTTP/1.1 200 OK
Date: Tue, 17 Mar 2020 13:37:31 GMT
Content-Type: application/json
Connection: keep-alive
Set-Cookie: __cfduid=daaab295934a2a8ef966c2c70fe0955b91584452250; expires=Thu, 16-Apr-20 13:37:30 GMT; path=/; domain=.ygoprodeck.com; HttpOnly; SameSite=Lax
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET
Access-Control-Allow-Headers: Content-Type, Authorization, X-Requested-With
Cache-Control: public
Last-Modified: Tue, 17 Mar 2020 13:15:53 GMT
Etag: 399b9ba2d69ab115f46faa44be04d0ca
Vary: User-Agent
CF-Cache-Status: DYNAMIC
Expect-CT: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
Server: cloudflare
CF-RAY: 57571be8a986a72f-DUB
Your request is being proxied through Cloudflare which has its own caching layer. If you test this direct to origin/with a grey clouded record are you getting a 304?
You said you were working on browser caching, browser is going to cache based on the max-age setting you send, but don't see one being set in the response.
I'm writing a web service who generate thumbs of images with Phalcon.
I try to HTTP cache it.
This is my code :
$seconds = 43200;
$expireDate = new DateTime();
$expireDate->modify("+ $seconds seconds");
$finfo = new finfo(FILEINFO_MIME_TYPE);
$app->response->setHeader('Content-Type', 'Content-type: ' . $finfo->buffer($data));
$app->response->setExpires($expireDate);
$app->response->setHeader('Pragma', 'cache');
$app->response->setHeader('Cache-Control', "private, max-age=$seconds");
$app->response->setHeader('E-Tag', md5(filemtime($path)));
$app->response->setHeader('Last-Modified', gmdate('D, d M Y H:i:s', filemtime($path)).' GMT');
$app->response->sendHeaders();
echo $data;
The image is corretly displayed. But when you refresh it, the http code is always 200, I try on another image of another website and I've got 200, 304, 304, 304...
This is my raw response header :
HTTP/1.1 200 OK
Date: Thu, 27 Aug 2015 18:38:41 GMT
Server: Apache/2.4.10 (Debian)
Expires: Fri, 28 Aug 2015 06:38:41 GMT
Pragma: cache
Cache-Control: private, max-age=43200
E-Tag: 501a8d62f276eb5b165b8a709bf4e5b4
Last-Modified: Sun, 05 Jul 2015 20:34:14 GMT
Keep-Alive: timeout=5, max=90
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: image/jpeg
Someone see what i'm doing wrong ?
Thanks in advance.
Your php code needs to return the 304 Not Modified header when the browser asks if the cached image is still valid. Put an if statement at the top of your script to handle that request before sending the image again.
You are always sending the image that's why the browser is showing a 200 response.
If you add the max-age to the last-modified date you get an expiry time in the past.
Your code is a mess of every possible way to influence caching (and btw http does not define a "pragma: cache" header). As to the question of what you should be doing, that depends on what you are trying to acheive - just getting load off the server, faster page rendering or caching up to apre-planned replacement or something else. And you haven't told us what this is.
Thanks PaulS !
$filemtimeOk = isset($_SERVER['HTTP_IF_MODIFIED_SINCE']) && $filemtime <= strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE']);
$etagOk = isset($_SERVER['HTTP_IF_NONE_MATCH']) && $_SERVER['HTTP_IF_NONE_MATCH'] == $etag;
if ($filemtimeOk && $etagOk) {
$app->response->setStatusCode(304, "Not Modified");
$app->response->sendHeaders();
} else {
// Normal case... (send data and headers)
}
...and whether it was cached 30 days ago,,
I am using this code:
$page=get_headers('http://www.w3schools.com/php/func_date_strtotime.asp');
The output is this:
0=>HTTP/1.1 200 OK
1=>Connection: close
2=>Date: Thu, 03 May 2012 10:51:00 GMT
3=>Server: Microsoft-IIS/6.0
4=>MicrosoftOfficeWebServer: 5.0_Pub
5=>X-Powered-By: ASP.NET
6=>Pragma: no-cache
7=>Content-Length: 23643
8=>Content-Type: text/html
9=>Expires: Thu, 03 May 2012 10:50:00 GMT
10=>Set-Cookie: ASPSESSIONIDSAARQQST=AAMAAHBBBHBELMHDCHNNLMFP; path=/
11=>Cache-control: no-cache
I read that pragma cache , doesnt necessary mean that the page is uncacheable.
I want to know 2 things:
1) if the page is cached
2) if it was cached 30 days ago.
Can I do this
$date1=gmdate("D, d M Y H:i:s", strtotime("30 days ago")) . " GMT";
$date2=$page['Expires'];
if($date1>$date2)
{
echo 'The page was cached for longer than 30 days';
}
Since PHP is a server side language you cannot check browser cache(which is a client side) using PHP. So you need some client side scripting like Javascript and not server side programming like PHP.
I'm trying to perfom a log-in on pinterest.com with curl. I got the following request-response-flow:
GET-Request the login form and scrape hidden fields (csrftoken)
POST-Request login credentials (mail and pw) and scraped csrftoken
Receive Session Cookie for login
Using Curl, I can see the following Headers being sent and received:
GET /login/?next=%2F HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2
Host: pinterest.com
Referer:
Accept: text/html,application/xhtml+xml,application/xml,*/*
Accept-Language: de-de,en-us
Connection: keep-alive
HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Date: Tue, 10 Apr 2012 15:03:24 GMT
ETag: "45d6a85f0ede46f13f4fc751842ce5b7"
Server: nginx/0.8.54
Set-Cookie: csrftoken=dec6cb66064f318790c6d51e3f3a9612; Max-Age=31449600; Path=/
Set-Cookie: _pinterest_sess="eJyryMwNcTXOdtI3zXcKNq0qznIxyXVxK/KqSsy3tY8vycxNtfUN8a3yc3E09nXxLPdztLVVK04tLs5MsfXNAopVpVf6VnlW+Qba2gIAuqgZIg=="; Domain=pinterest.com; HttpOnly; expires=Tue, 17-Apr-2012 15:03:24 GMT; Max-Age=1334675004; Path=/
Vary: Cookie, Accept-Encoding
Content-Length: 4496
Connection: keep-alive
So after step 1, the two cookies csrftoken and _pinterest_sess are set. But a look in the cookiejar file (I use CURLOPT_COOKIEFILE and CURLOPT_COOKIEJAR to let curl handle the cookie processing) shows the following:
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.
pinterest.com FALSE / FALSE 1365519805 csrftoken dec6cb66064f318790c6d51e3f3a9612
#HttpOnly_.pinterest.com TRUE / FALSE -1626222087 _pinterest_sess "eJyryMwNcTXOdtI3zXcKNq0qznIxyXVxK/KqSsy3tY8vycxNtfUN8a3yc3E09nXxLPdztLVVK04tLs5MsfXNAopVpVf6VnlW+Qba2gIAuqgZIg=="
First thing to note is the #HttpOnly_ in preceding the _pinterest_sess cookie line. I just assume that curl handles that just fine. But looking further, one can see that a negative value is set as expiration date: -1626222087
I don't know where that's coming from, because the cookie is set with "expires=Tue, 17-Apr-2012 15:03:24 GMT" (which is about 7 days in the future, counting from today).
On the next request, the _pinterest_sess cookie won't be set by curl:
POST /login/?next=%2F HTTP/1.1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0.2) Gecko/20100101 Firefox/10.0.2
Host: pinterest.com
Referer: https://pinterest.com/login/?next=%2F
Cookie: csrftoken=dec6cb66064f318790c6d51e3f3a9612
Accept: text/html,application/xhtml+xml,application/xml,*/*
Accept-Language: de-de,en-us
Connection: keep-alive
Content-Length: 123
Content-Type: application/x-www-form-urlencoded
HTTP/1.1 302 FOUND
Content-Type: text/html; charset=utf-8
Date: Tue, 10 Apr 2012 15:05:26 GMT
ETag: "d41d8cd98f00b204e9800998ecf8427e"
Location: http://pinterest.com/
Server: nginx/0.8.54
Set-Cookie: _pinterest_sess="eJzLcssPCy4NTclIjvAOrjQzyywoCChISgvLDi+2tY9PrSjILEottvUN8a3yc4k09gtxrfRLt7VVK04tLs5MAYonV/qGeFb4ZkWW+4LES4tTi+KBEv4u6UZ+WYEmvlm+QOxZ6R/iWOEbEmgLAKNfJps="; Domain=pinterest.com; HttpOnly; expires=Tue, 17-Apr-2012 15:05:26 GMT; Max-Age=1334675126; Path=/
Vary: Cookie
Content-Length: 0
Connection: keep-alive
In the response, another _pinterest_sess cookie is set since curl didn't send the last one.
Currently, I don't know if I'm doing something wrong or if curl just isn't able to parse the expires value in the cookie correctly.
Any help would be greatly appreciated :)
// edit
One more thing:
According to http://opensource.apple.com/source/curl/curl-57/curl/lib/cookie.c the function curl_getdate() is used to extract the date. The documentation on that function lists some examples (http://curl.haxx.se/libcurl/c/curl_getdate.html):
Sun, 06 Nov 1994 08:49:37 GMT
Sunday, 06-Nov-94 08:49:37 GMT
Sun Nov 6 08:49:37 1994
06 Nov 1994 08:49:37 GMT
06-Nov-94 08:49:37 GMT
Nov 6 08:49:37 1994
06 Nov 1994 08:49:37
06-Nov-94 08:49:37
1994 Nov 6 08:49:37 GMT
08:49:37 06-Nov-94
Sunday 94 6 Nov 08:49:37
1994 Nov 6
06-Nov-94
Sun Nov 6 94
1994.Nov.6
Sun/Nov/6/94/GMT
Sun, 06 Nov 1994 08:49:37 CET
06 Nov 1994 08:49:37 EST
Sun, 12 Sep 2004 15:05:58 -0700
Sat, 11 Sep 2004 21:32:11 +0200
20040912 15:05:58 -0700
20040911 +0200
None of them matches the above mentioned expires date "Tue, 17-Apr-2012 15:03:24 GMT" because all examples with hyphens only use 2-digit-years..
You are experiencing an issue on your computer because of the limits of 32 bit signed integer values.
The server sets a cookie with the Max-Age of 1334675004 seconds in the future.
Max-Age=1334675004
You posted your question here # 2012-04-10 15:13:24Z. That is a UNIX timestamp of 1334070804. If you add 1334675004 to it and you take a 32 bit integer limit of 2147483647 into account while having an integer roundtrip, you'll get: -1626221485:
1334070804
+ 1334675004
------------
-1626221485
As the numbers show, it looks like that the server did misunderstood the Max-Age attribute, if you substract each values from each other there is a circa delta of 7 days in seconds (604200 = ~6.99 days, the difference is because the cookie was set earlier than you posted your question here). However Max-Age is the delta of seconds, not the absolut UNIX timestamp.
Try to raise PHP_INT_MAX with your PHP version, or compile against 64 bit, this should prevent negative numbers. However, the max-age calculation is still broken with the server. You might want to contact pinterest.com and report the problem.
Looks like pinterest.com is using Max-age incorrectly, and that's why curl is deleting this cookie.
From your example, Max-age contains timestamp for Tue, 17-Apr-2012 15:03:24 GMT, while it should contain number of seconds from request time to this date - 604800 (judging from request time - Date header)
What curl is doing is adding Max-age value to current timestamp and saving it as signed 32bit integer, hence -1626222087.
As for solution - you can try contacting pinterest and report a bug.
Actually you do not need to contact pinterest site since it is not required to send back to server cookie max age(if you will use cookie for a short period of time or you may calculate yourself correct max age). Just flip the minus sign and it will work meaning it will be send back to server. And it was not all what you have to do. Sometimes depending on login page presented you have to parce hidden fields also(where CSRF tokens resided and that have to match with the same token value in cookie). Moreover, it will sometimes require to change cookies(reset cookie values). So pinterest web site is making harder and harder to login using automated login tools and doing screen scraping. And recently they have changed how their site functions. So all the above mentioned points does not work now. Actually you do not really know when they will change how login works. You have to try and "guess" when change happens. Actually that attitude should be towards not developers but the ones who are threats to security of the system(intruders). You have to think about legality issue of above mentioned points too. Pinterest has API(although it is down right now) so it is the best and most correct way to use that API (pls see https://github.com/kellan/pinterest.api.php). There you are exchanging messages in a json format. Last option to use m.pinterest.com which is for mobile devices and it is strightforward to use like parce one login html for hidden input fields and resubmit form with correct values (to use it you are again faced with legality issues too). Please consult with pinterest site before using curl like tools or wait until pinterest api is up. Yes, there some improvements in the system like getting json responses which puts the end to screen scraping but that does not mean completely new api. Also right now they(seemingly) implemented web services, restful, api and taking ajax requests which are again steps towards to positive improvement. There are many discussions are going on the net on this matter so please refer to them for detailed info.
When I send a 304 response. How will the browser interpret other headers which I send together with the 304?
E.g.
header("HTTP/1.1 304 Not Modified");
header("Expires: " . gmdate("D, d M Y H:i:s", time() + $offset) . " GMT");
Will this make sure the browser will not send another conditional GET request (nor any request) until $offset time has "run out"?
Also, what about other headers?
Should I send headers like this together with the 304:
header('Content-Type: text/html');
Do I have to send:
header("Last-Modified:" . $modified);
header('Etag: ' . $etag);
To make sure the browser sends a conditional GET request the next time the $offset has "run out" or does it simply save the old Last Modified and Etag values?
Are there other things I should be aware about when sending a 304 response header?
This blog post helped me a lot in order to tame the "conditional get" beast.
An interesting excerpt (which partially contradicts Ben's answer) states that:
If a normal response would have included an ETag header, that header must also be included in the 304 response.
Cache headers (Expires, Cache-Control, and/or Vary), if their values might differ from those sent in a previous response.
This is in complete accordance with the RFC 2616 sec 10.3.5.
Below a 200 request...
HTTP/1.1 200 OK
Server: nginx/0.8.52
Date: Thu, 18 Nov 2010 16:04:38 GMT
Content-Type: image/png
Last-Modified: Thu, 15 Oct 2009 02:04:11 GMT
Expires: Thu, 31 Dec 2010 02:04:11 GMT
Cache-Control: max-age=315360000
Accept-Ranges: bytes
Content-Length: 6394
Via: 1.1 proxyIR.my.corporate.proxy.name:8080 (IronPort-WSA/6.3.3-015)
Connection: keep-alive
Proxy-Connection: keep-alive
X-Junk: xxxxxxxxxxxxxxxx
...And its optimal valid 304 counterpart.
HTTP/1.1 304 Not Modified
Server: nginx/0.8.52
Date: Thu, 18 Nov 2010 16:10:35 GMT
Expires: Thu, 31 Dec 2011 16:10:35 GMT
Cache-Control: max-age=315360000
Via: 1.1 proxyIR.my.corporate.proxy.name:8080 (IronPort-WSA/6.3.3-015)
Connection: keep-alive
Proxy-Connection: keep-alive
X-Junk: xxxxxxxxxxx
Notice that the Expires header is at most Current Date + One Year as per RFC-2616 14.21.
The Content-Type header only applies to responses which contain a body. A 304 response does not contain a body, so that header does not apply. Similarly, you don't want to send Last-Modified or ETag because a 304 response means that the document hasn't changed (and so neither have the values of those two headers).
For an example, see this blog post by Anne van Kesteren examining WordPress' http_modified function. Note that it returns either Last-Modified and ETag or a 304 response.