How best to cache entire HTML output on the server - php

I have a fairly basic website, written in pure php, no framework was used, running in a basic LAMP environment.
The site dynamically generates markup based on the HTTP User Agent header, and some query string parameters. For example "itemdetail.php" would inspect the querystring param "itemid" and the User Agent header and produce some markup.
I want to cache this markup, so that the next time a device with the same User Agent and itemid in the query string tries to request the page, it simply dumps out whatever markup is in its cache.
I realise I could do this manually in php using memcache, and just write some code at the top of the page to inspect the relevant params, and either try serve from memcached or render the page and store the markup in memcached, but I was thinking it might be possible to avoid the PHP layer altogether, using something like what is described here http://httpd.apache.org/docs/2.2/caching.html
So, my question, which I realise might be vague and this post will get killed is:
What is the recommended caching implementation here? Is it indeed to use memcache at the php level, or are the apache modules sufficient to meet my needs?

Generating different pages depending on User Agents is just bad practice. You shouldn't do that.
If you want to cache entire pages because your website is slow, the problem probably has to be searched in your code.
On-topic: Write a simple function that hashes the uri being served with a small footprint hash function (md5, sha1,...)
e.g.
<?php
$hash = md5('itemdetail.php-'.$itemid);
if ( file_exist('cache/'.$hash.'.html') {
echo file_get_contents('cache/'.$hash.'.html');
die();
}
and then at the end of your script save the result to 'cache/'.$hash.'.html';
You can offcourse use different kind of extension or folder or...
If you want to cache without using PHP, take a look at Varnish. Or the other example posted here.

If you are familiar with OpenCart at all here is something I wrote to do just this. hopefully you will get the idea given the possible unfamiliar
context.
ob_start();
$enableCaching = false; // Boolean flag
$route = !isset($_GET['route']) ? 'home' : str_replace("/",'-',$_GET['route']);
$cacheFile = DIR_CACHE . $route . '.' . md5($_SERVER['QUERY_STRING']) . ".cache.tpl";
if ($enableCaching !== false && in_array($_GET['route'], $cachePages) && file_exists($cacheFile) ||
$enableCaching !== false && file_exists($cacheFile) && !isset($_GET['route'])) {
/**
* This block of code will output the contents of the cache file.
*/
require ($cacheFile);
}
else {
/**
* Cache file doesn't exist, process the request
*/
$response->output();
if($enableCaching !== false && in_array($_GET['route'], $cachePages) ||
$enableCaching !== false && !isset($_GET['route'])){
file_put_contents($cacheFile, str_replace(array("\n","\r","\t"),'', str_replace(" "," ",ob_get_contents())));
}
}
Basically, create a variable generating a unique file name based on the file name and quest string.
Create that file, writing all HTML output to that file.
Then when it comes to processing request you can check if the unique cache file exists and just send that instead of processing the request.

use the memcached library...
you'll have to install it first and then memcached provides and in-memory caching system for php

Related

Sharing access restrictions between php and javascript

The actual questions
How to "map" access restrictions so it can be used from php and javasript?
What kind of method should I use to share access restrictions / rules between php and javascript?
Explanation
I have created a RESTful backend using php which will use context-aware access control to limit data access and modification. For example, person can modify address information that belongs to him and can view (but not modify) address information of all other persons who are in the same groups. And of course, group admin can modify address details of all the persons in that group.
Now, php side is quite "simple" as that is all just a bunch of checks. Javascript side is also quite "simple" as that as well is just a bunch of checks. The real issue here is how to make those checks come from the same place?
Javascript uses checks to show/hide edit/save buttons.
PHP uses checks to make the actual changes.
and yes,
I know this would be much more simpler situation if I ran javascript (NodeJS or the like) on server, but the backend has already been made and changing ways at this point would cause major setbacks.
Maybe someone has already deviced a method to model access checks in "passive" way, then just use some sort of "compiler" to run the actual checks?
Edit:
Im case it helps to mention, the front-end (js) part is built with AngularJS...
Edit2
This is some pseudo-code to clarify what I think I am searching for, but am not at all certain that this is possible in large scale. On the plus side, all access restrictions would be in single place and easy to amend if needed. On the darkside, I would have to write AccessCheck and canAct functions in both languages, or come up with a way to JIT compile some pseudo code to javascript and php :)
AccessRestrictions = {
Address: {
View: [
OWNER, MEMBER_OF_OWNER_PRIMARY_GROUP
],
Edit: [
OWNER, ADMIN_OF_OWNER_PRIMARY_GROUP
]
}
}
AccessCheck = {
OWNER: function(Owner) {
return Session.Person.Id == Owner.Id;
},
MEMBER_OF_OWNER_PRIMARY_GROUP: function(Owner) {
return Session.Person.inGroup(Owner.PrimaryGroup)
}
}
canAct('Owner', 'Address', 'View') {
var result;
AccessRestrictions.Address.View.map(function(role) {
return AccessCheck[role](Owner);
});
}
First things first.
You can't "run JavaScript on the server" because Javascript is always run on the client, at the same way PHP is always run on the server and never on the client.
Next, here's my idea.
Define a small library of functions you need to perform the checks. This can be as simple as a single function that returns a boolean or whatever format for your permissions. Make sure that the returned value is meaningful for both PHP and Javascript (this means, return JSON strings more often than not)
In your main PHP scripts, include the library when you need to check permissions and use the function(s) you defined to determine if the user is allowed.
Your front-end is the one that requires the most updates: when you need to determine user's permission, fire an AJAX request to your server (you may need to write a new script similar to #2 to handle AJAX requests if your current script isn't flexible enough) which will simply reuse your permissions library. Since the return values are in a format that's easily readable to JavaScript, when you get the response you'll be able to check what to show to the user
There are some solutions to this problem. I assume you store session variables, like the name of the authorized user in the PHP's session. Let's assume all you need to share is the $authenticated_user variable. I assume i'ts just a string, but it can also be an array with permissions etc.
If the $authenticated_user is known before loading the AngularJS app you may prepare a small PHP file whish mimics a JS file like this:
config.js.php:
<?php
session_start();
$authenticated_user = $_SESSION['authenticated_user'];
echo "var authenticated_user = '$authenticated_user';";
?>
If you include it in the header of your application it will tell you who is logged in on the server side. The client side will just see this JS code:
var authenticated_user = 'johndoe';
You may also load this file with ajax, or even better JSONP if you wrap it in a function:
<?php
session_start();
$authenticated_user = $_SESSION['authenticated_user'];
echo <<<EOD;
function set_authenticated_user() {
window.authenticated_user = '$authenticated_user';
}
EOD;
?>

Posting and executing a string as PHP?

I'm building an application that's self-hosted, and I'd like to have some form of licensing to minimize fraudulent downloads/distribution.
Of course, I'm well aware that being self-hosted someone could simply rip out all license features from the source-code, but the con's of using a compiler like Zend Guard or ionCube far outweigh the pro's in my opinion - nonetheless I'd like to have some basic form of license security.
What I originally had in mind to do was: user logs in with license on app -> app posts license to my server -> server sends a response via a HTTP GET request -> app evaluates response, and if license is valid sets a value in a session variable (A), if invalid returns to login screen with an error.
The problem with this is, the evaluation of response/session setting is readily available in a application file, so if the user knows a little PHP and checks in on that source code, they'll realize all they'll need to do is set a session themselves with a particular $_SESSION['_valid_license'] value, and they'll be good to go.
What I was considering doing to make it a little less easy was (if possible) to post PHP back as a response, and then have the application file execute it, for example:
My original code:
$response = $_GET['response'];
if($response == "fjdkuf9") {
start_session();
$_SESSION['_valid_license'] = "YES";
header("Location:" . $rp . "/admin/");
} else {
header("Location:" . $rp . "/login/?err=1");
}
My new concept:
$response = $_POST['response'];
str_replace("\", "", $response);
With the following being posted as response:
start_session();
\$_SESSION[\'_valid_license\'] = \"YES\";
header(\"Location:\" . \$rp . \"/admin/\");
Would that execute $response as actual PHP code after str_replace()? If possible, this would be great, as it would mean evaluation would be done on my server rather than within the self-hosted app itself.
Your second solution is just as insecure as the first. here's what I would do:
Your application POSTS to your server a serial number or some other identifying information.
Your server validates the serial number against the user's account or whatever and returns a unique response.
If that response is successful, you allow the user to continue. Obviously you'd want to implement some sort of caching mechanism here so you're not having to hit you server on every page view.
Putting the responsibility of validation on your server instead of self-hosted code is much more secure. You would need to encrypt the data that is sent BTW so that someone couldn't simply emulate the success response, but you get the idea.

Loading Javascript through PHP

From a tutorial I read on Sitepoint, I learned that I could load JS files through PHP (it was a comment, anyway). The code for this was in this form:
<script src="js.php?script1=jquery.js&scipt2=main.js" />
The purpose of using PHP was to reduce the number of HTTP requests for JS files. But from the markup above, it seems to me that there are still going to be the same number of requests as if I had written two tags for the JS files (I could be wrong, that's why I'm asking).
The question is how is the PHP code supposed to be written and what is/are the advantage(s) of this approach over the 'normal' method?
The original poster was presumably meaning that
<script src="js.php?script1=jquery.js&scipt2=main.js" />
Will cause less http requests than
<script src="jquery.js" />
<script src="main.js" />
That is because js.php will read all script names from GET parameters and then print it out to a single file. This means that there's only one roundtrip to the server to get all scripts.
js.php would probably be implemented like this:
<?php
$script1 = $_GET['script1'];
$script2 = $_GET['script2'];
echo file_get_contents($script1); // Load the content of jquery.js and print it to browser
echo file_get_contents($script2); // Load the content of main.js and print it to browser
Note that this may not be an optimal solution if there is a low number of scripts that is required. The main issue is that web browser does not load an infinitely number of scripts in parallel from the same domain.
You will need to implement caching to avoid loading and concatenating all your scripts on every request. Loading and combining all scripts on every request will eat very much CPU.
IMO, the best way to do this is to combine and minify all script files into a big one before deploying your website, and then reference that file. This way, the client just makes one roundtrip to the server, and the server does not have any extra load upon each request.
Please note that the PHP solution provided is by no means a good approach, it's just a simple demonstration of the procedure.
The main advantage of this approach is that there is only a single request between the browser and server.
Once the server receives the request, the PHP script combines the javascript files and spits the results out.
Building a PHP script that simply combines JS files is not at all difficult. You simply include the JS files and send the appropriate content-type header.
When it gets more difficult is based on whether or not you want to worry about caching.
I recommend you check out minify.
<script src="js.php?script1=jquery.js&scipt2=main.js" />
That's:
invalid (ampersands have to be encoded)
hard to expand (using script[]= would make PHP treat it as an array you can loop over)
not HTML compatible (always use <script></script>, never <script />)
The purpose of using PHP was to reduce the number of HTTP requests for JS files. But from the markup above, it seems to me that there are still going to be the same number of requests as if I had written two tags for the JS files (I could be wrong, that's why I'm asking).
You're wrong. The browser makes a single request. The server makes a single response. It just digs around in multiple files to construct it.
The question is how is the PHP code supposed to be written
The steps are listed in this answer
and what is/are the advantage(s) of this approach over the 'normal' method?
You get a single request and response, so you avoid the overhead of making multiple HTTP requests.
You lose the benefits of the generally sane cache control headers that servers send for static files, so you have to set up suitable headers in your script.
You can do this like this:
The concept is quite easy, but you may make it a bit more advanced
Step 1: merging the file
<?php
$scripts = $_GET['script'];
$contents = "";
foreach ($scripts as $script)
{
// validate the $script here to prevent inclusion of arbitrary files
$contents .= file_get_contents($pathto . "/" . $script);
}
// post processing here
// eg. jsmin, google closure, etc.
echo $contents();
?>
usage:
<script src="js.php?script[]=jquery.js&script[]=otherfile.js" type="text/javascript"></script>
Step 2: caching
<?php
function cacheScripts($scriptsArray,$outputdir)
{
$filename = sha1(join("-",$scripts) . ".js";
$path = $outputdir . "/" . $filename;
if (file_exists($path))
{
return $filename;
}
$contents = "";
foreach ($scripts as $script)
{
// validate the $script here to prevent inclusion of arbitrary files
$contents .= file_get_contents($pathto . "/" . $script);
}
// post processing here
// eg. jsmin, google closure, etc.
$filename = sha1(join("-",$scripts) . ".js";
file_write_contents( , $contents);
return $filename;
}
?>
<script src="/js/<?php echo cacheScripts(array('jquery.js', 'myscript.js'),"/path/to/js/dir"); ?>" type="text/javascript"></script>
This makes it a bit more advanced. Please note, this is semi-pseudo code to explain the concepts. In practice you will need to do more error checking and you need to do some cache invalidation.
To do this is a more managed and automated way, there's assetic (if you may use php 5.3):
https://github.com/kriswallsmith/assetic
(Which more or less does this, but much better)
Assetic
Documentation
https://github.com/kriswallsmith/assetic/blob/master/README.md
The workflow will be something along the lines of this:
use Assetic\Asset\AssetCollection;
use Assetic\Asset\FileAsset;
use Assetic\Asset\GlobAsset;
$js = new AssetCollection(array(
new GlobAsset('/path/to/js/*'),
new FileAsset('/path/to/another.js'),
));
// the code is merged when the asset is dumped
echo $js->dump();
There is a lot of support for many formats:
js
css
lot's of minifiers and optimizers (css,js, png, etc.)
Support for sass, http://sass-lang.com/
Explaining everything is a bit outside the scope of this question. But feel free to open a new question!
PHP will simply concatenate the two script files and sends only 1 script with the contents of both files, so you will only have 1 request to the server.
Using this method, there will still be the same number of disk IO requests as if you had not used the PHP method. However, in the case of a web application, disk IO on the server is never the bottle neck, the network is. What this allows you to do is reduce the overhead associated with requesting the file from the server over the network via HTTP. (Reduce the number of messages sent over the network.) The PHP script outputs the concatenation of all of the requested files so you get all of your scripts in one HTTP request operation rather than multiple.
Looking at the parameters it's passing to js.php it can load two javascript files (or any number for that matter) in one request. It would just look at its parameters (script1, script2, scriptN) and load them all in one go as opposed to loading them one by one with your normal script directive.
The PHP file could also do other things like minimizing before outputting. Although it's probably not a good idea to minimize every request on the fly.
The way the PHP code would be written is, it would look at the script parameters and just load the files from a given directory. However, it's important to note that you should check the file type and or location before loading. You don't want allow a people a backdoor where they can read all the files on your server.

Comparing XML documents for changes in PHP

Currently I'm using PHP to load multiple XML files from around the web (non-local) using simplexml_load_file(). This, as you can imagine, is quite a clunky process and is slowing load time significantly (7 seconds to load 7 files), and there could possibly be more files to load. These files don't change often, but changes should be displayed on the page as soon as they are made.
One idea I had was to cache a version of each feed and the html output I generate from that feed in my DB. Then, each time the user loads the page, the feeds would be compared; if they are different I would run my existing code, generate the HTML, output it, and save it to the DB. However, if it is the same, I could simply output the cached HTML.
My two concerns with this are:
Security: If I am storing a copy of an XML file, could this pose a security threat, seeing as I don't control the content of that file?
Speed: The main goal here is to increase the speed of the overall page load. Would the process described above increase the speed, or would it just bog down the server with more to do? Thanks for your help!
How about having a cron job crawl through every external XML source, say, hourly or quarter-hourly and update it if necessary?
It wouldn't be in 100% real time, but would take the load off your web page - that would always be using cached files. I don't think there is a reliable way of polling external sources for updates other than actually downloading the file (in theory, it should be possible to get the correct cache headers, but I wouldn't rely on them being configured correctly.)
Security: If I am storing a copy of an XML file, could this pose a security threat, seeing as I don't control the content of that file?
Hardly. To make totally sure, store the cached XML files outside the web root. The any threat that remains then is the same as if you were passing the stream through live.
One idea I had was to cache a version of each feed and the html output I generate from that feed in my DB. Then, each time the user loads the page, the feeds would be compared; if they are different I would run my existing code, generate the HTML, output it, and save it to the DB. However, if it is the same, I could simply output the cached HTML.
Rather than caching the XML file yourself, you should set the If-None-Match or If-Modified-Since fields in the request header. This way you can check to see if the files have changed without necessarily downloading them.
This can be done by setting a stream context for libxml before running simplexml_load_file(). If the file hasn't changed, you'll get a 304 Not Modified response, and simplexml_load_file will fail.
You could also use stream_context_get_default to set the general stream context, then retrieve the XML file into a string with file_get_contents and pass it to simplexml_load_string().
Here's an example of the first way:
Class CachedXml {
public $element,$url;
private $mod_date, $etag;
public function __construct($url){
$this->url = $url;
$this->element = NULL;
$this->mod_date = FALSE;
$this->etag = FALSE;
}
public function updateXml(){
if($this->mod_date || $this->etag){
$opts = array(
'http'=>array(
'header'=>"If-Modified-Since: $this->mod_date\r\n" .
"If-None-Match: $this->etag\r\n"
)
);
$context = stream_context_create($opts);
libxml_set_streams_context($context);
}
if($attempt = # simplexml_load_file($this->url)){
$this->element = $attempt;
$headers = get_headers($this->url,1);
$this->mod_date = $headers['Last-Modified'];
$this->etag = $headers['ETag'];
return TRUE;
}
return FALSE;
}
}
$bob = new CachedXml('http://example.com/xml/test.xml');
if($bob->updateXml()){
echo "Bob was just updated.<br />";
echo " Bob's name is " . $bob->element->getName() . ".<br />";
}
else{
echo "Bob was not updated.<br />";
}

PHP equivalent of ASP.NET Application/Cache objects

My Google-fu hasn't revealed what I'm looking for, so I'm putting this one out to the crowd.
Coming from an ASP.NET development background, I'm used to having the Application and Cache collections available for me to stash rarely-modified but often-used resources (such as lookup rows from a database or the contents of static XML documents) in the memory of the web server, so I don't have to reload these often-used items during every request.
Does PHP have an equivalent? I've read up briefly on the memcache extension, but this won't work for me (as I don't have control over the server configuration.) I'm tempted to implement something that would allow me to pre-parse or pre-select the resources and generate a sort of PHP cache "file" that would construct the cached object from literals stored in the file, but this seems like a very hacky solution to me.
Is there something in PHP (or, alternatively, a helper library of some sort) that will allow me to accomplish this using best practices?
In short, no, such a thing is not available natively in PHP. To understand why, you have to understand that PHP has its entire environment built for each request, and it is subsequently torn down at the end of the request. PHP does give you $_SESSION to store per session variables, but after digging into the docs you will see that that variable is built during each request also. PHP (or mod php to be more specific) is fundamentally different from other "application servers". Basically, it is not an application server. It is a per request script runner.
Now, don't get me wrong, PHP lets you do application level data store, but you will have to go to a database, or to disk to get it. Remember this though, don't worry about optimizing for performance until it is shown that preformance is a problem. And I will guess that 99 times out of 100, by the time performance is an issue that isn't due to some poor code you wrote, you will have the resources to build your own pretty little memcached server.
Take a look at Zend_Cache library, for example. It can cache in multiple backends.
This is a bit of a hack but but works in php 7+
Basically you cache your data to a temp file and then use include to read the file, which is cached in memory by the php engine’s in-memory file caching (opcache)
function cache_set($key, $val) {
$val = var_export($val, true);
// HHVM fails at __set_state, so just use object cast for now
$val = str_replace('stdClass::__set_state', '(object)', $val);
// Write to temp file first to ensure atomicity
$tmp = "/tmp/$key." . uniqid('', true) . '.tmp';
file_put_contents($tmp, '<?php $val = ' . $val . ';', LOCK_EX);
rename($tmp, "/tmp/$key");
}
And here’s how we “get” a value from the cache:
function cache_get($key) {
#include "/tmp/$key";
return isset($val) ? $val : false;
}
from https://medium.com/#dylanwenzlau/500x-faster-caching-than-redis-memcache-apc-in-php-hhvm-dcd26e8447ad

Categories