Is the same file tokenized every time I include it? - php

This question is about the PHP parsing engine.
When I include a file multiple times in a single runtime, does PHP tokenize it every time or does it keep a cache and just run the compiled code on subsequent inclusions?
EDIT: More details: I am not using an external caching mechanism and I am dealing with the same file being included multiple times during the same request.
EDIT 2: The file I'm trying to include contains procedural code. I want it to be executed every time I include() it, I am just curious if PHP internally keeps track of the tokenized version of the file for speed reasons.

You should use a PHP bytecode cache such as APC. That will accomplish what you want, to re-use a compiled version of a PHP page on subsequent requests. Otherwise, PHP reads the file, tokenizes and compiles it on every request.

By default the file is parsed every time it is (really) included, even within the same php instance.
But there are opcode caches like e.g. apc
<?php
$i = 'include_test.php';
file_put_contents($i, '<?php $x = 1;');
include $i;
echo $x, ' ';
file_put_contents($i, '<?php $x = 2;');
include $i;
echo $x, ' '1 2(ok, weak proof. PHP could check whether the file's mtime has changed. And that what apc does, I think. But without a cache PHP really doesn't)

Look at include_once().
It will include it again.
Also if you are using objects. Look at __autoload()

I just wrote a basic test, much like VolkerK's. Here's what I tested:
<?php
file_put_contents('include.php','<?php echo $i . "<br />"; ?>');
for($i = 0; $i<10; $i++){
include('include.php');
if($i == 5){
file_put_contents('include.php','<?php echo $i+$i; echo "<br />"; ?>');
}
}
?>
This generated the following:
0
1
2
3
4
5
12
14
16
18
So, unless it caches based on mtime of the file, it seems it parses every include. You would likely want to use include_once() instead of standard include(). Hope that helps!

Related

Is The include Command of PHP buffered?

I've some problems in my script with inlcuding some Files with include
I can reduce it to a small code example:
<?php
$res = 0;
include "test.txt";
echo "RES = > $res".PHP_EOL;
file_put_contents('./test.txt','<?php $res='.($res+1).';'.PHP_EOL);
include "test.txt";
echo "RES = > $res".PHP_EOL;
I'm expecting an Output of
RES => 0
RES => 1
//On next Call I'm expecting ...
RES => 1
RES => 2
But what I'm getting:
RES => 0
RES => 0
Even the next call gives the same Result (RES => 0). When I recall the Script 1-2 sec later, I'm getting an increment of the RES.
So my question: Is the include statement of PHP buffered? I haven't seen some parts in the documentation of php about buffering. What is the problem with my example?
It depends on whether or not you have an OPCode cache installed. If you do the script is loaded from memory after the first time.
I'm not sure on the behavior without an OPCode cache. PHP may load the file from disk each time you call include. You could find out with strace et al. You'll probably reap the benefits of a filesystem cache even if PHP is going back to disk on subsequent invocations of include.
Generally I would encourage use of an OPCode cache.
EDIT
I now see you're changing the content of the file before the second include... I've tried your example from the CLI and it's working as you expect. Try it on your server via the CLI. If it works (which it should) then there's a good chance you have an OPCode cache enabled and the particular configuration is preventing the expected behavior.
You should also verify Apache is writing out the updated file as you expect. Maybe when you write to disk with file_put_contents, you also log what each version of the generated file is. Something like this after your existing file_put_contents call:
// For logging
file_put_contents('./test-' . time() . '.txt','<?php $res='.($res+1).';'.PHP_EOL);
When running the example from the command line or with disabled opcache, the script works fine.
Disabling opcache would give correct results.
A workaround with an enabled opcache would be something like this (this will prevent caching the file):
//Instead of include "test.txt";
//we include the part manual by eval
$cont = file_get_contents("test.txt");
//Strip of leading <?php and eval the string
eval(substr($cont, 5));

creating only new files in PHP without cpu intensive code

In my cache system, I want it where if a new page is requested, a check is made to see if a file exists and if it doesn't then a copy is stored on the server, If it does exist, then it must not be overwritten.
The problem I have is that I may be using functions designed to be slow.
This is part of my current implementation to save files:
if (!file_exists($filename)){$h=fopen($filename,"wb");if ($h){fwrite($h,$c);fclose($h);}}
This is part of my implementation to load files:
if (($m=#filemtime($file)) !== false){
if ($m >= filemtime("sitemodification.file")){
$outp=file_get_contents($file);
header("Content-length:".strlen($outp),true);echo $outp;flush();exit();
}
}
What I want to do is replace this with a better set of functions meant for performance and yet still achieve the same functionality. All caching files including sitemodification.file reside on a ramdisk. I added a flush before exit in hopes that content will be outputted faster.
I can't use direct memory addressing at this time because the file sizes to be stored are all different.
Is there a set of functions I can use that can execute the code I provided faster by at least a few milliseconds, especially the loading files code?
I'm trying to keep my time to first byte low.
First, prefer is_file to file_exists and use file_put_contents:
if ( !is_file($filename) ) {
file_put_contents($filename,$c);
}
Then, use the proper function for this kind of work, readfile:
if ( ($m = #filemtime($file)) !== false && $m >= filemtime('sitemodification.file')) {
header('Content-length:'.filesize($file));
readfile($file);
}
}
You should see a little improvement but keep in mind that file accesses are slow and you check three times for files access before sending any content.

Dynamically changed files in PHP. Changes sometimes are not visible in include(), ftp_put()

I have scripts like these:
file_put_contents("filters.php", '<? $filter_arr = '.var_export($filter_arr, true).'; ?>');
include("filters.php");
or:
$xml = '<?xml version="1.0" encoding="UTF-8"?>'."\n<xml>\n\t<items>\n".$xml_0."\n\t</items>\n</xml>";
file_put_contents($PROJECT_ROOT."/xml/$file_type.xml", $xml);
$upload_result = ftp_put($ftp_stream, $destination_file, $PROJECT_ROOT."/xml/$file_type.xml", FTP_BINARY);
Actually changes to those files are applied physically (written to files).
But sometimes not visible after include(), or not sent by ftp_put() to remote server.
It's seems something like PHP caching this files.
Adding sleep(1) before include() doesn't help.
A also have a test like this:
for ($i=1; $i <= 100; $i++) {
echo "$i)";
$filter_arr = array($i);
file_put_contents("test.txt", '<? $filter_arr = '.var_export($filter_arr, true).'; ?>');
include("test.txt");
echo $filter_arr[0]."<br>";
}
About 90% of times output is normal:
1) 1
2) 2
...
100) 100
About 10% of times output is wrong:
1) 1
2) 1
...
100) 1
Playing with flock() or clearstatcache() also have no affect.
It seems no filesystem or file locking problem, because in both times file is written, but one time with incorrect data, like $i is not ascending, what is weird. Only error that i got was that the file was locked for writing when i hold down F5, thats it.
Can you be more exact with versions and OS?
I have face the same problem.
EDIT The correct answer
You can use
opcache_invalidate('second.php');//Reset file cache
as stated here: PHP include doesn't read changes of source file

Caching includes in PHP for iterated reuse

Is there a way to cache a PHP include effectively for reuse, without APC, et al?
Simple (albeit stupid) example:
// rand.php
return rand(0, 999);
// index.php
$file = 'rand.php';
while($i++ < 1000){
echo include($file);
}
Again, while ridiculous, this pair of scripts dumps 1000 random numbers. However, for every iteration, PHP has to hit the filesystem (Correct? There is no inherit caching functionality I've missed, is there?)
Basically, how can I prevent the previous scenario from resulting in 1000 hits to the filesystem?
The only consideration I've come to so far is a goofy one, and it may not prove effective at all (haven't tested, wrote it here, error prone, but you get the idea):
// rand.php
return rand(0, 999);
// index.php
$file = 'rand.php';
$cache = array();
while($i++ < 1000){
if(isset($cache[$file])){
echo eval('?>' . $cache[$file] . '<?php;');
}else{
$cache[$file] = file_get_contents($file);
echo include($file);
}
}
A more realistic and less silly example:
When including files for view generation, given a view file is used a number of times in a given request (a widget or something) is there a realistic way to capture and re-evaluate the view script without a filesystem hit?
This would only make any sense if the include file was accessed across a network.
There is no inherit caching functionality I've missed, is there?
All operating systems are very highly optimized to reduce the amount of physical I/O and to speed up file operations. On a properly configured system in most cases, the system will rarely revert to disk to fetch PHP code. Sit down with a spreadsheet and have a think about how long it would take to process PHP code if every file had to be fetched from disk - it'd be ridiculous, e.g. suppose your script is in /var/www/htdocs/index.php and includes /usr/local/php/resource.inc.php - that's 8 seek operations to just locate the files - #8ms each, that's 64ms to find the files! Run some timings on your test case - you'll see that its running much, much faster than that.
As with Sabeen Malik's answer you could capture the output of the include with output buffering, then concat all of them together, then save that to a file and include the one file each time.
This one collective include could be kept for an hour by checking the file's mod time and then rewriting and re including the includes only once an hour.
I think better design would be something like this:
// rand.php
function get_rand() {
return rand(0, 999);
}
// index.php
$file = 'rand.php';
include($file);
while($i++ < 1000){
echo get_rand();
}
Another option:
while($i++ < 1000) echo rand(0, 999);

How can I optimize this simple PHP script?

This first script gets called several times for each user via an AJAX request. It calls another script on a different server to get the last line of a text file. It works fine, but I think there is a lot of room for improvement but I am not a very good PHP coder, so I am hoping with the help of the community I can optimize this for speed and efficiency:
AJAX POST Request made to this script
<?php session_start();
$fileName = $_POST['textFile'];
$result = file_get_contents($_SESSION['serverURL']."fileReader.php?textFile=$fileName");
echo $result;
?>
It makes a GET request to this external script which reads a text file
<?php
$fileName = $_GET['textFile'];
if (file_exists('text/'.$fileName.'.txt')) {
$lines = file('text/'.$fileName.'.txt');
echo $lines[sizeof($lines)-1];
}
else{
echo 0;
}
?>
I would appreciate any help. I think there is more improvement that can be made in the first script. It makes an expensive function call (file_get_contents), well at least I think its expensive!
This script should limit the locations and file types that it's going to return.
Think of somebody trying this:
http://www.yoursite.com/yourscript.php?textFile=../../../etc/passwd (or something similar)
Try to find out where delays occur.. does the HTTP request take long, or is the file so large that reading it takes long.
If the request is slow, try caching results locally.
If the file is huge, then you could set up a cron job that extracts the last line of the file at regular intervals (or at every change), and save that to a file that your other script can access directly.
readfile is your friend here
it reads a file on disk and streams it to the client.
script 1:
<?php
session_start();
// added basic argument filtering
$fileName = preg_replace('/[^A-Za-z0-9_]/', '', $_POST['textFile']);
$fileName = $_SESSION['serverURL'].'text/'.$fileName.'.txt';
if (file_exists($fileName)) {
// script 2 could be pasted here
//for the entire file
//readfile($fileName);
//for just the last line
$lines = file($fileName);
echo $lines[count($lines)-1];
exit(0);
}
echo 0;
?>
This script could further be improved by adding caching to it. But that is more complicated.
The very basic caching could be.
script 2:
<?php
$lastModifiedTimeStamp filemtime($fileName);
if (isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
$browserCachedCopyTimestamp = strtotime(preg_replace('/;.*$/', '', $_SERVER['HTTP_IF_MODIFIED_SINCE']));
if ($browserCachedCopyTimestamp >= $lastModifiedTimeStamp) {
header("HTTP/1.0 304 Not Modified");
exit(0);
}
}
header('Content-Length: '.filesize($fileName));
header('Expires: '.gmdate('D, d M Y H:i:s \G\M\T', time() + 604800)); // (3600 * 24 * 7)
header('Last-Modified: '.date('D, d M Y H:i:s \G\M\T', $lastModifiedTimeStamp));
?>
First things first: Do you really need to optimize that? Is that the slowest part in your use case? Have you used xdebug to verify that? If you've done that, read on:
You cannot really optimize the first script usefully: If you need a http-request, you need a http-request. Skipping the http request could be a performance gain, though, if it is possible (i.e. if the first script can access the same files the second script would operate on).
As for the second script: Reading the whole file into memory does look like some overhead, but that is neglibable, if the files are small. The code looks very readable, I would leave it as is in that case.
If your files are big, however, you might want to use fopen() and its friends fseek() and fread()
# Do not forget to sanitize the file name here!
# An attacker could demand the last line of your password
# file or similar! ($fileName = '../../passwords.txt')
$filePointer = fopen($fileName, 'r');
$i = 1;
$chunkSize = 200;
# Read 200 byte chunks from the file and check if the chunk
# contains a newline
do {
fseek($filePointer, -($i * $chunkSize), SEEK_END);
$line = fread($filePointer, $i++ * $chunkSize);
} while (($pos = strrpos($line, "\n")) === false);
return substr($line, $pos + 1);
If the files are unchanging, you should cache the last line.
If the files are changing and you control the way they are produced, it might or might not be an improvement to reverse the order lines are written, depending on how often a line is read over its lifetime.
Edit:
Your server could figure out what it wants to write to its log, put it in memcache, and then write it to the log. The request for the last line could be fulfulled from memcache instead of file read.
The most probable source of delay is that cross-server HTTP request. If the files are small, the cost of fopen/fread/fclose is nothing compared to the whole HTTP request.
(Not long ago I used HTTP to retrieve images to dinamically generate image-based menus. Replacing the HTTP request by a local file read reduced the delay from seconds to tenths of a second.)
I assume that the obvious solution of accessing the file server filesystem directly is out of the question. If not, then it's the best and simplest option.
If not, you could use caching. Instead of getting the whole file, you just issue a HEAD request and compare the timestamp to a local copy.
Also, if you are ajax-updating a lot of clients based on the same files, you might consider looking at using comet (meteor, for example). It's used for things like chats, where a single change has to be broadcasted to several clients.

Categories