This first script gets called several times for each user via an AJAX request. It calls another script on a different server to get the last line of a text file. It works fine, but I think there is a lot of room for improvement but I am not a very good PHP coder, so I am hoping with the help of the community I can optimize this for speed and efficiency:
AJAX POST Request made to this script
<?php session_start();
$fileName = $_POST['textFile'];
$result = file_get_contents($_SESSION['serverURL']."fileReader.php?textFile=$fileName");
echo $result;
?>
It makes a GET request to this external script which reads a text file
<?php
$fileName = $_GET['textFile'];
if (file_exists('text/'.$fileName.'.txt')) {
$lines = file('text/'.$fileName.'.txt');
echo $lines[sizeof($lines)-1];
}
else{
echo 0;
}
?>
I would appreciate any help. I think there is more improvement that can be made in the first script. It makes an expensive function call (file_get_contents), well at least I think its expensive!
This script should limit the locations and file types that it's going to return.
Think of somebody trying this:
http://www.yoursite.com/yourscript.php?textFile=../../../etc/passwd (or something similar)
Try to find out where delays occur.. does the HTTP request take long, or is the file so large that reading it takes long.
If the request is slow, try caching results locally.
If the file is huge, then you could set up a cron job that extracts the last line of the file at regular intervals (or at every change), and save that to a file that your other script can access directly.
readfile is your friend here
it reads a file on disk and streams it to the client.
script 1:
<?php
session_start();
// added basic argument filtering
$fileName = preg_replace('/[^A-Za-z0-9_]/', '', $_POST['textFile']);
$fileName = $_SESSION['serverURL'].'text/'.$fileName.'.txt';
if (file_exists($fileName)) {
// script 2 could be pasted here
//for the entire file
//readfile($fileName);
//for just the last line
$lines = file($fileName);
echo $lines[count($lines)-1];
exit(0);
}
echo 0;
?>
This script could further be improved by adding caching to it. But that is more complicated.
The very basic caching could be.
script 2:
<?php
$lastModifiedTimeStamp filemtime($fileName);
if (isset($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
$browserCachedCopyTimestamp = strtotime(preg_replace('/;.*$/', '', $_SERVER['HTTP_IF_MODIFIED_SINCE']));
if ($browserCachedCopyTimestamp >= $lastModifiedTimeStamp) {
header("HTTP/1.0 304 Not Modified");
exit(0);
}
}
header('Content-Length: '.filesize($fileName));
header('Expires: '.gmdate('D, d M Y H:i:s \G\M\T', time() + 604800)); // (3600 * 24 * 7)
header('Last-Modified: '.date('D, d M Y H:i:s \G\M\T', $lastModifiedTimeStamp));
?>
First things first: Do you really need to optimize that? Is that the slowest part in your use case? Have you used xdebug to verify that? If you've done that, read on:
You cannot really optimize the first script usefully: If you need a http-request, you need a http-request. Skipping the http request could be a performance gain, though, if it is possible (i.e. if the first script can access the same files the second script would operate on).
As for the second script: Reading the whole file into memory does look like some overhead, but that is neglibable, if the files are small. The code looks very readable, I would leave it as is in that case.
If your files are big, however, you might want to use fopen() and its friends fseek() and fread()
# Do not forget to sanitize the file name here!
# An attacker could demand the last line of your password
# file or similar! ($fileName = '../../passwords.txt')
$filePointer = fopen($fileName, 'r');
$i = 1;
$chunkSize = 200;
# Read 200 byte chunks from the file and check if the chunk
# contains a newline
do {
fseek($filePointer, -($i * $chunkSize), SEEK_END);
$line = fread($filePointer, $i++ * $chunkSize);
} while (($pos = strrpos($line, "\n")) === false);
return substr($line, $pos + 1);
If the files are unchanging, you should cache the last line.
If the files are changing and you control the way they are produced, it might or might not be an improvement to reverse the order lines are written, depending on how often a line is read over its lifetime.
Edit:
Your server could figure out what it wants to write to its log, put it in memcache, and then write it to the log. The request for the last line could be fulfulled from memcache instead of file read.
The most probable source of delay is that cross-server HTTP request. If the files are small, the cost of fopen/fread/fclose is nothing compared to the whole HTTP request.
(Not long ago I used HTTP to retrieve images to dinamically generate image-based menus. Replacing the HTTP request by a local file read reduced the delay from seconds to tenths of a second.)
I assume that the obvious solution of accessing the file server filesystem directly is out of the question. If not, then it's the best and simplest option.
If not, you could use caching. Instead of getting the whole file, you just issue a HEAD request and compare the timestamp to a local copy.
Also, if you are ajax-updating a lot of clients based on the same files, you might consider looking at using comet (meteor, for example). It's used for things like chats, where a single change has to be broadcasted to several clients.
Related
Have a file in a website. A PHP script modifies it like this:
$contents = file_get_contents("MyFile");
// ** Modify $contents **
// Now rewrite:
$file = fopen("MyFile","w+");
fwrite($file, $contents);
fclose($file);
The modification is pretty simple. It grabs the file's contents and adds a few lines. Then it overwrites the file.
I am aware that PHP has a function for appending contents to a file rather than overwriting it all over again. However, I want to keep using this method since I'll probably change the modification algorithm in the future (so appending may not be enough).
Anyway, I was testing this out, making like 100 requests. Each time I call the script, I add a new line to the file:
First call:
First!
Second call:
First!
Second!
Third call:
First!
Second!
Third!
Pretty cool. But then:
Fourth call:
Fourth!
Fifth call:
Fourth!
Fifth!
As you can see, the first, second and third lines simply disappeared.
I've determined that the problem isn't the contents string modification algorithm (I've tested it separately). Something is messed up either when reading or writing the file.
I think it is very likely that the issue is when the file's contents are read: if $contents, for some odd reason, is empty, then the behavior shown above makes sense.
I'm no expert with PHP, but perhaps the fact that I performed 100 calls almost simultaneously caused this issue. What if there are two processes, and one is writing the file while the other is reading it?
What is the recommended approach for this issue? How should I manage file modifications when several processes could be writing/reading the same file?
What you need to do is use flock() (file lock)
What I think is happening is your script is grabbing the file while the previous script is still writing to it. Since the file is still being written to, it doesn't exist at the moment when PHP grabs it, so php gets an empty string, and once the later processes is done it overwrites the previous file.
The solution is to have the script usleep() for a few milliseconds when the file is locked and then try again. Just be sure to put a limit on how many times your script can try.
NOTICE:
If another PHP script or application accesses the file, it may not necessarily use/check for file locks. This is because file locks are often seen as an optional extra, since in most cases they aren't needed.
So the issue is parallel accesses to the same file, while one is writing to the file another instance is reading before the file has been updated.
PHP luckily has a mechanisms for locking the file so no one can read from it until the lock is released and the file has been updated.
flock()
can be used and the documentation is here
You need to create a lock, so that any concurrent requests will have to wait their turn. This can be done using the flock() function. You will have to use fopen(), as opposed to file_get_contents(), but it should not be a problem:
$file = 'file.txt';
$fh = fopen($file, 'r+');
if (flock($fh, LOCK_EX)) { // Get an exclusive lock
$data = fread($fh, filesize($file)); // Get the contents of file
// Do something with data here...
ftruncate($fh, 0); // Empty the file
fwrite($fh, $newData); // Write new data to file
fclose($fh); // Close handle and release lock
} else {
die('Unable to get a lock on file: '.$file);
}
Some background information
The files I would like to download is kept at the external server for a week, and a new XML file(10-50mb large) is created there every hour with a different name. I would like the large file to be downloaded to my server chunk by chunk in the background each time my website is loaded, perhaps 0.5mb each time, and then resume the download the next time someone else loads the website. This would require my site to have atleast 100 pageloads each hour to stay updated, so perhaps abit more of the file each time if possible. I have researched simpleXML, XMLreader, SAX parsing, but whatever I do, it seems it takes too long to parse the file directly, therefore I would like a different approach, namely downloading it like described above.
If I download a 30mb large XML file, I can parse it locally with XMLreader in 3 seconds(250k iterations) only, but when I try to do the same from the external server limiting it to 50k iterations, it uses 15secs to read that small part, so it would not be possible to parse it directly from that server it seems.
Possible solutions
I think it's best to use cURL. But then again, perhaps fopen(), fsockopen(), copy() or file_get_contents() are the way to go. I'm looking for advice on what functions to use to make this happen, or different solutions on how I can parse a 50mb external XML file into a mySQL database.
I suspect a Cron job every hour would be the best solution, but I am not sure how well that would be supported by webhosting companies, and I have no clue how to do something like that. But if thats the best solution, and the majority thinks so, I will have to do my research in that area too.
If a java applet/javascript running in the background would be a better solution, please point me in the right direction when it comes to functions/methods/libraries there aswell.
Summary
What's the best solution to downloading parts of a file in the
background, and resume the download each time my website is loaded
until its completed?
If the above solution would be moronic to even try, what
language/software would you use to achieve the same thing(download a large file every hour)?
Thanks in advance for all answers, and sorry for the long story/question.
Edit: I ended up using this solution to get the files with cron job scheduling a php script. It checks my folder for what files I already have, generates a list of the possible downloads for the last four days, then downloads the next XMLfile in line.
<?php
$date = new DateTime();
$current_time = $date->getTimestamp();
$four_days_ago = $current_time-345600;
echo 'Downloading: '."\n";
for ($i=$four_days_ago; $i<=$current_time; ) {
$date->setTimestamp($i);
if($date->format('H') !== '00') {
$temp_filename = $date->format('Y_m_d_H') ."_full.xml";
if(!glob($temp_filename)) {
$temp_url = 'http://www.external-site-example.com/'.$date->format('Y/m/d/H') .".xml";
echo $temp_filename.' --- '.$temp_url.'<br>'."\n";
break; // with a break here, this loop will only return the next file you should download
}
}
$i += 3600;
}
set_time_limit(300);
$Start = getTime();
$objInputStream = fopen($temp_url, "rb");
$objTempStream = fopen($temp_filename, "w+b");
stream_copy_to_stream($objInputStream, $objTempStream, (1024*200000));
$End = getTime();
echo '<br>It took '.number_format(($End - $Start),2).' secs to download "'.$temp_filename.'".';
function getTime() {
$a = explode (' ',microtime());
return(double) $a[0] + $a[1];
}
?>
edit2: I just wanted to inform you that there is a way to do what I asked, only it would'nt work in my case. With the amount of data I need the website would have to have 400+ visitors an hour for it to work properly. But with smaller amounts of data there are some options; http://www.google.no/search?q=poormanscron
You need to have a scheduled, offline task (e.g., cronjob). The solution you are pursuing is just plain wrong.
The simplest thing that could possibly work is a php script you run every hour (scheduled via cron, most likely) that downloads the file and processes it.
You could try fopen:
<?php
$handle = fopen("http://www.example.com/test.xml", "rb");
$contents = stream_get_contents($handle);
fclose($handle);
?>
I'm having the following problem with my VPS server.
I have a long-running PHP script that sends big files to the browser. It does something like this:
<?php
header("Content-type: application/octet-stream");
readfile("really-big-file.zip");
exit();
?>
This basically reads the file from the server's file system and sends it to the browser. I can't just use direct links(and let Apache serve the file) because there is business logic in the application that needs to be applied.
The problem is that while such download is running, the site doesn't respond to other requests.
The problem you are experiencing is related to the fact that you are using sessions. When a script has a running session, it locks the session file to prevent concurrent writes which may corrupt the session data. This means that multiple requests from the same client - using the same session ID - will not be executed concurrently, they will be queued and can only execute one at a time.
Multiple users will not experience this issue, as they will use different session IDs. This does not mean that you don't have a problem, because you may conceivably want to access the site whilst a file is downloading, or set multiple files downloading at once.
The solution is actually very simple: call session_write_close() before you start to output the file. This will close the session file, release the lock and allow further concurrent requests to execute.
Your server setup is probably not the only place you should be checking.
Try doing a request from your browser as usual and then do another from some other client.
Either wget from the same machine or another browser on a different machine.
In what way doesn't the server respond to other requests? Is it "Waiting for example.com..." or does it give an error of any kind?
I do something similar, but I serve the file chunked, which gives the file system a break while the client accepts and downloads a chunk, which is better than offering up the entire thing at once, which is pretty demanding on the file system and the entire server.
EDIT: While not the answer to this question, asker asked about reading a file chunked. Here's the function that I use. Supply it the full path to the file.
function readfile_chunked($file_path, $retbytes = true)
{
$buffer = '';
$cnt = 0;
$chunksize = 1 * (1024 * 1024); // 1 = 1MB chunk size
$handle = fopen($file_path, 'rb');
if ($handle === false) {
return false;
}
while (!feof($handle)) {
$buffer = fread($handle, $chunksize);
echo $buffer;
ob_flush();
flush();
if ($retbytes) {
$cnt += strlen($buffer);
}
}
$status = fclose($handle);
if ($retbytes && $status) {
return $cnt; // return num. bytes delivered like readfile() does.
}
return $status;
}
I have tried different approaches (reading and sending the files in small chunks [see comments on readfile in PHP doc], using PEARs HTTP_Download) but I always ran into performance problems when the files are getting big.
There is an Apache mod X-Sendfile where you can do your business logic and then delegate the download to Apache. The download will not be publicly available. I think, this is the most elegant solution for the problem.
More Info:
http://tn123.org/mod_xsendfile/
http://www.brighterlamp.com/2010/10/send-files-faster-better-with-php-mod_xsendfile/
The same happens go to me and i'm not using sessions.
session.auto_start is set to 0
My example script only runs "sleep(5)", and adding "session_write_close()" at the beginning doesn't solve the problem.
Check your httpd.conf file. Maybe you have "KeepAlive On" and that is why your second request hangs until the first is completed. In general your PHP script should not allow the visitors to wait for long time. If you need to download something big, do it in a separate internal request that user have no direct control of. Until its done, return some "executing" status to the end user and when its done, process the actual results.
Here's my code:
$cachefile = "cache/ttcache.php";
if(file_exists($cachefile) && ((time() - filemtime($cachefile)) < 900))
{
include($cachefile);
}
else
{
ob_start();
/*resource-intensive loop that outputs
a listing of the top tags used on the website*/
$fp = fopen($cachefile, 'w');
fwrite($fp, ob_get_contents());
fflush($fp);
fclose($fp);
ob_end_flush();
}
This code seemed like it worked fine at first sight, but I found a bug, and I can't figure out how to solve it. Basically, it seems that after I leave the page alone for a period of time, the cache file empties (either that, or when I refresh the page, it clears the cache file, rendering it blank). Then the conditional sees the now-blank cache file, sees its age as less than 900 seconds, and pulls the blank cache file's contents in place of re-running the loop and refilling the cache.
I catted the cache file in the command line and saw that it is indeed blank when this problem exists.
I tried setting it to 60 seconds to replicate this problem more often and hopefully get to the bottom of it, but it doesn't seem to replicate if I am looking for it, only when I leave the page and come back after a while.
Any help?
In the caching routines that I write, I almost always check the filesize, as I want to make sure I'm not spewing blank data, because I rely on a bash script to clear out the cache.
if(file_exists($cachefile) && (filesize($cachefile) > 1024) && ((time() - filemtime($cachefile)) < 900))
This assumes that your outputted cache file is > 1024 bytes, which, usually it will be if it's anything relatively large. Adding a lock file would be useful as well, as noted in the comments above to avoid multiple processes trying to write to the same lock file.
you can double check the file size with the filesize() function, if it's too small, act as if the cache was old.
if there's no PHP in the file, you may want to either use readfile() for performance reasons to just spit the file back out to the end user.
I've recently started getting into the area of optimizing preformance and load times client side, compressing css/js, gzipping, paying attention to YSlow, etc.
I'm wondering, while trying to achieve all these micro-optimizations, what are the pros and cons of serving php files as css or javascript?
I'm not entirely sure where the bottleneck is, if there is one. I would assume that between an identical css and php file, the "pure" css file would be slightly faster simply because it doesn't need to parse php code. However, in a php file you can have more control over headers which may be more important(?).
Currently I'm doing a filemtime() check on a "trigger" file, and with some php voodoo writing a single compressed css file from it, combined with several other files in a defined group. This creates a file like css/groupname/301469778.css, which the php template catches and updates the html tags with the new file name. It seemed like the safest method, but I don't really like the server cache getting filled up with junk css files after several edits. I also don't bother doing this for small "helper" css files that are only loaded for certain pages.
If 99% of my output is generated by php anyways, what's the harm (if any) by using php to directly output css/js content? (assuming there are no php errors)
If using php, is it a good idea to mod_rewrite the files to use the css/js extension for any edge cases of browser misinterpretation? Can't hurt? Not needed?
Are there any separate guidelines/methods for css and javascript? I would assume that they would be equal.
Which is faster: A single css file with several #imports, or a php file with several readfile() calls?
What other ways does using php affect speed?
Once the file is cached in the browser, does it make a difference anymore?
I would prefer to use php with .htaccess because it is much simpler, but in the end I will use whatever method is best.
ok, so here are your direct answers:
no harm at all as long as your code is fine. The browser won't notice any difference.
no need for mod_rewrite. the browsers usually don't care about the URL (and often not even about the MIME type).
CSS files are usually smaller and often one file is enough, so no need to combine. Be aware that combining files from different directories affect images referenced in the CSS as they remain relative to the CSS URL
definitely readfile() will be faster as #import requires multiple HTTP requests and you want to reduce as much as possible
when comparing a single HTTP request, PHP may be slightly slower. But you loose the possibility to combine files unless you do that offline.
no, but browser caches are unreliable and improper web server config may cause the browser to unnecessarily re-fetch the URL.
It's impossible to give you a much more concrete answer because it depends a lot on your project details.
We are developing really large DHTML/AJAX web application with about 2+ MB of JavaScript code and they still load quickly with some optimizations:
try to reduce the number of Script URLs included. We use a simple PHP script that loads a bunch of .js files and sends them in one go to the browser (all concatenated). This will load your page a lot faster when you have a lot of .js files as we do since the overhead of setting up a HTTP connection is usually much higher that the actually transferring the content itself. Note that the browser needs to download JS files synchroneously.
be cache friendly. Our HTML page is also generated via PHP and the URL to the scripts contains a hash that's dependent on the file modification times. The PHP script above that combines the .js files then checks the HTTP cache headers and sets a long expiration time so that the browser does not even have to load any external scripts the second time the user visits the page.
GZIP compress the scripts. This will reduce your code by about 90%. We don't even have to minify the code (which makes debugging easier).
So, yes, using PHP to send the CSS/JS files can improve the loading time of your page a lot - especially for large pages.
EDIT: You may use this code to combine your files:
function combine_files($list, $mime) {
if (!is_array($list))
throw new Exception("Invalid list parameter");
ob_start();
$lastmod = filemtime(__FILE__);
foreach ($list as $fname) {
$fm = #filemtime($fname);
if ($fm === false) {
$msg = $_SERVER["SCRIPT_NAME"].": Failed to load file '$fname'";
if ($mime == "application/x-javascript") {
echo 'alert("'.addcslashes($msg, "\0..\37\"\\").'");';
exit(1);
} else {
die("*** ERROR: $msg");
}
}
if ($fm > $lastmod)
$lastmod = $fm;
}
//--
$if_modified_since = preg_replace('/;.*$/', '',
$_SERVER["HTTP_IF_MODIFIED_SINCE"]);
$gmdate_mod = gmdate('D, d M Y H:i:s', $lastmod) . ' GMT';
$etag = '"'.md5($gmdate_mod).'"';
if (headers_sent())
die("ABORTING - headers already sent");
if (($if_modified_since == $gmdate_mod) or
($etag == $_SERVER["HTTP_IF_NONE_MATCH"])) {
if (php_sapi_name()=='CGI') {
Header("Status: 304 Not Modified");
} else {
Header("HTTP/1.0 304 Not Modified");
}
exit();
}
header("Last-Modified: $gmdate_mod");
header("ETag: $etag");
fc_enable_gzip();
// Cache-Control
$maxage = 30*24*60*60; // 30 Tage (Versions-Unterstützung im HTML Code!)
$expire = gmdate('D, d M Y H:i:s', time() + $maxage) . ' GMT';
header("Expires: $expire");
header("Cache-Control: max-age=$maxage, must-revalidate");
header("Content-Type: $mime");
echo "/* ".date("r")." */\n";
foreach ($list as $fname) {
echo "\n\n/***** $fname *****/\n\n";
readfile($fname);
}
}
function files_hash($list, $basedir="") {
$temp = array();
$incomplete = false;
if (!is_array($list))
$list = array($list);
if ($basedir!="")
$basedir="$basedir/";
foreach ($list as $fname) {
$t = #filemtime($basedir.$fname);
if ($t===false)
$incomplete = true;
else
$temp[] = $t;
}
if (!count($temp))
return "ERROR";
return md5(implode(",",$temp)) . ($incomplete ? "-INCOMPLETE" : "");
}
function fc_compress_output_gzip($output) {
$compressed = gzencode($output);
$olen = strlen($output);
$clen = strlen($compressed);
if ($olen)
header("X-Compression-Info: original $olen bytes, gzipped $clen bytes ".
'('.round(100/$olen*$clen).'%)');
return $compressed;
}
function fc_compress_output_deflate($output) {
$compressed = gzdeflate($output, 9);
$olen = strlen($output);
$clen = strlen($compressed);
if ($olen)
header("X-Compression-Info: original $olen bytes, deflated $clen bytes ".
'('.round(100/$olen*$clen).'%)');
return $compressed;
}
function fc_enable_gzip() {
if(isset($_SERVER['HTTP_ACCEPT_ENCODING']))
$AE = $_SERVER['HTTP_ACCEPT_ENCODING'];
else
$AE = $_SERVER['HTTP_TE'];
$support_gzip = !(strpos($AE, 'gzip')===FALSE);
$support_deflate = !(strpos($AE, 'deflate')===FALSE);
if($support_gzip && $support_deflate) {
$support_deflate = $PREFER_DEFLATE;
}
if ($support_deflate) {
header("Content-Encoding: deflate");
ob_start("fc_compress_output_deflate");
} else{
if($support_gzip){
header("Content-Encoding: gzip");
ob_start("fc_compress_output_gzip");
} else{
ob_start();
}
}
}
Use files_hash() to generate a unique hash string that changes whenever your source files change and combine_files() to send the combined files to the browser. So, use files_hash() when generating the HTML code for the tag and combine_files() in the PHP script that is loaded via that tag. Just place the hash in the query string of the URL.
<script language="JavaScript" src="get_the_code.php?hash=<?=files_hash($list_of_js_files)?>"></script>
Make sure you specify the same $list in both cases.
You're talking about serving static files via PHP, there's really little point doing that since its always going to be slower than Apache serving a normal file. A CSS #import will be quicker that PHP's readfile() but the best performance will be gained by serving one minified CSS file that combines all the CSS you need to use.
If sounds like you're on the right track though. I'd advise pre-processing your CSS and saving to disk. If you need to set special headers for things like caching just do this in your VirtualHost directive or .htaccess file.
To avoid lots of cached files you could use a simple file-naming convention for your minified CSS. For example, if your main CSS file called main.css and it references reset.css and forms.css via #imports, the minified version could be called main.min.css
When this file is regenerated it simply replaces it. If you include a reference to that file in your HTML, you could send the request to PHP if the file doesn't exist, combine and minify the file (via something like YUI Compressor), and save it to disk and therefore be served via normal HTTP for all future requests.
When you update your CSS just delete the main.min.css version and it will automatically regenerate.
You can do the preprocessing with an ANT Build. Sorry, the post is german, but I've tried translate.google.com and it worked fine :-) So you can use the post as tutorial to achieve a better performance...
I would preprocess the files and save them to disk, just like simonrjones said. Caching-stuff etc. should be done by the dedicated elements, like Apache WebServer, Headers and Browser.
While slower, one advantage / reason you might have to do this is to put dynamic content into the files on the server, but still have them appear to be js or css from the client perspective.
Like this for example, passing the environment from php to javascript:
var environment = <?=getenv('APPLICATION_ENV');?>
// More JS code here ...