In my cache system, I want it where if a new page is requested, a check is made to see if a file exists and if it doesn't then a copy is stored on the server, If it does exist, then it must not be overwritten.
The problem I have is that I may be using functions designed to be slow.
This is part of my current implementation to save files:
if (!file_exists($filename)){$h=fopen($filename,"wb");if ($h){fwrite($h,$c);fclose($h);}}
This is part of my implementation to load files:
if (($m=#filemtime($file)) !== false){
if ($m >= filemtime("sitemodification.file")){
$outp=file_get_contents($file);
header("Content-length:".strlen($outp),true);echo $outp;flush();exit();
}
}
What I want to do is replace this with a better set of functions meant for performance and yet still achieve the same functionality. All caching files including sitemodification.file reside on a ramdisk. I added a flush before exit in hopes that content will be outputted faster.
I can't use direct memory addressing at this time because the file sizes to be stored are all different.
Is there a set of functions I can use that can execute the code I provided faster by at least a few milliseconds, especially the loading files code?
I'm trying to keep my time to first byte low.
First, prefer is_file to file_exists and use file_put_contents:
if ( !is_file($filename) ) {
file_put_contents($filename,$c);
}
Then, use the proper function for this kind of work, readfile:
if ( ($m = #filemtime($file)) !== false && $m >= filemtime('sitemodification.file')) {
header('Content-length:'.filesize($file));
readfile($file);
}
}
You should see a little improvement but keep in mind that file accesses are slow and you check three times for files access before sending any content.
Related
I have a small web-page that delivers different content to a user based on a %3 (modulo 3) of a counter. The counter is read in from a file with php, at which point the counter is incremented and written back into the file over the old value.
I am trying to get an even distribution of the content in question which is why I have this in place.
I am worried that if two users access the page at a similar time then they could either both be served the same data or that one might fail to increment the counter since the other is currently writing to the file.
I am fairly new to web-dev so I am not sure how to approach this without mutex's. I considered having only one open and close and doing all of the operations inside of it but I am trying to minimize time where in which a user could fail to access the file. (hence why the read and write are in separate opens)
What would be the best way to implement a sort of mutual exclusion so that only one person will access the file at a time and create a queue for access if multiple overlapping requests for the file come in? The primary goal is to preserve the ratio of the content that is being shown to users which involves keeping the counter consistent.
The code is as follows :
<?php
session_start();
$counterName = "<path/to/file>";
$file = fopen($counterName, "r");
$currCount = fread($file, filesize($counterName));
fclose($file);
$newCount = $currCount + 1;
$file = fopen($counterName,'w');
if(fwrite($file, $newCount) === FALSE){
echo "Could not write to the file";
}
fclose($file);
?>
Just in case anyone finds themselves with the same issue, I was able to fix the problem by adding in
flock($fp, LOCK_EX | LOCK_NB) before writing into the file as per the documentation for php's flock function. I have read that it is not the most reliable, but for what I am doing it was enough.
Documentation here for convenience.
https://www.php.net/manual/en/function.flock.php
Consider this code:
public static function removeDir($src)
{
if (is_dir($src)) {
$dir = #opendir($src);
if ($dir === false)
return;
while(($file = readdir($dir)) !== false) {
if ($file != '.' && $file != '..') {
$path = $src . DIRECTORY_SEPARATOR . $file;
if (is_dir($path)) {
self::removeDir($path);
} else {
#unlink($path);
}
}
}
closedir($dir);
#rmdir($src);
}
}
This will remove a directory. But if unlink fails or opendir fails on any subdirectory, the directory will be left with some content.
I want either everything deleted, or nothing deleted. I'm thinking of copying the directory before removal and if anything fails, restoring the copy. But maybe there's a better way - like locking the files or something similar?
I general I would confirm the comment:
"Copy it, delete it, copy back if deleted else throw deleting message fail..." – We0
However let's take some side considerations:
Trying to implement a transaction save file deletion indicates that you want to allow competing file locks on the same set of files. Transaction handling is usually the most 'expensive' way to ensure consistency. This holds true even if php would have any kind of testdelete available, because you would need to testdelete everything in a first run and then do a second loop which costs time (and where you are in danger that something changed on your file system in the meanwhile). There are other options:
Try to isolate what really needs to be transaction save and handle those data accesses in databases. Eg. MySQL/InnoDB supports all the nitty gritty details of transaction handling
Define and implement dedicated 'write/lock ownership'. So you have folders A and B with sub items and your php is allowed to lock files in A and some other process is allowed to lock files in B. Both your php and the other process are allowed to read A and B. This gets tricky on files, because a file read causes a lock as well which lasts the longer the bigger the files are. So on file basis you probably need to enrich this with file size limits, torance periods and so on.
Define and implement dedicated access time frames. Eg. All files can be used within the week, but you have a maintenance time frame at sunday night which can also run deletions and therefore requires lock free environments.
Right, let's say my reasoning was not frightening enough :) - and you implement a transaction save file deletion anyway - your routine can be implemented this way:
backup all files
if the backup fails you could try a second, third, fourth time (this is an implementation decision)
if there is no successful backup, full stop
run your deletion process, two implementation options (in any way you need to log the files you deleted successfully):
always run through fully, and document all errors (this can be returned to the user later on as homework task list, however potentially runs long)
run through and stop at the first error
if the deletion was the successful, all fine/full stop; if not proceed with rolling back
copy back only previously successful deleted file from the archive (ONLY THEM!)
Wipe out your backup
This then is only transaction save on file level. It does NOT handle the case where somebody changes permissions on folders in between step 5 and 6.
Or you could try to just rename/move the directory to something like /tmp/, it succeeds or doesnt - but the files are not gone. Even if another process would have an open handle, the move should be ok. The files will be gone some time later when the tmp folder is emptied.
I've written a PHP script that iterates through a given folder, extracts all the images from there and displays them on an HTML page (as tags). The size of the page is about 14 KB, but it takes the page almost 15 seconds.
Here's the code:
function displayGallery( $gallery, $rel, $first_image ) {
$prefix = "http://www.example.com/";
$path_to_gallery = "gallery_albums/" . $gallery . "/";
$handler = opendir( $path_to_gallery ); //opens directory
while ( ( $file = readdir( $handler ) ) !== false ) {
if ( strcmp( $file, "." ) != 0 && strcmp( $file, ".." ) !=0 ) {
//check for "." and ".." files
if ( isImage( $prefix . $path_to_gallery . $file ) ) {
echo '';
}
}
}
closedir( $handler ); //closes directory
}
function isImage($image_file) {
if (getimagesize($image_file)!==false) {
return true;
} else {
return false;
}
}
I looked at other posts, but most of them deal with SQL queries, and that's not my case.
Any suggestions how to optimize this?
You can use a PHP profiler like http://xdebug.org/docs/profiler to find what part of the script is taking forever to run. It might be overkill for this issue, but long-term you may be glad you took the time now to set it up.
I suppose that's because you've added $prefix in the isImage invokation. That way this function actually downloads all your images directly from your webserver instead of looking them up locally.
you may use use getimagesize(), it issues E_NOTICE and returns FALSE when file is not a known image type.
An out of left field suggestion here. You don't state how you are clocking the execution time. If you are clocking it in the browser, as in taking 15 seconds to load the page from a link, the problem could have nothing at all to do with your script. I have seen people in the past create similar pages trying to use images as tags, and they take forever to load because even though they are displaying the image at thumbnail size or smaller, the image itself is still 800 x 600 or something. I know it sounds daft, but make sure that you are not just displaying large images in a small size. It would be perfectly reasonable for a script to require 15 seconds to load and display 76 800 x 600 jpegs.
My assumption is that isImage is the problem. I've never seen it before. Why not just check for particular file extensions? That's pretty quick.
Update: You might also try switching to use exif_imagetype() which is likely faster than getimagesize() Putting that check into the top function is also going to be faster. Neither of those functions was meant to be done over a web connection - avoid that altogether. Best to stick with the file extension.
Do you not already have access to the files directly? Every time you look something up over the web, it's going to take a while - you need to wait for the entire file to download. Look up the files directly on your system.
Use scandir to get all the filenames at once into an array and walk through them. That will likely speed things up as I assume there won't be a back and forth to get things individually.
Instead of doing strcmp for . and .. just do $file != '.' && $file != '..'
Also, the speed is going to depend on the number of files being returned, if there are a lot it's going to be slow. The OS can slow down with too many files in a directory as well. You're looping over all files and directories, not just images so that's the number that counts, not just the images.
getimagesize is the problem, it took 99.1% of the script time.
Version #1 - Orignal case
Version #2 - If you really need to use getimagesize with URL (http://). Then a faster alternative, found in http://www.php.net/manual/en/function.getimagesize.php#88793 . It reads only the X first bytes of the image. XHProf shows it is x10 faster. Another ideas also could be using curl multi for parallel download https://stackoverflow.com/search?q=getimagesize+alternative
Version #3 - I think this is the best suitable for your case is to open files as normal files systems without (http://). This is even x100 faster per Xhprof
What's the cleanest way in php to open a file, read the contents, and subsequently overwrite the file's contents with some output based on the original contents? Specifically, I'm trying to open a file populated with a list of items (separated by newlines), process/add items to the list, remove the oldest N entries from the list, and finally write the list back into the file.
fopen(<path>, 'a+')
flock(<handle>, LOCK_EX)
fread(<handle>, filesize(<path>))
// process contents and remove old entries
fwrite(<handle>, <contents>)
flock(<handle>, LOCK_UN)
fclose(<handle>)
Note that I need to lock the file with flock() in order to protect it across multiple page requests. Will the 'w+' flag when fopen()ing do the trick? The php manual states that it will truncate the file to zero length, so it seems that may prevent me from reading the file's current contents.
If the file isn't overly large (that is, you can be confident loading it won't blow PHP's memory limit), then the easiest way to go is to just read the entire file into a string (file_get_contents()), process the string, and write the result back to the file (file_put_contents()). This approach has two problems:
If the file is too large (say, tens or hundreds of megabytes), or the processing is memory-hungry, you're going to run out of memory (even more so when you have multiple instances of the thing running).
The operation is destructive; when the saving fails halfway through, you lose all your original data.
If any of these is a concern, plan B is to process the file and at the same time write to a temporary file; after successful completion, close both files, rename (or delete) the original file and then rename the temporary file to the original filename.
Read
$data = file_get_contents($filename);
Write
file_put_contents($filename, $data);
One solution is to use a separate lock file to control access.
This solution assumes that only your script, or scripts you have access to, will want to write to the file. This is because the scripts will need to know to check a separate file for access.
$file_lock = obtain_file_lock();
if ($file_lock) {
$old_information = file_get_contents('/path/to/main/file');
$new_information = update_information_somehow($old_information);
file_put_contents('/path/to/main/file', $new_information);
release_file_lock($file_lock);
}
function obtain_file_lock() {
$attempts = 10;
// There are probably better ways of dealing with waiting for a file
// lock but this shows the principle of dealing with the original
// question.
for ($ii = 0; $ii < $attempts; $ii++) {
$lock_file = fopen('/path/to/lock/file', 'r'); //only need read access
if (flock($lock_file, LOCK_EX)) {
return $lock_file;
} else {
//give time for other process to release lock
usleep(100000); //0.1 seconds
}
}
//This is only reached if all attempts fail.
//Error code here for dealing with that eventuality.
}
function release_file_lock($lock_file) {
flock($lock_file, LOCK_UN);
fclose($lock_file);
}
This should prevent a concurrently-running script reading old information and updating that, causing you to lose information that another script has updated after you read the file. It will allow only one instance of the script to read the file and then overwrite it with updated information.
While this hopefully answers the original question, it doesn't give a good solution to making sure all concurrent scripts have the ability to record their information eventually.
Is there a way to cache a PHP include effectively for reuse, without APC, et al?
Simple (albeit stupid) example:
// rand.php
return rand(0, 999);
// index.php
$file = 'rand.php';
while($i++ < 1000){
echo include($file);
}
Again, while ridiculous, this pair of scripts dumps 1000 random numbers. However, for every iteration, PHP has to hit the filesystem (Correct? There is no inherit caching functionality I've missed, is there?)
Basically, how can I prevent the previous scenario from resulting in 1000 hits to the filesystem?
The only consideration I've come to so far is a goofy one, and it may not prove effective at all (haven't tested, wrote it here, error prone, but you get the idea):
// rand.php
return rand(0, 999);
// index.php
$file = 'rand.php';
$cache = array();
while($i++ < 1000){
if(isset($cache[$file])){
echo eval('?>' . $cache[$file] . '<?php;');
}else{
$cache[$file] = file_get_contents($file);
echo include($file);
}
}
A more realistic and less silly example:
When including files for view generation, given a view file is used a number of times in a given request (a widget or something) is there a realistic way to capture and re-evaluate the view script without a filesystem hit?
This would only make any sense if the include file was accessed across a network.
There is no inherit caching functionality I've missed, is there?
All operating systems are very highly optimized to reduce the amount of physical I/O and to speed up file operations. On a properly configured system in most cases, the system will rarely revert to disk to fetch PHP code. Sit down with a spreadsheet and have a think about how long it would take to process PHP code if every file had to be fetched from disk - it'd be ridiculous, e.g. suppose your script is in /var/www/htdocs/index.php and includes /usr/local/php/resource.inc.php - that's 8 seek operations to just locate the files - #8ms each, that's 64ms to find the files! Run some timings on your test case - you'll see that its running much, much faster than that.
As with Sabeen Malik's answer you could capture the output of the include with output buffering, then concat all of them together, then save that to a file and include the one file each time.
This one collective include could be kept for an hour by checking the file's mod time and then rewriting and re including the includes only once an hour.
I think better design would be something like this:
// rand.php
function get_rand() {
return rand(0, 999);
}
// index.php
$file = 'rand.php';
include($file);
while($i++ < 1000){
echo get_rand();
}
Another option:
while($i++ < 1000) echo rand(0, 999);