I'm a beginner at PHP, and I'm still trying to work out proper file handling techniques. I'm usually alright with trial and error, but when it comes to deleting and modifying data, I always like to be on the safe side.
I wrote the code below to delete a certain section of a file, but I'm not sure if it will work with larger files or under unforeseen conditions which require experience to code for.
I tested this just now and it did work, but I would like to run it by the more experienced programmers first:
function deletesection($start,$len){
$pos=0;
$tmpname=$this->name."tmp.tmp";
$tmpf=fopen($tmpname,"wb+");
rewind($tmpf);
$h=fopen($this->name,"rb");
rewind($h);
while(!feof($h)){
$this->xseek($h,$pos);
$endpos = $pos+1000;
if($endpos>$start && $pos<$start+$len){
$readlen=$start-$pos;
$nextpos=$start+$len;
}
else{
$readlen=1000;
$nextpos=$pos+1000;
}
fwrite($tmpf,fread($h,$readlen));
$pos=$nextpos;
}
fclose($h);
unlink($this->name);
rename($tmpname,$this->name);
}
This is inside a class where the property "name" is the file path.
I'm writing the file 1000 bytes at a time because I was getting errors about the maximum amount of memory being exceeded when testing with files over 30mb.
I had a quick look at your code - seems a bit complicated, also copying the entire file will be less efficient if the section to delete is small in relation to the total filesize...
function deletesection($filename, $start, $len)
{
$chunk=49128;
if (!is_readable($filename) || !is_writeable($filename) || !is_file($filename)) {
return false;
}
$tfile=tempnam(); // used to hold stuff after the section to delete
$oh=fopen($tfile, 'wb');
$ih=fopen($filename, 'rb');
if (fseek($ih, $start+$len)) {
while ($data=fgets($ih, $chunk) && !feof($ih) {
fputs($oh,$data);
}
fclose($oh); $oh=fopen($tfile, 'rb');
// or could just have opened it r+b to begin with
fseek($ih, $start, SEEK_SET);
while ($data=fgets($oh, $chunk) && !feof($oh) {
fputs($ih, $data);
}
}
fclose($oh);
fclose($ih);
unlink($tfile);
return true;
}
I believe it would also be possible to do this modifying the file in place (i.e. not using a second file) using a single file handle - but the code would get a bit messy and would require lots of seeks (then an ftruncate).
NB using files for managing data with PHP (and most other languages in a multi-user context) is not a good idea.
Related
I created a script for use with my website that is supposed to erase the oldest entry in cache when a new item needs to be cached. My website is very large with 500,000 photos on it and the cache space is set to 2 GB.
These functions are what cause the trouble:
function cache_tofile($fullf, $c)
{
error_reporting(0);
if(strpos($fullf, "/") === FALSE)
{
$fullf = "./".$fullf;
}
$lp = strrpos($fullf, "/");
$fp = substr($fullf, $lp + 1);
$dp = substr($fullf, 0, $lp);
$sz = strlen($c);
cache_space_make($sz);
mkdir($dp, 0755, true);
cache_space_make($sz);
if(!file_exists($fullf))
{
$h = #fopen($fullf, "w");
if(flock($h, LOCK_EX))
{
ftruncate($h, 0);
rewind($h);
$tmo = 1000;
$cc = 1;
$i = fputs($h, $c);
while($i < strlen($c) || $tmo-- > 1)
{
$c = substr($c, $i);
$i = fwrite($h, $c);
}
flock($h, LOCK_UN);
fclose($h);
}
}
error_reporting(7);
}
function cache_space_make($sz)
{
$ct = 0;
$cf = cachefolder();
clearstatcache();
$fi = shell_exec("df -i ".$cf." | tail -1 | awk -F\" \" '{print \$4}'");
if($fi < 1)
{
return;
}
if(($old = disk_free_space($cf)) === false)
{
return;
}
while($old < $sz)
{
$ct++;
if($ct > 10000)
{
error_log("Deleted over 10,000 files. Is disk screwed up?");
break;
}
$fi = shell_exec("rm \$(find ".$cf."cache -type f -printf '%T+ %p\n' | sort | head -1 | awk -F\" \" '{print \$2}');");
clearstatcache();
$old = disk_free_space($cf);
}
}
cachefolder() is a function that returns the correct folder name with a / appended to it.
When the functions are executed, the CPU usage for apache is between 95% and 100% and other services on the server are extremely slow to access during that time. I also noticed in whm that cache disk usage is at 100% and refuses to drop until I clear the cache. I was expecting more like maybe 90ish%.
What I am trying to do with the cache_tofile function is attempt to free disk space in order to create a folder then free disk space to make the cache file. The cache_space_make function takes one parameter representing the amount of disk space to free up.
In that function I use system calls to try to find the oldest file in the directory tree of the entire cache and I was unable to find native php functions to do so.
The cache file format is as follows:
/cacherootfolder/requestedurl
For example, if one requests http://www.example.com/abc/def then from both functions, the folder that is supposed to be created is abc and the file is then def so the entire file in the system will be:
/cacherootfolder/abc/def
If one requests http://www.example.com/111/222 then the folder 111 is created and the file 222 will be created
/cacherootfolder/111/222
Each file in both cases contain the same content as what the user requests based on the url. (example: /cacherootfolder/111/222 contains the same content as what one would see when viewing source from http://www.example.com/111/222)
The intent of the caching system is to deliver all web pages at optimal speed.
My question then is how do I prevent the system from trying to lockup when the cache is full. Is there better code I can use than what I provided?
I would start by replacing the || in your code by &&, which was most likely the intention.
Currently, the loop will always run at least 1000 times - I very much hope the intention was to stop trying after 1000 times.
Also, drop the ftruncate and rewind.
From the PHP Manual on fopen (emphasis mine):
'w' Open for writing only; place the file pointer at the beginning of the file and truncate the
file to zero length. If the file does not exist, attempt to create it.
So your truncate is redundant, as is your rewind.
Next, review your shell_exec's.
The one outside the loop doesn't seem too much of a bottleneck to me, but the one inside the loop...
Let's say you have 1'000'000 files in that cache folder.
find will happily list all of them for you, no matter how long it takes.
Then you sort that list.
And then you flush 999'999 entries of that list down the toilet, and only keep the first one.
Then you do some stuff with awk that I don't really care about, and then you delete the file.
On the next iteration, you'll only have to go through 999'999 files, of which you discard only 999'998.
See where I'm going?
I consider calling shell scripts out of pure convenience bad practice anyway, but if you do it, do it as efficiently as possible, at least!
Do one shell_exec without head -1, store the resulting list in a variable, and iterate over it.
Although it might be better to abandon shell_exec altogether and instead program the corresponding routines in PHP (one could argue that find and rm are machine code, and therefore faster than code written in PHP to do the same task, but there sure is a lot of overhead for all that IO redirection).
Please do all that, and then see how bad it still performs.
If the results are still unacceptable, I suggest you put in some code to measure the time certain parts of those functions require (tip: microtime(true)) or use a profiler, like XDebug, to see where exactly most of your time is spent.
Also, why did you turn off error reporting for that block? Looks more than suspicious to me.
And as a little bonus, you can get rid of $cc since you're not using it anywhere.
In my cache system, I want it where if a new page is requested, a check is made to see if a file exists and if it doesn't then a copy is stored on the server, If it does exist, then it must not be overwritten.
The problem I have is that I may be using functions designed to be slow.
This is part of my current implementation to save files:
if (!file_exists($filename)){$h=fopen($filename,"wb");if ($h){fwrite($h,$c);fclose($h);}}
This is part of my implementation to load files:
if (($m=#filemtime($file)) !== false){
if ($m >= filemtime("sitemodification.file")){
$outp=file_get_contents($file);
header("Content-length:".strlen($outp),true);echo $outp;flush();exit();
}
}
What I want to do is replace this with a better set of functions meant for performance and yet still achieve the same functionality. All caching files including sitemodification.file reside on a ramdisk. I added a flush before exit in hopes that content will be outputted faster.
I can't use direct memory addressing at this time because the file sizes to be stored are all different.
Is there a set of functions I can use that can execute the code I provided faster by at least a few milliseconds, especially the loading files code?
I'm trying to keep my time to first byte low.
First, prefer is_file to file_exists and use file_put_contents:
if ( !is_file($filename) ) {
file_put_contents($filename,$c);
}
Then, use the proper function for this kind of work, readfile:
if ( ($m = #filemtime($file)) !== false && $m >= filemtime('sitemodification.file')) {
header('Content-length:'.filesize($file));
readfile($file);
}
}
You should see a little improvement but keep in mind that file accesses are slow and you check three times for files access before sending any content.
I have a script that re-writes a file every few hours. This file is inserted into end users html, via php include.
How can I check if my script, at this exact moment, is working (e.g. re-writing) the file when it is being called to user for display? Is it even an issue, in terms of what will happen if they access the file at the same time, what are the odds and will the user just have to wait untill the script is finished its work?
Thanks in advance!
More on the subject...
Is this a way forward using file_put_contents and LOCK_EX?
when script saves its data every now and then
file_put_contents($content,"text", LOCK_EX);
and when user opens the page
if (file_exists("text")) {
function include_file() {
$file = fopen("text", "r");
if (flock($file, LOCK_EX)) {
include_file();
}
else {
echo file_get_contents("text");
}
}
} else {
echo 'no such file';
}
Could anyone advice me on the syntax, is this a proper way to call include_file() after condition and how can I limit a number of such calls?
I guess this solution is also good, except same call to include_file(), would it even work?
function include_file() {
$time = time();
$file = filectime("text");
if ($file + 1 < $time) {
echo "good to read";
} else {
echo "have to wait";
include_file();
}
}
To check if the file is currently being written, you can use filectime() function to get the actual time the file is being written.
You can get current timestamp on top of your script in a variable and whenever you need to access the file, you can compare the current timestamp with the filectime() of that file, if file creation time is latest then the scenario occured when you have to wait for that file to be written and you can log that in database or another file.
To prevent this scenario from happening, you can change the script which is writing the file so that, it first creates temporary file and once it's done you just replace (move or rename) the temporary file with original file, this action would require very less time compared to file writing and make the scenario occurrence very rare possibility.
Even if read and replace operation occurs simultaneously, the time the read script has to wait will be very less.
Depending on the size of the file, this might be an issue of concurrency. But you might solve that quite easy: before starting to write the file, you might create a kind of "lock file", i.e. if your file is named "incfile.php" you might create an "incfile.php.lock". Once you're doen with writing, you will remove this file.
On the include side, you can check for the existance of the "incfile.php.lock" and wait until it's disappeared, need some looping and sleeping in the unlikely case of a concurrent access.
Basically, you should consider another solution by just writing the data which is rendered in to that file to a database (locks etc are available) and render that in a module which then gets included in your page. Solutions like yours are hardly to maintain on the long run ...
This question is old, but I add this answer because the other answers have no code.
function write_to_file(string $fp, string $string) : bool {
$timestamp_before_fwrite = date("U");
$stream = fopen($fp, "w");
fwrite($stream, $string);
while(is_resource($stream)) {
fclose($stream);
}
$file_last_changed = filemtime($fp);
if ($file_last_changed < $timestamp_before_fwrite) {
//File not changed code
return false;
}
return true;
}
This is the function I use to write to file, it first gets the current timestamp before making changes to the file, and then I compare the timestamp to the last time the file was changed.
I have a simple caching system as
if (file_exists($cache)) {
echo file_get_contents($cache);
// if coming here when $cache is deleting, then nothing to display
}
else {
// PHP process
}
We regularly delete outdated cache files, e.g. deleting all caches after 1 hour. Although this process is very fast, but I am thinking that a cache file can be deleted right between the if statement and file_get_contents processes.
I mean when if statement checks the existence of cache file, it exists; but when file_get_contents tries to catch it, it is no longer there (deleted by simultaneous cache deleting process).
file_get_contents locks the file to avoid the undergoing delete process during the read process. But the file can be deleted when the if statement sends the PHP process to the first condition (before start of the file_get_contents).
Is there any approach to avoid this? Is the cache deleting system different?
NOTE: I did not face any practical problem, as it is not very probable to catch this event, but logically it is possible, and should happen on heavy loads.
Luckily file_get_contents return FALSE on error, so you could quick-bake it like:
if (FALSE !== ($buffer = file_get_contents())) {
echo $buffer;
return;
}
// PHP process
or similiar. It's a bit the quick and dirty way, considering you want to place the # operator to hide any warnings about non-existent files:
if (FALSE !== ($buffer = #file_get_contents())) {
The other alternative would be to lock, however that might prevent your cache-deletion to not delete the file if you have locked it.
Then left is to stall the cache your own. That means reading the file-creation time in PHP, check that it is < 5 minutes then for the file-deletion processing (5 minutes is exemplary) and then you would know that the file is already stale and for being replaced with fresh content. Re-create the file then. Otherwise read the file in, which probably is better then with readfile instead of file_get_contents and echo.
On failure, file_get_contents returns false, so what about this:
if (($output = file_get_contents($filename)) === false){
// Do the processing.
$output = 'Generated content';
// Save cache file
file_put_contents($filename, $output);
}
echo $output;
By the way, you may want to consider using fpassthru, which is more memory-efficient, especially for larger files. Using file_get_contents on large files (> 100 MB), will probably cause problems (depending on your configuration).
<?php
$fp = #fopen($filename, 'rb');
if ($fp === false){
// Generate output
} else {
fpassthru($fp);
}
Is there a way to cache a PHP include effectively for reuse, without APC, et al?
Simple (albeit stupid) example:
// rand.php
return rand(0, 999);
// index.php
$file = 'rand.php';
while($i++ < 1000){
echo include($file);
}
Again, while ridiculous, this pair of scripts dumps 1000 random numbers. However, for every iteration, PHP has to hit the filesystem (Correct? There is no inherit caching functionality I've missed, is there?)
Basically, how can I prevent the previous scenario from resulting in 1000 hits to the filesystem?
The only consideration I've come to so far is a goofy one, and it may not prove effective at all (haven't tested, wrote it here, error prone, but you get the idea):
// rand.php
return rand(0, 999);
// index.php
$file = 'rand.php';
$cache = array();
while($i++ < 1000){
if(isset($cache[$file])){
echo eval('?>' . $cache[$file] . '<?php;');
}else{
$cache[$file] = file_get_contents($file);
echo include($file);
}
}
A more realistic and less silly example:
When including files for view generation, given a view file is used a number of times in a given request (a widget or something) is there a realistic way to capture and re-evaluate the view script without a filesystem hit?
This would only make any sense if the include file was accessed across a network.
There is no inherit caching functionality I've missed, is there?
All operating systems are very highly optimized to reduce the amount of physical I/O and to speed up file operations. On a properly configured system in most cases, the system will rarely revert to disk to fetch PHP code. Sit down with a spreadsheet and have a think about how long it would take to process PHP code if every file had to be fetched from disk - it'd be ridiculous, e.g. suppose your script is in /var/www/htdocs/index.php and includes /usr/local/php/resource.inc.php - that's 8 seek operations to just locate the files - #8ms each, that's 64ms to find the files! Run some timings on your test case - you'll see that its running much, much faster than that.
As with Sabeen Malik's answer you could capture the output of the include with output buffering, then concat all of them together, then save that to a file and include the one file each time.
This one collective include could be kept for an hour by checking the file's mod time and then rewriting and re including the includes only once an hour.
I think better design would be something like this:
// rand.php
function get_rand() {
return rand(0, 999);
}
// index.php
$file = 'rand.php';
include($file);
while($i++ < 1000){
echo get_rand();
}
Another option:
while($i++ < 1000) echo rand(0, 999);