I am building a system to create files that will range from a few Kb to say, around 50Mb, and this question is more out of curiosity than anything else. I couldn't find any answers online.
If I use
$handle=fopen($file,'w');
where is the $handle stored before I call
fclose($handle);
? Is it stored in the system's memory, or in a temp file somewhere?
Secondly, I am building the file using a loop that takes 1024 bytes of data at a time, and each time writes data as:
fwrite($handle, $content);
It then calls
fclose($handle);
when the loop is complete and all data is written. However, would it be more efficient or memory friendly to use a loop that did
$handle = fopen($file, 'a');
fwrite($handle, $content);
fclose($handle);
?
In PHP terminology, fopen() (as well as many other functions) returns a resource. So $handle is a resource that references the file handle that is associated with your $file.
Resources are in-memory objects, they are not persisted to the file system by PHP.
Your current methodology is the more efficient of the two options. Opening, writing to, and then closing the same file over and over again is less efficient than just opening it once, writing to it many times, and then closing it. Opening and closing the file requires setting up input and output buffers and allocating other internal resources, which are comparatively expensive operations.
Your file handle is just another memory reference and is stored in the stack memory just like other program variables and resources. Also in terms of file I/O, open and close once, and write as many times as you need - that is the most efficient way.
$handle = fopen($file, 'a'); //open once
while(condition){
fwrite($handle, $content); //write many
}
fclose($handle); //close once
According to PHP DOCS fopen() creates a stream which is a File handle. It is associated with file in filesystem.
Creating new File handle every time you need to write another 1024 bytes would be terribly slow.
Related
I was curious to do a test. My question is if is it possible to open file for both reading and writing, so if I have more read-write operations to do on one file I do not need to close the reading status, read, open for write status, write and so on in a loop.
$filename = "test.txt";
$handle = fopen($filename, "rwb");
fseek( $handle , 15360 );
$contents = fread($handle, 51200);
$start = microtime (true);
fseek( $handle , 1 );
fwrite ( $handle , $contents );
fclose($handle);
This test does not work. I expected, I will read the data and move the fseek pointer to begin of the file either 1 or 0 position and then I will write the data. But this action failed for some reason with a result 0 (int) bytes written. Hence my question is, is it possible to do it? Or I need to close file for reading first?
As a related sub-question - is it possible that more users can read or write from files simultaneously from different position. As this should simulate database read/write operations. You know how mysql works - more users can write same table - same file any time. I know this is not problem in C/C++ but is it possible to do it in php?
You can create multiple file handlers on the same file. Just fopen() it twice, one with read only, the other with read/write. Although I'm not sure why you'd want to do so unless you're reading and writing from two different point in the file.
$filename = "test.txt";
$rw_handle = fopen($filename, "c+"); //open for read/write, allow fseek
$r_handle = fopen($filename, "r");
If you want to have multiple processes reading and writing a file from different locations, you'll want to file lock with flock()
Have a file in a website. A PHP script modifies it like this:
$contents = file_get_contents("MyFile");
// ** Modify $contents **
// Now rewrite:
$file = fopen("MyFile","w+");
fwrite($file, $contents);
fclose($file);
The modification is pretty simple. It grabs the file's contents and adds a few lines. Then it overwrites the file.
I am aware that PHP has a function for appending contents to a file rather than overwriting it all over again. However, I want to keep using this method since I'll probably change the modification algorithm in the future (so appending may not be enough).
Anyway, I was testing this out, making like 100 requests. Each time I call the script, I add a new line to the file:
First call:
First!
Second call:
First!
Second!
Third call:
First!
Second!
Third!
Pretty cool. But then:
Fourth call:
Fourth!
Fifth call:
Fourth!
Fifth!
As you can see, the first, second and third lines simply disappeared.
I've determined that the problem isn't the contents string modification algorithm (I've tested it separately). Something is messed up either when reading or writing the file.
I think it is very likely that the issue is when the file's contents are read: if $contents, for some odd reason, is empty, then the behavior shown above makes sense.
I'm no expert with PHP, but perhaps the fact that I performed 100 calls almost simultaneously caused this issue. What if there are two processes, and one is writing the file while the other is reading it?
What is the recommended approach for this issue? How should I manage file modifications when several processes could be writing/reading the same file?
What you need to do is use flock() (file lock)
What I think is happening is your script is grabbing the file while the previous script is still writing to it. Since the file is still being written to, it doesn't exist at the moment when PHP grabs it, so php gets an empty string, and once the later processes is done it overwrites the previous file.
The solution is to have the script usleep() for a few milliseconds when the file is locked and then try again. Just be sure to put a limit on how many times your script can try.
NOTICE:
If another PHP script or application accesses the file, it may not necessarily use/check for file locks. This is because file locks are often seen as an optional extra, since in most cases they aren't needed.
So the issue is parallel accesses to the same file, while one is writing to the file another instance is reading before the file has been updated.
PHP luckily has a mechanisms for locking the file so no one can read from it until the lock is released and the file has been updated.
flock()
can be used and the documentation is here
You need to create a lock, so that any concurrent requests will have to wait their turn. This can be done using the flock() function. You will have to use fopen(), as opposed to file_get_contents(), but it should not be a problem:
$file = 'file.txt';
$fh = fopen($file, 'r+');
if (flock($fh, LOCK_EX)) { // Get an exclusive lock
$data = fread($fh, filesize($file)); // Get the contents of file
// Do something with data here...
ftruncate($fh, 0); // Empty the file
fwrite($fh, $newData); // Write new data to file
fclose($fh); // Close handle and release lock
} else {
die('Unable to get a lock on file: '.$file);
}
My app reads a large file 5MB - 10MB that has been entered in with json entries line by line.
Each line is handled by a parser that is fed to multiple parsers and treated separately. Once the file is read, the file is moved. The Program is continuously fed with files to be processed.
The program currently works with #file_get_contents($filename). The program's structure works as is.
The problem is that file_get_contents loads the entire file as an array and the entire system runs for a minute. I suspect that I can gain speed if I read it line by line rather than wait for the file to load into memory (I might be wrong and open to suggestion).
There are too many file handler that does this. What is the most effective way to achieve what I need and which file reading method is best for this?
I have fopen, fread, readfile, file, and fscanf to contend with off the top of my head. However when I read the man for them - its all code to read generic files without a clear indication what is best for larger files.
$file = fopen("file.json", "r");
if ($file)
{
while (($line = fgets($file)) !== false)
{
echo $line;
}
}
else
{
echo "Unable to open the file";
}
Fgets read until it reach EOL, or EOF. if you want, you can add how much it should read using the second arg.
For more info about fgets: http://us3.php.net/fgets
I would like to cut off the beginning of a large file in PHP. Use of file_get_contents() is not possible due to memory restrictions.
What is the best way to delete the first $n characters from a file?
If it is possible to do it without creating a second file, I would prefer that solution.
Update After the file has been modified, it will be used by other scripts.
If you don't have enough memory to buffer the entire file, you'll need to create two files (at least temporarily) regardless of your solution.
Look into fseek(), which allows you to go to a particular byte position within a file.
// Open the file
$filename = 'somefile.dat';
$file = fopen($filename, 'r');
// Skip the first 1 KB
fseek($file, 1024);
// Your processing goes here...
// Close the file
fclose($file);
In your case, you could open the original file for reading and the temp file for writing concurrently. Seek the original file. Loop over the original file, reading a small chunk and writing it to temp. Then rename temp to have the same name as original.
I had a newcomer (the next door teenager) write some php code to track some usage on my web site. I'm not familiar with php so I'm asking a bit about concurrent file access.
My native app (on Windows), occasionally logs some data to my site by hitting the URL that contains my php script. The native app does not examine the returned data.
$fh = fopen($updateFile, 'a') or die("can't open file");
fwrite($fh, $ip);
fwrite($fh, ', ');
fwrite($fh, $date);
fwrite($fh, ', ');
fwrite($fh, implode(', ', $_GET));
fwrite($fh, "\r\n");
fclose($fh);
This is a low traffic site, and the data is not critical. But what happens if two users collide and two instances of the script each try to add a line to the file? Is there any implicit file locking in php?
Is the code above at least safe from locking up and never returning control to my user? Can the file get corrupted? If I have the script above delete the file every month, what happens if another instance of the script is in the middle of writing to the file?
You should put a lock on the file:
$fp = fopen($updateFile, 'w+');
if (flock($fp, LOCK_EX)) {
fwrite($fp, 'a');
flock($fp, LOCK_UN);
} else {
echo 'can\'t lock';
}
fclose($fp);
For the record, I worked in a library that does that:
https://github.com/EFTEC/DocumentStoreOne
It allows to CRUD documents by locking the file. I tried 100 concurrent users (100 calls to the PHP script at the same time) and it works.
However, it doesn't use flock but mkdir:
while (!#mkdir("file.lock")) {
// use the file
fopen("file"...)
#rmdir("file.lock")
}
Why?
mkdir is atomic, so the lock is atomic: In a single step, you lock or you don't.
It's faster than flock(). Apparently flock requires several calls to the file system.
flock() depends on the system.
I did a stress test and it worked.
Since this is an append to the file, the best way would be to aggregate the data and write it to the file in one fwrite(), providing the data to be written is not bigger then the file buffer. Ofcourse you don't always know the size of the buffer, so flock(); is always a good option.