I'd like to read a large file line by line, perform string replacement and save changes into the file, IOW rewriting 1 line at a time. Is there any simple solution in PHP/ Unix?
The easiest way that came on my mind would be to write the lines into a new file and then replace the old one, but it's not elegant.
I think there're only 2 options
Use memory
Read, replace then store the replaced string into memory, once done, overwrite the source file.
Use tmp file
Read & replace string then write every line immediately to tmp file, once all done, replace original file by tmp file
The #1 will be more effective because IO is expensive, use it if you have vast memory or the processing file is not too big.
The #2 will be a bit slow but be quite stable even on large file.
Of course, you may combine both way by writing replaced string by chunk of lines to file (instead of just by line)
There're the simplest, most elegant ways I can think out.
It seems it's not such a bad solution to use the temporary file in most cases.
$f='data.txt';
$fh=fopen($f,'r+');
while (($l=fgets($fh))!==false) file_put_contents('tmp',clean($l),FILE_APPEND);
fclose($f);
unlink($f);
rename('tmp',$f);
Related
What is the best way to overwrite a specific line in a file? I basically want to search a file for the string '#parsethis' and overwrite the rest of that line with something else.
If the file is really big (log files or something like this) and you are willing to sacrifice speed for memory consumption you could open two files and essentially do the trick Jeremy Ruten proposed by using files instead of system memory.
$source='in.txt';
$target='out.txt';
// copy operation
$sh=fopen($source, 'r');
$th=fopen($target, 'w');
while (!feof($sh)) {
$line=fgets($sh);
if (strpos($line, '#parsethis')!==false) {
$line='new line to be inserted' . PHP_EOL;
}
fwrite($th, $line);
}
fclose($sh);
fclose($th);
// delete old source file
unlink($source);
// rename target file to source file
rename($target, $source);
If the file isn't too big, the best way would probably be to read the file into an array of lines with file(), search through the array of lines for your string and edit that line, then implode() the array back together and fwrite() it back to the file.
Your main problem is the fact that the new line may not be the same length as the old line. If you need to change the length of the line, there is no way out of rewriting at least all of the file after the changed line. The easiest way is to create a new, modified file and then move it over the original. This way there is a complete file available at all times for readers. Use locking to make sure that only one script is modifying the file at once, and since you are going to replace the file, do the locking on a different file. Check out flock().
If you are certain that the new line will be the same length as the old line, you can open the file in read/write mode (use r+ as the second argument to fopen()) and call ftell() to save the position the line starts at each time before you call fgets() to read a line. Once you find the line that you want to overwrite, you can use fseek() to go back to the beginning of the line and fwrite() the new data. One way to force the line to always be the same length is to space pad it out to the maximum possible length.
This is a solution that works for rewriting only one line of a file in place with sed from PHP. My file contains only style vars and is formatted:
$styleVarName: styleVarProperty;\n
For this I first add the ":" to the ends of myStyleVarName, and sed replaces the rest of that line with the new property and adds a semicolon.
Make sure characters are properly escaped in myStyleVarProp.
$command = "pathToShellScript folder1Name folder2Name myStyleVarName myStyleVarProp";
shell_exec($command);
/* shellScript */
#!/bin/bash
file=/var/www/vhosts/mydomain.com/$1/$2/scss/_variables.scss
str=$3"$4"
sed -i "s/^$3.*/$str;/" $file
or if your file isn't too big:
$sample = file_get_contents('sample');
$parsed =preg_replace('##parsethis.*#', 'REPLACE TO END OF LINE', $sample);
You'll have to choose delimiters '#' that aren't present in the file though.
If you want to completely replace the contents of one file with the contents of another file you can use this:
rename("./some_path/data.txt", "./some_path/data_backup.txt");
rename("./some_path/new_data.txt", "./some_path/data.txt");
So in the first line you backup the file and in the second line you replace the file with the contents of a new file.
As far as I can tell the rename returns a boolean. True if the rename is successful and false if it fails. One could, therefore, only run the second step if the first step is successful to prevent overwriting the file unless a backup has been made successfully. Check out:
https://www.php.net/manual/en/function.rename.php
Hope that is useful to someone.
Cheers
Adrian
I'd most likely do what Jeremy suggested, but just for an alternate way to do it here is another solution. This has not been tested or used and is for *nix systems.
$cmd = "grep '#parsethis' " . $filename;
$output = system($cmd, $result);
$lines = explode("\n", $result);
// Read the entire file as a string
// Do a str_repalce for each item in $lines with ""
Considering i have a 100GB txt file containing millions of lines of text. How could i read this text file by block of lines using PHP?
i can't use file_get_contents(); because the file is too large. fgets() also read the text line by line which will likely takes longer time to finish reading the whole file.
If i'll be using fread($fp,5030) wherein '5030' is some length value for which it has to read. Would there be a case where it won't read the whole line(such as stop at the middle of the line) because it has reached the max length?
i can't use file_get_contents(); because the file is too large. fgets() also read the text line by line which will likely takes longer time to finish reading the whole file.
Don't see, why you shouldn't be able to use fgets()
$blocksize = 50; // in "number of lines"
while (!feof($fh)) {
$lines = array();
$count = 0;
while (!feof($fh) && (++$count <= $blocksize)) {
$lines[] = fgets($fh);
}
doSomethingWithLines($lines);
}
Reading 100GB will take time anyway.
The fread approach sounds like a reasonable solution. You can detect whether you've reached the end of a line by checking whether the final character in the string is a newline character ('\n'). If it isn't, then you can either read some more characters and append them to your existing string, or you can trim characters from your string back to the last newline, and then use fseek to adjust your position in the file.
Side point: Are you aware that reading a 100GB file will take a very long time?
i think that you have to use fread($fp, somesize), and check manually if you have founded the end of the line, otherwise read another chunk.
Hope this helps.
I would recommend implementing the reading of a single line within a function, hiding the implementation details of that specific step from the rest of your code - the processing function must not care how the line was retrieved. You can then implement your first version using fgets() and then try other methods if you notice that it is too slow. It could very well be that the initial implementation is too slow, but the point is: you won't know until you've benchmarked.
I know this is an old question, but I think there is value for a new answer for anyone that finds this question eventually.
I agree that reading 100GB takes time, that I why I also agree that we need to find the most effective option to read it so it can be as little as possible instead of just thinking "who cares how much it is if is already a lot", so, lets find out our lowest time possible.
Another solution:
Cache a chunk of raw data
Use fread to read a cache of that data
Read line by line
Read line by line from the cache until end of cache or end of data found
Read next chunk and repeat
Grab the un processed last part of the chunk (the one you were looking for the line delimiter) and move it at the front, then reads a chunk of the size you had defined minus the size of the unprocessed data and put it just after that un processed chunk, then, there you go, you have a new complete chunk.
Repeat the read by line and this process until the file is read completely.
You should use a cache chunk bigger than any expected size of line.
The bigger the cache size the faster you read, but the more memory you use.
I have a 10MB text file.
The length of the lines may vary.
Which is the most efficient way (fast and memory friendly) to read just one specific line from this file? e.g. get_me_the_line($nr, $file_resource)
I don't know of a way to just jump to the line, if the lines are of varying length. However you can iterate through lines pretty quickly when not using them for anything, and return the one of interest.
function ReadLineNumber($file, $number)
{
$handle = fopen($file, "r");
$i = 0;
while (fgets($handle) && $i < $number - 1)
$i++;
return fgets($handle);
}
Edit
I added - 1 to the loop because this reads a line ahead. The $number is therefore a zero-index line reference. Change to - 2 if you would prefer line 1 mean the first line in the file.
As the lines are of varying length you have to look at each character as it might denote the end of the line. Quickest would be loading the file in chunks that are sized like the blocksize of the filesystem and counting the linebreaks until you are on the desired line.
Better way would be to have an index file that stores information about the file containing the lines. Using a database could also be a better idea.
If the file is REALLY large (several GB or more) and your application is running on *nix you may not want to try having PHP process the file and instead use some existing unix tools optimized for this kind of line processing. Once such tool is sed and an example of printing a specific line from a huge file can be found here.
Should be trivial to wrap this in a system_exec() call, or similar to write the function you are looking for.
Considering i have a 100GB txt file containing millions of lines of text. How could i read this text file by block of lines using PHP?
i can't use file_get_contents(); because the file is too large. fgets() also read the text line by line which will likely takes longer time to finish reading the whole file.
If i'll be using fread($fp,5030) wherein '5030' is some length value for which it has to read. Would there be a case where it won't read the whole line(such as stop at the middle of the line) because it has reached the max length?
i can't use file_get_contents(); because the file is too large. fgets() also read the text line by line which will likely takes longer time to finish reading the whole file.
Don't see, why you shouldn't be able to use fgets()
$blocksize = 50; // in "number of lines"
while (!feof($fh)) {
$lines = array();
$count = 0;
while (!feof($fh) && (++$count <= $blocksize)) {
$lines[] = fgets($fh);
}
doSomethingWithLines($lines);
}
Reading 100GB will take time anyway.
The fread approach sounds like a reasonable solution. You can detect whether you've reached the end of a line by checking whether the final character in the string is a newline character ('\n'). If it isn't, then you can either read some more characters and append them to your existing string, or you can trim characters from your string back to the last newline, and then use fseek to adjust your position in the file.
Side point: Are you aware that reading a 100GB file will take a very long time?
i think that you have to use fread($fp, somesize), and check manually if you have founded the end of the line, otherwise read another chunk.
Hope this helps.
I would recommend implementing the reading of a single line within a function, hiding the implementation details of that specific step from the rest of your code - the processing function must not care how the line was retrieved. You can then implement your first version using fgets() and then try other methods if you notice that it is too slow. It could very well be that the initial implementation is too slow, but the point is: you won't know until you've benchmarked.
I know this is an old question, but I think there is value for a new answer for anyone that finds this question eventually.
I agree that reading 100GB takes time, that I why I also agree that we need to find the most effective option to read it so it can be as little as possible instead of just thinking "who cares how much it is if is already a lot", so, lets find out our lowest time possible.
Another solution:
Cache a chunk of raw data
Use fread to read a cache of that data
Read line by line
Read line by line from the cache until end of cache or end of data found
Read next chunk and repeat
Grab the un processed last part of the chunk (the one you were looking for the line delimiter) and move it at the front, then reads a chunk of the size you had defined minus the size of the unprocessed data and put it just after that un processed chunk, then, there you go, you have a new complete chunk.
Repeat the read by line and this process until the file is read completely.
You should use a cache chunk bigger than any expected size of line.
The bigger the cache size the faster you read, but the more memory you use.
What is the best way to overwrite a specific line in a file? I basically want to search a file for the string '#parsethis' and overwrite the rest of that line with something else.
If the file is really big (log files or something like this) and you are willing to sacrifice speed for memory consumption you could open two files and essentially do the trick Jeremy Ruten proposed by using files instead of system memory.
$source='in.txt';
$target='out.txt';
// copy operation
$sh=fopen($source, 'r');
$th=fopen($target, 'w');
while (!feof($sh)) {
$line=fgets($sh);
if (strpos($line, '#parsethis')!==false) {
$line='new line to be inserted' . PHP_EOL;
}
fwrite($th, $line);
}
fclose($sh);
fclose($th);
// delete old source file
unlink($source);
// rename target file to source file
rename($target, $source);
If the file isn't too big, the best way would probably be to read the file into an array of lines with file(), search through the array of lines for your string and edit that line, then implode() the array back together and fwrite() it back to the file.
Your main problem is the fact that the new line may not be the same length as the old line. If you need to change the length of the line, there is no way out of rewriting at least all of the file after the changed line. The easiest way is to create a new, modified file and then move it over the original. This way there is a complete file available at all times for readers. Use locking to make sure that only one script is modifying the file at once, and since you are going to replace the file, do the locking on a different file. Check out flock().
If you are certain that the new line will be the same length as the old line, you can open the file in read/write mode (use r+ as the second argument to fopen()) and call ftell() to save the position the line starts at each time before you call fgets() to read a line. Once you find the line that you want to overwrite, you can use fseek() to go back to the beginning of the line and fwrite() the new data. One way to force the line to always be the same length is to space pad it out to the maximum possible length.
This is a solution that works for rewriting only one line of a file in place with sed from PHP. My file contains only style vars and is formatted:
$styleVarName: styleVarProperty;\n
For this I first add the ":" to the ends of myStyleVarName, and sed replaces the rest of that line with the new property and adds a semicolon.
Make sure characters are properly escaped in myStyleVarProp.
$command = "pathToShellScript folder1Name folder2Name myStyleVarName myStyleVarProp";
shell_exec($command);
/* shellScript */
#!/bin/bash
file=/var/www/vhosts/mydomain.com/$1/$2/scss/_variables.scss
str=$3"$4"
sed -i "s/^$3.*/$str;/" $file
or if your file isn't too big:
$sample = file_get_contents('sample');
$parsed =preg_replace('##parsethis.*#', 'REPLACE TO END OF LINE', $sample);
You'll have to choose delimiters '#' that aren't present in the file though.
If you want to completely replace the contents of one file with the contents of another file you can use this:
rename("./some_path/data.txt", "./some_path/data_backup.txt");
rename("./some_path/new_data.txt", "./some_path/data.txt");
So in the first line you backup the file and in the second line you replace the file with the contents of a new file.
As far as I can tell the rename returns a boolean. True if the rename is successful and false if it fails. One could, therefore, only run the second step if the first step is successful to prevent overwriting the file unless a backup has been made successfully. Check out:
https://www.php.net/manual/en/function.rename.php
Hope that is useful to someone.
Cheers
Adrian
I'd most likely do what Jeremy suggested, but just for an alternate way to do it here is another solution. This has not been tested or used and is for *nix systems.
$cmd = "grep '#parsethis' " . $filename;
$output = system($cmd, $result);
$lines = explode("\n", $result);
// Read the entire file as a string
// Do a str_repalce for each item in $lines with ""