I want to merge two large CSV files with PHP. This files are too big to even put into memory all at once. In pseudocode, I can think of something like this:
for i in file1
file3.write(file1.line(i) + ',' + file2.line(i))
end
But when I'm looping through a file using fgetcsv, it's not really clear how I would grab line n from a certain file without loading the whole thing into memory first.
Any ideas?
Edit: I forgot to mention that each of the two files has the same number of lines and they have a one-to-one relationship. That is, line 62,324 in file1 goes with line 62,324 in file2.
Not sure what operating system you're on, but if you're using Linux, using the paste command is probably a lot easier than trying to do this in PHP.
If this is a viable solution and you don't absolutely need to do it in PHP, you could try the following:
paste -d ',' file1 file2 > combined_file
Take a look at the fgets function. You could read a single line of each file, process them, and write them to your new file, then move on to the next line until you've reached the end of your file.
PHP: fgets
Specifically look at the example titled Example #1 Reading a file line by line in the PHP manual. It's also important to note the return value of the the fgets functions.
Returns a string of up to length - 1
bytes read from the file pointed to by
handle. If there is no more data to
read in the file pointer, then FALSE
is returned.
So, if it doesn't return FALSE you know you still have more lines to process.
You can use fgets().
$file1 = fopen('file1.txt', 'r');
$file2 = fopen('file2.txt', 'r');
$merged = fopen('merged.txt', 'w');
while (
($line1 = fgets($file1)) !== false
&& ($line2 = fgets($file2)) !== false) {
fwrite($merged, $line1 . ',' . $line2);
}
fgets() reads one line from a file. As you can see, this code uses it on both files at the same time, writing the merged lines to a third file. The manual here:
http://php.net/fgets
http://php.net/fopen
http://php.net/fwrite
Try using fgets() to read one line from each file at a time.
I think the solution for this is to map first line begins for each line ( and some kind of key if you need ) and then make a new csv using fread and fwrite ( we know beginning and ending of each line now , so we need just seek and read )
Another way is to put it into MySQL ( if it is possible ) and then back to new CSV
Related
What is the best way to overwrite a specific line in a file? I basically want to search a file for the string '#parsethis' and overwrite the rest of that line with something else.
If the file is really big (log files or something like this) and you are willing to sacrifice speed for memory consumption you could open two files and essentially do the trick Jeremy Ruten proposed by using files instead of system memory.
$source='in.txt';
$target='out.txt';
// copy operation
$sh=fopen($source, 'r');
$th=fopen($target, 'w');
while (!feof($sh)) {
$line=fgets($sh);
if (strpos($line, '#parsethis')!==false) {
$line='new line to be inserted' . PHP_EOL;
}
fwrite($th, $line);
}
fclose($sh);
fclose($th);
// delete old source file
unlink($source);
// rename target file to source file
rename($target, $source);
If the file isn't too big, the best way would probably be to read the file into an array of lines with file(), search through the array of lines for your string and edit that line, then implode() the array back together and fwrite() it back to the file.
Your main problem is the fact that the new line may not be the same length as the old line. If you need to change the length of the line, there is no way out of rewriting at least all of the file after the changed line. The easiest way is to create a new, modified file and then move it over the original. This way there is a complete file available at all times for readers. Use locking to make sure that only one script is modifying the file at once, and since you are going to replace the file, do the locking on a different file. Check out flock().
If you are certain that the new line will be the same length as the old line, you can open the file in read/write mode (use r+ as the second argument to fopen()) and call ftell() to save the position the line starts at each time before you call fgets() to read a line. Once you find the line that you want to overwrite, you can use fseek() to go back to the beginning of the line and fwrite() the new data. One way to force the line to always be the same length is to space pad it out to the maximum possible length.
This is a solution that works for rewriting only one line of a file in place with sed from PHP. My file contains only style vars and is formatted:
$styleVarName: styleVarProperty;\n
For this I first add the ":" to the ends of myStyleVarName, and sed replaces the rest of that line with the new property and adds a semicolon.
Make sure characters are properly escaped in myStyleVarProp.
$command = "pathToShellScript folder1Name folder2Name myStyleVarName myStyleVarProp";
shell_exec($command);
/* shellScript */
#!/bin/bash
file=/var/www/vhosts/mydomain.com/$1/$2/scss/_variables.scss
str=$3"$4"
sed -i "s/^$3.*/$str;/" $file
or if your file isn't too big:
$sample = file_get_contents('sample');
$parsed =preg_replace('##parsethis.*#', 'REPLACE TO END OF LINE', $sample);
You'll have to choose delimiters '#' that aren't present in the file though.
If you want to completely replace the contents of one file with the contents of another file you can use this:
rename("./some_path/data.txt", "./some_path/data_backup.txt");
rename("./some_path/new_data.txt", "./some_path/data.txt");
So in the first line you backup the file and in the second line you replace the file with the contents of a new file.
As far as I can tell the rename returns a boolean. True if the rename is successful and false if it fails. One could, therefore, only run the second step if the first step is successful to prevent overwriting the file unless a backup has been made successfully. Check out:
https://www.php.net/manual/en/function.rename.php
Hope that is useful to someone.
Cheers
Adrian
I'd most likely do what Jeremy suggested, but just for an alternate way to do it here is another solution. This has not been tested or used and is for *nix systems.
$cmd = "grep '#parsethis' " . $filename;
$output = system($cmd, $result);
$lines = explode("\n", $result);
// Read the entire file as a string
// Do a str_repalce for each item in $lines with ""
Let's consider we got following format text file with a size around 1Gb:
...
li1
li2
li3
...
My task is to update line li2 to line2.
Following will not work:
$fd = fopen("file", 'c+');
// ..
// code that loops till we reach li2 line..
// ..
$offset = ftell($fd);
// ..
fseek($fd, $offset );
fwrite($fd, "line2" . PHP_EOL);
Since it produces:
...
li1
line2
3
...
I'm expecting to have as result:
...
li1
line2
li3
...
Thanks
In my opinion we need a temporary file here where we copy the non-changed result, add the change and continue with non-changed result. At the end the original file may be renamed and the temp file to take its name.
As I am familiar with file systems you cannot just delete/insert part of the file. Or it is just too complicated and it is not worthy.
Second - you can copy the file in the memory and make your modifications there but here you are resource dependent and your code may not pass on some servers (e.g. dedicated hosting).
Good luck!
If you know what you must change in text just use this:
$file_data=file_get_contents(some_name.txt);
$text = str_replace("line2", "li2", $file_data);
file_put_contents('some_name_changed.txt', $text);
Or read line by line and load in $text then replace what you want.
I'm trying to open a file and determine if it is valid. It's valid if the first line is START and the last line is END.
I've seen different ways of getting the last line of a file, but it does not pay particular attention to the first line either.
How should I go about this? I was thinking of loading the file contents in an array and checking $array[0] and $array[x] for START and END. But this seems to be a waste for all the junk that could possibly be in the middle.
If its a valid file, I will be reading/processing the contents of the file between START and END.
Don't read entire file into an array if it is not needed. If file can be big you can do it that way:
$h = fopen('text.txt', 'r');
$firstLine = fgets($h);
fseek($h, -3, SEEK_END);
$lastThreeChars = fgets($h);
Memory footprint is much lower
That's from me:
$lines = file($pathToFile);
if ($lines[0] == 'START' && end($lines) == 'END') {
// do stuff
}
Reading whole file with fgets will be efficient for small siles. iF ur file is big then:
open It and read first line
use tail (i didn't check it but it looks OK) function I found in php.net in fseek documentation
How can i get a particular line in a 3 gig text file. The lines are delimited by \n. And i need to be able to get any line on demand.
How can this be done? Only one line need be returned. And i would not like to use any system calls.
Note: There is the same question elsewhere regarding how to do this in bash. I would like to compare it with the PHP equiv.
Update: Each line is the same length the whole way thru.
Without keeping some sort of index to the file, you would need to read all of it until you've encountered x number of \n characters. I see that nickf has just posted some way of doing that, so I won't repeat it.
To do this repeatedly in an efficient manner, you will need to build an index. Store some known file positions for certain (or all) line numbers once, which you can then use to seek to the right location using fseek.
Edit: if each line is the same length, you do not need the index.
$myfile = fopen($fileName, "r");
fseek($myfile, $lineLength * $lineNumber);
$line = fgets($myfile);
fclose($myfile);
Line number is 0 based in this example, so you may need to subtract one first. The line length includes the \n character.
There is little discussion of the problem and no mention is made of how the 'one line' should be referenced (by number, some value within it, etc.) so below is just a guess as to what you're wanting.
If you're not averse to using an object (it might be 'too high level', perhaps) and wish to reference the line by offset, then SplFileObject (available as of PHP 5.1.0) could be used. See the following basic example:
$file = new SplFileObject('myreallyhugefile.dat');
$file->seek(12345689); // seek to line 123456790
echo $file->current(); // or simply, echo $file
That particular method (seek) requires scanning through the file line-by-line. However, if as you say all the lines are the same length then you can instead use fseek to get where you want to go much, much faster.
$line_length = 1024; // each line is 1 KB line
$file->fseek($line_length * 1234567); // seek lots of bytes
echo $file->current(); // echo line 1234568
You said each line has the same length, so you can use fopen() in combination with fseek() to get a line quickly.
http://ch2.php.net/manual/en/function.fseek.php
The only way I can think to do it would be like this:
function getLine($fileName, $num) {
$fh = fopen($fileName, 'r');
for ($i = 0; $i < $num && ($line = fgets($fh)); ++$i);
return $line;
}
While this is not a solution exactly, how come you are needing to pull out one line from a 3 gig text file? is perfomance an issue or can this run a leisurely pace?
If you need pull lots of lines out of this file at different points in time, i would definately suggest putting this data into a DB of some kind. SQLite maybe your friend here as its very simple but not great with lots of scripts/people accessing it at one time.
What is the best way to overwrite a specific line in a file? I basically want to search a file for the string '#parsethis' and overwrite the rest of that line with something else.
If the file is really big (log files or something like this) and you are willing to sacrifice speed for memory consumption you could open two files and essentially do the trick Jeremy Ruten proposed by using files instead of system memory.
$source='in.txt';
$target='out.txt';
// copy operation
$sh=fopen($source, 'r');
$th=fopen($target, 'w');
while (!feof($sh)) {
$line=fgets($sh);
if (strpos($line, '#parsethis')!==false) {
$line='new line to be inserted' . PHP_EOL;
}
fwrite($th, $line);
}
fclose($sh);
fclose($th);
// delete old source file
unlink($source);
// rename target file to source file
rename($target, $source);
If the file isn't too big, the best way would probably be to read the file into an array of lines with file(), search through the array of lines for your string and edit that line, then implode() the array back together and fwrite() it back to the file.
Your main problem is the fact that the new line may not be the same length as the old line. If you need to change the length of the line, there is no way out of rewriting at least all of the file after the changed line. The easiest way is to create a new, modified file and then move it over the original. This way there is a complete file available at all times for readers. Use locking to make sure that only one script is modifying the file at once, and since you are going to replace the file, do the locking on a different file. Check out flock().
If you are certain that the new line will be the same length as the old line, you can open the file in read/write mode (use r+ as the second argument to fopen()) and call ftell() to save the position the line starts at each time before you call fgets() to read a line. Once you find the line that you want to overwrite, you can use fseek() to go back to the beginning of the line and fwrite() the new data. One way to force the line to always be the same length is to space pad it out to the maximum possible length.
This is a solution that works for rewriting only one line of a file in place with sed from PHP. My file contains only style vars and is formatted:
$styleVarName: styleVarProperty;\n
For this I first add the ":" to the ends of myStyleVarName, and sed replaces the rest of that line with the new property and adds a semicolon.
Make sure characters are properly escaped in myStyleVarProp.
$command = "pathToShellScript folder1Name folder2Name myStyleVarName myStyleVarProp";
shell_exec($command);
/* shellScript */
#!/bin/bash
file=/var/www/vhosts/mydomain.com/$1/$2/scss/_variables.scss
str=$3"$4"
sed -i "s/^$3.*/$str;/" $file
or if your file isn't too big:
$sample = file_get_contents('sample');
$parsed =preg_replace('##parsethis.*#', 'REPLACE TO END OF LINE', $sample);
You'll have to choose delimiters '#' that aren't present in the file though.
If you want to completely replace the contents of one file with the contents of another file you can use this:
rename("./some_path/data.txt", "./some_path/data_backup.txt");
rename("./some_path/new_data.txt", "./some_path/data.txt");
So in the first line you backup the file and in the second line you replace the file with the contents of a new file.
As far as I can tell the rename returns a boolean. True if the rename is successful and false if it fails. One could, therefore, only run the second step if the first step is successful to prevent overwriting the file unless a backup has been made successfully. Check out:
https://www.php.net/manual/en/function.rename.php
Hope that is useful to someone.
Cheers
Adrian
I'd most likely do what Jeremy suggested, but just for an alternate way to do it here is another solution. This has not been tested or used and is for *nix systems.
$cmd = "grep '#parsethis' " . $filename;
$output = system($cmd, $result);
$lines = explode("\n", $result);
// Read the entire file as a string
// Do a str_repalce for each item in $lines with ""