I am using PHP to import data from a CSV file using fgetcsv(), which yields an array for each row. Initially, I had the character limit set at 1024, like so:
while ($data = fgetcsv($fp, 1024)) {
// do stuff with the row
}
However, a CSV with 200+ columns surpassed the 1024 limit on many rows. This caused the line read to stop in the middle of a row, and then the next call to fgetcsv() would start where the previous one left off and so on until an EOL was reached.
I have since upped this limit to 4096, which should take care of the majority of cases, but I would like put a check in to be sure that the entire line was read after each line is fetched. How do I go about this?
I was thinking to check the end of the last element of the array for end of line characters (\n, \r, \r\n), but wouldn't these be parsed out by the fgetcsv() call?
Just omit the length parameter. It's optional in PHP5.
while ($data = fgetcsv($fp)) {
// do stuff with the row
}
Just don't specify a limit, and fgetcsv() will slurp in as much as is necessary to capture a full line. If you do specify a limit, then it's entirely up to YOU to scan the file stream and ensure you're not slicing something down the middle.
However, note that not specifying a limit can be risky if you don't have control over generation of this .csv in the first place. It'd be easy to swamp your server with a malicious CSV that has a many terabytes of data on a single line.
Thank you for the suggestions, but these solutions really didn't solve the issue of knowing that we account for the longest line while still providing a limit. I was able to accomplish this by using the wc -L UNIX command via shell_exec() to determine the longest line in the file prior to beginning the line fetching. The code is below:
// open the CSV file to read lines
$fp = fopen($sListFullPath, 'r');
// use wc to figure out the longest line in the file
$longestArray = explode(" ", shell_exec('wc -L ' . $sListFullPath));
$longest_line = (int)$longestArray[0] + 4; // add a little padding for EOL chars
// check against a user-defined maximum length
if ($longest_line > $line_length_max) {
// alert user that the length of at least one line in the CSV is too long
}
// read in the data
while ($data = fgetcsv($fp, $longest_line)) {
// do stuff with the row
}
This approach ensures that every line is read in its entirety and still provides a safety net for really long lines without stepping through the entire file with PHP line by line.
I would be careful with your final solution. I was able to upload a file named /.;ls -a;.csv to perform command injection. Make sure you validate the file path if you use this approach. Also, it might be a good idea to provide a default_length in the case your wc fails for any reason.
// use wc to find max line length
// uses a hardcoded default if wc fails
// this is relatively safe from command
// injection since the file path is a tmp file
$wc = explode(" ", shell_exec('wc -L ' . $validated_file_path));
$longest_line = (int)$wc[0];
$length = ($longest_line) ? $longest_line + 4 : $default_length;
fgetcsv() is by default is used to read line by line from a csv file but when it is not functioning that way, you have to check PHP_EOL character on your OS machine
you have simply to go:
C:\xampp\php\php.ini
and search for:
;auto_detect_line_endings = Off
and uncomment it and activate it to:
auto_detect_line_endings = On
restart Apache and check . . . should works
Related
What is the best way to overwrite a specific line in a file? I basically want to search a file for the string '#parsethis' and overwrite the rest of that line with something else.
If the file is really big (log files or something like this) and you are willing to sacrifice speed for memory consumption you could open two files and essentially do the trick Jeremy Ruten proposed by using files instead of system memory.
$source='in.txt';
$target='out.txt';
// copy operation
$sh=fopen($source, 'r');
$th=fopen($target, 'w');
while (!feof($sh)) {
$line=fgets($sh);
if (strpos($line, '#parsethis')!==false) {
$line='new line to be inserted' . PHP_EOL;
}
fwrite($th, $line);
}
fclose($sh);
fclose($th);
// delete old source file
unlink($source);
// rename target file to source file
rename($target, $source);
If the file isn't too big, the best way would probably be to read the file into an array of lines with file(), search through the array of lines for your string and edit that line, then implode() the array back together and fwrite() it back to the file.
Your main problem is the fact that the new line may not be the same length as the old line. If you need to change the length of the line, there is no way out of rewriting at least all of the file after the changed line. The easiest way is to create a new, modified file and then move it over the original. This way there is a complete file available at all times for readers. Use locking to make sure that only one script is modifying the file at once, and since you are going to replace the file, do the locking on a different file. Check out flock().
If you are certain that the new line will be the same length as the old line, you can open the file in read/write mode (use r+ as the second argument to fopen()) and call ftell() to save the position the line starts at each time before you call fgets() to read a line. Once you find the line that you want to overwrite, you can use fseek() to go back to the beginning of the line and fwrite() the new data. One way to force the line to always be the same length is to space pad it out to the maximum possible length.
This is a solution that works for rewriting only one line of a file in place with sed from PHP. My file contains only style vars and is formatted:
$styleVarName: styleVarProperty;\n
For this I first add the ":" to the ends of myStyleVarName, and sed replaces the rest of that line with the new property and adds a semicolon.
Make sure characters are properly escaped in myStyleVarProp.
$command = "pathToShellScript folder1Name folder2Name myStyleVarName myStyleVarProp";
shell_exec($command);
/* shellScript */
#!/bin/bash
file=/var/www/vhosts/mydomain.com/$1/$2/scss/_variables.scss
str=$3"$4"
sed -i "s/^$3.*/$str;/" $file
or if your file isn't too big:
$sample = file_get_contents('sample');
$parsed =preg_replace('##parsethis.*#', 'REPLACE TO END OF LINE', $sample);
You'll have to choose delimiters '#' that aren't present in the file though.
If you want to completely replace the contents of one file with the contents of another file you can use this:
rename("./some_path/data.txt", "./some_path/data_backup.txt");
rename("./some_path/new_data.txt", "./some_path/data.txt");
So in the first line you backup the file and in the second line you replace the file with the contents of a new file.
As far as I can tell the rename returns a boolean. True if the rename is successful and false if it fails. One could, therefore, only run the second step if the first step is successful to prevent overwriting the file unless a backup has been made successfully. Check out:
https://www.php.net/manual/en/function.rename.php
Hope that is useful to someone.
Cheers
Adrian
I'd most likely do what Jeremy suggested, but just for an alternate way to do it here is another solution. This has not been tested or used and is for *nix systems.
$cmd = "grep '#parsethis' " . $filename;
$output = system($cmd, $result);
$lines = explode("\n", $result);
// Read the entire file as a string
// Do a str_repalce for each item in $lines with ""
I have a 10MB text file.
The length of the lines may vary.
Which is the most efficient way (fast and memory friendly) to read just one specific line from this file? e.g. get_me_the_line($nr, $file_resource)
I don't know of a way to just jump to the line, if the lines are of varying length. However you can iterate through lines pretty quickly when not using them for anything, and return the one of interest.
function ReadLineNumber($file, $number)
{
$handle = fopen($file, "r");
$i = 0;
while (fgets($handle) && $i < $number - 1)
$i++;
return fgets($handle);
}
Edit
I added - 1 to the loop because this reads a line ahead. The $number is therefore a zero-index line reference. Change to - 2 if you would prefer line 1 mean the first line in the file.
As the lines are of varying length you have to look at each character as it might denote the end of the line. Quickest would be loading the file in chunks that are sized like the blocksize of the filesystem and counting the linebreaks until you are on the desired line.
Better way would be to have an index file that stores information about the file containing the lines. Using a database could also be a better idea.
If the file is REALLY large (several GB or more) and your application is running on *nix you may not want to try having PHP process the file and instead use some existing unix tools optimized for this kind of line processing. Once such tool is sed and an example of printing a specific line from a huge file can be found here.
Should be trivial to wrap this in a system_exec() call, or similar to write the function you are looking for.
How can i get a particular line in a 3 gig text file. The lines are delimited by \n. And i need to be able to get any line on demand.
How can this be done? Only one line need be returned. And i would not like to use any system calls.
Note: There is the same question elsewhere regarding how to do this in bash. I would like to compare it with the PHP equiv.
Update: Each line is the same length the whole way thru.
Without keeping some sort of index to the file, you would need to read all of it until you've encountered x number of \n characters. I see that nickf has just posted some way of doing that, so I won't repeat it.
To do this repeatedly in an efficient manner, you will need to build an index. Store some known file positions for certain (or all) line numbers once, which you can then use to seek to the right location using fseek.
Edit: if each line is the same length, you do not need the index.
$myfile = fopen($fileName, "r");
fseek($myfile, $lineLength * $lineNumber);
$line = fgets($myfile);
fclose($myfile);
Line number is 0 based in this example, so you may need to subtract one first. The line length includes the \n character.
There is little discussion of the problem and no mention is made of how the 'one line' should be referenced (by number, some value within it, etc.) so below is just a guess as to what you're wanting.
If you're not averse to using an object (it might be 'too high level', perhaps) and wish to reference the line by offset, then SplFileObject (available as of PHP 5.1.0) could be used. See the following basic example:
$file = new SplFileObject('myreallyhugefile.dat');
$file->seek(12345689); // seek to line 123456790
echo $file->current(); // or simply, echo $file
That particular method (seek) requires scanning through the file line-by-line. However, if as you say all the lines are the same length then you can instead use fseek to get where you want to go much, much faster.
$line_length = 1024; // each line is 1 KB line
$file->fseek($line_length * 1234567); // seek lots of bytes
echo $file->current(); // echo line 1234568
You said each line has the same length, so you can use fopen() in combination with fseek() to get a line quickly.
http://ch2.php.net/manual/en/function.fseek.php
The only way I can think to do it would be like this:
function getLine($fileName, $num) {
$fh = fopen($fileName, 'r');
for ($i = 0; $i < $num && ($line = fgets($fh)); ++$i);
return $line;
}
While this is not a solution exactly, how come you are needing to pull out one line from a 3 gig text file? is perfomance an issue or can this run a leisurely pace?
If you need pull lots of lines out of this file at different points in time, i would definately suggest putting this data into a DB of some kind. SQLite maybe your friend here as its very simple but not great with lots of scripts/people accessing it at one time.
Using PHP, it's possible to read off the contents of a file using fopen and fgets. Each time fgets is called, it returns the next line in the file.
How does fgets know what line to read? In other words, how does it know that it last read line 5, so it should return the contents of line 6 this time? Is there a way for me to access that line-number data?
(I know it's possible to do something similar by reading the entire contents of the file into an array with file, but I'd like to accomplish this with fopen.)
There is a "position" kept in memory for each file that is opened ; it is automatically updated each time you are reading a line/character/whatever from the file.
You can get this position with ftell, and modify it with fseek :
ftell — Returns the current position
of the file read/write pointer
fseek — Seeks on a file pointer
You can also use rewind to... rewind... the position of that pointer.
This is not getting you a position as a line number, but closer to a position as a character number (actually, you are getting the position as a number of bytes from the beginning of the file) ; when you have that, reading a line is just a metter of reading characters until yu hit an end of line character.
BTW : as far as I remember, these functions are coming from the C language -- PHP itself being written in C ;-)
Files are just a stream of data, read from the beginning to the end. The OS will remember the position you've read so far in that file. If needed, doing so in the application as well is fairly simple. The OS only cares about byte positions though, not lines.
Just imagine dealing out a deck of 52 card sequentially. You hand off the first card. Next time the 2. card. When you want to give out the 3. card , you don't need to start counting from the start again, or even remembering where you were you just hand out the next available card, and that'll be the third.
It might be a bit more work that's needed to read lines, since you'd want to buffer data read from the actual file for preformance sake, but it's not that much more to it than to record the offset of the last piece of data you handed out, find the next newline character and hand off all the data between those 2 points.
PHP nor the OS has no real need to keep the line number around, since all the system care about is "next line". If you want to know the line number, you keep a counter and increment it every time your app reads a line.
$lineno=0;
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
lineno++; // keep track of the line number
...
}
i hav this old sample i hob its can help you :)
$File = file('path');
$array = array();
$linenr = 5;
foreach( $File AS $line_num => $line )
{
$array = array_push( $array , $line );
}
echo $array[($linenr-1)];
You could just call fgets and increment a var $line_number each time you call it. That would tell you the line it is on.
What is the best way to overwrite a specific line in a file? I basically want to search a file for the string '#parsethis' and overwrite the rest of that line with something else.
If the file is really big (log files or something like this) and you are willing to sacrifice speed for memory consumption you could open two files and essentially do the trick Jeremy Ruten proposed by using files instead of system memory.
$source='in.txt';
$target='out.txt';
// copy operation
$sh=fopen($source, 'r');
$th=fopen($target, 'w');
while (!feof($sh)) {
$line=fgets($sh);
if (strpos($line, '#parsethis')!==false) {
$line='new line to be inserted' . PHP_EOL;
}
fwrite($th, $line);
}
fclose($sh);
fclose($th);
// delete old source file
unlink($source);
// rename target file to source file
rename($target, $source);
If the file isn't too big, the best way would probably be to read the file into an array of lines with file(), search through the array of lines for your string and edit that line, then implode() the array back together and fwrite() it back to the file.
Your main problem is the fact that the new line may not be the same length as the old line. If you need to change the length of the line, there is no way out of rewriting at least all of the file after the changed line. The easiest way is to create a new, modified file and then move it over the original. This way there is a complete file available at all times for readers. Use locking to make sure that only one script is modifying the file at once, and since you are going to replace the file, do the locking on a different file. Check out flock().
If you are certain that the new line will be the same length as the old line, you can open the file in read/write mode (use r+ as the second argument to fopen()) and call ftell() to save the position the line starts at each time before you call fgets() to read a line. Once you find the line that you want to overwrite, you can use fseek() to go back to the beginning of the line and fwrite() the new data. One way to force the line to always be the same length is to space pad it out to the maximum possible length.
This is a solution that works for rewriting only one line of a file in place with sed from PHP. My file contains only style vars and is formatted:
$styleVarName: styleVarProperty;\n
For this I first add the ":" to the ends of myStyleVarName, and sed replaces the rest of that line with the new property and adds a semicolon.
Make sure characters are properly escaped in myStyleVarProp.
$command = "pathToShellScript folder1Name folder2Name myStyleVarName myStyleVarProp";
shell_exec($command);
/* shellScript */
#!/bin/bash
file=/var/www/vhosts/mydomain.com/$1/$2/scss/_variables.scss
str=$3"$4"
sed -i "s/^$3.*/$str;/" $file
or if your file isn't too big:
$sample = file_get_contents('sample');
$parsed =preg_replace('##parsethis.*#', 'REPLACE TO END OF LINE', $sample);
You'll have to choose delimiters '#' that aren't present in the file though.
If you want to completely replace the contents of one file with the contents of another file you can use this:
rename("./some_path/data.txt", "./some_path/data_backup.txt");
rename("./some_path/new_data.txt", "./some_path/data.txt");
So in the first line you backup the file and in the second line you replace the file with the contents of a new file.
As far as I can tell the rename returns a boolean. True if the rename is successful and false if it fails. One could, therefore, only run the second step if the first step is successful to prevent overwriting the file unless a backup has been made successfully. Check out:
https://www.php.net/manual/en/function.rename.php
Hope that is useful to someone.
Cheers
Adrian
I'd most likely do what Jeremy suggested, but just for an alternate way to do it here is another solution. This has not been tested or used and is for *nix systems.
$cmd = "grep '#parsethis' " . $filename;
$output = system($cmd, $result);
$lines = explode("\n", $result);
// Read the entire file as a string
// Do a str_repalce for each item in $lines with ""