I have a script that parses large files line by line. When it encounters an error that it can't handle, it stops, notifying us of the last line parsed.
Is this really the best / only way to seek to a specific line in a file? (fseek() is not usable in my case.)
<?php
for ($i = 0; $i < 100000; $i++)
fgets($fp); // just discard this
I don't have a problem using this, it is fast enough - it just feels a bit dirty. From what I know about the underlying code, I don't imagine there is a better way to do this.
An easy way to seek to a specific line in a file is to use the SplFileObject class, which supports seeking to a line number (seek()) or byte offset (fseek()).
$file = new SplFileObject('myfile.txt');
$file->seek(9999); // Seek to line no. 10,000
echo $file->current(); // Print contents of that line
In the background, seek() just does what your PHP code did (except, in C code).
If you only have the line number to go on, there is no other method of finding the line. Files are not line based (or even character based), so there is no way to simply jump to a specific line in a file.
There might be other ways of reading the lines in the file that might be slightly faster, like reading larger chunks of the file into a buffer and read lines from that, but you could only hope for it to be a few percent faster. Any method to find a specific line in a file still has to read all data up to that line.
I know it is late for posting but it can help some ppl
I did a function like fseekbyline one day ...
function GoToLine($handle,$line)
{
fseek($handle,0); // seek to 0
$i = 0;
$bufcarac = 0;
for($i = 1;$i<$line;$i++)
{
$ligne = fgets($handle);
$bufcarac += strlen($ligne); // in the end bufcarac will contains all caracters until the line
}
fseek($handle,$bufcarac);
}
there is no error system, if you wanna go to the line <1 or 203 but the file is empty ...
you will get nothing good.
same if you wanna go out of eot
rewind($handle);
for ($i=0; $i < $desired_line; $i++) {
fgetcsv($handle, 1000, ",");
}
This is working for me while I need to rewind to a specific line multiple times in my script.
I am not sure if this eats up memory or speed, but it does the trick.
If I understand correctly, you want to seek to the specific line at some point after you have found an error. If that is the case, you probably store or print the line-number of the bad line somewhere, depending on what you mean by "notify".
Unless you really mean that you cannot use fseek()*, what you can do is to also store/print the position in the file where the bad line starts. Then you can fseek().
* How, in that case, would fseekbyline() be usable if it existed?
Related
I know there might be many questions related to this one but I read most of them and still got no idea on how to do this.
Suppose I have a certain .txt file with each line being part of a chat log.
What I'd like to do is retrieve lines of the file, from line x onwards to the end. So if I have x = 450 I'd like to obtain the lines 450->eof(). I've tried this with stream_get_line and other functions but I always get stuck at "number of bytes to read" from file. How am I even suppose to know the number of bytes a line occupies, being them all different from each other?
Each line separates by \n and the files might have +500.000 lines.
Then I'd need to process each extracted line and I take it from here...
Here is a schema
if(file_exists($file)){
$x = 500; //hard-coded
$file = fopen($file);
$single_line_or_could_be_all_at_once(exploded) = stream_get_line($file,$x,"\n"); //nope.
//process lines
}
Since each line is of variable width, the only way to do this is to read from the beginning of the file, skipping over the first N lines that you don't need.
My PHP is very rusty, but it would go something like this (and feel free to edit the code to correct any syntax errors)
$lineNr = 1;
$in = fopen('file.txt', 'rb');
while($line = fread($in)) {
if ($lineNr >= 450) {
// Use the line
}
$lineNr++;
}
If you want, you can modify the while to end when either the end of the file is reached, or a maximum line number desired is reached, like:
while($line = fread($in) && $lineNr < 550) {
I'm ok with PHP but probably not half as good as some of you guys on here.
I am basically trying to find a way to grab a line from a huge and I mean huge text file.... its basically a list of keywords I want to call by line number but without preferably going through them all before I get to that line.....otherwise couldmcrash my server obviously.
At the moment im using this
$lines = file('http://www.mysite.com/keywords.txt');
foreach ($lines as $line_num => $line) {
echo "$line_num";
}
This works but im sure theres gotta be a better way of doing to save on usuage because this is putting the whole file into the memory and if I can simply say to php give me line number 97, would umm RULE....
Hope you guys can come up with a solution as your much smarter than me :P ty
Use SplFileObject
$file = "test.txt";
$line_number = 1000;
$file_obj = new SplFileObject( $file );
/*** seek to the line number ***/
$file_obj->seek( $line_number );
/*** return the current line ***/
echo $file_obj->current();
If the lines are just text and variable in length, you can't know which line is #97; the only thing that makes it 97th is that there are 96 lines before.
So you need to read the whole file up to that point (this is what SplFileObject does):
$fp = fopen("keywords.txt", "r");
while($line--)
{
if (feof($fp))
// ERROR: line does not exist
$text = fgets($fp, 1024); // 1024 = max length of one line
}
fclose($fp);
But if you can store a line number before each line, i.e. the file is
...
95 abbagnale
96 abbatangelo
97 abbatantuono
98 ...
then you can implement a sort of binary search:
- start with s1 = 0 and s2 = file length
- read a keyword and line number at seek position s3 = (s1+s2)/2 (*)
- if line number is less than desired, s1 = s3; else s2 = s3; and repeat previous step.
- if line number is the one desired, strip the number from the text and you get the keyword.
(*) since the line most likely will not start exactly at s#, you need two fgets: one to get rid of the spurious half keyword, the second to read the line number. When you get "close", it will be faster to read a bigger chunk and split it into lines. For example, you seek line 170135 and read in line 170180: what you'd better do is rewind the seek position by one kilobyte, read in a kilobyte of data, and seek 170135 in there.
Or, if the lengths of the various lines are not too different, it could be worthwhile to store a fixed size line (here the "#" should actually be spaces, and in the line length you need to count the line terminator, \n or \r\n):
abbagnale#########
abbatangelo#######
abbatantuono######
and then, say that each keyword is 32 bytes,
$fp = fopen("keywords.txt", "r");
fseek($fp, 97 * 32, SEEK_SET);
$text = trim(fgets($fp, 32));
fclose($fp);
would be more or less instantaneous.
If the file is on a remote server though, you still need to download the Whole file (up to the desired line), and you'd be better served by placing a "scanner" script on the remote server that could run the search. Then you could run
$text = file_get_contents("http://www.mysite.com/keywords.php?line=97");
and get your line in milliseconds.
There isn't any way to get 'line number x' from a file in pretty much any language without having to read it first some way or the other. A line, after all, is just the stuff between two end-of-line characters. Whereas picking up 'character number x' from a file can be done without loading the whole file (with some difficulty), picking up 'line number x' can't be done without loading all lines till x (and in most methods, you need to load all lines)
A method in which you load all the lines till line x is the following (using fgets):
$f = fopen('http://www.mysite.com/keywords.txt');
$i=97
$text=""
while (($text = fgets($f,2048)) !== false && $i>0) {
$i--
}
echo $text
I have a 10MB text file.
The length of the lines may vary.
Which is the most efficient way (fast and memory friendly) to read just one specific line from this file? e.g. get_me_the_line($nr, $file_resource)
I don't know of a way to just jump to the line, if the lines are of varying length. However you can iterate through lines pretty quickly when not using them for anything, and return the one of interest.
function ReadLineNumber($file, $number)
{
$handle = fopen($file, "r");
$i = 0;
while (fgets($handle) && $i < $number - 1)
$i++;
return fgets($handle);
}
Edit
I added - 1 to the loop because this reads a line ahead. The $number is therefore a zero-index line reference. Change to - 2 if you would prefer line 1 mean the first line in the file.
As the lines are of varying length you have to look at each character as it might denote the end of the line. Quickest would be loading the file in chunks that are sized like the blocksize of the filesystem and counting the linebreaks until you are on the desired line.
Better way would be to have an index file that stores information about the file containing the lines. Using a database could also be a better idea.
If the file is REALLY large (several GB or more) and your application is running on *nix you may not want to try having PHP process the file and instead use some existing unix tools optimized for this kind of line processing. Once such tool is sed and an example of printing a specific line from a huge file can be found here.
Should be trivial to wrap this in a system_exec() call, or similar to write the function you are looking for.
I have a 200kb file, what I use in multiple pages, but on each page I need only 1-2 lines of that file so how I can read only these lines what I need if I know the line number?
For example if I need only the 10th line, I don`t want to load in memory all the lines, just the 10th line.
Sorry for my bad english!
Try SplFileObject
echo memory_get_usage(), PHP_EOL; // 333200
$file = new SplFileObject('bible.txt'); // 996kb
$file->seek(5000); // jump to line 5000 (zero-based)
echo $file->current(), PHP_EOL; // output current line
echo memory_get_usage(), PHP_EOL; // 342984 vs 3319864 when using file()
For outputting the current line, you can either use current() or just echo $file. I find it clearer to use the method though. You can also use fgets(), but that would get the next line.
Of course, you only need the middle three lines. I've added the memory_get_usage calls just to prove this approach does eat almost no memory.
Unless you know the offset of the line, you will need to read every line up to that point. You can just throw away the old lines (that you don't want) by looping through the file with something like fgets(). (EDIT: Rather than fgets(), I would suggest #Gordon's solution)
Possibly a better solution would be to use a database, as the database engine will do the grunt work of storing the strings and allow you to (very efficiently) get a certain "line" (It wouldn't be a line but a record with an numeric ID, however it amounts to the same thing) without having to read the records before it.
Do the contents of the file change? If it's static, or relatively static, you can build a list of offsets where you want to read your data. For instance, if the file changes once a year, but you read it hundreds of times a day, then you can pre-compute the offsets of the lines you want and jump to them directly like this:
$offsets = array();
while ($line = fread($filehandle)) { .... find line 10 .... }
$offsets[10] = ftell($filehandle); // store line 10's location
.... find next line
$offsets[20] = ftell($filehandle);
and so on. Afterwards, you can trivially jump to that line's location like this:
$fh = fopen('file.txt', 'rb');
fseek($fh, $offsets[20]); // jump to line 20
But this could entirely be overkill. Try benchmarking the operations - compare how long it takes to do an oldfashioned "read 20 lines" versus precompute/jump.
<?php
$lines = array(1, 2, 10);
$handle = #fopen("/tmp/inputfile.txt", "r");
if ($handle) {
$i = 0;
while (!feof($handle)) {
$line = stream_get_line($handle, 1000000, "\n");
if (in_array($i, $lines)) {
echo $line;
$line = ''; // Don't forget to clean the buffer!
}
if ($i > end($lines)) {
break;
}
$i++;
}
fclose($handle);
}
?>
Just loop through them without storing, e.g.
$i = 1;
$file = fopen('file.txt', 'r');
while (!feof($file)) {
$line = fgets($file); // this gets whole line from the file;
if ($i == 10) {
break; // break on tenth line
}
$i ++;
}
The above example would keep memory for only the last line it got from the file, so this is the most memory efficient way to do it.
use fgets(). 10 times :-) in this case you will not store all 10 lines in the memory
Why are you only trying to load the first ten lines? Do you know that loading all those lines is in fact a problem?
If you haven't measured, then you don't know that it's a problem. Don't waste your time optimizing for non-problems. Chances are that any performance change you'll have in not loading the entire 200K file will be imperceptible, unless you know for a fact that loading that file is indeed a bottleneck.
How can i get a particular line in a 3 gig text file. The lines are delimited by \n. And i need to be able to get any line on demand.
How can this be done? Only one line need be returned. And i would not like to use any system calls.
Note: There is the same question elsewhere regarding how to do this in bash. I would like to compare it with the PHP equiv.
Update: Each line is the same length the whole way thru.
Without keeping some sort of index to the file, you would need to read all of it until you've encountered x number of \n characters. I see that nickf has just posted some way of doing that, so I won't repeat it.
To do this repeatedly in an efficient manner, you will need to build an index. Store some known file positions for certain (or all) line numbers once, which you can then use to seek to the right location using fseek.
Edit: if each line is the same length, you do not need the index.
$myfile = fopen($fileName, "r");
fseek($myfile, $lineLength * $lineNumber);
$line = fgets($myfile);
fclose($myfile);
Line number is 0 based in this example, so you may need to subtract one first. The line length includes the \n character.
There is little discussion of the problem and no mention is made of how the 'one line' should be referenced (by number, some value within it, etc.) so below is just a guess as to what you're wanting.
If you're not averse to using an object (it might be 'too high level', perhaps) and wish to reference the line by offset, then SplFileObject (available as of PHP 5.1.0) could be used. See the following basic example:
$file = new SplFileObject('myreallyhugefile.dat');
$file->seek(12345689); // seek to line 123456790
echo $file->current(); // or simply, echo $file
That particular method (seek) requires scanning through the file line-by-line. However, if as you say all the lines are the same length then you can instead use fseek to get where you want to go much, much faster.
$line_length = 1024; // each line is 1 KB line
$file->fseek($line_length * 1234567); // seek lots of bytes
echo $file->current(); // echo line 1234568
You said each line has the same length, so you can use fopen() in combination with fseek() to get a line quickly.
http://ch2.php.net/manual/en/function.fseek.php
The only way I can think to do it would be like this:
function getLine($fileName, $num) {
$fh = fopen($fileName, 'r');
for ($i = 0; $i < $num && ($line = fgets($fh)); ++$i);
return $line;
}
While this is not a solution exactly, how come you are needing to pull out one line from a 3 gig text file? is perfomance an issue or can this run a leisurely pace?
If you need pull lots of lines out of this file at different points in time, i would definately suggest putting this data into a DB of some kind. SQLite maybe your friend here as its very simple but not great with lots of scripts/people accessing it at one time.