I'm ok with PHP but probably not half as good as some of you guys on here.
I am basically trying to find a way to grab a line from a huge and I mean huge text file.... its basically a list of keywords I want to call by line number but without preferably going through them all before I get to that line.....otherwise couldmcrash my server obviously.
At the moment im using this
$lines = file('http://www.mysite.com/keywords.txt');
foreach ($lines as $line_num => $line) {
echo "$line_num";
}
This works but im sure theres gotta be a better way of doing to save on usuage because this is putting the whole file into the memory and if I can simply say to php give me line number 97, would umm RULE....
Hope you guys can come up with a solution as your much smarter than me :P ty
Use SplFileObject
$file = "test.txt";
$line_number = 1000;
$file_obj = new SplFileObject( $file );
/*** seek to the line number ***/
$file_obj->seek( $line_number );
/*** return the current line ***/
echo $file_obj->current();
If the lines are just text and variable in length, you can't know which line is #97; the only thing that makes it 97th is that there are 96 lines before.
So you need to read the whole file up to that point (this is what SplFileObject does):
$fp = fopen("keywords.txt", "r");
while($line--)
{
if (feof($fp))
// ERROR: line does not exist
$text = fgets($fp, 1024); // 1024 = max length of one line
}
fclose($fp);
But if you can store a line number before each line, i.e. the file is
...
95 abbagnale
96 abbatangelo
97 abbatantuono
98 ...
then you can implement a sort of binary search:
- start with s1 = 0 and s2 = file length
- read a keyword and line number at seek position s3 = (s1+s2)/2 (*)
- if line number is less than desired, s1 = s3; else s2 = s3; and repeat previous step.
- if line number is the one desired, strip the number from the text and you get the keyword.
(*) since the line most likely will not start exactly at s#, you need two fgets: one to get rid of the spurious half keyword, the second to read the line number. When you get "close", it will be faster to read a bigger chunk and split it into lines. For example, you seek line 170135 and read in line 170180: what you'd better do is rewind the seek position by one kilobyte, read in a kilobyte of data, and seek 170135 in there.
Or, if the lengths of the various lines are not too different, it could be worthwhile to store a fixed size line (here the "#" should actually be spaces, and in the line length you need to count the line terminator, \n or \r\n):
abbagnale#########
abbatangelo#######
abbatantuono######
and then, say that each keyword is 32 bytes,
$fp = fopen("keywords.txt", "r");
fseek($fp, 97 * 32, SEEK_SET);
$text = trim(fgets($fp, 32));
fclose($fp);
would be more or less instantaneous.
If the file is on a remote server though, you still need to download the Whole file (up to the desired line), and you'd be better served by placing a "scanner" script on the remote server that could run the search. Then you could run
$text = file_get_contents("http://www.mysite.com/keywords.php?line=97");
and get your line in milliseconds.
There isn't any way to get 'line number x' from a file in pretty much any language without having to read it first some way or the other. A line, after all, is just the stuff between two end-of-line characters. Whereas picking up 'character number x' from a file can be done without loading the whole file (with some difficulty), picking up 'line number x' can't be done without loading all lines till x (and in most methods, you need to load all lines)
A method in which you load all the lines till line x is the following (using fgets):
$f = fopen('http://www.mysite.com/keywords.txt');
$i=97
$text=""
while (($text = fgets($f,2048)) !== false && $i>0) {
$i--
}
echo $text
Related
I want to seek to line 10 from the end of the file and write some data
I tried $file->fseek(-10, SEEK_END); but it doesn't seek lines only bytes
$file = new SplFileObject('file.txt');
$file->seek(99);//this seeks to line 99 but I was wondering if there is a way to make it seek from end
$file->fwrite('hi there');
can someone help me out?
$file = new SplFileObject('file.txt');
$numLines = count(file('file.txt'));
$file->seek($numLines - 10);
$file->fwrite('hi there');
This code should get you close to what you are after. The second line counts the number of lines in the file, then this is used to find the correct value to plug into seek().
I would suggest adding checks that you don't end up passing a negative value to seek().
Ref file() manual.
I know there might be many questions related to this one but I read most of them and still got no idea on how to do this.
Suppose I have a certain .txt file with each line being part of a chat log.
What I'd like to do is retrieve lines of the file, from line x onwards to the end. So if I have x = 450 I'd like to obtain the lines 450->eof(). I've tried this with stream_get_line and other functions but I always get stuck at "number of bytes to read" from file. How am I even suppose to know the number of bytes a line occupies, being them all different from each other?
Each line separates by \n and the files might have +500.000 lines.
Then I'd need to process each extracted line and I take it from here...
Here is a schema
if(file_exists($file)){
$x = 500; //hard-coded
$file = fopen($file);
$single_line_or_could_be_all_at_once(exploded) = stream_get_line($file,$x,"\n"); //nope.
//process lines
}
Since each line is of variable width, the only way to do this is to read from the beginning of the file, skipping over the first N lines that you don't need.
My PHP is very rusty, but it would go something like this (and feel free to edit the code to correct any syntax errors)
$lineNr = 1;
$in = fopen('file.txt', 'rb');
while($line = fread($in)) {
if ($lineNr >= 450) {
// Use the line
}
$lineNr++;
}
If you want, you can modify the while to end when either the end of the file is reached, or a maximum line number desired is reached, like:
while($line = fread($in) && $lineNr < 550) {
I am reading from log files which can be anything from a small log file up to 8-10mb of logs. The typical size would probably be 1mb. Now the key thing is that the keyword im looking for is normally near the end of the document, in probably 95% of the cases. Then i extract 1000 characters after the keyword.
If i use this approach:
$lines = explode("\n",$body);
$reversed = array_reverse($lines);
foreach($reversed AS $line) {
// Search for my keyword
}
Would it be more efficent than using:
$pos = stripos($body,$keyword);
$snippet_pre = substr($body, $pos, 1000);
What i am not sure on is with stripos does it just start searching through the document 1 character at a time so in theory if there is 10,000 characters after the keyword then i wont have to read those into memory, whereas the first option would have to read everything into memory even though it probably only needs the last 100 lines, could i alter it to read 100 lines into memory, then search another 101-200 lines if the first 100 was not successful or is the query so light that it doesnt really matter.
I have a 2nd question and this assumes the reverse_array is the best approach, how would i extract the next 1000 characters after i have found the keyword, here is my woeful attempt
$body = $this_is_the_log_content;
$lines = explode("\n",$body);
$reversed = array_reverse($lines);
foreach($reversed AS $line) {
$pos = stripos($line,$keyword);
$snippet_pre = substr($line, $pos, 1000);
}
Why i don't think that will work is because each $line might only be a few hundred characters so would the better solution be to explode it every say 2,000 lines and also keep the previous $line as a backup variable so something like this.
$body = $this_is_the_log_content;
$lines = str_split($body, 2000);
$reversed = array_reverse($lines);
$previous_line = $line;
foreach($reversed AS $line) {
$pos = stripos($line,$keyword);
if ($pos) {
$line = $previous_line . ' ' . $line;
$pos1 = stripos($line,$keyword);
$snippet_pre = substr($line, $pos, 1000);
}
}
Im probably massively over-complicating this?
I would strongly consider using a tool like grep for this. You can call this command line tool from PHP and use it to search the file for the word you are looking for and do things like give you the byte offset of the matching line, give you a matching line plus trailing context lines, etc.
Here is a link to grep manual. http://unixhelp.ed.ac.uk/CGI/man-cgi?grep
Play with the command a bit on the command line to get it the way you want it, then call it from PHP using exec(), passthru(), or similar depending on how you need to capture/display the content.
Alternatively, you can simply fopen() the file with the pointer at the end and move the file pointer forward in the file using fseek() searching for the string as you move along the way. Once you find you needle, you can then read the file from that offset until you get to the end of file or the number of log entries.
Either of these might be preferable to reading the entire log file into memory and then trying to work with it.
The other thing to consider is whether 1000 characters is meaningful. Typically log files would have lines that vary in length. To me it would seem that you should be more concerned about getting the next X lines from the log file, not the next Y characters. What if a line has 2000 characters, are you saying you only want to get half of it? That may not be meaningful at all.
How can i get a particular line in a 3 gig text file. The lines are delimited by \n. And i need to be able to get any line on demand.
How can this be done? Only one line need be returned. And i would not like to use any system calls.
Note: There is the same question elsewhere regarding how to do this in bash. I would like to compare it with the PHP equiv.
Update: Each line is the same length the whole way thru.
Without keeping some sort of index to the file, you would need to read all of it until you've encountered x number of \n characters. I see that nickf has just posted some way of doing that, so I won't repeat it.
To do this repeatedly in an efficient manner, you will need to build an index. Store some known file positions for certain (or all) line numbers once, which you can then use to seek to the right location using fseek.
Edit: if each line is the same length, you do not need the index.
$myfile = fopen($fileName, "r");
fseek($myfile, $lineLength * $lineNumber);
$line = fgets($myfile);
fclose($myfile);
Line number is 0 based in this example, so you may need to subtract one first. The line length includes the \n character.
There is little discussion of the problem and no mention is made of how the 'one line' should be referenced (by number, some value within it, etc.) so below is just a guess as to what you're wanting.
If you're not averse to using an object (it might be 'too high level', perhaps) and wish to reference the line by offset, then SplFileObject (available as of PHP 5.1.0) could be used. See the following basic example:
$file = new SplFileObject('myreallyhugefile.dat');
$file->seek(12345689); // seek to line 123456790
echo $file->current(); // or simply, echo $file
That particular method (seek) requires scanning through the file line-by-line. However, if as you say all the lines are the same length then you can instead use fseek to get where you want to go much, much faster.
$line_length = 1024; // each line is 1 KB line
$file->fseek($line_length * 1234567); // seek lots of bytes
echo $file->current(); // echo line 1234568
You said each line has the same length, so you can use fopen() in combination with fseek() to get a line quickly.
http://ch2.php.net/manual/en/function.fseek.php
The only way I can think to do it would be like this:
function getLine($fileName, $num) {
$fh = fopen($fileName, 'r');
for ($i = 0; $i < $num && ($line = fgets($fh)); ++$i);
return $line;
}
While this is not a solution exactly, how come you are needing to pull out one line from a 3 gig text file? is perfomance an issue or can this run a leisurely pace?
If you need pull lots of lines out of this file at different points in time, i would definately suggest putting this data into a DB of some kind. SQLite maybe your friend here as its very simple but not great with lots of scripts/people accessing it at one time.
Using PHP, it's possible to read off the contents of a file using fopen and fgets. Each time fgets is called, it returns the next line in the file.
How does fgets know what line to read? In other words, how does it know that it last read line 5, so it should return the contents of line 6 this time? Is there a way for me to access that line-number data?
(I know it's possible to do something similar by reading the entire contents of the file into an array with file, but I'd like to accomplish this with fopen.)
There is a "position" kept in memory for each file that is opened ; it is automatically updated each time you are reading a line/character/whatever from the file.
You can get this position with ftell, and modify it with fseek :
ftell — Returns the current position
of the file read/write pointer
fseek — Seeks on a file pointer
You can also use rewind to... rewind... the position of that pointer.
This is not getting you a position as a line number, but closer to a position as a character number (actually, you are getting the position as a number of bytes from the beginning of the file) ; when you have that, reading a line is just a metter of reading characters until yu hit an end of line character.
BTW : as far as I remember, these functions are coming from the C language -- PHP itself being written in C ;-)
Files are just a stream of data, read from the beginning to the end. The OS will remember the position you've read so far in that file. If needed, doing so in the application as well is fairly simple. The OS only cares about byte positions though, not lines.
Just imagine dealing out a deck of 52 card sequentially. You hand off the first card. Next time the 2. card. When you want to give out the 3. card , you don't need to start counting from the start again, or even remembering where you were you just hand out the next available card, and that'll be the third.
It might be a bit more work that's needed to read lines, since you'd want to buffer data read from the actual file for preformance sake, but it's not that much more to it than to record the offset of the last piece of data you handed out, find the next newline character and hand off all the data between those 2 points.
PHP nor the OS has no real need to keep the line number around, since all the system care about is "next line". If you want to know the line number, you keep a counter and increment it every time your app reads a line.
$lineno=0;
while (!feof($handle)) {
$buffer = fgets($handle, 4096);
lineno++; // keep track of the line number
...
}
i hav this old sample i hob its can help you :)
$File = file('path');
$array = array();
$linenr = 5;
foreach( $File AS $line_num => $line )
{
$array = array_push( $array , $line );
}
echo $array[($linenr-1)];
You could just call fgets and increment a var $line_number each time you call it. That would tell you the line it is on.