How to save memory when reading a file in Php? - php

I have a 200kb file, what I use in multiple pages, but on each page I need only 1-2 lines of that file so how I can read only these lines what I need if I know the line number?
For example if I need only the 10th line, I don`t want to load in memory all the lines, just the 10th line.
Sorry for my bad english!

Try SplFileObject
echo memory_get_usage(), PHP_EOL; // 333200
$file = new SplFileObject('bible.txt'); // 996kb
$file->seek(5000); // jump to line 5000 (zero-based)
echo $file->current(), PHP_EOL; // output current line
echo memory_get_usage(), PHP_EOL; // 342984 vs 3319864 when using file()
For outputting the current line, you can either use current() or just echo $file. I find it clearer to use the method though. You can also use fgets(), but that would get the next line.
Of course, you only need the middle three lines. I've added the memory_get_usage calls just to prove this approach does eat almost no memory.

Unless you know the offset of the line, you will need to read every line up to that point. You can just throw away the old lines (that you don't want) by looping through the file with something like fgets(). (EDIT: Rather than fgets(), I would suggest #Gordon's solution)
Possibly a better solution would be to use a database, as the database engine will do the grunt work of storing the strings and allow you to (very efficiently) get a certain "line" (It wouldn't be a line but a record with an numeric ID, however it amounts to the same thing) without having to read the records before it.

Do the contents of the file change? If it's static, or relatively static, you can build a list of offsets where you want to read your data. For instance, if the file changes once a year, but you read it hundreds of times a day, then you can pre-compute the offsets of the lines you want and jump to them directly like this:
$offsets = array();
while ($line = fread($filehandle)) { .... find line 10 .... }
$offsets[10] = ftell($filehandle); // store line 10's location
.... find next line
$offsets[20] = ftell($filehandle);
and so on. Afterwards, you can trivially jump to that line's location like this:
$fh = fopen('file.txt', 'rb');
fseek($fh, $offsets[20]); // jump to line 20
But this could entirely be overkill. Try benchmarking the operations - compare how long it takes to do an oldfashioned "read 20 lines" versus precompute/jump.

<?php
$lines = array(1, 2, 10);
$handle = #fopen("/tmp/inputfile.txt", "r");
if ($handle) {
$i = 0;
while (!feof($handle)) {
$line = stream_get_line($handle, 1000000, "\n");
if (in_array($i, $lines)) {
echo $line;
$line = ''; // Don't forget to clean the buffer!
}
if ($i > end($lines)) {
break;
}
$i++;
}
fclose($handle);
}
?>

Just loop through them without storing, e.g.
$i = 1;
$file = fopen('file.txt', 'r');
while (!feof($file)) {
$line = fgets($file); // this gets whole line from the file;
if ($i == 10) {
break; // break on tenth line
}
$i ++;
}
The above example would keep memory for only the last line it got from the file, so this is the most memory efficient way to do it.

use fgets(). 10 times :-) in this case you will not store all 10 lines in the memory

Why are you only trying to load the first ten lines? Do you know that loading all those lines is in fact a problem?
If you haven't measured, then you don't know that it's a problem. Don't waste your time optimizing for non-problems. Chances are that any performance change you'll have in not loading the entire 200K file will be imperceptible, unless you know for a fact that loading that file is indeed a bottleneck.

Related

Reading a log file into an array reversed, is it best method when looking for keyword near the bottom?

I am reading from log files which can be anything from a small log file up to 8-10mb of logs. The typical size would probably be 1mb. Now the key thing is that the keyword im looking for is normally near the end of the document, in probably 95% of the cases. Then i extract 1000 characters after the keyword.
If i use this approach:
$lines = explode("\n",$body);
$reversed = array_reverse($lines);
foreach($reversed AS $line) {
// Search for my keyword
}
Would it be more efficent than using:
$pos = stripos($body,$keyword);
$snippet_pre = substr($body, $pos, 1000);
What i am not sure on is with stripos does it just start searching through the document 1 character at a time so in theory if there is 10,000 characters after the keyword then i wont have to read those into memory, whereas the first option would have to read everything into memory even though it probably only needs the last 100 lines, could i alter it to read 100 lines into memory, then search another 101-200 lines if the first 100 was not successful or is the query so light that it doesnt really matter.
I have a 2nd question and this assumes the reverse_array is the best approach, how would i extract the next 1000 characters after i have found the keyword, here is my woeful attempt
$body = $this_is_the_log_content;
$lines = explode("\n",$body);
$reversed = array_reverse($lines);
foreach($reversed AS $line) {
$pos = stripos($line,$keyword);
$snippet_pre = substr($line, $pos, 1000);
}
Why i don't think that will work is because each $line might only be a few hundred characters so would the better solution be to explode it every say 2,000 lines and also keep the previous $line as a backup variable so something like this.
$body = $this_is_the_log_content;
$lines = str_split($body, 2000);
$reversed = array_reverse($lines);
$previous_line = $line;
foreach($reversed AS $line) {
$pos = stripos($line,$keyword);
if ($pos) {
$line = $previous_line . ' ' . $line;
$pos1 = stripos($line,$keyword);
$snippet_pre = substr($line, $pos, 1000);
}
}
Im probably massively over-complicating this?
I would strongly consider using a tool like grep for this. You can call this command line tool from PHP and use it to search the file for the word you are looking for and do things like give you the byte offset of the matching line, give you a matching line plus trailing context lines, etc.
Here is a link to grep manual. http://unixhelp.ed.ac.uk/CGI/man-cgi?grep
Play with the command a bit on the command line to get it the way you want it, then call it from PHP using exec(), passthru(), or similar depending on how you need to capture/display the content.
Alternatively, you can simply fopen() the file with the pointer at the end and move the file pointer forward in the file using fseek() searching for the string as you move along the way. Once you find you needle, you can then read the file from that offset until you get to the end of file or the number of log entries.
Either of these might be preferable to reading the entire log file into memory and then trying to work with it.
The other thing to consider is whether 1000 characters is meaningful. Typically log files would have lines that vary in length. To me it would seem that you should be more concerned about getting the next X lines from the log file, not the next Y characters. What if a line has 2000 characters, are you saying you only want to get half of it? That may not be meaningful at all.

Which method is better? Hashing each line in a file with PHP

This question was asked on a message board, and I want to get a definitive answer and intelligent debate about which method is more semantically correct and less resource intensive.
Say I have a file with each line in that file containing a string. I want to generate an MD5 hash for each line and write it to the same file, overwriting the previous data. My first thought was to do this:
$file = 'strings.txt';
$lines = file($file);
$handle = fopen($file, 'w+');
foreach ($lines as $line)
{
fwrite($handle, md5(trim($line))."\n");
}
fclose($handle);
Another user pointed out that file_get_contents() and file_put_contents() were better than using fwrite() in a loop. Their solution:
$thefile = 'strings.txt';
$newfile = 'newstrings.txt';
$current = file_get_contents($thefile);
$explodedcurrent = explode('\n', $thefile);
$temp = '';
foreach ($explodedcurrent as $string)
$temp .= md5(trim($string)) . '\n';
$newfile = file_put_contents($newfile, $temp);
My argument is that since the main goal of this is to get the file into an array, and file_get_contents() is the preferred way to read the contents of a file into a string, file() is more appropriate and allows us to cut out another unnecessary function, explode().
Furthermore, by directly manipulating the file using fopen(), fwrite(), and fclose() (which is the exact same as one call to file_put_contents()) there is no need to have extraneous variables in which to store the converted strings; you're writing them directly to the file.
My method is the exact same as the alternative - the same number of opens/closes on the file - except mine is shorter and more semantically correct.
What do you have to say, and which one would you choose?
This should be more efficient and less resource-intensive as the previous two methods:
$file = 'passwords.txt';
$passwords = file($file);
$converted = fopen($file, 'w+');
while (count($passwords) > 0)
{
static $i = 0;
fwrite($converted, md5(trim($passwords[$i])));
unset($passwords[$i]);
$i++;
}
fclose($converted);
echo 'Done.';
As one of the comments suggests do what makes more sense to you. Since you might come back to this code in few months and you need to spend least amount of time trying to understand it.
However, if speed is your concern then I would create two test cases (you pretty much already got them) and use timestamp (create variable with timestamp at the beginning of the script, then at the end of the script subtract it from timestamp at the end of the script to work out the difference - how long it took to run the script.) Prepare few files I would go for about 3, two extremes and one normal file. To see which version runs faster.
http://php.net/manual/en/function.time.php
I would think that differences would be marginal, but it also depends on your file sizes.
I'd propose to write a new temporary file, while you process the input one. Once done, overwrite the input file with the temporary one.

fseek() by line, not bytes?

I have a script that parses large files line by line. When it encounters an error that it can't handle, it stops, notifying us of the last line parsed.
Is this really the best / only way to seek to a specific line in a file? (fseek() is not usable in my case.)
<?php
for ($i = 0; $i < 100000; $i++)
fgets($fp); // just discard this
I don't have a problem using this, it is fast enough - it just feels a bit dirty. From what I know about the underlying code, I don't imagine there is a better way to do this.
An easy way to seek to a specific line in a file is to use the SplFileObject class, which supports seeking to a line number (seek()) or byte offset (fseek()).
$file = new SplFileObject('myfile.txt');
$file->seek(9999); // Seek to line no. 10,000
echo $file->current(); // Print contents of that line
In the background, seek() just does what your PHP code did (except, in C code).
If you only have the line number to go on, there is no other method of finding the line. Files are not line based (or even character based), so there is no way to simply jump to a specific line in a file.
There might be other ways of reading the lines in the file that might be slightly faster, like reading larger chunks of the file into a buffer and read lines from that, but you could only hope for it to be a few percent faster. Any method to find a specific line in a file still has to read all data up to that line.
I know it is late for posting but it can help some ppl
I did a function like fseekbyline one day ...
function GoToLine($handle,$line)
{
fseek($handle,0); // seek to 0
$i = 0;
$bufcarac = 0;
for($i = 1;$i<$line;$i++)
{
$ligne = fgets($handle);
$bufcarac += strlen($ligne); // in the end bufcarac will contains all caracters until the line
}
fseek($handle,$bufcarac);
}
there is no error system, if you wanna go to the line <1 or 203 but the file is empty ...
you will get nothing good.
same if you wanna go out of eot
rewind($handle);
for ($i=0; $i < $desired_line; $i++) {
fgetcsv($handle, 1000, ",");
}
This is working for me while I need to rewind to a specific line multiple times in my script.
I am not sure if this eats up memory or speed, but it does the trick.
If I understand correctly, you want to seek to the specific line at some point after you have found an error. If that is the case, you probably store or print the line-number of the bad line somewhere, depending on what you mean by "notify".
Unless you really mean that you cannot use fseek()*, what you can do is to also store/print the position in the file where the bad line starts. Then you can fseek().
* How, in that case, would fseekbyline() be usable if it existed?

Getting one line in a huge file with PHP

How can i get a particular line in a 3 gig text file. The lines are delimited by \n. And i need to be able to get any line on demand.
How can this be done? Only one line need be returned. And i would not like to use any system calls.
Note: There is the same question elsewhere regarding how to do this in bash. I would like to compare it with the PHP equiv.
Update: Each line is the same length the whole way thru.
Without keeping some sort of index to the file, you would need to read all of it until you've encountered x number of \n characters. I see that nickf has just posted some way of doing that, so I won't repeat it.
To do this repeatedly in an efficient manner, you will need to build an index. Store some known file positions for certain (or all) line numbers once, which you can then use to seek to the right location using fseek.
Edit: if each line is the same length, you do not need the index.
$myfile = fopen($fileName, "r");
fseek($myfile, $lineLength * $lineNumber);
$line = fgets($myfile);
fclose($myfile);
Line number is 0 based in this example, so you may need to subtract one first. The line length includes the \n character.
There is little discussion of the problem and no mention is made of how the 'one line' should be referenced (by number, some value within it, etc.) so below is just a guess as to what you're wanting.
If you're not averse to using an object (it might be 'too high level', perhaps) and wish to reference the line by offset, then SplFileObject (available as of PHP 5.1.0) could be used. See the following basic example:
$file = new SplFileObject('myreallyhugefile.dat');
$file->seek(12345689); // seek to line 123456790
echo $file->current(); // or simply, echo $file
That particular method (seek) requires scanning through the file line-by-line. However, if as you say all the lines are the same length then you can instead use fseek to get where you want to go much, much faster.
$line_length = 1024; // each line is 1 KB line
$file->fseek($line_length * 1234567); // seek lots of bytes
echo $file->current(); // echo line 1234568
You said each line has the same length, so you can use fopen() in combination with fseek() to get a line quickly.
http://ch2.php.net/manual/en/function.fseek.php
The only way I can think to do it would be like this:
function getLine($fileName, $num) {
$fh = fopen($fileName, 'r');
for ($i = 0; $i < $num && ($line = fgets($fh)); ++$i);
return $line;
}
While this is not a solution exactly, how come you are needing to pull out one line from a 3 gig text file? is perfomance an issue or can this run a leisurely pace?
If you need pull lots of lines out of this file at different points in time, i would definately suggest putting this data into a DB of some kind. SQLite maybe your friend here as its very simple but not great with lots of scripts/people accessing it at one time.

Efficient flat file searching in PHP

I'd like to store 0 to ~5000 IP addresses in a plain text file, with an unrelated header at the top. Something like this:
Unrelated data
Unrelated data
----SEPARATOR----
1.2.3.4
5.6.7.8
9.1.2.3
Now I'd like to find if '5.6.7.8' is in that text file using PHP. I've only ever loaded an entire file and processed it in memory, but I wondered if there was a more efficient way of searching a text file in PHP. I only need a true/false if it's there.
Could anyone shed any light? Or would I be stuck with loading in the whole file first?
Thanks in advance!
5000 isn't a lot of records. You could easily do this:
$addresses = explode("\n", file_get_contents('filename.txt'));
and search it manually and it'll be quick.
If you were storing a lot more I would suggest storing them in a database, which is designed for that kind of thing. But for 5000 I think the full load plus brute force search is fine.
Don't optimize a problem until you have a problem. There's no point needlessly overcomplicating your solution.
I'm not sure if perl's command line tool needs to load the whole file to handle it, but you could do something similar to this:
<?php
...
$result = system("perl -p -i -e '5\.6\.7\.8' yourfile.txt");
if ($result)
....
else
....
...
?>
Another option would be to store the IP's in separate files based on the first or second group:
# 1.2.txt
1.2.3.4
1.2.3.5
1.2.3.6
...
# 5.6.txt
5.6.7.8
5.6.7.9
5.6.7.10
...
... etc.
That way you wouldn't necessarily have to worry about the files being so large you incur a performance penalty by loading the whole file into memory.
You could shell out and grep for it.
You might try fgets()
It reads a file line by line. I'm not sure how much more efficient this is though. I'm guessing that if the IP was towards the top of the file it would be more efficient and if the IP was towards the bottom it would be less efficient than just reading in the whole file.
You could use the GREP command with backticks in your on a Linux server. Something like:
$searchFor = '5.6.7.8';
$file = '/path/to/file.txt';
$grepCmd = `grep $searchFor $file`;
echo $grepCmd;
I haven't tested this personally, but there is a snippet of code in the PHP manual that is written for large file parsing:
http://www.php.net/manual/en/function.fgets.php#59393
//File to be opened
$file = "huge.file";
//Open file (DON'T USE a+ pointer will be wrong!)
$fp = fopen($file, 'r');
//Read 16meg chunks
$read = 16777216;
//\n Marker
$part = 0;
while(!feof($fp)) {
$rbuf = fread($fp, $read);
for($i=$read;$i > 0 || $n == chr(10);$i--) {
$n=substr($rbuf, $i, 1);
if($n == chr(10))break;
//If we are at the end of the file, just grab the rest and stop loop
elseif(feof($fp)) {
$i = $read;
$buf = substr($rbuf, 0, $i+1);
break;
}
}
//This is the buffer we want to do stuff with, maybe thow to a function?
$buf = substr($rbuf, 0, $i+1);
//Point marker back to last \n point
$part = ftell($fp)-($read-($i+1));
fseek($fp, $part);
}
fclose($fp);
The snippet was written by the original author: hackajar yahoo com
are you trying to compare the current IP with the text files listed IP's? the unrelated data wouldnt match anyway.
so just use strpos on the on the full file contents (file_get_contents).
<?php
$file = file_get_contents('data.txt');
$pos = strpos($file, $_SERVER['REMOTE_ADDR']);
if($pos === false) {
echo "no match for $_SERVER[REMOTE_ADDR]";
}
else {
echo "match for $_SERVER[REMOTE_ADDR]!";
}
?>

Categories