I want to read everything from a textfile and echo it. But there might be more lines written to the text-file while I'm reading so I don't want the script to exit when it has reached the end of the file, instead I wan't it to wait forever for more lines. Is this possible in php?
this is just a guess, but try to pass through (passthru) a "tail -f" output.
but you will need to find a way to flush() your buffer.
IMHO a much nicer solution would be to build a ajax site.
read the contents of the file in to an array. store the number of lines in the session. print the content of the file.
start an ajax request every x seconds to a script which checks the file, if the line count is greater then the session count append the result to the page.
you could use popen() inststed:
$f = popen("tail -f /where/ever/your/file/is 2>&1", 'r');
while(!feof($f)) {
$buffer = fgets($f);
echo "$buffer\n";
flush();
sleep(1);
}
pclose($f)
the sleep is important, without it you will have 100% CPU time.
In fact, when you "echo" it, it goes to the buffer. So what you want is "appending" the new content if it's added while the browser is still receiving output. And this is not possible (but there are some approaches to this).
I solved it.
The trick was to use fopen and when eof is reached move the cursor to the previous position and continue reading from there.
<?php
$handle = fopen('text.txt', 'r');
$lastpos = 0;
while(true){
if (!feof($handle)){
echo fread($handle,8192);
flush();
$lastpos = ftell($handle);
}else{
fseek($handle,$lastpos);
}
}
?>
Still consumes pretty much cpu though, don't know how to solve that.
You may also use filemtime: you get latest modification timestamp, send the output and at the end compare again the stored filemtime with the current one.
Anyway, if you want the script go at the same time that the browser (or client), you should send the output using chunks (fread, flush), then check any changes at the end. If there are any changes, re-open the file and read from the latest position (you can get the position outside of the loop of while(!feof())).
Related
I have a file that I'm opening for writing, like this:
if (!$idHandle = fopen($fileName, 'w')) {
echo "Cannot open file ($fileName)";
die();
}
I then enter a loop where I'm incrementing a variable, $beerId, during each iteration. I want to save that variable to the file I have open, so I use the following code:
if (fwrite($idHandle, $beerId) === FALSE) {
echo "Cannot write to file ($fileName)";
die();
}
$beerId++;
However, this ends up creating a massive string of every beerId I encounter. What I want is to have the file populated ONLY with the last id I left off on.
I realize I could put the write outside of the loop, but the script is volatile and likely to terminate prematurely with an error, so I want to have a reference to the last $beerId variable even in the event of an error that terminates the script before the loop is properly terminated.
You must go back to the beginning of the file because fwrite keeps track of where it is in the file. Use fseek. Opening and closing the file several times in a loop is expensive and I don't see a reason to do that in this case. You should of course close the file when you're done with it.
You should add this just before you write to the file:
fseek($idHandle, 0);
That will move you to the beginning of the file, since your incrementing values you won't have to worry about removing the previous value.
EDIT
In my answer above i assume that the id's encountered are incremented values, but you don't say that so, if for example you encounter id=10, and then encounter id=1
the above would still result in 10 in the file, to handle that just add some padding to the string that you're writing using str_pad:
str_pad($value_to_write, 10); //or whatever value is reasonable.
If you can, try memcache->increment().
http://php.net/manual/en/memcache.increment.php
Use $memcache->add('beer_id', 0); to initialize it to zero. Then fetch $beer_id like $memcache->get('beer_id') for an initial sanity check, and then $memcache->increment('beer_id'); for the next $beer_id.
Else, stick to file_get_contents() and file_put_contents():
http://php.net/manual/en/function.file-get-contents.php
http://php.net/manual/en/function.file-put-contents.php
I have a 1.3GB text file that I need to extract some information from in PHP. I have researched it and have come up with a few various ways to do what I need to do, but as always am after a little clarification on which method would be best or if another better exists that I do not know about?
The information I need in the text file is only the first 40 characters of each line, and there are around 17million lines in the file. The 40 characters from each line will be inserted into a database.
The methods I have are below;
// REMOVE TIME LIMIT
set_time_limit(0);
// REMOVE MEMORY LIMIT
ini_set('memory_limit', '-1');
// OPEN FILE
$handle = #fopen('C:\Users\Carl\Downloads\test.txt', 'r');
if($handle) {
while(($buffer = fgets($handle)) !== false) {
$insert[] = substr($buffer, 0, 40);
}
if(!feof($handle)) {
// END OF FILE
}
fclose($handle);
}
Above is read each line at a time and get the data, I have all the database inserts sorted, doing 50 inserts at a time ten times over in a transaction.
The next method is the same as above really but calling file() to store all the lines in an array before doing a foreach to get the data? I am not sure about this method though as the array would essentially have over 17 million values.
Another method would be to extract only part of the file, rewrite the file with the unused data, and after that part has been executed recall the script using a header call?
What would be the best way in terms of getting this done in the most quick and efficient manner? Or is there a better way to approach this that I have thought of?
Also I plan to use this script with wamp, but running it in a browser while testing has caused problems with timeout even with setting script time out to 0. Is there a way I can execute the script to run without accessing the page through a browser?
You have it good so far, don't use "file()" function as it would most probably hit RAM usage limit and terminate your script.
I wouldn't even accumulate stuff into "insert[]" array, as that would waste RAM as well. If you can, insert into the database right away.
BTW, there is a nice tool called "cut" that you could use to process the file.
cut -c1-40 file.txt
You could even redirect cut's stdout to some PHP script that inserts into database.
cut -c1-40 file.txt | php -f inserter.php
inserter.php could then read lines from php://stdin and insert into DB.
"cut" is a standard tool available on all Linuxes, if you use Windows you can get it with MinGW shell, or as part of msystools (if you use git) or install native win32 app using gnuWin32.
Why are you doing this in PHP when your RDBMS almost certainly has bulk import functionality built in? MySQL, for example, has LOAD DATA INFILE:
LOAD DATA INFILE 'data.txt'
INTO TABLE `some_table`
FIELDS TERMINATED BY ''
LINES TERMINATED BY '\n';
( #line )
SET `some_column` = LEFT( #line, 40 );
One query.
MySQL also has the mysqlimport utility that wraps this functionality from the command line.
None of the above. The problem with the using fgets() is it does not work as you expect. When the maximum characters is reached, then the next call to fgets() will continue on the same line. You have correctly identified the problem with using file(). The third method is an interesting idea, and you could pull it off with other solutions as well.
That said, your first idea of using fgets() is pretty close, however we need to slightly modify its behaviour. Here's a customized version that will work as you'd expect:
function fgetl($fp, $len) {
$l = 0;
$buffer = '';
while (false !== ($c = fgetc($fp)) && PHP_EOL !== $c) {
if ($l < $len)
$buffer .= $c;
++$l;
}
if (0 === $l && false === $c) {
return false;
}
return $buffer;
}
Execute the insert operation immediately or you will waste memory. Make sure you are using prepared statements to insert this many rows; this will drastically reduce execution time. You don't want to submit the full query on each insert when you can only submit the data.
I have a DB of sensor data that is being collected every second. The client would like to be able to download 12hour chunks in CSV format - This is all done.
The output is sadly not straight data and needs to be processed before the CSV can be created (parts are stored as JSON in the DB) - so I cant just dump the table.
So, to reduce load, I figured that the first time the file is downloaded, I would cache it to disk, then any more requests just download that file.
If I dont try to write it (using file_put_contents, FILE_APPEND), and just echo every line it is fine, but writing it, even if I give the script 512M it runs out of memory.
so this works
while($stmt->fetch()){
//processing code
$content = //CSV formatting
echo $content;
}
This does not
while($stmt->fetch()){
//processing code
$content = //CSV formatting
file_put_contents($pathToFile, $content, FILE_APPEND);
}
It seems like even thought I am calling file_put_contents at every line, it is storing it all to memory.
Any suggestions?
The problem is that file_put_contents is trying to dump the entire thing at once. Instead you should loop through in your formatting and use fopen, fwrite, fclose.
while($stmt->fetch()){
//processing code
$content[] = //CSV formatting
$file = fopen($pathToFile, a);
foreach($content as $line)
{
fwrite($file, $line);
}
fclose($file);
}
This will limit the amount of data trying to be tossed around in data at any given time.
I agree completely with writing one line at a time, you will never have memory issues this way since there is never more than 1 line loaded in to memory at a time. I have an application that does the same. A problem I have found with this method however, is that the file takes forever to finish writing. So this post is to back up what has already been said, but also to ask all of you for an opinion on how to speed this up? For example, my system cleans a data file against a suppression file, so I read in one line at a time and look for a match in the suppression file, then if no match is found, I write the line in to the new cleaned file. A 50k line file is taking about 4 hours to finish however, so I am hoping to find a better way. I have tried this several ways, and at this point I load the entire suppression file in to memory now to avoid my main reading loop to have to run another loop through each line in the suppression file, but even that is still taking hours.
So, line by line is by far the best way to manage your system's memory, but I'd like to get the processing time for a 50k line file (lines are email addresses and first and last names) to finishing running in less than 30 minutes if possible.
fyi: the suppression file is 16,000 kb in size and total memory used by the script as told by memory_get_usage() is about 35 megs.
Thanks!
i would like to understand how to use the buffer of a read file.
Assuming we have a big file with a list of emails line by line ( delimiter is a classic \n )
now, we want compare each line with each record of a table in our database in a kind of check like line_of_file == table_row.
this is a simple task if you have a normal file, otherwise, if you have a huge file the server usually stop the operation after few minute.
so what's the best way of doing this kind of stuff with the file buffer?
what i have so far is something like this:
$buffer = file_get_contents('file.txt');
while($row = mysql_fetch_array($result)) {
if ( preg_match('/'.$email.'/im',$buffer)) {
echo $row_val;
}
}
$buffer = file_get_contents('file.txt');
$lines = preg_split('/\n/',$buffer);
//or $lines = explode('\n',$buffer);
while($row = mysql_fetch_array($result)) {
if ( in_array($email,$lines)) {
echo $row_val;
}
}
Like already suggested in my closevotes to your question (hence CW):
You can use SplFileObject which implements Iterator to iterate over a file line by line to save memory. See my answers to
Least memory intensive way to read a file in PHP and
How to save memory when reading a file in Php?
for examples.
Don't use file_get_contents for large files. This pulls the entire file into memory all at once. You have to read it in pieces
$fp = fopen('file.txt', 'r');
while(!feof($fp)){
//get onle line
$buffer = fgets($fp);
//do your stuff
}
fclose($fp);
Open the file with fopen() and read it incrementally. Probably one line at a time with fgets().
file_get_contents reads the whole file into memory, which is undesirable if the file is larger than a few megabytes
Depending on how long this takes, you may need to worry about the PHP execution time limit, or the browser timing out if it doesn't receive any output for 2 minutes.
Things you might try:
set_time_limit(0) to avoid running up against the PHP time limit
Make sure to output some data every 30 seconds or so so the browser doesn't time out; make sure to flush(); and possibly ob_flush(); so your output is actually sent over the network (this is a kludge)
start a separate process (e.g. via exec()) to run this in the background. Honestly, anything that takes more than a second or two is best run in the background
What is the best way to write to files in a large php application. Lets say there are lots of writes needed per second. How is the best way to go about this.
Could I just open the file and append the data. Or should i open, lock, write and unlock.
What will happen of the file is worked on and other data needs to be written. Will this activity be lost, or will this be saved. and if this will be saved will is halt the application.
If you have been, thank you for reading!
Here's a simple example that highlights the danger of simultaneous wites:
<?php
for($i = 0; $i < 100; $i++) {
$pid = pcntl_fork();
//only spawn more children if we're not a child ourselves
if(!$pid)
break;
}
$fh = fopen('test.txt', 'a');
//The following is a simple attempt to get multiple threads to start at the same time.
$until = round(ceil(time() / 10.0) * 10);
echo "Sleeping until $until\n";
time_sleep_until($until);
$myPid = posix_getpid();
//create a line starting with pid, followed by 10,000 copies of
//a "random" char based on pid.
$line = $myPid . str_repeat(chr(ord('A')+$myPid%25), 10000) . "\n";
for($i = 0; $i < 1; $i++) {
fwrite($fh, $line);
}
fclose($fh);
echo "done\n";
If appends were safe, you should get a file with 100 lines, all of which roughly 10,000 chars long, and beginning with an integer. And sometimes, when you run this script, that's exactly what you'll get. Sometimes, a few appends will conflict, and it'll get mangled, however.
You can find corrupted lines with grep '^[^0-9]' test.txt
This is because file append is only atomic if:
You make a single fwrite() call
and that fwrite() is smaller than PIPE_BUF (somewhere around 1-4k)
and you write to a fully POSIX-compliant filesystem
If you make more than a single call to fwrite during your log append, or you write more than about 4k, all bets are off.
Now, as to whether or not this matters: are you okay with having a few corrupt lines in your log under heavy load? Honestly, most of the time this is perfectly acceptable, and you can avoid the overhead of file locking.
I do have high-performance, multi-threaded application, where all threads are writing (appending) to single log file. So-far did not notice any problems with that, each thread writes multiple times per second and nothing gets lost. I think just appending to huge file should be no issue. But if you want to modify already existing content, especially with concurrency - I would go with locking, otherwise big mess can happen...
If concurrency is an issue, you should really be using databases.
If you're just writing logs, maybe you have to take a look in syslog function, since syslog provides an api.
You should also delegate writes to a dedicated backend and do the job in an asynchroneous maneer ?
These are my 2p.
Unless a unique file is needed for a specific reason, I would avoid appending everything to a huge file. Instead, I would wrap the file by time and dimension. A couple of configuration parameters (wrap_time and wrap_size) could be defined for this.
Also, I would probably introduce some buffering to avoid waiting the write operation to be completed.
Probably PHP is not the most adapted language for this kind of operations, but it could still be possible.
Use flock()
See this question
If you just need to append data, PHP should be fine with that as filesystem should take care of simultaneous appends.