I have a PHP script that takes a user-supplied string, then SSHs out to a remote server, reads the file into an array, then parses out the request/response blocks containing the string to return to the user.
This implementation does not work with large log files, because PHP runs out of memory trying to store the whole file in an array.
Example data:
*** REQUEST
request line 1
request line 2
request line 3
[...]
*** RESPONSE
response line 2
response line 2
response line 3
[...]
[blank line]
The length of the requests and responses vary, so I can never be sure how many lines there will be.
How can I read a file in chunks without storing the whole file in memory, while still ensuring I'll always be able to process a full request/response block of data from the log without truncating it?
I feel like I'm just being exceptionally dense about this, since my experience is usually working with whole files or arrays.
Here's my current code (with $search representing the user-supplied string we're looking for in the log), which is putting the whole file into an array first:
$stream = ssh2_exec($ssh, $command);
stream_set_blocking($stream, true);
$data = '';
while($buffer = fread($stream, 4096)) {
$data .= $buffer;
}
fclose($stream);
$rawlog = $data;
$logline = explode("\n",$rawlog);
reset($logline);
$block='';
foreach ( $logline as $k => $v ) {
if ( preg_match("/\*\*\* REQUEST",$v) && $block != '') {
if ( preg_match("/$search/i",$block) ) {
$results[] = $block;
}
$block=$v . "\n";
} else {
$block .= $v . "\n";
}
}
if ( preg_match("/$search/i",$block) ) {
$results[] = $block;
}
Any suggestions?
Hard to say if this would work for you but if the logs are in files you could use phpseclib's SFTP implementation (latest Git version).
eg.
If you do $sftp->get('filename.ext', false, 0, 1000) it'll download bytes 0-1000 from filename.ext and return a string with those bytes. If you do $sftp->get('filename.ext', false, 1000, 1000) it'll download bytes 1000-2000.
You can use command like tail which will get lines from 0 to 99, from 100 to 199, and so on.
This will require more ssh commands, but will not require you to store all result in memory.
Or, you can first store all the output into local file, and after that parse it.
Related
I have a piece of code, the issue is the file "data" is over 8GB. This is very memory intensive. I want to reduce the usage of RAM and saw the f_load would be ideal, however, how could i explode this data?
This is my current code:
$data = file_get_contents("data");
$data = explode("|", $data);
foreach ($data as $d) { // rest of code
theoretically, i need to open a pipe, stream and close a pipe How would i go about this?
I've tried using f_open rather than file_get_contents but errors started popping up so i'm doing something wrong and would really like to learn.
You can use stream_get_line to read your data block per block (with | as the delimiter character) : PHP stream_get_line()
$fh = fopen('data', 'r'); // open the file in read-only mode
while( $d = stream_get_line($fh, 1000, '|') ) // read the file until the next |
{
echo $d . PHP_EOL ; // display one block of data
}
fclose($fh); // close the file
I need help processing files holding about 46k lines or more than 30MB of data.
My original idea was to open the file and turn each line into an array element. This worked the first time as the array held about 32k values total.
The second time, the process was repeated, the array only held 1011 elements, and finally, the third time it could only hold 100.
I'm confused and don't know much about the backend array processes. Can someone explain what is happening and fix the code?
function file_to_array($cvsFile){
$handle = fopen($cvsFile, "r");
$path = fread($handle, filesize($cvsFile));
fclose($handle);
//Turn the file into an array and separate lines to elements
$csv = explode(",", $path);
//Remove common double spaces
foreach ($csv as $key => $line){
$csv[$key] = str_replace(' ', '', str_getcsv($line));
}
array_filter($csv);
//get the row count for the file and array
$rows = count($csv);
$filerows = count(file($cvsFile)); //this no longer works
echo "File has $filerows and array has $rows";
return $csv;
}
The approach here can be split in 2.
Optimized file reading and processing
Proper storage solution
Optimized file processing can be done like so:
$handle = fopen($cvsFile, "r");
$rowsSucceed = 0;
$rowsFailed = 0;
if ($handle) {
while (($line = fgets($handle)) !== false) { // Reading file by line
// Process CSV line and check if it was parsed correctly
// And count as you go
if (!empty($parsedLine)) {
$csv[$key] = ... ;
$rowsSucceed++;
} else {
$rowsFailed++;
}
}
fclose($handle);
} else {
// Error handling
}
$totalLines = $rowsSucceed + $rowsFailed;
Also you can avoid array_filter() simply by not adding processed line if its empty.
It will allow to optimize memory usage during script execution.
Proper storage
Proper storage here is needed for performing operations on certain amount of data. File reading are ineffective and expensive. Using simple file based database like sqlite can help you a lot and increase overall performance of your script.
For this purpose you probably should process your CSV directly to database and than perform count operation on parsed data avoiding excessive file line counts etc.
Also it gives you further advantage on working with data not keeping it all in memory.
Your question says you want to "turn each line into an array element" but that is definitely not what you are doing. The code is quite clear; it reads the entire file into $path and then uses explode() to make one massive flat array of every element on every line. Then later you're trying to run str_getcsv() on each item, which of course isn't going to work; you've already exploded all the commas away.
Looping over the file using fgetcsv() makes more sense:
function file_to_array($cvsFile) {
$filerows = 0;
$handle = fopen($cvsFile, "r");
while ($line = fgetcsv($handle)) {
$filerows++;
// skip empty lines
if ($line[0] === null) {
continue;
}
//Remove common double spaces
$csv[] = str_replace(' ', '', $line);
}
//get the row count for the file and array
$rows = count($csv);
echo "File has $filerows and array has $rows";
fclose($handle);
return $csv;
}
I have a file with the size of around 10 GB or more. The file contains only numbers ranging from 1 to 10 on each line and nothing else. Now the task is to read the data[numbers] from the file and then sort the numbers in ascending or descending order and create a new file with the sorted numbers.
Can anyone of you please help me with the answer?
I'm assuming this is somekind of homework and goal for this is to sort more data than you can hold in your RAM?
Since you only have numbers 1-10, this is not that complicated task. Just open your input file and count how many occourances of every specific number you have. After that you can construct simple loop and write values into another file. Following example is pretty self explainatory.
$inFile = '/path/to/input/file';
$outFile = '/path/to/output/file';
$input = fopen($inFile, 'r');
if ($input === false) {
throw new Exception('Unable to open: ' . $inFile);
}
//$map will be array with size of 10, filled with 0-s
$map = array_fill(1, 10, 0);
//Read file line by line and count how many of each specific number you have
while (!feof($input)) {
$int = (int) fgets($input);
$map[$int]++;
}
fclose($input);
$output = fopen($outFile, 'w');
if ($output === false) {
throw new Exception('Unable to open: ' . $outFile);
}
/*
* Reverse array if you need to change direction between
* ascending and descending order
*/
//$map = array_reverse($map);
//Write values into your output file
foreach ($map AS $number => $count) {
$string = ((string) $number) . PHP_EOL;
for ($i = 0; $i < $count; $i++) {
fwrite($output, $string);
}
}
fclose($output);
Taking into account the fact, that you are dealing with huge files, you should also check script execution time limit for your PHP environment, following example will take VERY long for 10GB+ sized files, but since I didn't see any limitations concerning execution time and performance in your question, I'm assuming it is OK.
I had a similar issue before. Trying to manipulate such a large file ended up being huge drain on resources and it couldn't cope. The easiest solution I ended up with was to try and import it into a MySQL database using a fast data dump function called LOAD DATA INFILE
http://dev.mysql.com/doc/refman/5.1/en/load-data.html
Once it's in you should be able to manipulate the data.
Alternatively, you could just read the file line by line while outputting the result into another file line by line with the sorted numbers. Not too sure how well this would work though.
Have you had any previous attempts at it or are you just after a possible method of doing it?
If that's all you don't need PHP (if you have a Linux maschine at hand):
sort -n file > file_sorted-asc
sort -nr file > file_sorted-desc
Edit: OK, here's your solution in PHP (if you have a Linux maschine at hand):
<?php
// Sort ascending
`sort -n file > file_sorted-asc`;
// Sort descending
`sort -nr file > file_sorted-desc`;
?>
:)
I have a backgrounded ffmpeg process which is outputting a audio file, I want to push this file to user web-browser while ffmpeg continues to write upon the file. I tried the below but this send 0 byte files.
// open the file in a binary mode
$fp = fopen($fname, 'rb');
// send the right headers
header('Content-Type: application/octet-stream');
header('Content-Disposition: attachment; filename='.basename($fname));
ob_end_clean();
fpassthru($fp);
exit;
ffmpeg cannot be launched from PHP/Python and output captured here.
The fpassthru function won't do the job here. In addition, there is going to be an issue of knowing when the file is complete.
File reading functions stop when the end of the file is reached. If there is a concurrent writer increasing the length of the file, then it's indeterminate how far along a reader will get before seeing EOF. In addition, there's no clear way - through the file operations - to know whether the file writer is done.
It may be feasible to attempt to read with timeouts using a loop like this (psuedo-code):
LOOP
READ bytes
IF count read == 0
THEN
SLEEP briefly
INCREMENT idle_count
ELSE
SET idle_count = 0
WRITE
END IF
UNTIL ( idle_count == 10 )
I can put that into PHP code if it helps.
Here is PHP code that does this with two files, in.dat and out.dat:
<?php
$in_fp = fopen("./in.dat", "r");
$out_fp = fopen("./out.dat", "w");
$idle_count = 0;
while ( $idle_count < 10 )
{
if ( $idle_count > 0 )
sleep(1);
$val = fread($in_fp, 4192);
if ( ! $val )
{
$idle_count++;
}
else
{
$idle_count = 0;
$rc = fwrite($out_fp, $val);
if ( $rc != strlen($val) )
die("error on writing of the output file\n");
}
}
Note the odd location of the sleep is to prevent sleeping one second after the last attempt to read.
I recommend setting the idle timeout limit higher than 10 for this purpose.
For one off my projects I need to import a very huge text file ( ~ 950MB ). I'm using Symfony2 & Doctrine 2 for my project.
My problem is that I get errors like:
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 24 bytes)
The error even occurs if I increase the memory limit to 1GB.
I tried to analyze the problem by using XDebug and KCacheGrind ( as part of PHPEdit ), but I don't really understand the values :(
I'am looking for a tool or a method (Quick & Simple due to the fact that I don't have much time) to find out why memory is allocated and not freed again.
Edit
To clear some things up here is my code:
$handle = fopen($geonameBasePath . 'allCountries.txt','r');
$i = 0;
$batchSize = 100;
if($handle) {
while (($buffer = fgets($handle,16384)) !== false) {
if( $buffer[0] == '#') //skip comments
continue;
//split parts
$parts = explode("\t",$buffer);
if( $parts[6] != 'P')
continue;
if( $i%$batchSize == 0 ) {
echo 'Flush & Clear' . PHP_EOL;
$em->flush();
$em->clear();
}
$entity = $em->getRepository('MyApplicationBundle:City')->findOneByGeonameId( $parts[0] );
if( $entity !== null) {
$i++;
continue;
}
//create city object
$city = new City();
$city->setGeonameId( $parts[0] );
$city->setName( $parts[1] );
$city->setInternationalName( $parts[2] );
$city->setLatitude($parts[4] );
$city->setLongitude( $parts[5] );
$city->setCountry( $em->getRepository('MyApplicationBundle:Country')->findOneByIsoCode( $parts[8] ) );
$em->persist($city);
unset($city);
unset($entity);
unset($parts);
unset($buffer);
echo $i . PHP_EOL;
$i++;
}
}
fclose($handle);
Things I have tried, but nothing helped:
Adding second parameter to fgets
Increasing memory_limit
Unsetting vars
Increasing memory limit is not going to be enough. When importing files like that, you buffer the reading.
$f = fopen('yourfile');
while ($data = fread($f, '4096') != 0) {
// Do your stuff using the read $data
}
fclose($f);
Update :
When working with an ORM, you have to understand that nothing is actually inserted in the database until the flush call. Meaning all those objects are stored by the ORM tagged as "to be inserted". Only when the flush call is made, the ORM will check the collection and start inserting.
Solution 1 : Flush often. And clear.
Solution 2 : Don't use the ORM. Go for plain SQL command. They will take up far less memory than the object + ORM solution.
33554432 are 32MB
change memory limit in php.ini for example 75MB
memory_limit = 75M
and restart server
Instead of simply reading the file, you should read the file line by line. Every time you do read the one line you should process your data. Do NOT try to fit EVERYTHING in memory. You will fail. The reason for that is that while you can put the TEXT file in ram, you will not be able to also have the data as php objects/variables/whathaveyou at the same time, since php by itself needs much larger amounts of memory for each of them.
What I instead suggest is
a) read a new line,
b) parse the data in the line
c) create the new object to store in the database
d) goto step a, by unset(ting) the old object first or reusing it's memory