php decode base 64 very big file, not a string - php

i have a system that saves files to the server harddrive in base 64 encoded after stripping them from email files.
i would like to change the file to thier original format again, how can i do that in php?
this is how i tried to save the files but that does not seem to create a valid files:
$start = $part['starting-pos-body'];
$end = $part['ending-pos-body'];
$len = $end-$start;
$written = 0;
$write = 2028;
$body = '';
while ($start <= $end) {
fseek($this->stream, $start, SEEK_SET);
$my = fread($this->stream, $write);
fwrite($temp_fp, base64_decode($my));
$start += $write;
}
fclose($temp_fp);

#traylz makes the point clear for why it may fail when it shouldn't. However, base64_decode() may fail for large images even. I have worked with 6 to 7 MB files fine, I haven't gone over this size, so for me it should be as simple as:
$dir = dirname(__FILE__);
// get the base64 encoded file and decode it
$o_en = file_get_contents($dir . '/base64.txt');
$d = base64_decode($o_en);
// put decoded string into tmp file
file_put_contents($dir . '/base64_d', $d);
// get mime type (note: mime_content_type() is deprecated in favour
// for fileinfo functions)
$mime = str_replace('image/', '.', mime_content_type($dir . '/base64_d'));
rename($dir . '/base64_d', $dir . '/base64_d' . $mime);
If the following fails try adding chunk_split() function to decode operation:
$d = base64_decode(chunk_split($o_en));
So what am I sayaing...forget the loop unless there is a need for it...keep the orignal file extension if you don't trust php's mime detection. use chunk_split() on base64_decode() operation if working on large files.
NOTE: all theory so untested
EDIT: for large files that mostly likely will freeze file_get_contents(), read in what you need, output to file so little RAM is used:
$chunkSize = 1024;
$src = fopen('base64.txt', 'rb');
$dst = fopen('binary.mime', 'wb');
while (!feof($src)) {
fwrite($dst, base64_decode(fread($src, $chunkSize)));
}

Your problem is that you read 2028 chunks, checking start<=end AFTER you read chunk,so you read beyond end pointer, and you should check < insead of <= (to avoid reading 0 bytes)
Also you don't need to fseek on every iteration, because fread reads from current position. You can take fssek out of loop( before while).Why 2028 btw?
Try this out:
fseek($this->stream, $start, SEEK_SET);
while ($start < $end) {
$write = min($end-$start,2048);
$my = fread($this->stream, $write);
fwrite($temp_fp, base64_decode($my));
$start += $write;
}
fclose($temp_fp);

thank you all guys for the help
finally i ended up with:
shell_exec('/usr/bin/base64 -d '.$temp_file.' > '.$temp_file.'_s');

Related

Multibyte pointer when reading part of a file in PHP

I use PHP.
The function below loads part of a big multibyte enter separated CSV file and return a pointer (the end position) and the content in an array. With the pointer I can later do another run. It works:
function part($path, $offset, $rows) {
$buffer = array();
$buffer['content'] = '';
$buffer['pointer'] = array();
$handle = fopen($path, "r");
fseek($handle, $offset);
if( $handle ) {
for( $i = 0; $i < $rows; $i++ ) {
$buffer['content'] .= fgets($handle);
$buffer['pointer'] = mb_strlen($buffer['content']);
}
}
fclose($handle);
return($buffer);
}
// Buffer first part
$buffer = part($path_to_file, 0, 100);
// Buffer second part
$buffer = part($path_to_file, $buffer['pointer'], 100);
print_r($buffer);
If I change the $buffer['pointer'] line to:
$buffer['pointer'] = mb_strlen($buffer['content'], "UTF-8");
...it does not work anymore... I understand that it uses the different encoding when I use UTF-8 instead of the default, but why doesn't it work with UTF-8?
Shouldn't UTF-8 be compatible with foreign characters?
Because the function above works when I use it without "UTF-8" I guess I could just use it without UTF-8.
I'm still worried that in some cases it can give the wrong pointer?
Is there a safer way to get the correct pointer?
Encoding test
When I do this I get UTF-8:
echo mb_detect_encoding($buffer['content']);
This has little to do with UTF-8. Filesystem functions (like fseek(), fread(), etc.) operate on individual bytes. They don't care about the encoding at all. (You could be writing / reading binary data).
If you want to store a pointer to fseek() to at a later time, use ftell() to find out the current position:
$buffer['pointer'] = ftell($handle);

Change pointer for file_put_contents()

$iplog = "$time EST - $userip - $location - $currentpage\n";
file_put_contents("iplog.txt", $iplog, FILE_APPEND);
I am trying to write this to the text file, but it puts it at the bottom and I would prefer if the new entries were at the top. How would I change the pointer for where it puts the text?
To prepend at the beginning of a file is very uncommon, as it requires all data of the file copied. If the file is large, this might get unacceptable for performance (especially when it is a log file, which is frequently written to). I would re-think If you really want that.
The simplest way to do this with PHP is something like this:
$iplog = "$time EST - $userip - $location - $currentpage\n";
file_put_contents("iplog.txt", $iplog . file_get_contents('iplog.txt'));
The file_get_contents solution doesn't have a flag for prepending content to a file and is not very efficient for big files, which log files usually are. The solution is to use fopen and fclose with a temporary buffer. Then you can have issues if different visitors are updating your log file at the same time, but's that another topic (you then need locking mechanisms or else).
<?php
function prepend($file, $data, $buffsize = 4096)
{
$handle = fopen($file, 'r+');
$total_len = filesize($file) + strlen($data);
$buffsize = max($buffsize, strlen($data));
// start by adding the new data to the file
$save= fread($handle, $buffsize);
rewind($handle);
fwrite($handle, $data, $buffsize);
// now add the rest of the file after the new data
$data = $save;
while (ftell($handle) < $total_len)
{
$chunk = fread($handle, $buffsize);
fwrite($handle, $data);
$data = $chunk;
}
}
prepend("iplog.txt", "$time EST - $userip - $location - $currentpage\n")
?>
That should do it (the code was tested). It requires an initial iplog.txt file though (or filesize throws an error.

Reading large files from end

Can I read a file in PHP from my end, for example if I want to read last 10-20 lines?
And, as I read, if the size of the file is more than 10mbs I start getting errors.
How can I prevent this error?
For reading a normal file, we use the code :
if ($handle) {
while (($buffer = fgets($handle, 4096)) !== false) {
$i1++;
$content[$i1]=$buffer;
}
if (!feof($handle)) {
echo "Error: unexpected fgets() fail\n";
}
fclose($handle);
}
My file might go over 10mbs, but I just need to read the last few lines. How do I do it?
Thanks
You can use fopen and fseek to navigate in file backwards from end. For example
$fp = #fopen($file, "r");
$pos = -2;
while (fgetc($fp) != "\n") {
fseek($fp, $pos, SEEK_END);
$pos = $pos - 1;
}
$lastline = fgets($fp);
It's not pure PHP, but the common solution is to use the tac command which is the revert of cat and loads the file in reverse. Use exec() or passthru() to run it on the server and then read the results. Example usage:
<?php
$myfile = 'myfile.txt';
$command = "tac $myfile > /tmp/myfilereversed.txt";
exec($command);
$currentRow = 0;
$numRows = 20; // stops after this number of rows
$handle = fopen("/tmp/myfilereversed.txt", "r");
while (!feof($handle) && $currentRow <= $numRows) {
$currentRow++;
$buffer = fgets($handle, 4096);
echo $buffer."<br>";
}
fclose($handle);
?>
It depends how you interpret "can".
If you wonder whether you can do this directly (with PHP function) without reading the all the preceding lines, then the answer is: No, you cannot.
A line ending is an interpretation of the data and you can only know where they are, if you actually read the data.
If it is a really big file, I'd not do that though.
It would be better if you were to scan the file starting from the end, and gradually read blocks from the end to the file.
Update
Here's a PHP-only way to read the last n lines of a file without reading through all of it:
function last_lines($path, $line_count, $block_size = 512){
$lines = array();
// we will always have a fragment of a non-complete line
// keep this in here till we have our next entire line.
$leftover = "";
$fh = fopen($path, 'r');
// go to the end of the file
fseek($fh, 0, SEEK_END);
do{
// need to know whether we can actually go back
// $block_size bytes
$can_read = $block_size;
if(ftell($fh) < $block_size){
$can_read = ftell($fh);
}
// go back as many bytes as we can
// read them to $data and then move the file pointer
// back to where we were.
fseek($fh, -$can_read, SEEK_CUR);
$data = fread($fh, $can_read);
$data .= $leftover;
fseek($fh, -$can_read, SEEK_CUR);
// split lines by \n. Then reverse them,
// now the last line is most likely not a complete
// line which is why we do not directly add it, but
// append it to the data read the next time.
$split_data = array_reverse(explode("\n", $data));
$new_lines = array_slice($split_data, 0, -1);
$lines = array_merge($lines, $new_lines);
$leftover = $split_data[count($split_data) - 1];
}
while(count($lines) < $line_count && ftell($fh) != 0);
if(ftell($fh) == 0){
$lines[] = $leftover;
}
fclose($fh);
// Usually, we will read too many lines, correct that here.
return array_slice($lines, 0, $line_count);
}
Following snippet worked for me.
$file = popen("tac $filename",'r');
while ($line = fgets($file)) {
echo $line;
}
Reference: http://laughingmeme.org/2008/02/28/reading-a-file-backwards-in-php/
If your code is not working and reporting an error you should include the error in your posts!
The reason you are getting an error is because you are trying to store the entire contents of the file in PHP's memory space.
The most effiicent way to solve the problem would be as Greenisha suggests and seek to the end of the file then go back a bit. But Greenisha's mecanism for going back a bit is not very efficient.
Consider instead the method for getting the last few lines from a stream (i.e. where you can't seek):
while (($buffer = fgets($handle, 4096)) !== false) {
$i1++;
$content[$i1]=$buffer;
unset($content[$i1-$lines_to_keep]);
}
So if you know that your max line length is 4096, then you would:
if (4096*lines_to_keep<filesize($input_file)) {
fseek($fp, -4096*$lines_to_keep, SEEK_END);
}
Then apply the loop I described previously.
Since C has some more efficient methods for dealing with byte streams, the fastest solution (on a POSIX/Unix/Linux/BSD) system would be simply:
$last_lines=system("last -" . $lines_to_keep . " filename");
For Linux you can do
$linesToRead = 10;
exec("tail -n{$linesToRead} {$myFileName}" , $content);
You will get an array of lines in $content variable
Pure PHP solution
$f = fopen($myFileName, 'r');
$maxLineLength = 1000; // Real maximum length of your records
$linesToRead = 10;
fseek($f, -$maxLineLength*$linesToRead, SEEK_END); // Moves cursor back from the end of file
$res = array();
while (($buffer = fgets($f, $maxLineLength)) !== false) {
$res[] = $buffer;
}
$content = array_slice($res, -$linesToRead);
If you know about how long the lines are, you can avoid a lot of the black magic and just grab a chunk of the end of the file.
I needed the last 15 lines from a very large log file, and altogether they were about 3000 characters. So I just grab the last 8000 bytes to be safe, then read the file as normal and take what I need from the end.
$fh = fopen($file, "r");
fseek($fh, -8192, SEEK_END);
$lines = array();
while($lines[] = fgets($fh)) {}
This is possibly even more efficient than the highest rated answer, which reads the file character by character, compares each character, and splits based on newline characters.
Here is another solution. It doesn't have line length control in fgets(), you can add it.
/* Read file from end line by line */
$fp = fopen( dirname(__FILE__) . '\\some_file.txt', 'r');
$lines_read = 0;
$lines_to_read = 1000;
fseek($fp, 0, SEEK_END); //goto EOF
$eol_size = 2; // for windows is 2, rest is 1
$eol_char = "\r\n"; // mac=\r, unix=\n
while ($lines_read < $lines_to_read) {
if (ftell($fp)==0) break; //break on BOF (beginning...)
do {
fseek($fp, -1, SEEK_CUR); //seek 1 by 1 char from EOF
$eol = fgetc($fp) . fgetc($fp); //search for EOL (remove 1 fgetc if needed)
fseek($fp, -$eol_size, SEEK_CUR); //go back for EOL
} while ($eol != $eol_char && ftell($fp)>0 ); //check EOL and BOF
$position = ftell($fp); //save current position
if ($position != 0) fseek($fp, $eol_size, SEEK_CUR); //move for EOL
echo fgets($fp); //read LINE or do whatever is needed
fseek($fp, $position, SEEK_SET); //set current position
$lines_read++;
}
fclose($fp);
Well while searching for the same thing, I can across the following and thought it might be useful to others as well so sharing it here:
/* Read file from end line by line */
function tail_custom($filepath, $lines = 1, $adaptive = true) {
// Open file
$f = #fopen($filepath, "rb");
if ($f === false) return false;
// Sets buffer size, according to the number of lines to retrieve.
// This gives a performance boost when reading a few lines from the file.
if (!$adaptive) $buffer = 4096;
else $buffer = ($lines < 2 ? 64 : ($lines < 10 ? 512 : 4096));
// Jump to last character
fseek($f, -1, SEEK_END);
// Read it and adjust line number if necessary
// (Otherwise the result would be wrong if file doesn't end with a blank line)
if (fread($f, 1) != "\n") $lines -= 1;
// Start reading
$output = '';
$chunk = '';
// While we would like more
while (ftell($f) > 0 && $lines >= 0) {
// Figure out how far back we should jump
$seek = min(ftell($f), $buffer);
// Do the jump (backwards, relative to where we are)
fseek($f, -$seek, SEEK_CUR);
// Read a chunk and prepend it to our output
$output = ($chunk = fread($f, $seek)) . $output;
// Jump back to where we started reading
fseek($f, -mb_strlen($chunk, '8bit'), SEEK_CUR);
// Decrease our line counter
$lines -= substr_count($chunk, "\n");
}
// While we have too many lines
// (Because of buffer size we might have read too many)
while ($lines++ < 0) {
// Find first newline and remove all text before that
$output = substr($output, strpos($output, "\n") + 1);
}
// Close file and return
fclose($f);
return trim($output);
}
As Einstein said every thing should be made as simple as possible but no simpler. At this point you are in need of a data structure, a LIFO data structure or simply put a stack.
A more complete example of the "tail" suggestion above is provided here. This seems to be a simple and efficient method -- thank-you. Very large files should not be an issue and a temporary file is not required.
$out = array();
$ret = null;
// capture the last 30 files of the log file into a buffer
exec('tail -30 ' . $weatherLog, $buf, $ret);
if ( $ret == 0 ) {
// process the captured lines one at a time
foreach ($buf as $line) {
$n = sscanf($line, "%s temperature %f", $dt, $t);
if ( $n > 0 ) $temperature = $t;
$n = sscanf($line, "%s humidity %f", $dt, $h);
if ( $n > 0 ) $humidity = $h;
}
printf("<tr><th>Temperature</th><td>%0.1f</td></tr>\n",
$temperature);
printf("<tr><th>Humidity</th><td>%0.1f</td></tr>\n", $humidity);
}
else { # something bad happened }
In the above example, the code reads 30 lines of text output and displays the last temperature and humidity readings in the file (that's why the printf's are outside of the loop, in case you were wondering). The file is filled by an ESP32 which adds to the file every few minutes even when the sensor reports only nan. So thirty lines gets plenty of readings so it should never fail. Each reading includes the date and time so in the final version the output will include the time the reading was taken.

file_get_contents => PHP Fatal error: Allowed memory exhausted

I have no experience when dealing with large files so I am not sure what to do about this. I have attempted to read several large files using file_get_contents ; the task is to clean and munge them using preg_replace().
My code runs fine on small files ; however, the large files (40 MB) trigger an Memory exhausted error:
PHP Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate 41390283 bytes)
I was thinking of using fread() instead but I am not sure that'll work either. Is there a workaround for this problem?
Thanks for your input.
This is my code:
<?php
error_reporting(E_ALL);
##get find() results and remove DOS carriage returns.
##The error is thrown on the next line for large files!
$myData = file_get_contents("tmp11");
$newData = str_replace("^M", "", $myData);
##cleanup Model-Manufacturer field.
$pattern = '/(Model-Manufacturer:)(\n)(\w+)/i';
$replacement = '$1$3';
$newData = preg_replace($pattern, $replacement, $newData);
##cleanup Test_Version field and create comma delimited layout.
$pattern = '/(Test_Version=)(\d).(\d).(\d)(\n+)/';
$replacement = '$1$2.$3.$4 ';
$newData = preg_replace($pattern, $replacement, $newData);
##cleanup occasional empty Model-Manufacturer field.
$pattern = '/(Test_Version=)(\d).(\d).(\d) (Test_Version=)/';
$replacement = '$1$2.$3.$4 Model-Manufacturer:N/A--$5';
$newData = preg_replace($pattern, $replacement, $newData);
##fix occasional Model-Manufacturer being incorrectly wrapped.
$newData = str_replace("--","\n",$newData);
##fix 'Binary file' message when find() utility cannot id file.
$pattern = '/(Binary file).*/';
$replacement = '';
$newData = preg_replace($pattern, $replacement, $newData);
$newData = removeEmptyLines($newData);
##replace colon with equal sign
$newData = str_replace("Model-Manufacturer:","Model-Manufacturer=",$newData);
##file stuff
$fh2 = fopen("tmp2","w");
fwrite($fh2, $newData);
fclose($fh2);
### Functions.
##Data cleanup
function removeEmptyLines($string)
{
return preg_replace("/(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+/", "\n", $string);
}
?>
Firstly you should understand that when using file_get_contents you're fetching the entire string of data into a variable, that variable is stored in the hosts memory.
If that string is greater than the size dedicated to the PHP process then PHP will halt and display the error message above.
The way around this to open the file as a pointer, and then take a chunk at a time. This way if you had a 500MB file you can read the first 1MB of data, do what you will with it, delete that 1MB from the system's memory and replace it with the next MB. This allows you to manage how much data you're putting in the memory.
An example if this can be seen below, I will create a function that acts like node.js
function file_get_contents_chunked($file,$chunk_size,$callback)
{
try
{
$handle = fopen($file, "r");
$i = 0;
while (!feof($handle))
{
call_user_func_array($callback,array(fread($handle,$chunk_size),&$handle,$i));
$i++;
}
fclose($handle);
}
catch(Exception $e)
{
trigger_error("file_get_contents_chunked::" . $e->getMessage(),E_USER_NOTICE);
return false;
}
return true;
}
and then use like so:
$success = file_get_contents_chunked("my/large/file",4096,function($chunk,&$handle,$iteration){
/*
* Do what you will with the {$chunk} here
* {$handle} is passed in case you want to seek
** to different parts of the file
* {$iteration} is the section of the file that has been read so
* ($i * 4096) is your current offset within the file.
*/
});
if(!$success)
{
//It Failed
}
One of the problems you will find is that you're trying to perform regex several times on an extremely large chunk of data. Not only that but your regex is built for matching the entire file.
With the above method your regex could become useless as you may only be matching a half set of data. What you should do is revert to the native string functions such as
strpos
substr
trim
explode
for matching the strings, I have added support in the callback so that the handle and current iteration are passed. This will allow you to work with the file directly within your callback, allowing you to use functions like fseek, ftruncate and fwrite for instance.
The way you're building your string manipulation is not efficient whatsoever, and using the proposed method above is by far a much better way.
A pretty ugly solution to adjust your memory limit depending on file size:
$filename = "yourfile.txt";
ini_set ('memory_limit', filesize ($filename) + 4000000);
$contents = file_get_contents ($filename);
The right solutuion would be to think if you can process the file in smaller chunks, or use command line tools from PHP.
If your file is line-based you can also use fgets to process it line-by-line.
For processing just n numbers of rows at a time, we can use generators in PHP.
n(use 1000)
This is how it works
Read n lines, process them, come back at n+1, then read n lines, process them come back and read next n lines and so on.
Here's the code for doing so.
<?php
class readLargeCSV{
public function __construct($filename, $delimiter = "\t"){
$this->file = fopen($filename, 'r');
$this->delimiter = $delimiter;
$this->iterator = 0;
$this->header = null;
}
public function csvToArray()
{
$data = array();
while (($row = fgetcsv($this->file, 1000, $this->delimiter)) !== false)
{
$is_mul_1000 = false;
if(!$this->header){
$this->header = $row;
}
else{
$this->iterator++;
$data[] = array_combine($this->header, $row);
if($this->iterator != 0 && $this->iterator % 1000 == 0){
$is_mul_1000 = true;
$chunk = $data;
$data = array();
yield $chunk;
}
}
}
fclose($this->file);
if(!$is_mul_1000){
yield $data;
}
return;
}
}
And for reading it, you can use this.
$file = database_path('path/to/csvfile/XYZ.csv');
$csv_reader = new readLargeCSV($file, ",");
foreach($csv_reader->csvToArray() as $data){
// you can do whatever you want with the $data.
}
Here $data contains the 1000 entries from the csv or n%1000 which will be for the last batch.
A detailed explanation for this can be found here https://medium.com/#aashish.gaba097/database-seeding-with-large-files-in-laravel-be5b2aceaa0b
My advice would be to use fread. It may be a little slower, but you won't have to use all your memory...
For instance :
//This use filesize($oldFile) memory
file_put_content($newFile, file_get_content($oldFile));
//And this 8192 bytes
$pNew=fopen($newFile, 'w');
$pOld=fopen($oldFile, 'r');
while(!feof($pOld)){
fwrite($pNew, fread($pOld, 8192));
}

Read a file from line X to line Y? [duplicate]

The closest I've seen in the PHP docs, is to fread() a given length, but that doesnt specify which line to start from. Any other suggestions?
Yes, you can do that easily with SplFileObject::seek
$file = new SplFileObject('filename.txt');
$file->seek(1000);
for($i = 0; !$file->eof() && $i < 1000; $i++) {
echo $file->current();
$file->next();
}
This is a method from the SeekableIterator interface and not to be confused with fseek.
And because SplFileObject is iterable you can do it even easier with a LimitIterator:
$file = new SplFileObject('longFile.txt');
$fileIterator = new LimitIterator($file, 1000, 2000);
foreach($fileIterator as $line) {
echo $line, PHP_EOL;
}
Again, this is zero-based, so it's line 1001 to 2001.
You not going to be able to read starting from line X because lines can be of arbitrary length. So you will have to read from the start counting the number of lines read to get to line X. For example:
<?php
$f = fopen('sample.txt', 'r');
$lineNo = 0;
$startLine = 3;
$endLine = 6;
while ($line = fgets($f)) {
$lineNo++;
if ($lineNo >= $startLine) {
echo $line;
}
if ($lineNo == $endLine) {
break;
}
}
fclose($f);
Unfortunately, in order to be able to read from line x to line y, you'd need to be able to detect line breaks... and you'd have to scan through the whole file. However, assuming you're not asking about this for performance reasons, you can get lines x to y with the following:
$x = 10; //inclusive start line
$y = 20; //inclusive end line
$lines = file('myfile.txt');
$my_important_lines = array_slice($lines, $x, $y);
See: array_slice
Well, you can't use function fseek to seek the appropriate position because it works with given number of bytes.
I think that it's not possible without some sort of cache or going through lines one after the other.
Here is the possible solution :)
<?php
$f = fopen('sample.txt', 'r');
$lineNo = 0;
$startLine = 3;
$endLine = 6;
while ($line = fgets($f)) {
$lineNo++;
if ($lineNo >= $startLine) {
echo $line;
}
if ($lineNo == $endLine) {
break;
}
}
fclose($f);
?>
If you're looking for lines then you can't use fread because that relies on a byte offset, not the number of line breaks. You actually have to read the file to find the line breaks, so a different function is more appropriate. fgets will read the file line-by-line. Throw that in a loop and capture only the lines you want.
I was afraid of that... I guess it's plan B then :S
For each AJAX request I'm going to:
Read into a string the number of lines I'm going to return to the client.
Copy the rest of the file into a temp file.
Return string to the client.
It's lame, and it will probably be pretty slow with 10,000+ lines files, but I guess it's better than reading the same over and over again, at least the temp file is getting shorter with every request... No?

Categories