byte position: file_get_contents vs fopen - php

I need some data from a specific byte in range in a binary file.
(concatenated jpegs, don't ask...)
So I have a offset and length data from an external API.
(I would guess that those are byte positions)
What works is the following:
$fileData = file_get_contents($binaryFile);
$imageData = substr($fileData, $offset, $length);
But i would rather not load the full file into memory and therefor tried fopen:
$handle = fopen($binaryFile, 'rb');
fseek($handle, $offset);
$imageData = fgets($handle, $length);
But that doesn't work. The data chunk is no valid image data.
So i assume i got the position wrong with fopen.
Any ideas on how the positions differ in substr vs fopen?

You wrote
The data chunk is no valid image data
"image data" - but in your code you call fgets() to read that data. That's wrong, as image is binary data, not a text file, so you don't want it read it by lines (docs):
fgets — Gets line from file pointer
This means fgets() would stop reading from file once it finds what it considers line end marker, which usually means stopping earlier and reading less than $length as there's pretty low chance such byte is not in the binary sequence.
So fgets() wrong method to use and this is the main issue. Instead you should pick less smart fread() (which does not know about lines and stuff, and just reads what you told). Finally you should fclose() the handle when you done. And naturally you should always check for errors, starting from fopen():
if ($handle = fopen($binaryFile, 'rb')) {
if (fseek($handle, $offset) === 0) {
$imageData = fread($handle, $length);
if ($imageData === false) {
// error handling - failed to read the data
}
} else {
// error handling - seek failed
}
fclose($handle);
} else {
// error handling - can't open file
}
So always use right tool for the task, and if you are unsure what given method/function does, there's always not-that-bad documentation to peek.

You can use file_get_contents, too. See this simple line:
imageData = file_get_contents($binaryFile, null, null, 0, $length);
And here the documentation of file_get_contents.

Related

What the correct way to get size of binary data in php?

I've read a part of file and now want to make sure the part is the right size. How can I do it in php?
$part = fread($file, 1024);
return some_function($part) == 1024;
I've read the examples, but a I doubt to use strlen in cause of Null-terminated string, that might be inside the binary data in $part. In this way strlen returns size from start of part and until first null-byte is presented in data.
As stated in the PHP manual, strlen returns the number of bytes in the string, not the character length.
In PHP, a null byte in a string does NOT count as the end of the string, and any null bytes are included in the length of the string.
So strlen can be used for binary data, no matter if the data is from a file or some other source.
According to PHP documentation of fread() function you may use the construction using filesize() as shown in the first example:
<?php
// get contents of a file into a string
$filename = "/usr/local/something.txt";
$handle = fopen($filename, "r");
$contents = fread($handle, filesize($filename));
fclose($handle);
?>
Update: to find a size of the file can be used function stat() without opening or fstat() on opened file.

Concatenate files in PHP

I'd like to know if there is a faster way of concatenating 2 text files in PHP, than the usual way of opening txt1 in a+, reading txt2 line by line and copying each line to txt1.
If you want to use a pure-PHP solution, you could use file_get_contents to read the whole file in a string and then write that out (no error checking, just to show how you could do it):
$fp1 = fopen("txt1", 'a+');
$file2 = file_get_contents("txt2");
fwrite($fp1, $file2);
It's probably much faster to use the cat program in linux if you have command line permissions for PHP
system('cat txt1 txt2 > txt3');
$content = file_get_contents("file1");
file_put_contents("file2", $content, FILE_APPEND);
I have found using *nix cat to be the most effective here, but if for whatever reason you don't have access to it, and you are concatenating large files, then you can use this line by line function. (Error handling stripped for simplicity).
function catFiles($arrayOfFiles, $outputPath) {
$dest = fopen($outputPath,"a");
foreach ($arrayOfFiles as $f) {
$FH = fopen($f,"r");
$line = fgets($FH);
while ($line !== false) {
fputs($dest,$line);
$line = fgets($FH);
}
fclose($FH);
}
fclose($dest);
}
While the fastest way is undobtedly to use OS commands, like cp or cat, this is hardly advisable for compatibility.
The fastest "PHP only" way is using file_get_contents, that reads the whole source file, in one shot but it also has some drawbacks. It will require a lot of memory for large files and for this reason it may fail depending on the memory assigned to PHP.
A universal clean and fast solution is to use fread and fwrite with a large buffer.
If the file is smaller than the buffer, all reading will happen in one burst, so speed is optimal, otherwise reading happens at big chunks (the size of the buffer) so the overhead is minimal and speed is quite good.
Reading line by line with fgets instead, has to test for every charachter, one by one, if it's a newline or line feed.
Also, reading line by line with fgets a file with many short lines will be slower as you will read many little pieces, of different sizes, depending of where newlines are positioned.
fread is faster as it only checks for EOF (which is easy) and reads files using a fixed size chunk you decide, so it can be made optimal for your OS or disk or kind of files (say you have many files <12k you can set the buffer size to 16k so they are all read in one shot).
// Code is untested written on mobile phone inside Stack Overflow, comes from various examples online you can also check.
<?php
$BUFFER_SIZE=1*1024*1024; // 1MB, bigger is faster.. depending on file sizes and count
$dest = fopen($fileToAppendTo "a+");
if (FALSE === $dest) die("Failed to open destination");
$handle = fopen("source.txt", "rb");
if (FALSE === $handle) {
fclose($dest);
die("Failed to open source");
}
$contents = '';
while( !feof($handle) ) {
fwrite($dest, fread($handle, $BUFFER_SIZE) );
}
fclose($handle);
fclose($dest);
?>

PHP fread file pointer position

I wanted to know how fread function moves the file pointer inside the file.
lets consider the following scenario:
<?php
$file=fopen(binary.txt,rb);
fread($file,0x594);
function(fread($file,0x1a8), ....); // some function w/ first argument as fread O/P
?>
brief overview of the code:
it will open a binary file in read only mode. I wanted to know if my understanding is correct or not:
The first invocation of the fread function will move the file pointer to position 0x594.
Since the position of the first byte in the binary file is considered 0, and fread function is reading 0x594 bytes, so what will be the new position of file pointer?
0x593 or 0x594?
The second fread function will start reading from the previous file pointer position. So, everytime, there is an invocation of fread function, the position of file pointer is preserved?
Which means, in a sequence of fread function invocations, each fread function starts reading bytes from the position of the file pointer set by the previous fread function?
in this case, it will start reading bytes from position, 0x594 to (0x594+0x1a8) or 0x73c ?
thanks.
You can investigate this yourself, using ftell(). The current position of the file pointer is an inherent part of the file infrastructure, and PHP is simply riding on top of libc/glibc's implementation of fopen/fread/etc...
However, consider this:
$fh = fopen('somefile.txt', 'r');
the file pointer will now be a position zero, because no data has been read.
$data = fread($fh, 500);
file pointer will now be at position 500, because it's read positions 0->499 (500 bytes) as part of the previous fread call.
$data = fread($fh, 0); // makes no sense to read 0 bytes, but hey...
still at position 500
$data = fread($fh, 1); // now at 501
$data = fread($fh, 2); // now at 503
etc...
Basically, use ftell() to check for yourself. ftell() is used to retrieve the CURRENT LOCATION of the filepointer, so you can use remember where you are. You can then use rewind(), fseek(), etc... to move the pointer all over, then jump right back to where you were without losing place:
$old_loc = ftell($fh); // 503
fseek($fh, 9999);
fseek($fh, 20000); // jump around a bit
fseek($fh, $old_loc); // back to 503, ready to resume reading where we left off.

Reading part of a remote page in PHP

I have a page at URL http://site.com/params. I want to read only the first n characters from this remote page. Functions like readfile(), file_get_contents() and curl seem to download whole of the pages. Can't figure out how to do this in PHP.
Please help...!
file_get_contents() may be what you're looking for if the maxlen parameter is utilized. By default, this function:
//Reads entire file into a string
$wholePage= file_get_contents('http://www.example.com/');
However, the maxlen parameter is the maximum length of data read.
// Read 14 characters starting from the 1st character
$section = file_get_contents('http://www.example.com/', NULL, NULL, 0, 14);
This implies that the entire file is not read, and only maxlen characters are read, if maxlen is defined.
You can try to do through sockets link
$handle = fopen("http://domain.com/params", "r");
$buffer = fgets($handle, 100);
echo $buffer;
fclose($handle);

Only partial content from a remote file is being read (PHP)

I need some help with this code. I am pretty sure the code is correct but I could be wrong. The problem is that the getSourceCode() isn't pulling the entire contents of the URL. It only returns a third of the data, for example: the $size variable would return 26301 and the returned data size would only be 8900. I have changed php.ini to have max file size of 100M so I don't think that is problem.
private function getSourceCode($url){
$fp = fopen($url, "r");
$size = strlen(file_get_contents($url));;
$data = fread($fp, $size);
fclose($fp);
return $data;
}
Ok, well, if you're using file_get_contents you shouldn't be using fread too.
private function getSourceCode($url){
return file_get_contents($url);
}
Short answer is that byte != 1 character in a string. You can use $data= file_get_contents($url) to get the entire file as a string.
Long answer fread is looking for the number of bytes but strlen is returning the number of characters and a character can be larger than 1 byte so you can end up not getting the entire file. Alternatively you could use filesize() to get the file length in bytes instead of strlen().

Categories