PHP: Amount of bytes fread - php

Say I read a number of bytes like this:
$data = fread($fp, 4096);
Since fread will stop reading if it reaches the end of the file, how can I know exactly how much was read? Would strlen($data) work? Or could that be potentially wrong?
What I'm trying to accomplish, is to read a number of bytes, and then go back to where I was before I read. And I'm trying to avoid using arithmetic (ftell, fread, ftell, subract, fseek), since a file could potentially be larger than PHP_INT_MAX and potentially mess that up. What I would want is to just do fseek($fp, -$bytes_read, SEEK_CUR), but for that I need to know how many bytes I just read...

After fread use ftell($fp) to get the current file position.

Check this (untested):
mb_strlen($data, '8bit')
The second argument '8bit' forces the function to return the number of bytes.
Found in comments at php manual for mb_strlen.

Related

What the correct way to get size of binary data in php?

I've read a part of file and now want to make sure the part is the right size. How can I do it in php?
$part = fread($file, 1024);
return some_function($part) == 1024;
I've read the examples, but a I doubt to use strlen in cause of Null-terminated string, that might be inside the binary data in $part. In this way strlen returns size from start of part and until first null-byte is presented in data.
As stated in the PHP manual, strlen returns the number of bytes in the string, not the character length.
In PHP, a null byte in a string does NOT count as the end of the string, and any null bytes are included in the length of the string.
So strlen can be used for binary data, no matter if the data is from a file or some other source.
According to PHP documentation of fread() function you may use the construction using filesize() as shown in the first example:
<?php
// get contents of a file into a string
$filename = "/usr/local/something.txt";
$handle = fopen($filename, "r");
$contents = fread($handle, filesize($filename));
fclose($handle);
?>
Update: to find a size of the file can be used function stat() without opening or fstat() on opened file.

Downloading a large file in PHP, max 8192 bytes?

I'm using the following code to download a large file (>100mb). The code is executed in a shell.
$fileHandle = fopen($url, 'rb');
$bytes = 100000;
while ($read = #fread($fileHandle, $bytes)) {
debug(strlen($read));
if (!file_put_contents($filePath, $read, FILE_APPEND)) {
return false;
}
}
Where I would expect that debug(strlen($read)) would output 100000, this is the actual output:
10627
8192
8192
8192
...
Why doesn't fread read more than 8192 bytes after the first time, and why does it read 10627 bytes on the first iteration?
This makes downloading the file very slow, is there a better way to do this?
The answer to your question is (quoting from the PHP docs for fread()):
if the stream is read buffered and it does not represent a plain file, at most one read of up to a number of bytes equal to the chunk size (usually 8192) is made; depending on the previously buffered data, the size of the returned data may be larger than the chunk size
The solution to your performance problem is to using stream_copy_to_stream() which should be faster than block reading using fread(), and more memory efficient as well
I checked the manual, and found this: http://php.net/manual/en/function.fread.php
"If the stream is read buffered and it does not represent a plain file, at most one read of up to a number of bytes equal to the chunk size (usually 8192) is made;"
Since you're opening a URL this is probably the case.
It doesn't explain the 10627 though...
Besides that, why do you expect 100000 byte reads to be faster than 8192?
I doubt that's your bottle neck. My guess is that either the download speed from the URL or the writing speed of the HD is the problem.

fgets() and fread() - What is the difference?

I understand the differences between fgets() and fgetss() but I don't get the difference between fgets() and fread(), can someone please clarify this subject? Which one is faster? Thanks!
fgets reads a line -- i.e. it will stop at a newline.
fread reads raw data -- it will stop after a specified (or default) number of bytes, independently of any newline that might or might not be present.
Speed is not a reason to use one over the other, as those two functions just don't do the same thing :
If you want to read a line, from a text file, then use fgets
If you want to read some data (not necessarily a line) from a file, then use fread.
fread() for binary data and fread has a limit on how many chars you can read
$source_file = fopen( $filename, "r" ) or die("Couldn't open $filename");
while (!feof($source_file)) {
$buffer = fread($source_file, 5);
var_dump($buffer); //return string with length 5 chars!
}
Number 5 is length bytes have been read .
The function fgets reads a single line from a text file. It is reading so long until the end of the current line (or the end of the file) is reached. Therefore, if you would like to read one line from a text file, you should use fgets.
The function fread not only reads until the end of the line but to the end of the file [e.g. fread($handle)] or as many bytes as specified as a parameter [e.g. fread($handle, 1024)]. So, if you would like to read a complete file, no matter whether it is a text file with all containing lines or arbitrary raw data from a file, you should use fread.
Both the functions are used to read data from files
fgets($filename, $bytes)
fgets usually reads $bytes-1 amount of data and stops at a newline or an EOF(end-of-file) whichever comes first. If the bytes are not specified, then the default value is 1024 bytes.
fread($filename, $bytes)
fread reads exactly $bytes amount of data and stops only at EOF.
The accepted answer is correct, but there is one more case for fread to stop reading. fread has a chunk limit of 8192 bytes. I discovered this when I was getting different results from fread($stream, 8300) and fget($stream, 8300).
From fread docs:
if the stream is read buffered and it does not represent a plain file, at most one read of up to a number of bytes equal to the chunk size (usually 8192) is made; depending on the previously buffered data, the size of the returned data may be larger than the chunk size.

Is it better to use fseek() fread() on individual lines, or fread() the entire file and substr to parse?

To make this more clear, I'm going to put code samples:
$file = fopen('filename.ext', 'rb');
// Assume $pos has been declared
// method 1
fseek($file, $pos);
$parsed = fread($file, 2);
// method 2
while (!feof($file)) {
$data = fread($file, 1000000);
}
$data = bin2hex($data);
$parsed = substr($data, $pos, 2);
$fclose($file);
There are about 40 fread() in method 1 (with maybe 15 fseek()) vs 1 fread() in method 2. The only thing I am wondering is if loading in 1000000 bytes is overkill when you're really only extracting maybe 100 total bytes (all relatively close together in the middle of the file).
So which code is going to perform better? Which code makes more sense to use? A quick explanation would be greatly appreciated.
If you already know the offset you are looking for, fseek is the best method here, as there is no reason to load the whole file into memory if you only need a few bytes of it. The first method is better because you skip right to what you want in the file stream and read out a small portion. The second method requires you to read the entire file into memory, then seek through that while you could have just read it straight from the file. Hope this answers your question
Files are read in units of clusters, and a cluster is usually something like 8 kb. Usually a few clusters are read ahead.
So, if the file is only a few kb there is very little to gain by using fseek compared to reading the entire file. The file system will read the entire file anyway.
If the file is considerably larger, as in your case, only a few of the clusters has to be read, so the first method should perform better. At worst all the data will still be read from the disk, but your application will still use less memory.
It seems that seeking the position you want and then reading only be bytes you need is the best approach.
But the correct answer is (as always) to test it for real instead of guessing. Run your two examples in your server environment and make some time measurements. Also check memory usage. Then make your optimization once you have some hard data to back it up.

PHP fseek() equivalent for variables?

What I need is an equivalent for PHP's fseek() function. The function works on files, but I have a variable that contains binary data and I want to work on it. I know I could use substr(), but that would be lame - it's used for strings, not for binary data. Also, creating a file and then using fseek() is not what I am looking for either.
Maybe something constructed with streams?
EDIT: Okay, I'm almost there:
$data = fopen('data://application/binary;binary,'.$bin,'rb');
Warning: failed to open stream: rfc2397: illegal parameter
Kai:
You have almost answered yourself here. Streams are the answer. The following manual entry will be enlightening: http://us.php.net/manual/en/wrappers.data.php
It essentially allows you to pass arbitrary data to PHP's file handling functions such as fopen (and thus fseek).
Then you could do something like:
<?php
$data = fopen('data://mime/type;encoding,' . $binaryData);
fseek($data, 128);
?>
fseek on data in a variable doesn't make sense. fseek just positions the file handle to the specified offset, so the next fread call starts reading from that offset. There is no equivalent of fread for strings.
Whats wrong with substr()?
With a file you would do:
$f = fopen(...)
fseek($f, offset)
$x = fread($f, len)
with substr:
$x = substr($var, offset, len)
I'm guessing, but maybe what is being asked for is a way to access bytes in a variable by using a pointer.. (using it like an array of bytes like you could do in c - without the memory overhead of putting the data in php arrays) and being able to edit them inplace without the overhead of copying the data.
Not being able to do this is a BIG problem, but if the operating system caches disk data well using fseek on a temporary file could be a workaround.

Categories