Only partial content from a remote file is being read (PHP) - php

I need some help with this code. I am pretty sure the code is correct but I could be wrong. The problem is that the getSourceCode() isn't pulling the entire contents of the URL. It only returns a third of the data, for example: the $size variable would return 26301 and the returned data size would only be 8900. I have changed php.ini to have max file size of 100M so I don't think that is problem.
private function getSourceCode($url){
$fp = fopen($url, "r");
$size = strlen(file_get_contents($url));;
$data = fread($fp, $size);
fclose($fp);
return $data;
}

Ok, well, if you're using file_get_contents you shouldn't be using fread too.
private function getSourceCode($url){
return file_get_contents($url);
}

Short answer is that byte != 1 character in a string. You can use $data= file_get_contents($url) to get the entire file as a string.
Long answer fread is looking for the number of bytes but strlen is returning the number of characters and a character can be larger than 1 byte so you can end up not getting the entire file. Alternatively you could use filesize() to get the file length in bytes instead of strlen().

Related

What the correct way to get size of binary data in php?

I've read a part of file and now want to make sure the part is the right size. How can I do it in php?
$part = fread($file, 1024);
return some_function($part) == 1024;
I've read the examples, but a I doubt to use strlen in cause of Null-terminated string, that might be inside the binary data in $part. In this way strlen returns size from start of part and until first null-byte is presented in data.
As stated in the PHP manual, strlen returns the number of bytes in the string, not the character length.
In PHP, a null byte in a string does NOT count as the end of the string, and any null bytes are included in the length of the string.
So strlen can be used for binary data, no matter if the data is from a file or some other source.
According to PHP documentation of fread() function you may use the construction using filesize() as shown in the first example:
<?php
// get contents of a file into a string
$filename = "/usr/local/something.txt";
$handle = fopen($filename, "r");
$contents = fread($handle, filesize($filename));
fclose($handle);
?>
Update: to find a size of the file can be used function stat() without opening or fstat() on opened file.

Split binary raw file without converting to array php

I am using following code to convert a binary file into an array.
$handle = fopen($file, "r");
$contents = fread($handle,filesize($file));
$array = unpack("s*", $contents);
I want to be able to read it in chunks and send multiple separate requests to process it in parallel.
For example, I want to grab first 16000 bytes, then next 16000 etc.
So I would end up with multiple sets of data to process in parallel
$content1 = first 16000 bytes
$content2 = bytes from 16000 to 32000
$content3 = bytes from 32000 to 48000
I think this is pretty simple I am just not sure how it can be done.
A simple way would be to use substr() to split out chunks until it runs out of something to process...
$start = 0;
$size = 16000;
$contents = file_get_contents($file);
while ($chunk = substr($contents, $start, $size)) {
// Process
echo ">".$chunk."<".PHP_EOL;
$start +=$size;
}
Another way would be to convert it to array to split the string into chunks, you can use str_split()
$contents = file_get_contents($file);
$chunks = str_split($contents, 16000);
file_get_contents() does all the open file/read/close in one go, the str_split() then splits it up into an array of the size chunk you want it (16000 in this case).
Not sure how much performance gain you will get by this, but that is something you will have to test for yourself.
(Also check the notes on the manual page in case you are using multi-byte encoded files).
you should use the multi thread in php
see
http://php.net/manual/en/intro.pthreads.php
and
Does PHP have threading?
Given that the OP has accepted Nigel's answer, then the question was actually how to read arbitrary chunks from a file. That can be done with a slight variation of the original code. Instead of reading the complete file contents:
fread($handle, filesize($file));
^^^^^^^^^^^^^^^
… you pass your chunk size as second argument:
$contents = fread($handle, 16000);
Prior to that, you move to the desired location:
// E.g. Read 4th chunk:
fseek($handle, 3 * 16000);
Full stuff:
$handle = fopen($file, "r");
fseek($handle, 3 * 16000);
$contents = fread($handle, 16000);
Add some error checking and you're done. These are really old functions very close to the C implementation so they should be pretty fast and require very little memory.

byte position: file_get_contents vs fopen

I need some data from a specific byte in range in a binary file.
(concatenated jpegs, don't ask...)
So I have a offset and length data from an external API.
(I would guess that those are byte positions)
What works is the following:
$fileData = file_get_contents($binaryFile);
$imageData = substr($fileData, $offset, $length);
But i would rather not load the full file into memory and therefor tried fopen:
$handle = fopen($binaryFile, 'rb');
fseek($handle, $offset);
$imageData = fgets($handle, $length);
But that doesn't work. The data chunk is no valid image data.
So i assume i got the position wrong with fopen.
Any ideas on how the positions differ in substr vs fopen?
You wrote
The data chunk is no valid image data
"image data" - but in your code you call fgets() to read that data. That's wrong, as image is binary data, not a text file, so you don't want it read it by lines (docs):
fgets — Gets line from file pointer
This means fgets() would stop reading from file once it finds what it considers line end marker, which usually means stopping earlier and reading less than $length as there's pretty low chance such byte is not in the binary sequence.
So fgets() wrong method to use and this is the main issue. Instead you should pick less smart fread() (which does not know about lines and stuff, and just reads what you told). Finally you should fclose() the handle when you done. And naturally you should always check for errors, starting from fopen():
if ($handle = fopen($binaryFile, 'rb')) {
if (fseek($handle, $offset) === 0) {
$imageData = fread($handle, $length);
if ($imageData === false) {
// error handling - failed to read the data
}
} else {
// error handling - seek failed
}
fclose($handle);
} else {
// error handling - can't open file
}
So always use right tool for the task, and if you are unsure what given method/function does, there's always not-that-bad documentation to peek.
You can use file_get_contents, too. See this simple line:
imageData = file_get_contents($binaryFile, null, null, 0, $length);
And here the documentation of file_get_contents.

Reading part of a remote page in PHP

I have a page at URL http://site.com/params. I want to read only the first n characters from this remote page. Functions like readfile(), file_get_contents() and curl seem to download whole of the pages. Can't figure out how to do this in PHP.
Please help...!
file_get_contents() may be what you're looking for if the maxlen parameter is utilized. By default, this function:
//Reads entire file into a string
$wholePage= file_get_contents('http://www.example.com/');
However, the maxlen parameter is the maximum length of data read.
// Read 14 characters starting from the 1st character
$section = file_get_contents('http://www.example.com/', NULL, NULL, 0, 14);
This implies that the entire file is not read, and only maxlen characters are read, if maxlen is defined.
You can try to do through sockets link
$handle = fopen("http://domain.com/params", "r");
$buffer = fgets($handle, 100);
echo $buffer;
fclose($handle);

fgets() and fread() - What is the difference?

I understand the differences between fgets() and fgetss() but I don't get the difference between fgets() and fread(), can someone please clarify this subject? Which one is faster? Thanks!
fgets reads a line -- i.e. it will stop at a newline.
fread reads raw data -- it will stop after a specified (or default) number of bytes, independently of any newline that might or might not be present.
Speed is not a reason to use one over the other, as those two functions just don't do the same thing :
If you want to read a line, from a text file, then use fgets
If you want to read some data (not necessarily a line) from a file, then use fread.
fread() for binary data and fread has a limit on how many chars you can read
$source_file = fopen( $filename, "r" ) or die("Couldn't open $filename");
while (!feof($source_file)) {
$buffer = fread($source_file, 5);
var_dump($buffer); //return string with length 5 chars!
}
Number 5 is length bytes have been read .
The function fgets reads a single line from a text file. It is reading so long until the end of the current line (or the end of the file) is reached. Therefore, if you would like to read one line from a text file, you should use fgets.
The function fread not only reads until the end of the line but to the end of the file [e.g. fread($handle)] or as many bytes as specified as a parameter [e.g. fread($handle, 1024)]. So, if you would like to read a complete file, no matter whether it is a text file with all containing lines or arbitrary raw data from a file, you should use fread.
Both the functions are used to read data from files
fgets($filename, $bytes)
fgets usually reads $bytes-1 amount of data and stops at a newline or an EOF(end-of-file) whichever comes first. If the bytes are not specified, then the default value is 1024 bytes.
fread($filename, $bytes)
fread reads exactly $bytes amount of data and stops only at EOF.
The accepted answer is correct, but there is one more case for fread to stop reading. fread has a chunk limit of 8192 bytes. I discovered this when I was getting different results from fread($stream, 8300) and fget($stream, 8300).
From fread docs:
if the stream is read buffered and it does not represent a plain file, at most one read of up to a number of bytes equal to the chunk size (usually 8192) is made; depending on the previously buffered data, the size of the returned data may be larger than the chunk size.

Categories