I am writing an application that can stream videos. It requires the filesize of the video, so I use this code:
$filesize = sprintf("%u", filesize($file));
However, when streaming a six gig movie, it fails.
Is is possible to get a bigger interger value in PHP? I don't care if I have to use third party libraries, if it is slow, all I care about is that it can get the filesize properly.
FYI, $filesize is currently 3017575487 which is really really really really far from 6000000000, which is roughly correct.
I am running PHP on a 64 bit operating system.
Thanks for any suggestions!
The issue here is two-fold.
Problem 1
The filesize function returns a signed integer, with a maximum value of PHP_INT_MAX. On 32-bit PHP, this value is 2147483647 or about 2GB. On 64-bit PHP can you go higher, up to 9223372036854775807. Based on the comments from the PHP filesize page, I created a function that will use a fseek loop to find the size of the file, and return it as a float, which can count higher that a 32-bit unisgned integer.
function filesize_float($filename)
{
$f = fopen($filename, 'r');
$p = 0;
$b = 1073741824;
fseek($f, 0, SEEK_SET);
while($b > 1)
{
fseek($f, $b, SEEK_CUR);
if(fgetc($f) === false)
{
fseek($f, -$b, SEEK_CUR);
$b = (int)($b / 2);
}
else
{
fseek($f, -1, SEEK_CUR);
$p += $b;
}
}
while(fgetc($f) !== false)
{
++$p;
}
fclose($f);
return $p;
}
To get the file size of the file as a float using the above function, you would call it like this.
$filesize = filesize_float($file);
Problem 2
Using %u in the sprintf function will cause it to interpret the argument as an unsigned integer, thus limiting the maximum possible value to 4294967295 on 32-bit PHP, before overflowing. Therefore, if we were to do the following, it would return the wrong number.
sprintf("%u", filesize_float($file));
You could interpret the value as a float using %F, using the following, but it will result in trailing decimals.
sprintf("%F", filesize_float($file));
For example, the above will return something like 6442450944.000000, rather than 6442450944.
A workaround would be to have sprintf interpret the float as a string, and let PHP cast the float to a string.
$filesize = sprintf("%s", filesize_float($file));
This will set $filesize to the value of something like 6442450944, without trailing decimals.
The Final Solution
If you add the filesize_float function above to your code, you can simply use the following line of code to read the actual file size into the sprintf statement.
$filesize = sprintf("%s", filesize_float($file));
As per PHP docuemnation for 64 bit platforms, this seems quite reliable for getting the filesize of files > 4GB
<?php
$a = fopen($filename, 'r');
fseek($a, 0, SEEK_END);
$filesize = ftell($a);
fclose($a);
?>
Related
I am trying to get the file size of files >2GB, but PHP seams to have problems regardless of 64 or 32 bit versions. On the PHP 64bit version running on a 64bit processor on a 64 bit OS, it still returns the wrong file size using filesize() function. Before I have encountered that it returns a negative number, or nothing at all... the number is changing if the file size is changing, but is not accurate on files > 2gb... I could understand a number less than actual size, zero or even negative numbers if php is using 32bit integers, but as I read, php 64bit is supposed to support file sizes > 2gb...
I have also tried using fseek to end and ftell like:
$a = fopen("c:\big.txt", 'r');
fseek($a,0,SEEK_END);
$fs = ftell($a);
fclose($a);
echo $fs;
But that just gives 0...
Why not just use the filesize function?
echo filesize('c:\big.txt');
This function may not work properly on 32-bit systems, but should give you the correct size on a 64-bit system:
Note: Because PHP's integer type is signed and many platforms use 32bit integers, some filesystem functions may return unexpected results for files which are larger than 2GB.
The manual actually says "This may be broken on your platform. Oh, too bad."
This comment may help: http://us3.php.net/manual/en/function.filesize.php#113457 . It something similar to what you tried, but it seems to assume SEEK_END doesn't work either.
function RealFileSize($fp)
{
$pos = 0;
$size = 1073741824;
fseek($fp, 0, SEEK_SET);
while ($size > 1)
{
fseek($fp, $size, SEEK_CUR);
if (fgetc($fp) === false)
{
fseek($fp, -$size, SEEK_CUR);
$size = (int)($size / 2);
}
else
{
fseek($fp, -1, SEEK_CUR);
$pos += $size;
}
}
while (fgetc($fp) !== false) $pos++;
return $pos;
}
Disclaimer: I didn't try this.
I have some text files that are very large - 100MB each that contain a single-line string (just 1 line). I want to extract the last xx bytes / characters from each of them. I know how to do this by reading them in a string and then searching by strpos() or substr() but that would require a large chunk of the RAM which isn't desirable for such a small action.
Is there any other way I can just extract, say, the last 50 bytes / characters of the text file in PHP before executing the search?
Thank you!
You can use fseek:
$fp = fopen('somefile.txt', 'r');
fseek($fp, -50, SEEK_END); // It needs to be negative
$data = fgets($fp, 50);
You can do this with file_get_contents by playing with the fourth parameter offset.
PHP 7.1.0 onward:
In PHP 7.1.0 the fourth parameter offset can be negative.
// only negative seek if it "lands" inside the file or false will be returned
if (filesize($filename) > 50) {
$data = file_get_contents($filename, false, null, -50);
}
else {
$data = file_get_contents($filename);
}
Pre PHP 7.1.0:
$fsz = filesize($filename);
// only negative seek if it "lands" inside the file or false will be returned
if ($fsz > 50) {
$data = file_get_contents($filename, false, null, $fsz - 50);
}
else {
$data = file_get_contents($filename);
}
I have a binary file that is all 8 bit integers. I have tried to use the php unpack() functions but I cant get any of the arguments to work for 1 byte integers. I have tried to combine the data with a dummy byte so that I can use the 'n'/'v' arguments. I am working with a windows machine to do this. Ultimately I would like a function to return an array of integers based on a string of 8 bit binary integers. The code I have tried is below -
$dat_handle = "intergers.dat";
$dat_file = fopen($dat_handle, "rb");
$dat_data = fread($dat_file, 1);
$dummy = decbin(0);
$combined = $dummy.$dat_data;
$result = unpack("n", $combined);
What your looking for is the char datatype. Now there are two version of this, signed (lowercase c) and unsigned (uppercase C). Just use the one that's correct for your data.
<?php
$byte = unpack('c', $byte);
?>
Also, if the data file is just a bunch of bytes and nothing else, and you know it's length, you can do this. (If the length is 16 signed chars in a row.)
<?php
$bytes = unpack('c16', $byte);
?>
If you don't know how many bytes will be in the file, but you know there is only going to be bytes you can use the asterisk code to read until EOF.
<?php
$bytes = unpack('c*', $byte);
?>
The following should do what you want (ord):
$dat_handle = "intergers.dat";
$dat_file = fopen($dat_handle, "rb");
$dat_data = ord(fread($dat_file, 1));
What you are trying to do is retrieve the integer value of the single byte. Because you are reading in single bytes at a time, you will always have exactly one valid ASCII character. ord returns the binary value of that one character.
I want to check the file's size of local drives on windows OS.But the native PHP function filesize() only work when the file size less than 2GB. The file which greater than 2GB will return the wrong number.So,is there other way to get the file size which greater than 2GB?
Thank you very much!!
You can always use the system's file size method.
For Windows:
Windows command for file size only?
#echo off
echo %~z1
For Linux
stat -c %s filenam
You would run these through the exec php command.
PHP function to get the file size of a local file with insignificant memory usage:
function get_file_size ($file) {
$fp = #fopen($file, "r");
#fseek($fp,0,SEEK_END);
$filesize = #ftell($fp);
fclose($fp);
return $filesize;
}
In first line of code, $file is opened in read-only mode and attached to the $fp handle.
In second line, the pointer is moved with fseek() to the end of $file.
Lastly, ftell() returns the byte position of the pointer in $file, which is now the end of it.
The fopen() function is binary-safe and it's apropiate for use even with very large files.
The above code is also very fast.
this function works for any size:
function fsize($file) {
// filesize will only return the lower 32 bits of
// the file's size! Make it unsigned.
$fmod = filesize($file);
if ($fmod < 0) $fmod += 2.0 * (PHP_INT_MAX + 1);
// find the upper 32 bits
$i = 0;
$myfile = fopen($file, "r");
// feof has undefined behaviour for big files.
// after we hit the eof with fseek,
// fread may not be able to detect the eof,
// but it also can't read bytes, so use it as an
// indicator.
while (strlen(fread($myfile, 1)) === 1) {
fseek($myfile, PHP_INT_MAX, SEEK_CUR);
$i++;
}
fclose($myfile);
// $i is a multiplier for PHP_INT_MAX byte blocks.
// return to the last multiple of 4, as filesize has modulo of 4 GB (lower 32 bits)
if ($i % 2 == 1) $i--;
// add the lower 32 bit to our PHP_INT_MAX multiplier
return ((float)($i) * (PHP_INT_MAX + 1)) + $fmod;
}
note: this function maybe litte slow for files > 2gb
(taken from php comments)
If you're running a Linux server, use the system command.
$last_line = system('ls');
Is an example of how it is used. If you replace 'ls' with:
du <filename>
then it will return an integer of the file size in the variable $last_line. For example:
472 myProgram.exe
means it's 472 KB. You can use regular expressions to obtain just the number. I haven't used the du command that much, so you'd want to play around with it and have a look at what the output is for files > 2gb.
http://php.net/manual/en/function.system.php
<?php
$files = `find / -type f -size +2097152`;
?>
This function returns the size for files > 2GB and is quite fast.
function file_get_size($file) {
//open file
$fh = fopen($file, "r");
//declare some variables
$size = "0";
$char = "";
//set file pointer to 0; I'm a little bit paranoid, you can remove this
fseek($fh, 0, SEEK_SET);
//set multiplicator to zero
$count = 0;
while (true) {
//jump 1 MB forward in file
fseek($fh, 1048576, SEEK_CUR);
//check if we actually left the file
if (($char = fgetc($fh)) !== false) {
//if not, go on
$count ++;
} else {
//else jump back where we were before leaving and exit loop
fseek($fh, -1048576, SEEK_CUR);
break;
}
}
//we could make $count jumps, so the file is at least $count * 1.000001 MB large
//1048577 because we jump 1 MB and fgetc goes 1 B forward too
$size = bcmul("1048577", $count);
//now count the last few bytes; they're always less than 1048576 so it's quite fast
$fine = 0;
while(false !== ($char = fgetc($fh))) {
$fine ++;
}
//and add them
$size = bcadd($size, $fine);
fclose($fh);
return $size;
}
To riff on joshhendo's answer, if you're on a Unix-like OS (Linux, OSX, macOS, etc) you can cheat a little using ls:
$fileSize = trim(shell_exec("ls -nl " . escapeshellarg($fullPathToFile) . " | awk '{print $5}'"));
trim() is there to remove the carriage return at the end. What's left is a string containing the full size of the file on disk, regardless of size or stat cache status, with no human formatting such as commas.
Just be careful where the data in $fullPathToFile comes from...when making system calls you don't want to trust user-supplied data. The escapeshellarg will probably protect you, but better safe than sorry.
Does anyone know of one? I need to test some upload/download scripts and need some really large files generated. I was going to integrate the test utility with my debug script.
To start you could try something like this:
function generate_file($file_name, $size_in_bytes)
{
$data = str_repeat(rand(0,9), $size_in_bytes);
file_put_contents($file_name, $data); //writes $data in a file
}
This creates file filled up with a random digit (0-9).
generate_file() from "Marco Demaio" is not memory friendly so I created file_rand().
function file_rand($filename, $filesize) {
if ($h = fopen($filename, 'w')) {
if ($filesize > 1024) {
for ($i = 0; $i < floor($filesize / 1024); $i++) {
fwrite($h, bin2hex(openssl_random_pseudo_bytes(511)) . PHP_EOL);
}
$filesize = $filesize - (1024 * $i);
}
$mod = $filesize % 2;
fwrite($h, bin2hex(openssl_random_pseudo_bytes(($filesize - $mod) / 2)));
if ($mod) {
fwrite($h, substr(uniqid(), 0, 1));
}
fclose($h);
umask(0000);
chmod($filename, 0644);
}
}
As you can see linebreaks are added every 1024 bytes to avoid problems with functions that are limited to 1024-9999 bytes. e.g. fgets() with <= PHP 4.3. And it makes it easier to open the file with an text editor having the same issue with super long lines.
Do you really need so much variation in filesize that you need a PHP script? I'd just create test files of varying sizes via the command line and use them in my unit tests. Unless the filesize itself is likely to cause a bug, it would seem you're over-engineering here...
To create a file in Windows;
fsutil file createnew d:\filepath\filename.txt 1048576
in Linux;
dd if=/dev/zero of=filepath/filename.txt bs=10000000 count=1
if is the file source (in this case nothing), of is the output file, bs is the final filesize, count defines how many blocks you want to copy.
generate_file() from #Marco Demaio caused this below when generating 4GB file.
Warning: str_repeat(): Result is too big, maximum 2147483647 allowed
in /home/xxx/test_suite/handler.php on line 38
I found below function from php.net and it's working like charm.
I have tested it upto
17.6 TB (see update below)
in less than 3 seconds.
function CreatFileDummy($file_name,$size = 90294967296 ) {
// 32bits 4 294 967 296 bytes MAX Size
$f = fopen('dummy/'.$file_name, 'wb');
if($size >= 1000000000) {
$z = ($size / 1000000000);
if (is_float($z)) {
$z = round($z,0);
fseek($f, ( $size - ($z * 1000000000) -1 ), SEEK_END);
fwrite($f, "\0");
}
while(--$z > -1) {
fseek($f, 999999999, SEEK_END);
fwrite($f, "\0");
}
}
else {
fseek($f, $size - 1, SEEK_END);
fwrite($f, "\0");
}
fclose($f);
return true;
}
Update:
I was trying to hit 120TB, 1200 TB and more but filesize was limited to 17.6 TB. After some googling I found that it is max_volume_size for ReiserFS file system which was on my server.
May be PHP can handle 1200TB also in just few seconds. :)
Why not have a script that streams out random data? The script can take parameters for file size, type etc.
This way you can simulate many scenarios, for example bandwidth throttling, premature file end etc. etc.
Does the file really need to be random? If so, just read from /dev/urandom on a Linux system:
dd if=/dev/urandom of=yourfile bs=4096 count=1024 # for a 4MB file.
If it doesn't really need to be random, just find some files you have lying around that are the appropriate size, or (alternatively) use tar and make some tarballs of various sizes.
There's no reason this needs to be done in a PHP script: ordinary shell tools are perfectly sufficient to generate the files you need.
If you want really random data you might want to try this:
$data = '';
for ($byteSize-- >= 0) {
$data .= chr(rand(0,255));
}
Might take a while, though, if you want large file sizes (as with any random data).
I would suggest using a library like Faker to generate test data.
I took the answer of mgutt and shortened it a bit. Also, his answer has a little bug which I wanted to avoid.
function createRandomFile(string $filename, int $filesize): void
{
$h = fopen($filename, 'w');
if (!$h) return;
for ($i = 0; $i < intdiv($filesize, 1024); $i++) {
fwrite($h, bin2hex(random_bytes(511)).PHP_EOL);
}
fwrite($h, substr(bin2hex(random_bytes(512)), 0, $filesize % 1024));
fclose($h);
chmod($filename, 0644);
}
Note: This works only with PHP >= 7. If you really want to run it on lower versions, use openssl_random_pseudo_bytes instead of random_bytes and floor($filesize / 1024) instead of intdiv($filesize, 1024).