I'm checking file-size using a script that report content-length in BYTES that matches exactly what I see on my Mac, BUT if I convert bytes to KBs:
function formatBytes($bytes, $precision = 2) {
$units = array('B', 'KB', 'MB', 'GB', 'TB');
$bytes = max($bytes, 0);
$pow = floor(($bytes ? log($bytes) : 0) / log(1024));
$pow = min($pow, count($units) - 1);
$bytes /= (1 << (10 * $pow));
return round($bytes, $precision) . ' ' . $units[$pow];
}
... the size in KB is always different than what I see on my Mac.
So example:
Windows 8 TV Ad Tune.m4r
Bytes (Mac): 4,27,840 bytes
KBs (Mac) : 428KB
Bytes (Script): 427840
KBs (Script): 417.81 KB
I wonder if its the script or something else causing this difference?
Thanks!
It looks like the Mac is using the 1000 convention, i.e. that 1kB is 1000 bytes. Your conversion is using 1kB = 1024 bytes. Both are technically correct, however most programmers will use 1kB = 1024. Mac uses 1kB = 1000, and Windows uses 1kB = 1024.
Hard drive manufacturers will use the 1000 convention so they can use bigger numbers when advertising, which is why my 1 terabyte hard drive I have installed on my machine is listed as only having 931 gigabytes in Windows.
My recommendation when checking file sizes in code is to always use bytes, as this will avoid this discrepancy and also be more portable.
Maybe you are comparing size on disk (on Mac) and your script size conversion. The size on disk depend of your hard drive partition bloc size.
If the true size is 417.81 KBs and your bloc size is 200 KBs (this is not a real example) so your size on disk will be 600 KBs.
Size on disk is not the real size of you file, but the size occupied by your file on the disk.
Hope this may help.
It seems like the difference is caused by the convertion. You do 1 KB = 1024 B, and Mac seems to do 1KB = 1000 B.
Related
I have a file with 3,200,000 lines of csv data (with 450 columns). Total file size is 6 GB.
I read the file like this:
$data = file('csv.out');
Without fail, it only reads 897,000 lines. I confirmed with 'print_r', and echo sizeof($data). I increased my "memory_limit" to a ridiculous value like 80 GB but didn't make a difference.
Now, it DID read in my other large file, same number of lines (3,200,000) but only a few columns so total file size 1.1 GB. So it appears to be a total file size issue. FYI, 897,000 lines in the $data array is around 1.68 GB.
Update: I increased the second (longer) file to 2.1 GB (over 5 million lines) and it reads it in fine, yet truncates the other file at 1.68 GB. So does not appear to be a size issue. If I continue to increase the size of the second file to 2.2 GB, instead of truncating it and continuing the program (like it does for the first file), it dies and core dumps.
Update: I verified my system is 64 bit by printing integer and float numbers:
<?php
$large_number = 2147483647;
var_dump($large_number); // int(2147483647)
$large_number = 2147483648;
var_dump($large_number); // float(2147483648)
$million = 1000000;
$large_number = 50000 * $million;
var_dump($large_number); // float(50000000000)
$large_number = 9223372036854775807;
var_dump($large_number); //
int(9223372036854775807)
$large_number = 9223372036854775808;
var_dump($large_number); //
float(9.2233720368548E+18)
$million = 1000000;
$large_number = 50000000000000 * $million;
var_dump($large_number); // float(5.0E+19)
print "PHP_INT_MAX: " . PHP_INT_MAX . "\n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)\n";
?>
The output from this script is:
int(2147483647)
int(2147483648)
int(50000000000)
int(9223372036854775807)
float(9.2233720368548E+18)
float(5.0E+19)
PHP_INT_MAX: 9223372036854775807
PHP_INT_SIZE: 8 bytes (64 bits)
So since it's 64 bit, and memory limit is set really high, why is PHP not reading files > 2.15 GB?
Some things that come to mind:
If you're using a 32 bits PHP, you cannot read files that are larger than 2GB.
If reading the file takes too long, there could be time-outs.
If the file is really huge, then reading it all into memory is going to be problematic. It's usually better to read blocks of data and process that, unless you need random access to all parts of the file.
Another approach (i've used that in the past), is to chop the large file into smaller, more manageable ones (should work if it's a straightforwards log file for example)
I fixed it. All I had to do was change the way I read the files. Why...I do not know.
Old code that only reads 2.15 GB out of 6.0 GB:
$data = file('csv.out');
New code that reads the full 6.0 GB:
$data = array();
$i=1;
$handle = fopen('csv.out');
if ($handle) {
while (($data[$i] = fgets($handle)) !== false){
// process the line read
$i++;
}
Feel free to shed some light on why. There must be some limitation when using
$var=file();
Interestingly, 2.15 GB is close to the 32 bit limit I read about.
I have a php script which reads in a text file and does a count of all the lines in the file which match a specified regular expression. The script has worked well up until now as it segfaulted on the fread of a file over 2GB.
Actually before the segfault, I initially received the PHP Fatal Error: PHP Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 2223941409 bytes).
To fix that I added this line to my script: ini_set('memory_limit', '4G');
That fixes the memory size exhausted error but I get the segfault on fread now.
Here's a condensed working version of the script which will exhibit the error:
#!/usr/bin/php
<?php
ini_set('memory_limit', '4G');
$file = $argv[1];
$fh = fopen($file, 'r');
$fsize = filesize($file);
print("SIZE: ".$fsize."\n" );
$myData = fread($fh, $fsize);
print("Got passed fread!\n");
fclose($fh);
preg_match_all( '/Z\t/', $myData, $sArray );
$scount = count($sArray,COUNT_RECURSIVE);
print("COUNT: ".$scount."\n");
?>
Sample output:
$ runtest.php testfile.txt
SIZE: 2223941408
Segmentation fault (core dumped)
Other info:
OS: CentOS release 6.7 (Final) x86_64
PHP 5.3.3 (cli) (built: Jul 9 2015 17:39:00) 64-bit
You're probably using a 32-bit PHP distribution. Under such architecture a PHP process cannot allocate more than 2 GB of RAM. In practice the upper limit is closer to 1GB than 2GB—the interpreter crashes way before getting to the 2 GB limit. Additionally, integer variables cannot be greater than PHP_INT_MAX which, in 32 builds, is as small as 2,147,483,647 (232-1).
This highlights two problems in your code:
$fsize = filesize($file);
... will not work if the file size is greater that PHP_INT_MAX.
Because PHP's integer type is signed and many platforms use 32bit integers, some filesystem functions may return unexpected results for files which are larger than 2GB.
$myData = fread($fh, $fsize);
... will crash for large files because you're loading the complete file contents in memory and then doing additional processing that will probably eat even more memory.
You'd better redesign your algorithm and read the file in small chunks (the task where fread() excels at). Counting the occurrences of a two-character substring should only need a few KB of RAM.
Here's a possible approach that assumes single byte encoding (as your code does):
// Ridiculously small value for illustration purposes, set to something bigger for better performance
define('CHUNK_SIZE', 4);
$fsize = $scount = 0;
$fh = fopen($file, 'r');
$possible_pending_match = false;
while (!feof($fh)) {
$chunk = fread($fh, CHUNK_SIZE);
$fsize += strlen($chunk);
$scount += substr_count($chunk, "Z\t");
if ($possible_pending_match && $chunk[0]==="\t") {
$scount++;
}
$possible_pending_match = substr($chunk, -1)==='Z';
}
print("SIZE: ".$fsize."\n" );
print("COUNT: ".$scount."\n");
print("MEMORY: ".memory_get_peak_usage(true)." bytes\n");
You'd need to add 1 to $scount to get the same result as your code, which counts one extra item for no evident reason—it feels like a bug but I don't know the specs.
Hi 2GB means that there is some internal 32-bit limitation in PHP. Are you running 32-bit PHP?
There is an alternative solution. You can do it with a very small memory overhead using a shell command called by PHP. The memory used would be no more than a couple of MB as grep and wc only load portions of the file into memory.
$lines = shell_exec("grep 'Z\t' $file | wc --lines");
grep: command to search files using regular expression
wc: command that returns the number of words/lines/chars
I have this code for checking max image size allowed: The one below is for 4 MB
elseif (round($_FILES['image_upload_file']["size"] / 1024) > 4096) {
$output['error']= "You can upload file size up to 4 MB";
I don't understand this calculation and approaches from the internet is making it more confusing
I wanted the size for
8 MB
PHP $_FILES["image_upload_file"]["size"] variable return the value of file size in BYTES. So, for check the file size you have two option,
Convert your checking limit into BYTES, and check with the $_FILES["image_upload_file"]["size"] value. As, 5MB= 5000000KB, 6MB= 6000000KB, 8MB= 8000000KB and so on. (Values are simplified)
Convert the $_FILES["image_upload_file"]["size"] value in to MB and check.
For me check the value in BYTES. It is more easier and you no need to calculate any thing.
In your example, the values are calculate into KB and then checking. As, $_FILES['image_upload_file']["size"] / 1024 return value in KB and 4MB= 4096 KB. So, your internet code also right.
If you want to use your internet code for 8MB then change the 4096 to 8192. It will work same.
Hope, now you understand the code.
The image is less than 1MB but the size is roughly 5500x3600. I am trying to resize the image down too something less than 500x500. My code is pretty simple.
$image = imagecreatefromjpeg("upload/1.jpg");
$width = imagesx($image);
$height = imagesy($image);
$pWidth = 500;
$pHeight = 334;
$image_p = imagecreatetruecolor($pWidth, $pHeight);
setTransparency($image,$image_p,$ext);
imagecopyresampled($image_p, $image, 0, 0, 0, 0, $pWidth, $pHeight, $width, $height);
I found out that to process this image, the imagecreatefromjpeg uses 100M using memory_get_usage.
Is there a better way to do imagecreatefromjpeg? is there a workaround or a different function that uses less memory?
I did ask my server admins to increase the memory but I doubt they will increase it to 100M or more. I am considering limiting the dimensions of an image a user can upload but not before exhausting all my options as users will most likely upload images they took.
Btw, the following is the image i used which uses 100M of memory
What #dev-null-dweller says is the correct answer:
With 5500x3600 you will need at least 5500*3600*4 bytes in memory =~ 80MB. For large pictures Imagick extensions might have better performance than GD
There is no way to "improve" on that because that's the amount of memory needed to process the image. JPEG is a compressed format so its file size is irrelevant, it's the actual dimensions that count. The only way to deal with such an image inside GD is increasing the memory limit.
You may be able to do better using a library/command line client like ImageMagick if you have access to that - when you run ImageMagick from the command line, its memory usage won't count towards the memory limit. Whether you can do this, you'd need to find out from your web host or server admin.
Another idea that comes to mind is using an image resizing API that you send the image to. That would take the load completely off your server. This question has some pointers: https://stackoverflow.com/questions/5277571/is-there-a-cdn-which-provides-on-demand-image-resizing-cropping-sharpening-et
ImageMagick, on the command-line at least, is able to use a feature of libjpeg called "Shrink on Load" to avoid unnecessarily loading the whole image if you are only planning on scaling it down.
If I resize your image above up from its current 3412x2275 to the size you are actually dealing with, 5500x2600 like this
convert guys.jpg -resize 5500x2600 guys.jpg
I can now do some tests... first, a simple resize
/usr/bin/time -l convert guys.jpg -resize 500x334 small.jpg
0.85 real 0.79 user 0.05 sys
178245632 maximum resident set size <--- 178 MB
0 average shared memory size
0 average unshared data size
0 average unshared stack size
44048 page reclaims
0 page faults
0 swaps
you can see it uses a peak of 178 MB on my Mac.
If I now use the "Shrink on Load" feature I mentioned:
/usr/bin/time -l convert -define jpeg:size=500x334 guys.jpg -resize 500x334 small.jpg
0.06 real 0.04 user 0.00 sys
8450048 maximum resident set size <--- Only 8 MB
0 average shared memory size
0 average unshared data size
0 average unshared stack size
2381 page reclaims
33 page faults
0 swaps
you can see it only takes 8MB now. It's faster too!
I've been using vipsthumbnail for years. I use it to resize 700MP images (enormous 16,384 x 42,731 = 2.6BG if loaded into php) to a viewable size. This takes about 10 seconds to generate a small preview that's 3000px tall.
https://www.libvips.org/API/current/Using-vipsthumbnail.html
By specifying one dimension, it keeps the original aspect ratio and limits the x or y, which ever is greater
$exec_command = sprintf('vipsthumbnail --size 3000 "%s" -o "%s"', $source, $destination);
exec( $exec_command, $output, $return_var );
if($return_var != 0) {
// Handle Error Here
}
I was still using php to resize smaller images, but my system recently generated a secondary image that was 15,011 x 15,011 (1.075GB uncompressed). My PHP settings allow for 1GB of ram, and it was crashing!. I increased PHP's memory limit over time to deal with these images. I finally converted this function to also use vipsthumbnail. These smaller images only take about 100ms each to generate. Should have done this a long time ago.
$exec_command = sprintf('vipsthumbnail --size 150 "%s" -o "%s"', $src, $dst);
exec( $exec_command, $output, $return_var );
When decompressing with gzinflate, I found that - under certain
circumstances - the following code results in out-of-memory errors. Tested with PHP 5.3.20 on an 32bit Linux (Amazon Linux AMI on EC2).
$memoryLimit = Misc::bytesFromShorthand(ini_get('memory_limit')); // 256MB
$memoryUsage = memory_get_usage(); // 2MB in actual test case
$remaining = $memoryLimit - $memoryUsage;
$factor = 0.9;
$maxUncompressedSize = max(1, floor($factor * $remaining) - 1000);
$uncompressedData = gzinflate($compressedData, $maxUncompressedSize);
Although, I calculated the size of $maxUncompressedSize conservatively, hoping to give gzinflate sufficient memory, I still get:
Fatal error: Allowed memory size of 268435456 bytes exhausted (tried to allocate 266143484 bytes) in foo.php on line 123
When changing the value of $factor from 0.9 to 0.4, then the error goes away, in this case. In other cases 0.9 is OK.
I wonder:
Is the reason for the error really that gzinflate needs more than double the space of uncompressed data? Is there possibly some other reason? Is $remaining really the remaining memory at disposal to the application?
It is indeed possible. IMHO, the issue lies with memory_get_usage(true).
Using true should give a higher memory usage value, because should take everything into account.