I have an app that ingests photos from SD cards. After they are copied the cards will be reformatted and put back in cameras and more photos will be stored on them.
Currently, instead of using the PHP copy() function, I am doing the following (roughly):
$card = '/Volumes/SD_Card/DCIM/EOS/';
$files = scandir($card);
$target = '/Volumes/HARD_DRIVE/photos/';
foreach($files as $k => $file) {
if( strtolower ( pathinfo($file,PATHINFO_EXTENSION) ) == 'jpg') {
$img_data = file_get_contents($file);
$orig_md5 = md5($img_data);
$success = file_put_contents($target . $file, $img_data);
unset ($img_data);
if( $success != TRUE ) {
echo "an error occurred copying $file\n"; exit;
} elseif ( $orig_md5 != md5_file($target . $file) ) {
echo "an error occurred confirming data of $file\n"; exit;
} else {
echo "$file copied successfully.\n";
unlink ($img_data);
}
}
}
I am currently doing it this way so I can compare the md5 hashes to make sure the copy is a bit-for-bit match of the original.
My questions are:
1) Would using php copy() be faster? I assume it would, because the target file doesn't have to be read into memory to check the md5 hash.
2) Does copy() do some sort of hash check as part of the function, to ensure the integrity of the copy, before returning TRUE/FALSE?
PHP's copy function would not only be faster, but does it using buffers to avoid reading all the previous file in memory, which is a problem for big files. The return boolean is only for success writing, you can rely on that, but if you want to check the hash use md5_file instead of passing the content into md5, because it is optimized in the same memory-optimized way.
However if you have just to rename the file then rename is far better, it is totally instant and reliable.
No, copy() doesn't perform any additional integrity checks, it assumes that the operating system's filesystem API is reliable.
You could use md5_file() on both the source and destination:
if (copy($source, $dest) && md5_file($dest) == md5_file($source)) {
echo "File copied successfully";
} else {
echo "Copy failed";
}
Note that your integrity che
cks do not actually check the the file was written to disk properly. Most operating systems use a unified buffer cache, so when you call md5_file() immediately after writing the file, it will get the file contents from the kernel buffers, not the disk. In fact, it's possible that the target file hasn't even been written to disk yet, it's still sitting in kernel buffers that are waiting to be flushed. PHP doesn't have a function to call sync(2), but even if it did, it would still read from the buffer cache rather than re-reading from disk.
So you're basically at the mercy of the OS and hardware, which you must assume is reliable. Applications that need more reliability tests must perform direct device I/O rather than going through the filesystem.
Related
I am trying to transfer an entire folder to FTP server using PHP.
Right now I am using this code:
function ftp_copyAll($conn_id, $src_dir, $dst_dir) {
if (is_dir($dst_dir)) {
return "<br> Dir <b> $dst_dir </b> Already exists <br> ";
} else {
$d = dir($src_dir);
ftp_mkdir($conn_id, $dst_dir);
echo "create dir <b><u> $dst_dir </u></b><br>";
while($file = $d->read()) { // do this for each file in the directory
if ($file != "." && $file != "..") { // to prevent an infinite loop
if (is_dir($src_dir."/".$file)) { // do the following if it is a directory
ftp_copyAll($conn_id, $src_dir."/".$file, $dst_dir."/".$file); // recursive part
} else {
$upload = ftp_put($conn_id, $dst_dir."/".$file, $src_dir."/".$file, FTP_BINARY); // put the files
echo "creat files::: <b><u>".$dst_dir."/".$file ." </u></b><br>";
}
}
ob_flush() ;
sleep(1);
}
$d->close();
}
return "<br><br><font size=3><b>All Copied ok </b></font>";
}
But is it possible to transfer the entire folder without iterating through the files? Because I have about 100+ files and PHP is taking lot of time for the transfer.
Is there any way to increase the speed of transfer?
No there's no other generic way supported by a common FTP server.
Except that you pack the files (zip, gzip, etc) locally, upload them and unpack remotely.
But if you have an FTP access only, you do not have a way to unpack them remotely anyway. Unless the FTP server explicitly allows that. Either by allowing you to execute an arbitrary remote shell command (typically not allowed) or using a proprietary "unpack" extension (very few servers do support that).
The FTP protocol is generally very inefficient for transferring a large amount of small files, because each file transfer has quite an overhead for opening a separate data transfer connection.
In my cache system, I want it where if a new page is requested, a check is made to see if a file exists and if it doesn't then a copy is stored on the server, If it does exist, then it must not be overwritten.
The problem I have is that I may be using functions designed to be slow.
This is part of my current implementation to save files:
if (!file_exists($filename)){$h=fopen($filename,"wb");if ($h){fwrite($h,$c);fclose($h);}}
This is part of my implementation to load files:
if (($m=#filemtime($file)) !== false){
if ($m >= filemtime("sitemodification.file")){
$outp=file_get_contents($file);
header("Content-length:".strlen($outp),true);echo $outp;flush();exit();
}
}
What I want to do is replace this with a better set of functions meant for performance and yet still achieve the same functionality. All caching files including sitemodification.file reside on a ramdisk. I added a flush before exit in hopes that content will be outputted faster.
I can't use direct memory addressing at this time because the file sizes to be stored are all different.
Is there a set of functions I can use that can execute the code I provided faster by at least a few milliseconds, especially the loading files code?
I'm trying to keep my time to first byte low.
First, prefer is_file to file_exists and use file_put_contents:
if ( !is_file($filename) ) {
file_put_contents($filename,$c);
}
Then, use the proper function for this kind of work, readfile:
if ( ($m = #filemtime($file)) !== false && $m >= filemtime('sitemodification.file')) {
header('Content-length:'.filesize($file));
readfile($file);
}
}
You should see a little improvement but keep in mind that file accesses are slow and you check three times for files access before sending any content.
I have a simple caching system as
if (file_exists($cache)) {
echo file_get_contents($cache);
// if coming here when $cache is deleting, then nothing to display
}
else {
// PHP process
}
We regularly delete outdated cache files, e.g. deleting all caches after 1 hour. Although this process is very fast, but I am thinking that a cache file can be deleted right between the if statement and file_get_contents processes.
I mean when if statement checks the existence of cache file, it exists; but when file_get_contents tries to catch it, it is no longer there (deleted by simultaneous cache deleting process).
file_get_contents locks the file to avoid the undergoing delete process during the read process. But the file can be deleted when the if statement sends the PHP process to the first condition (before start of the file_get_contents).
Is there any approach to avoid this? Is the cache deleting system different?
NOTE: I did not face any practical problem, as it is not very probable to catch this event, but logically it is possible, and should happen on heavy loads.
Luckily file_get_contents return FALSE on error, so you could quick-bake it like:
if (FALSE !== ($buffer = file_get_contents())) {
echo $buffer;
return;
}
// PHP process
or similiar. It's a bit the quick and dirty way, considering you want to place the # operator to hide any warnings about non-existent files:
if (FALSE !== ($buffer = #file_get_contents())) {
The other alternative would be to lock, however that might prevent your cache-deletion to not delete the file if you have locked it.
Then left is to stall the cache your own. That means reading the file-creation time in PHP, check that it is < 5 minutes then for the file-deletion processing (5 minutes is exemplary) and then you would know that the file is already stale and for being replaced with fresh content. Re-create the file then. Otherwise read the file in, which probably is better then with readfile instead of file_get_contents and echo.
On failure, file_get_contents returns false, so what about this:
if (($output = file_get_contents($filename)) === false){
// Do the processing.
$output = 'Generated content';
// Save cache file
file_put_contents($filename, $output);
}
echo $output;
By the way, you may want to consider using fpassthru, which is more memory-efficient, especially for larger files. Using file_get_contents on large files (> 100 MB), will probably cause problems (depending on your configuration).
<?php
$fp = #fopen($filename, 'rb');
if ($fp === false){
// Generate output
} else {
fpassthru($fp);
}
I have the following PHP code
// Check if the upload is setted
if
(
isset($_FILES['file']['name']) && !empty($_FILES['file']['name']) &&
isset($_FILES['file']['type']) && !empty($_FILES['file']['type']) &&
isset($_FILES['file']['size']) && !empty($_FILES['file']['size'])
)
{
$UploadIsSetted = true;
$UploadIsBad = false;
$UploadExtension = pathinfo($_FILES['file']['name'], PATHINFO_EXTENSION);
// Check if the upload is good
require "../xdata/php/website_config/website.php";
$RandomFoo = rand(1000999999,9999999999);
if (($_FILES["file"]["size"] < ($MaxAvatarPictureSize*1000000)))
{
if ($_FILES["file"]["error"] > 0)
{
$UploadIsBad = true;
$hrefs->item(0)->setAttribute("Error","true");
$hrefs->item(0)->setAttribute("SomethingWrong","true");
}
else
{
move_uploaded_file($_FILES["file"]["tmp_name"],"../upload/tmp/".$RandomFoo.".file");
}
}
else
{
// The file is too big
$UploadIsBad = true;
$hrefs->item(0)->setAttribute("Error","true");
$hrefs->item(0)->setAttribute("UploadTooBig","true");
}
}
else
{
$UploadIsSetted = false;
}
$ZipFile = new ZipArchive;
$ZipFile->open('../upload/tmp/'.$LastFilename.'.zip',ZIPARCHIVE::CREATE);
$ZipFile->addFile('../upload/tmp/'.$RandomFoo.'.file',$RandomFoo.".".$UploadExtension);
$ZipFile->close();
now my big concern is that user can upload anything so how can i prevent :
uploading 2GB 3GB files
floading
uploading some kind of twisted exploit that would eventually alter my server security
buffer overflow
filenames that have arbitrary code injections
i mean, how secure is this script?
i'm running windows for now, i will switch to linux
Four your other questions:
floading
That's the complex part. Let me google you some ideas:
Prevent PHP script from being flooded
Quick and easy flood protection? - use a nonce, time+tie it onto a session
Use a captcha, if it doesn't impair usability too much.
uploading some kind of twisted exploit that would eventually alter my server
security
Use a commandline virus scanner (f-prot or clamav) to scan uploaded files. You might use a naive regex scanner in PHP itself (probe for HTMLish content in image files, e.g.), but that's not a factual security feature; don't reinvent the wheel.
buffer overflow
PHP in general is not susceptible to buffer overflows.
Okay, joking. But you can't do anything in userland about it. But pushing strings around isn't much of a problem. That's pretty reliable and unexploitable in scripting languages, as long as you know how to escape what in which context.
filenames that have arbitrary code injections
At the very leat you should most always use basename() to avoid path traversal exploits. If you want to keep user-specified filenames, a regex whitelist is in order. =preg_replace('/[^\w\s.]/', '', $fn) as crude example.
Your line if (($_FILES["file"]["size"] < ($MaxAvatarPictureSize*1000000))) already limits the size of file acceptable to $MaxAvatarPictureSize megabytes. Though $MaxAvatarPictureSize doesn't appear to be set in the code you provided. My guess that should be 1 or 2 max.
Also not set is $LastFilename and probably some others too.
Also place an if($UploadIsBad === false) { /* do zipping */ } around the Zipping part to avoid zipping up files which are too large or otherwise invalid.
I need to convert some files to PDF and then attach them to an email. I'm using Pear Mail for the email side of it and that's fine (mostly--still working out some issues) but as part of this I need to create temporary files. Now I could use the tempnam() function but it sounds like it creates a file on the filesystem, which isn't what I want.
I just want a name in the temporary file system (using sys_get_temp_dir()) that won't clash with someone else running the same script of the same user invoking the script more than once.
Suggestions?
I've used uniqid() in the past to generate a unique filename, but not actually create the file.
$filename = uniqid(rand(), true) . '.pdf';
The first parameter can be anything you want, but I used rand() here to make it even a bit more random. Using a set prefix, you could further avoid collisions with other temp files in the system.
$filename = uniqid('MyApp', true) . '.pdf';
From there, you just create the file. If all else fails, put it in a while loop and keep generating it until you get one that works.
while (true) {
$filename = uniqid('MyApp', true) . '.pdf';
if (!file_exists(sys_get_temp_dir() . $filename)) break;
}
Seriously, use tempnam(). Yes, this creates the file, but this is a very intentional security measure designed to prevent another process on your system from "stealing" your filename and causing your process to overwrite files you don't want.
I.e., consider this sequence:
You generate a random name.
You check the file system to make sure it doesn't exist. If it does, repeat the previous step.
Another, evil, process creates a file with the same name as a hard link to a file Mr Evil wants you to accidentally overwrite.
You open the file, thinking you're creating the file rather than opening an existing one in write mode and you start writing to it.
You just overwrote something important.
PHP's tempnam() actually calls the system's mkstemp under the hood (that's for Linux... substitute the "best practice" function for other OSs), which goes through a process like this:
Pick a filename
Create the file with restrictive permissions, inside a directory that prevents others from removing files it doesn't own (that's what the sticky-bit does on /var/tmp and /tmp)
Confirms that the file created still has the restrictive permissions.
If any of the above fails, try again with a different name.
Returns the filename created.
Now, you can do all of those things yourself, but why, when "the proper function" does everything that's required to create secure temporary files, and that almost always involves creating an empty file for you.
Exceptions:
You're creating a temporary file in a directory that only your process can create/delete files in.
Create a randomly generated temporary directory, which only your process can create/delete files in.
Another alternative based on #Lusid answer with a failover of max execution time:
// Max exectution time of 10 seconds.
$maxExecTime = time() + 10;
$isUnique = false;
while (time() !== $maxExecTime) {
// Unique file name
$uniqueFileName = uniqid(mt_rand(), true) . '.pdf';
if (!file_exists(sys_get_temp_dir() . $uniqueFileName)){
$isUnique = true;
break;
}
}
if($isUnique){
// Save your file with your unique name
}else{
// Time limit was exceeded without finding a unique name
}
Note:
I prefer to use mt_rand instead of rand because the first function use Mersenne Twister algorithm and it's faster than the second (LCG).
More info:
http://php.net/manual/en/function.uniqid.php
http://php.net/manual/en/function.mt-rand.php
http://php.net/manual/en/function.time.php
Consider using an uuid for the filename. Consider the uniqid function.
http://php.net/uniqid
You could use part of the date and time in order to create a unique file name, that way it isn't duplicated when invoked more than once.
I recomend you to use the PHP function
http://www.php.net/tempnam
$file=tempnam('tmpdownload', 'Ergebnis_'.date(Y.m.d).'_').'.pdf';
echo $file;
/var/www/html/tmpdownload/Ergebnis_20071004_Xbn6PY.pdf
Or
http://www.php.net/tmpfile
<?php
$temp = tmpfile();
fwrite($temp, "writing to tempfile");
fseek($temp, 0);
echo fread($temp, 1024);
fclose($temp); // this removes the file
?>
Better use Unix timestamp with the user id.
$filename='file_'.time().'_'.$id.'.jepg';
My idea is to use a recursive function to see if the filename exists, and if it does, iterate to the next integer:
function check_revision($filename, $rev){
$new_filename = $filename."-".$rev.".csv";
if(file_exists($new_filename)){
$new_filename = check_revision($filename, $rev++);
}
return $new_filename;
}
$revision = 1;
$filename = "whatever";
$filename = check_revision($filename, $revision);
function gen_filename($dir) {
if (!#is_dir($dir)) {
#mkdir($dir, 0777, true);
}
$filename = uniqid('MyApp.', true).".pdf";
if (#is_file($dir."/".$filename)) {
return $this->gen_filename($dir);
}
return $filename;
}
Update 2020
hash_file('md5', $file_pathname)
This way you will prevent duplications.