I have a problem when I create a new directory with PHP when uploading a file.
The directory is created, but if another instance of the same script runs at the same time the directory exist check doesn't work correctly (gives PHP warning).
Someone told me it's a race condition but I still have this issue after adding some random sleep time.
usleep(mt_rand(1, 50));
if(!is_dir($dir)){
mkdir($dir);
}
usleep(mt_rand(1, 50));
can anyone help?
Does anybody know a safe way to upload a file in multiple parts, with 3-4 parts being uploaded at the same time? Currently I'm moving the uploaded parts in a temporary directory (is_dir fails on the temporary dir if more parts arrive at the same time), then when the number of files from that dir equal the number of parts the parts get combined. But it fails many times, sometimes is_dir gives warning, sometimes the parts get combined twice...
mkdir() returns true on success, and false on failure (dir already exists). So you can just use it as a part of your conditional check.
if (!is_dir($dir) && mkdir($dir)) {
// good to go
}
You want to use clearstatcache() at the top of the page testing for the directory, so that the system cache of current directory structure is fresh. read more here
The upload javascript splits the file in multiple parts and sends 4 ajax requests to php, so the php receives 4 parts simultaneously. when all parts are received the php should combine them to reproduce the file on the server
So why split them up, send the Ajax as a single request and then allow PHP to handle each blob file part in its own scripting. This will sidestep the problem. PHP - with the correct setting up - can comfortably handle large files in chunk blocks.
Related
I have 1000 plus txt files with file name as usernames. Now i'm reading it by using loop. here is my code
for($i=0; $i<1240; $i++){
$node=$users_array[$i];
$read_file="Uploads/".$node."/".$node.".txt";
if (file_exists($read_file)) {
if(filesize($read_file) > 0){
$myfile = fopen($read_file, "r");
$file_str =fread($myfile,filesize($read_file));
fclose($myfile);
}
}
}
when loop runs, it takes too much time and server gets timed out.
I don't know why it is taking that much time because files have not much data in it. read all text from a txt file should be fast. am i right?
Well, you are doing read operations on HDD/SSD which are not as fast as memory, so you should expect a high running time depending on how big the text files are. You can try the following:
if you are running the script from browser, I recommend running it from command line, this way you will not get a web server time out and the script will manage to finish if there is no time execution limit set on php, case in which maybe you should increase it
on your script above you can set "filesize($read_file)" into a variable so that you do not execute it twice, it might improve running the script
if you still can't finish the job consider running it in batches of 100 or 500
keep an eye on memory usage, maybe that is why the script dies
if you need the content of the file as a string you can try "file_get_contents" and maybe skip "filesize" check all together
It sounds like your problem is having 1000+ files in a single directory. On a traditional Unix file system, finding a single file by name requires scanning through the directory entries one by one. If you have a list of files and try to read all of them, it'll require traversing about 500000 directory entries, and it will be slow. It's an O(n^2) algorithm and it'll only get worse as you add files.
Newer file systems have options to enable more efficient directory access (for example https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Hash_Tree_Directories) but if you can't/don't want to change file system options you'll have to split your files into directories.
For example, you could take the first two letters of the user name and use that as the directory. That's not great because you'll get an uneven distribution, it would be better to use a hash, but then it'll be difficult to find entries by hand.
Alternatively you could iterate the directory entries (with opendir and readdir) and check if the file names match your users, and leave dealing with the problems the huge directory creates for later.
Alternatively, look into using a database for your storage layer.
My php script needs to write text file, (actualy write php code), in order to include it on the next run. So I have all my data (arrays) from the previus run, with out the use of database.
The files are saved on a network drive (SAN).
The script runs 3-4 times in a sec (approx every 300 milisecs) and the size of file is 1.2 MB (approx)
Some times the file are corrupted, with missing END OF FILE (EOF)
Some times I get junks of text (repetead part of string)
eg.
$write_string = '$arrMK[1][1]=1.80;$arrMK[1][2]=1.82;$arrMK[1][3]=2.14;$arrMK[1][4]=1.80;$arrMK[2][1]=2.43;$arrMK[2][2]=1.13;$arrMK[2][3]=1.33;$arrMK[2][4]=4.11;... and so on...';
and the content of file is like (bold is repeted part):
$arrMK[1][<strong>1]=1.80;$arrMK[1][2]</strong>=1.82;$arrMK[1][3]=2.14;$arrMK[1][4]=1.80;$arrMK[2][1]=2.43;$arrMK[2][2]=1.13;$ar<strong>1]=1.80;$arrMK[1][2]</strong>rMK[2][3]=1.33;$arrMK[2][4]=4.11;
If the file size is less than 1 MB, everything is OK.
I have try using fopen/fwrite, file_put_contents, with the same results.
Any suggestion about that?
Elias
May be the PHP script cannot complete its write operation before your user requires it.
Try to call fflush() and fclose() on your script to be sure that your file is closed correctly.
Remember to pay attention to PHP timeout/tuning issues that can prevent PHP to save the file correctly. Double remember to remove old files too, or your sysadmin will kill you sooner or later...
BTW, IMHO it's a very ugly way of doing.
You can use sessions to save user states and environments (check for session_start)
I have a folder that contains 1000+ sub folders. In each subfolder there are 1-800 images, plus a thumbs folder with the same number of smaller versions. In total there are around 18000 photos and 18000 thumbs
I have created a script to run occasionally to check if everything is present (all paths, folders and photo names are stored in database).
Using either file_exist() or is_file(), plus clearstatcache(), I run a loop to my db records to check if everything is ok.
To check if the script actually works, I have included a check field in my table structure:
photo_present, SET(Y,N)
Each time I run the script each photo validated will have the photo_present flag set to Y.
After only a few records, 300-800, the script gives an internal server error 500.
I checked my table and I know that the script has run for a while since I see the photo_present field is set to Y.
My question is how to optimize it so either file_exist() or is_file() will continue to work until all the files have been checked?
How are you running the script? If it's by hitting the web page, it could be that your server caps the runtime of a script at 30 seconds or so and is killing PHP after that time, resulting in an Internal Server Error. The same could be true if it's being run via cron, but that's less likely.
Are you sure this isn't a script timeout? Check http://php.net/manual/en/function.set-time-limit.php
I myself consider glob being a good alternative when browsing folders and files, you might be able to figure out something.
<?php
//get all image files
$files = glob("{*.jpg,*.JPG,*.gif,*.GIF,*.png,*.PNG,*.cr2,*.CR2,*.DNG,*.dng}", GLOB_BRACE);
//print each file name
foreach($files as $file)
{
//Do your db validation here.
//echo "$file";
}
?>
I'm making a real simple "backend" (PHP5) for two flash/air-applications. One of them will upload a photo, the backend will save it to a folder, and the second app will poll the backend for new photo's and show them.
I don't got any access to a database, so the backend has to be pure PHP5 and nothing more. That's why I chose to save the images to a folder (with a timestamp in their names) and use readdir() to get them back.
This all works like a charm. Nevertheless, I would really like to make sure the backend only returns photo's that are completely uploaded, preventing the second app to try to load an unfinished image. Are there any methods/tricks that I can use to validate a file?
You could check the filesize a couple hundred milliseconds apart and see if it changes:
$first = filesize($file);
// wait 100ms
usleep(100000);
$second = filesize($file);
if($first == $second) {
// file is no longer being actively uploaded
}
The usual trick for atomic filesystem operations is to write into a temporary file that is not matched by the reader (e.g. XXX.jpg.tmp) and once it's completely uploaded, rename it to it's target name. Renames on the same volume are atomic, so there is no point where the file is either uncomplete or unavailable.
A really easy and common way to do so would be to create a trigger file based on the files name, so that you get something like
123.jpg
123.rdy
or
123.jpg
123.jpg.rdy
You create that file (just an empty stub) as soon as the upload is complete. The application that grabs files to load only cares about files with a trigger file and then processes those. Alternatively, you could also save the uploaded file as ie. 123.bsy or 123.jpg.bsy while it is still being uploaded and then rename it to the finale name 123.jpg after the upload is done. Since renames in the same directory are usually really cheap operations in term of processing time, the chances of running in a race condition should be pretty low. (This might or might now depend on the OS used, though ...)
If you need to keep the files in that place, you could, of course, use a database where you add a record for each file, as the upload is complete. The other app could then just provide files with a matching database record.
After writing this all down I figgered it out myself. What I did was adding the exact amount of bytes in the filename as well and validate that while outputting the list of images. The .tmp/.bsy-sollution is nice also, but I read it a bit to late :)
Upside to my solution is that no more renaming is required after the upload is done. Thanks everybody for your fast answers!
When a user uploads a file, randomly it gets replaced by another user's upload, I've finally tracked down the issue to PHP and the tmp file name being reused. Is there a way to fix this? Is there a way to make better random names? It seems to degrade over time, as in the random file name seed gets weaker? This is on PHP 5.2.8 and FreeBSD 7.0
Here is a log showing how the same tmp file name gets used and is overwritten by another upload: http://pastebin.com/m65790440
Any help is GREATLY appreciated. I've been trying to fix this for over 4 months and has gotten worse over time. Thank you.
EDIT: Keep in mind that this is not a PHP code issue, this is happening before it reaches any PHP code, the file received via $_FILES['name']['tmp_name'] is incorrect when it is received and its been traced back that it is being overwritten with someone else's upload before it reaches the upload processing script
After chasing the relevant code down to _gettemp in FreeBSD 7's libc implementation, I'm unclear regarding how the contents of the file tmp_name could be invalid. (To trace it, you might download a copy of PHP 5.2.8 and read in main/rfc1867.c - line 1018 calls in main/php_open_temporary_file.c, the function starting on line 227, which does it's main work in the function starting on line 97, which, however, is essentially just a wrapper for mkstemp on your system, which is found in the FreeBSD libc implementation on line 66 (linked), which uses _gettemp (same as above) to actually generate the random filename. However the manpage for mkstemp mentions in the BUGS section that the arc4random() function is not reentrant. It might be a possibility that 2 simultaneous requests are entering the critical code section and returning the same tmp_name - I know too little about how Apache works with either mod_php or php-cgi to comment there (though using FastCGI/php-cgi might work - I can't comment successfully on this at this time).
However, aiming for the simpliest solution, if you are not quite experiencing the file tmp_name itself being invalid, but colliding instead with other uploaded files (for example, if using the filename portion of tmp_name as your only source of uniqueness in the stored filename), you could be facing collisions due to the birthday paradox. In another question you mention having some 5,000,000 files to move, and in still another question you mention recieving 30-40k uploads a day. This strikes me as a prime situation for a birthday paradox collision. The mktemp man page mentions that (if using six 'Xs' as PHP does) there are 56,800,235,584 possible filenames (62 ** 6, or 62 ** n where n = number of 'Xs', etc). However, given that you have more than some 5 million files, the probability of a collision is approximately 100% (another heuristic suggests you'll have already experienced some order of 220 collisions already, if ((files*(files-1))/2)/(62**6) means anything, where files = 5,000,000). If this is the problem you are facing (probable, if not adding further entropy to the generated uploaded filename), you might try something like move_uploaded_file($file['tmp_name'], UPLOADS.sha1(mt_rand().$file['tmp_name']).strrchr($file['name'], '.')) - the idea being to add more randomness to the random filename, preventing collisions. An alternative could be to add two more 'Xs' to line 134 of main/php_open_temporary_file.c and recompile.
It sounds like something is seriously wrong with either your PHP installation or whichever system call PHP is internally using to generate the random file names (most likely tempnam).
For everyone else: PHP handles uploaded files internally before the user code is ever processed. These names are stored in $_FILES['file']['tmp_name'] (where 'file' is the (quoted) name of the file input element on the form).
Is PHP running under apache, as mod_php?
You may try to create a per-process temporary upload directory whose name contains your php getmypid(), then ini_set your PHP process' upload_tmp_dir to
that directory. This will not work if a new php process is spawned for every request.
Move your files to a user dir after they have been uploaded. Those temp files should be removed.
I would recommend using a GUID generator for the filename seeing that you are getting so many.