I am using symlinks generated in PHP. They are generated when someone requests a download, and I want them to expire at the end of each day.
The problem is, what if someone starts downloading a symlink 1 minute before the end of the day and then I delete the symlink while they are downloading it...
My question is, to your knowledge will that individual downloading the symlink, right before I delete it, still be able to "download" the file? I am not worried about "resumable download" capability.. but will it make their download stop or break in some way?
Yes you can do this.
On UNIX-like systems (including Linux), you don't delete files. You delete filenames. If you delete a file that someone else currently has open, the filename will be gone but the data will remain on disk until the file is closed.
Even more so with symlinks: if you delete a symlink the file data is still there, and any process with the file open refers to it by a file handle, not by filename.
So as long as you delete the symlink after your script opens the file, the download will complete without any trouble.
As long as the webserver keeps the file open until the download is complete, which I would expect it to. this will work fine. On Linux you could even remove the hardlink to a file and the webserver would still be able to read from it as long as it's kept open
My experience says otherwise.
If there is an interruption (in the connection), or if the user chooses to pause the download, the browser will not always be able to resume (because the link is gone).
I've tried IE9x Chrome23.x Firefox17.x (on an Apache Shared Server)
What I did, was to put the symlinks in unique folders, and delete folders based on time. In my case, I kill the folder (and it's included symlink) after 30minutes (a reasonable time for my download files.)
(I can't take credit for this idea---saw it somewhere else)
I will say that very short interruptions or pauses can sometimes be recovered.
Related
I have a PHP app, which is working fine for me, both on test system and a production system.
But another user of my app wrote me, that it creates a lot of files .nfs00000* on his system and it slows down loading of the page.
My app does not create any files on the filesystem, all datas are stored into MySQL. So I was really surprised by this. But that user removed my PHP app from his website and the problem dissappeared.
I will be honest -- I know nothing about .nfs00000* files and I was not able to google out anything reasonable about them. Can someone please try to give me explanation, what they are, why they are created and if I can do anything to avoid their creation?
Thanx, Honza
Maybe this can help:
Under linux/unix, if you remove a file that a currently running process still has open, the file isn't really removed. Once the process closes the file, the OS then removes the file handle and frees up the disk blocks. This process is complicated slightly when the file that is open and removed is on an NFS mounted filesystem. Since the process that has the file open is running on one machine (such as a workstation in your office or lab) and the files are on the file server, there has to be some way for the two machines to communicate information about this file. The way NFS does this is with the .nfsNNNN files. If you try to remove one of these file, and the file is still open, it will just reappear with a different number. So, in order to remove the file completely you must kill the process that has it open.
If you want to know what process has this file open, you can use 'lsof .nfs1234'. Note, however, this will only work on the machine where the processes that has the file open is running. So, if your process is running on one machine (eg. bobac) and you run the lsof on some other burrow machine (eg. silo or prairiedog), you won't see anything.
(Source)
If your app is deleting or modifying some files it could be the cause of the problem.
So I've done a fair bit of Googling and I haven't found the answer to this yet. I have one server that needs to be live and respond quickly - I don't want my visitors/customers waiting. However, I've had enough server issues to know that I need a good backup and not the whenever I feel like it backups I've got now.
When I've tried mysqldump and tar on this server it works but makes the server is quite slow for hours. It can also lead to filling up the hard drive (if I forget to remove old backups for too long) which takes down MySQL and by extension my sites. Slowing down the server is unacceptable so my backups to date only happen about once a week on the most common slow day/time. However, I'm not fond of this solution.
So I now have another server that I have set up as my primary server's near real-time backup. I am setting up MySQL replication and a daily rsync. Then a daily mysqldump and tar can be run from this backup server without effecting response times on the primary. Great except I'd like to take it a step further.
I have a number of data files that get captured throughout the day - like log files and basic visitor tracking files (IP, referrer, user agent, nothing evil) - that don't need to be processed in real time and I've decided that it is best to have this backup server process these files rather than tie up my primary server's resources. This creates an issue for me though. I want to get these files off the primary server and onto my backup server for processing. The files are scattered across a lot of directories and that list of directories will only grow over time. So I want to avoid having run mv on several directories now and having to maintain that list going forward - some new directory will get forgotten eventually.
So I've mounted the necessary directory on the primary server under the /media folder on the backup server. If I just wanted to copy the files over I'd use scp but I want move them. So I'd like a command that does something like:
mv /media/primary/*.log /backup/.
where the mv command recursively looks into the /media/primary folder and goes down through the directory to each subfolder looking for any file with the .log extension. Then moves that file from /media/primary/ to an equivalent folder on /backup. So, for example, if I had log files in the directories /media/primary/tool1/logs and /media/primary/tool2/logs I'd like them moved to /backup/tool1/logs and /backup/tool2/logs respectively. I'd like (but don't require) that the command create the folder if it is missing. Once properly moved the files can then be processed and renamed/deleted from the local hard drive on the backup server.
I'm on Ubuntu 12.04 server for both machines. I work primarily in PHP so I've been creating a PHP script for the backup server to be executed daily. The script can execute system commands via SSH (phpseclib).
A single command for the CLI would be great but I understand that may not be possible.
Thanks
How about something like
rsync -azrR --include='*.log' -f 'hide,! */' /media/primary /backup
If you use man rsync and type 2343g it should take you to the line where it explains this filter. Basically it will hide everything that's not a directory (every file) from the pattern matching; however, since I've used the --include='*.log' flag as well it will override that and the pattern will match only .log files. You can also use the -nv flag to do a dry run and see what would happen.
I've just published a client website, it's primary purpose is distributing content from other sources, so it's regularly pulling in text, videos, images and audio from external feeds.
It also has an option for client to manually add content to be distributed.
Using PHP all this makes a fair bit of use of copy() to copy files from another server, move_uploaded_file() to copy manually uploaded files, and it also uses SimpleImage image manipulation class to make multiple copies, and crop etc..
Now to the problem: in amongst all of this, some temp files are not being deleted, it's locking up the server pretty quickly as when tmp is full it causes things like mysql errors and stops pages loading.
I've spent a lot of time googling which leads me to one thing: "temp files are deleted when the script is finished executing" - this is clearly not the case here.
Is there anything i can do to make sure any temporary files created by the scripts are deleted?
I've spoken to my server guy who suggested running a cron that will delete from it every 24 hours, i don't know whether this is a good solution but it's certainly not THE solution as i believe the files should be getting deleted? what could be a cause of stoping files from being deleted?
Regardless of anything else you come up with, the cron idea is still a good one, as you want to make sure that /tmp is getting cleaned up. You can have the cron job delete anything older than 24 hours, not delete everything every 24 hours, assuming this leaves enough space.
In terms of temp files deleting when the script is done. This only happens when tmpfile () is used to creat the temp file in the first place, as far as I know. So other files created in /tmp by other means (and there would be many other means) will not just go away because the script is done.
Files are being pushed to my server via FTP. I process them with PHP code in a Drupal module. O/S is Ubuntu and the FTP server is vsftp.
At regular intervals I will check for new files, process them with SimpleXML and move them to a "Done" folder. How do I avoid processing a partially uploaded file?
vsftp has lock_upload_files defaulted to yes. I thought of attempting to move the files first, expecting the move to fail on a currently uploading file. That doesn't seem to happen, at least on the command line. If I start uploading a large file and move, it just keeps growing in the new location. I guess the directory entry is not locked.
Should I try fopen with mode 'a' or 'r+' just to see if it succeeds before attempting to load into SimpleXML or is there a better way to do this? I guess I could just detect SimpleXML load failing but... that seems messy.
I don't have control of the sender. They won't do an upload and rename.
Thanks
Using the lock_upload_files configuration option of vsftpd leads to locking files with the fcntl() function. This places advisory lock(s) on uploaded file(s) which are in progress. Other programs don't need to consider advisory locks, and mv for example does not. Advisory locks are in general just an advice for programs that care about such locks.
You need another command line tool like lockrun which respects advisory locks.
Note: lockrun must be compiled with the WAIT_AND_LOCK(fd) macro to use the lockf() and not the flock() function in order to work with locks that are set by fcntl() under Linux. So when lockrun is compiled with using lockf() then it will cooperate with the locks set by vsftpd.
With such features (lockrun, mv, lock_upload_files) you can build a shell script or similar that moves files one by one, checking if the file is locked beforehand and holding an advisory lock on it as long as the file is moved. If the file is locked by vsftpd then lockrun can skip the call to mv so that running uploads are skipped.
If locking doesn't work, I don't know of a solution as clean/simple as you'd like. You could make an educated guess by not processing files whose last modified time (which you can get with filemtime()) is within the past x minutes.
If you want a higher degree of confidence than that, you could check and store each file's size (using filesize()) in a simple database, and every x minutes check new size against its old size. If the size hasn't changed in x minutes, you can assume nothing more is being sent.
The lsof linux command lists opened files on your system. I suggest executing it with shell_exec() from PHP and parsing the output to see what files are still being used by your FTP server.
Picking up on the previous answer, you could copy the file over and then compare the sizes of the copied file and the original file at a fixed interval.
If the sizes match, the upload is done, delete the copy, work with the file.
If the sizes do not match, copy the file again.
repeat.
Here's another idea: create a super (but hopefully not root) FTP user that can access some or all of the upload directories. Instead of your PHP code reading uploaded files right off the disk, make it connect to the local FTP server and download files. This way vsftpd handles the locking for you (assuming you leave lock_upload_files enabled). You'll only be able to download a file once vsftp releases the exclusive/write lock (once writing is complete).
You mentioned trying flock in your comment (and how it fails). It does indeed seem painful to try to match whatever locking vsftpd is doing, but dio_fcntl might be worth a shot.
I guess you've solved your problem years ago but still.
If you use some pattern to find the files you need you can ask the party uploading the file to use different name and rename the file once the upload has completed.
You should check the Hidden Stores in proftp, more info here:
http://www.proftpd.org/docs/directives/linked/config_ref_HiddenStores.html
Is it possible that the creation of a rather largge (20MB) .csv download creates a memory leak in case the user stops the download/export before the file has been saved on his machine?
If yes, how would you catch and counter this problem?
It's possible but I would imagine it would get cleared up eventually. Either way, HTTPds are generally a lot more efficient at serving files than a server side language.
If you're worried, save the file (I assume we're talking about a dynamically generated file) to the filesystem (somewhere where the server can see it) and redirect the user to that URL.
For security (albeit through obscurity), make the filename something hideous (eg a hash of their username and a description of the file) and make sure people can't get a directory listing of the dir it lives in. Might make sense to date-tag the file (eg: filename-year-month-day.ext) so you can run something automatic to clean up the files after 24 hours.
If you are generating the file on the fly and streaming it to the user you may want to look at
http://php.net/manual/en/features.connection-handling.php and perform some cleanup if the connection gets aborted or times out.