Single download file upload mechanism - php

Sorry for the relatively vague title, I couldn't think of anything else.
So in short what I'm asking for is what would be an optimal (least resource intensive) way of creating a simple file upload service that would delete the file after the first download. Can be PHP or anything else (as long as it's relatively easy to implement). It's basically for streaming screenshots for a single user.
The first thing that comes to mind is simply doing a regular upload and then doing a readfile() followed by an unlink(). sendfile is obviously out of the question since then I don't have a way of executing code after the file has been transferred. But readfile() doesn't seem like such a good idea.
I wouldn't mind installing a separate daemon or something along those lines.

Pseudo-code:
Get the temporary path to the file from the $_FILES['tmp_name']
Move it to a non-guessable server location (as in uploads/file{random_numbers}.extension
Store the information in a DB
Upon visiting yoursite.tld/view.php?id={unique id that's <> file{random_numbers}:
SELECT path FROM TABLE WHERE token = 'UNIQUE ID ABOVE' AND downloaded = 0
1.1 IF there is a row in the DB, we get the path and then we set downloaded = 1 in the DB
1.2 ELSE we don't do anything further
INCLUDE the file on the page with a non-regular header so that it gets downloaded
Run a cron-job every x minutes to clear out files that aren't needed anymore - cron won't be able to delete a file that's currently being transmitted to the user (as far as I know, as it would still be "in use").
Hopefully you'll be able to follow my logic and implement it as planned.

If you wouldn't mind installing a separate daemon you can install cron. You might set it up so that it would remove outdated files every n minutes.

Related

A little php script (logical help needed)

I am a .net developer and devolving an application for a company. For that I need to write a little php script to meet my needs.
My app need to check some information which randomly change almost every second from internet. I am thinking to make a php script so that I can give app the needed information. My idea is to use a simple text file instead of a mysql database (I am free to use a mysql db also). And then make two php pages. For example writer.php and reader.php
work of writer.php is very simple. This file will save the submitted data to the text file I want to use as db.
reader.php will read the text file and then show as simple text and on every read it will also empty the text file.This file will be read by my app.
work done.
Now the logical questions.
reader.php will be read by 40 clients in the same time. If there is
any conflicts?
If this method will be fast than mysql db?
If this method is more resource consuming than a mysql db?
You will have to lock the file for I/O for the time of writting (PHP flock() function). This may slow down things a bit when there will be more clients at same time, as when file will be locked by one user, everyone else would have to wait. The other problem that may appear when writting alot o data is that writting queue may become infinite when there would be many write requests.
MySQL seems to be better idea, as it caches both write and read requests, and it is implemented to avoid simultanous access conflicts.

max_execution_time Alternative

So here's the lowdown:
The client i'm developing for is on HostGator, which has limited their max_execution_time to 30 seconds and it cannot be overridden (I've tried and confirmed it cannot be via their support and wiki)
What I'm have the code doing is take an uploaded file and...
loop though the xml
get all feed download links within the file
download each xml file
individually loop though each xml array of each file and insert the information of each item into the database based on where they are from (i.e. the filename)
Now is there any way I can queue this somehow or split the workload into multiple files possibly? I know the code works flawlessly and checks to see if each item exists before inserting it but I'm stuck getting around the execution_limit.
Any suggestions are appreciated, let me know if you have any questions!
The timelimit is in effect only when executing PHP scripts through a webserver, if you execute the script from CLI or as a background process, it should work fine.
Note that executing an external script is somewhat dangerous if you are not careful enough, but it's a valid option.
Check the following resources:
Process Control Extensions
And specifically:
pcntl-exec
pcntl-fork
Did you know you can trick the max_execution_time by registering a shutdown handler? Within that code you can run for another 30 seconds ;-)
Okay, now for something more useful.
You can add a small queue table in your database to keep track of where you are in case the script dies mid-way.
After getting all the download links, you add those to the table
Then you download one file and process it; when you're done, you check them off (delete from) from the queue
Upon each run you check if there's still work left in the queue
For this to work you need to request that URL a few times; perhaps use JavaScript to keep reloading until the work is done?
I am in such a situation. My approach is similar to Jack's
accept that execution time limit will simply be there
design the application to cope with sudden exit (look into register_shutdown_function)
identify all time-demanding parts of the process
continuously save progress of the process
modify your components so that they are able to start from arbitrary point, e.g. a position in a XML file or continue downloading your to-be-fetched list of XML links
For the task I made two modules, Import for the actual processing; TaskManagement for dealing with these tasks.
For invoking TaskManager I use CRON, now this depends on what webhosting offers you, if it's enough. There's also a WebCron.
Jack's JavaScript method's advantage is that it only adds requests if needed. If there are no tasks to be executed, the script runtime will be very short and perhaps overstated*, but still. The downsides are it requires user to wait the whole time, not to close the tab/browser, JS support etc.
*) Likely much less demanding than 1 click of 1 user in such moment
Then of course look into performance improvements, caching, skipping what's not needed/hasn't changed etc.

How to save and maintain local copy of javascript file using PHP (with parallelism in mind)?

What I'm after is a PHP-script which will provide a local copy of a javascript file stored on a different domain, e.g. JS-files used for tracking purposes, such as Google Analytics or, in my case, Wordpress Stats.
The intention is to add it as follows in my own :
mysite.com/somepath/wpstats.js.php?w=201145
(where "w=year-week_number" is used in the same way as by wordpress.com. Their cache expiry for there original stats JS-file is one year - hence the JS-file will never be changed more often than once a week)
In my wpstats.js.php I want to download the correct file from wordpress.com when w=xx changes, otherwise I want to return it from a locally kept copy.
I can implement it with one internal variable called "w_current", so that when a new visitor enters the site the second week w != w_current and it will trigger a new fetch. The problem is: how do I prevent race-conditions/parallelism problems that might occur when two website visitors simultaneously loads the website the first time the second week [1]?
[1] both apache-processes evaluate the w != w_current and starts two downloads of the (potentially) new version of the wpstats.js-file and both try to write it to the local copy (e.g. own_wpstats.js) which wpstats.js.php (in the normal case where a local copy already exists) will include:
if ($_GET["w"] == w_current) require("own_wpstats.js");
The main reason for doing this (performance possibly being another) is due to the fact that Wordpress.com Stats injects (via DOM) an additional JS-tracking file from a third party, quantserve.com. In my locally kept "WP JS file" I will make sure this quantserve.com thing is excluded, but this is a separate programming challenge than the question I try to ask here. A search for "quantserve.com tracking cookie" gives many discomfortable results.
I wouldn't worry TOO much about such a race condition. "Worst case" is you download the file twice.
But IF you are going to be doing ANY tracking of these files in a DB, then preventing the double download would be simple. Immediately before you start the download, check DB if file needs a refresh, log the refresh to the DB, and THEN do the download. Give the file a unique primary-key (site_url + file_name) and the second check before update will fail.

php script that deletes itself after completion

There's a problem that I'm currently investigating: after a coworker left, one night some files that he created, basicly all his work on a completed project that the boss hasn't payed him for, got deleted. From what I know all access credentials have been changed.
Is it possible to do this by setting up a file to do the deletion task and then delete the file in question? Or something similar that would change the code after the task has been done? Is this untraceable? (i'm thinking he could have cleverly disguised the request as a normal request, and i have skimmed through the code base and through the raw access logs and found nothing).
It's impossible to tell whether this is what actually happened or not, but setting up a mechanism that deletes files is trivial.
This works for me:
<? // index.php
unlink("index.php");
it would be a piece of cake to set up a script that, if given a certain GET variable for example, would delete itself and a number of other files.
Except for the server access logs, I'm not aware of a way to trace this - however, depending on your OS and file system, an undelete utility may be able to recover the files.
It has already been said in the comments how to prevent this - using centralized source control, and backups. (And of course paying your developers - although this kind of stuff can happen to anyone.)
Is is possible to do this by setting up a file to do the deletion task
and then delete the file in question?
Yes it is. He could have left an innoculous looking php file on the server which when accessed over the web later, would give him shell access. Getting this file to self delete when he is done is possible.
Create a php file with the following in it:
<?php
if ($_GET['vanish'] == 'y') {
echo "You wouldn't find me the next time you look!";
#unlink(preg_replace('!\(\d+\)\s.*!', '', __FILE__));
} else {
echo "I can self destruct ... generally";
}
?>
Put on your server and navigate to it. Then navigate again with a "vanish=y" argument and see what happens

How to associate files to records that have not been saved yet

I am working on a web application that allows you to write notes. I want to add some functionality to allow a user to add attachments to notes, but I have a little trouble figuring out the logic behind it.
I want it to work a bit like webmail or phpBB forum posts. You start a new message. There's a file upload input element there with an "add" button next to it. When you add a file, it is uploaded and you can continue to write your message. When you finally click "submit" it creates the note and associates the uploaded files with it. Here's some ASCII art:
Subject: ______________
Message: ______________
______________
______________
Attachments: some_file.txt
resume.odt
______________ [Browse][Add]
[Save]
But how can I associate the uploads with the note when it is still being written? It hasn't been saved yet. It has no ID. Normally I would add a database table that associates uploaded files with note IDs, but that doesn't work for a note that hasn't been saved yet. What I worry about is, when a user starts to write a new note, adds a file to it and then changes his mind, never saving the note (e.g. closing the browser). I don't want those uploaded files lingering around.
Thanks in advance!
Alright, so he opens that URL where he sees the layout you managed to draw (kudos for ASCII art). Before rendering the page, you have created an empty note record which will provide you with ID. He uploads files (that go somewhere on the hard drive and you save just the paths in another database table, remember that file-note relationship is many-to-one), voila!, you already have an ID to associate with them!
As of orphan files with abandoned note.. Well, there's a few options to handle that, but all of them end up to having another chapter in your web app, called 'maintenance'. I would keep last note ID in the user profile and wouldn't let him create a new note without saving or discarding previous one.
Way back in the days before AJAX, if you wanted to attach a file to a form you would simply upload the file at time of form submission. Now we have all sorts of clever ways to transfer files, or at least two or three popular ways, some using Flash or jQuery. But you have spotted the major maintenance problem with these techniques: whenever a user uploads an image in parallel to filling out a form, there is no guarantee that the form will ever be submitted. You have to come up with some kind of timeout mechanism for those uploaded files, and that generally means starting up some kind of housekeeping process. You get to add another moving part to your architecture, and you'll need to monitor it periodically to make sure things don't get out of control.
Depending on the expected frequency and size of traffic, you need to define some parameters for when an orphaned file can be deleted. If you expect less than 100 uploads a day, then daily purges should be fine. In each purge, you want to delete any file that has been orphaned for a certain amount of time. To do that you'll need a mechanism for identifying the old files, and while you could compare the files to your table records this requires extra database resources that might impact the speed of your site. That is a hint that you should be moving those files somewhere else once they have been processed, making it easier to identify the potential orphans.
Now as you monitor this process you can decide if it needs to run more or less often. If you made a poor design decision it will be more painful to do that housekeeping, especially if it runs hourly or more often.
Good luck!
There are many ways of doing it. Here's the one I prefer at first glance:
Generate the "mail ID" when the user clicks on "new mail", then you already have it when he wants to upload something.
A way more experimental idea:
Generate a hash of the file (p.e. MD5 with a timestamp or something similar) when he uploads it. When he finally submits the mail, add the hash to the mail and use this as a key to the previous upload.
A little (!) late to the conversation, but another option might be to upload to a tmp dir - then when the message/post/form is submitted and saved you could move the relevant files out of the tmp dir into a permanent dir.
Then have a cron job running daily (eg) deleting files from the tmp folder that are older than a day.

Categories