Limiting Parallel/Simultaneous Downloads - How to know if download was cancelled? - php

I have a simple file upload service, written out in PHP, which also includes a script that controls download speeds by sending limited-sized packets when a user requests a download from this site.
I want to implement a system to limit parallel/simultaneous downloads to 1 per user if they are not premium members. In the download script above, I can use a MySQL database to store a record that has: (1) the user ID; (2) the file ID; (3) when the download was initiated; and (4) when the last packet was sent, which is updated each time this is done (if DL speed is limited to 150 kB/sec, then after every 150 kB, this record is updated, etc.).
However, thus far, the database record will only be deleted once the download has successfully completed — at the end of the script, after the download has been fully served, the record is deleted from the table:
insert DB record;
while (download is being served) {
serve packet of data;
update DB record with current date/time;
}
// Download is now complete
delete DB record;
How will I be able to detect when a download has been cancelled? Would I just have to have a Cron job (or something similar) detect if an existing download record is more than X minutes/hours old? Or is there something else I can do that I'm missing?
I hope I've explained this well enough. I don't think posting specific code is required; I'm interested more in the logistics of how/whether this can be done. If specific is needed, I will gladly provide it.
NOTE: I know how to detect if a file was successfully downloaded; I need to know how to detect if it was cancelled, aborted, or otherwise stopped (and not just paused). This will be useful in stopping parallel downloads, as well as preventing a situation where the user cancels Download #1 and tries to initiate Download #2, only to find that the site claims he is still downloading file #1.
EDIT: You can find my download script here: http://codetidy.com/1319/ — it already supports multi-part downloads and download resuming.

<?php
class DownloadObserver
{
protected $file;
public function __construct($file) {
$this->file = $file;
}
public function send() {
// -> note in DB you've started
readfile($this->file);
}
public function __destruct() {
// download is done, either completed or aborted
$aborted = connection_aborted();
// -> note in DB
}
}
$dl = new DownloadObserver("/tmp/whatever");
$dl->send();
should work just fine. No need for a shutdown_function or any funky self-built connection observation.

You will want to check out the following functions: connection_status(), connection_aborted() and ignore_user_abort() (see the connection handling section of the PHP manual for more info).
Although I can't guarantee the reliability (it's been a while since I've played around with it), with the right combination you should be able to accomplish what you want. There are a few caveats when working with these though, the big one being that if something goes wrong you could end up with stranded PHP scripts running on the server requiring you to kill Apache to stop them.
The following should give you a good idea of how to do it (adapted from the PHP code examples and a couple of the comments):
<?php
//Set PHP not to cancel execution if the connection is aborted
//and drop the time limit to allow for big file downloads
ignore_user_abort(true);
set_time_limit(0);
while(true){
//See the ignore_user_abort() docs re having to send data
echo chr(0);
//Make sure the data gets flushed properly or the connection check won't work
flush();
ob_flush();
//Check then connection status and exit loop if aborted
if(connection_status() != CONNECTION_NORMAL || connection_aborted()) break;
//Just to provide some spacing in this example
sleep(1);
}
file_put_contents("abort.txt", "aborted\n", FILE_APPEND);
//Never hurts to ensure that the script halts execution
die();
Obviously for how you would be using it the data being sent would simply be the download data chunk (just make sure you flush the buffer properly to ensure the data is actually sent). As far as I'm aware, there is no way of making a distinction between pausing and aborting/stopping. Pause/resume functionality (and multi-part downloading - i.e. how download managers accelerate downloads) relies on the "Range" header, basically requesting byte x to byte y of the file. So if you want to allow resumable downloads you'll have to deal with that too.

There is no HTTP "cancel" signal that is sent by default. So, it looks like you will need to decide on a timeout, the length of time a connection can sit without sending/receiving another packet. If you are sending rather small packets (as I presume you are) keep the timeout short for best effect.
In your while condition you will need to check the age of the last timestamp update, if its too old, stop sending the file.

Related

PHP connection_aborted() and/or register_shutdown_function work intermittently

I've a PHP script that outputs a file to the user (as a download) which is also used to record what the user is downloading.
Basic structure is this:
set_time_limit(0);
ignore_user_abort(true);
register_shutdown_function('shutdown_fn'); //as a fail safe (i think)
//some other code here
//do some mysql queries
while(!feof($fh) && !connection_aborted()) {
echo fread(....);
ob_flush;
ob_end_flush;
sleep(1);
}
fclose($fh);
//do some more mysql queries here and set a boolean to track if it was done successfully
function shutdown_fn () {
//check boolean to see if queries failed, if so, do them here
}
The above code seems to work 99% of the time just fine. However, there are some instances when the second set of queries don't execute at all (the other 1%). I have no idea why. The files being sent to the user range from very small to very large (and in both cases they work just fine so i cant see how a large file (or small file) would be breaking the code).
Any thoughts? I hope i have explained myself well enough
I need to see some more code like the opening/reading of the file to further help you, but if you really want to be sure and not depend on the one shutdown_fn() function then why not call it yourself as well on the end of the script? Reset the boolean in the shutdown_fn() so whenever the actual shutdown is triggered than your sql queries are not ran twice.

Tracking changes in text file with PHP

I have a PHP script that has to reload a page on the client (server push) when something specific happens on the server. So I have to listen for changes. My idea is to have a text file that contains the number of page loads for the current page. So I would like to monitor the file and as soon as it is modified, to use server push in order to update the content on the client. The question is how to track the file for changes in PHP?
You could do something like:
<?php
while(true){
$file = stat('/file');
if($file['mtime'] == time()){
//... Do Something Here ..//
}
sleep(1);
}
This will continuously look for a change in the modified time of a file every second. If you don't constrain it you could kill your disk IO and may need to adjust your ulimit.
This will check your file for a change:
<?php
$current_contents = "";
function checkForChange($filepath) {
global $current_contents;
$new_contents = file_get_contents($filepath);
if (strcmp($new_contents, $current_contents) {
$current_contents = $new_contents;
return true;
}
return false;
}
But that will not solve your problem. The php file that serves the client finishes executing before the rendered html is sent to the client. That client will need to call back to some php file to check for a change... and since that is also a http request, the file will finish executing and forget anything in memory.
In order to properly solve this, you'll probably have to back off the idea of checking a file. Either the server needs to know when and how to contact currently connected clients, or those clients need to poll a lightweight service at a regular interval.
This is sort of hacky but what about creating a cron job that sucks in the page, stores it in a scope or table, and then simply compares it every 30 seconds?

Background process for importing data in PHP

Details
When a user first logs into my app I need to import all of their store's products from an API, this can be anywhere from 10 products to 11,000. So I'm thinking I need to to inform the user that we'll import their products and email them when we're finished.
Questions
What would be the best way to go about importing this data without requiring the user to stay on the page?
Should I go down the pcntl_fork route?
Would system style background tasks be better?
AFAIK there is no way to pcntl_fork() from a web server process, you can only do it from the command line. You can, however, start a child process using exec() (or similar) that will continue to run after you have terminated.
I don't know how "correct" this is, but I would do something like this:
upload.php - Get the user to upload their products list in whatever format you want. I shall assume you know how to do this and won't include any code - if you want an example let me know.
store.php - the upload form submits to this file:
// Make sure a file was uploaded and you have the user's ID
if (!isset($_FILES['file'],$_POST['userId']))
exit('No file uploaded or bad user ID');
// Make sure the upload was successful
if ($_FILES['file']['error'])
exit('File uploaded with error code '.$_FILES['file']['error']);
// Generate a temp name and store the file for processing
$tmpname = microtime(TRUE).'.tmp';
$tmppath = '/tmp/'; // ...or wherever you want to temporarily store the file
if (!move_uploaded_file($_FILES['file']['tmp_name'],$tmppath.$tmpname))
exit('Could not store file for processing');
// Start an import process, then display a message to the user
// The ' > /dev/null &' is required here - it let's you start the process asynchronously
exec("php import.php \"{$_POST['userId']}\" \"$tmppath$tmpname\" > /dev/null &");
// On Windows you can do this to start an asynchronous process instead:
//$WshShell = new COM("WScript.Shell");
//$oExec = $WshShell->Run("php import.php \"{$_POST['userId']}\" \"$tmppath$tmpname\"", 0, false);
exit("I'm importing your data - I'll email you when I've done it");
import.php - handles the import and sends an email
// Make sure the required command line arguments were passed and make sense
if (!isset($argv[1],$argv[2]) || !file_exists($argv[2])) {
// handle improper calls here
}
// Connect to DB here and get user details based on the username (passed in $argv[1])
// Do the import (pseudocode-ish)
$wasSuccessful = parse_import_data($argv[2]);
if ($wasSuccessful) {
// send the user an email
} else {
// handle import errors here
}
// Delete the file
unlink($argv[2]);
The main issue with this approach is that if lots of people upload lists to be imported at the same time, you would risk stressing your system resources with multiple simultaneous versions of import.php running.
For this reason, it is possibly better to schedule a cron job to import the lists one at a time as suggested by Aaron Bruce - but which approach is best for you will depend on your precise requirements.
I think the "standard" way to do this in PHP would be to run a cron every five minutes or so that checks a queue of pending imports.
So your user logs in, part of the log in process is to add them to your "pending_import" table (or however you choose to store the import queue). Then the next time the cron fires it will take care of the current contents of your queue.

PHP Singleton class for all requests

I have a simple problem. I use php as server part and have an html output. My site shows a status about an other server. So the flow is:
Browser user goes on www.example.com/status
Browser contacts www.example.com/status
PHP Server receives request and ask for stauts on www.statusserver.com/status
PHP Receives the data, transforms it in readable HTML output and send it back to the client
Browser user can see the status.
Now, I've created a singleton class in php which accesses the statusserver only 8 seconds. So it updates the status all 8 seconds. If a user requests for update inbetween, the server returns the locally (on www.example.com) stored status.
That's nice isn't it? But then I did an easy test and started 5 browser windows to see if it works. Here it comes, the php server created a singleton class for each request. So now 5 Clients requesting all 8 seconds the status on the statusserver. this means I have every 8 second 5 calls to the status server instead of one!
Isn't there a possibility to provide only one instance to all users within an apache server? That would be solve the problem in case 1000 users are connecting to www.example.com/status....
thx for any hints
=============================
EDIT:
I already use a caching on harddrive:
public function getFile($filename)
{
$diff = (time()-filemtime($filename));
//echo "diff:$diff<br/>";
if($diff>8){
//echo 'grösser 8<br/>';
self::updateFile($filename);
}
if (is_readable($filename)) {
try {
$returnValue = #ImageCreateFromPNG($filename);
if($returnValue == ''){
sleep(1);
return self::getFile($filename);
}else{
return $returnValue;
}
} catch (Exception $e){
sleep(1);
return self::getFile($filename);
}
} else {
sleep(1);
return self::getFile($filename);
}
}
this is the call in the singleton. I call for a file and save it on harddrive. but all the request call it at same time and start requesting the status server.
I think the only solution would be a standalone application which does an update every 8 seconds on the file... All request should just read the file and nomore able to update it.
This standalone could be a perl script or something similar...
Php requests are handled by different processes and each of them have a different state, there isn't any resident process like in other web development framework. You should handle that behavior directly in your class using for instance some caching.
The method which query the server status should have this logic
public function getStatus() {
if (!$status = $cache->load()) {
// cache miss
$status = // do your query here
$cache->save($status); // store the result in cache
}
return $status;
}
In this way only one request of X will fetch the real status. The X value depends on your cache configuration.
Some cache library you can use:
APC
Memcached
Zend_Cache which is just a wrapper for actual caching engines
Or you can store the result in plain text file and on every request check for the m_time of the file itself and rewrite it if more than xx seconds are passed.
Update
Your code is pretty strange, why all those sleep calls? Why a try/catch block when ImageCreateFromPNG does not throw?
You're asking a different question, since php is not an application server and cannot store state across processes your approach is correct. I suggest you to use APC (uses shared memory so it would be at least 10x faster than reading a file) to share status across different processes. With this approach your code could become
public function getFile($filename)
{
$latest_update = apc_fetch('latest_update');
if (false == $latest_update) {
// cache expired or first request
apc_store('latest_update', time(), 8); // 8 is the ttl in seconds
// fetch file here and save on local storage
self::updateFile($filename);
}
// here you can process the file
return $your_processed_file;
}
With this approach the code in the if part will be executed from two different processes only if a process is blocked just after the if line, which should not happen because is almost an atomic operation.
Furthermore if you want to ensure that you should use something like semaphores to handle that, but it would be an oversized solution for this kind of requirement.
Finally imho 8 seconds is a small interval, I'd use something bigger, at least 30 seconds, but this depends from your requirements.
As far as I know it is not possible in PHP. However, you surely can serialize and cache the object instance.
Check out http://php.net/manual/en/language.oop5.serialization.php

PHP Async Execution

Scenario is as follows:
Call to a specified URL including the Id of a known SearchDefinition should create a new Search record in a db and return the new Search.Id.
Before returning the Id, I need to spawn a new process / start async execution of a PHP file which takes in the new Search.Id and does the searching.
The UI then polls a 3rd PHP script to get status of the search (2nd script keeps updating search record in the Db).
This gives me a problem around spawning the 2nd PHP script in an async manner.
I'm going to be running this on a 3rd party server so have little control over permissions. As such, I'd prefer to avoid a cron job/similar polling for new Search records (and I don't really like polling if I can avoid it). I'm not a great fan of having to use a web server for work which is not web-related but to avoid permissions issues it may be required.
This seems to leave me 2 options:
Calling the 1st script returns the Id and closes the connection but continues executing and actually does the search (ie stick script 2 at the end of script 1 but close response at the append point)
Launch a second PHP script in an asynchronous manner.
I'm not sure how either of the above could be accomplished. The first still feels nasty.
If it's necessary to use CURL or similar to fake a web call, I'll do it but I was hoping for some kind of convenient multi-threading approach where I simply spawn a new thread and point it at the appropriate function and permissions would be inherited from the caller (ie web server user).
I'd rather use option 1. This would also keep related functionality closer to each other.
Here is a hint how to send something to user and then close the connection and continue executing:
(by tom ********* at gmail dot com, source: http://www.php.net/manual/en/features.connection-handling.php#93441)
<?php
ob_end_clean();
header("Connection: close\r\n");
header("Content-Encoding: none\r\n");
ignore_user_abort(true); // optional
ob_start();
echo ('Text user will see');
$size = ob_get_length();
header("Content-Length: $size");
ob_end_flush(); // Strange behaviour, will not work
flush(); // Unless both are called !
ob_end_clean();
//do processing here
sleep(5);
echo('Text user will never see');
//do some processing
?>
swoole: asynchronous & concurrent extension.
https://github.com/matyhtf/swoole
event-driven
full asynchronous non-blocking
multi-thread reactor
multi-process worker
millisecond timer
async MySQL
async task
async read/write file system
async dns lookup

Categories