Download counter function inaccurate

Download counter function inaccurate - php

We're using a normal PHP download script (with headers etc) to serve files to users.
The issue however is that with some browsers and large downloads the download script is requested multiple times. NGINX logs show the requests with a 206 status code, (suggesting chunked streaming?) which is strange because we don't serve any streamable content?
Regardless, this means the download script is requested multiple times and thus the MySQL function of +1'ing the download counter for the file is run multiple times per download.
We tried using sessions, but seeing as the download is severed from an external server + domain we have no way to clear said sessions after they're set.
We're using Laravel with NGINX + MySQL, any help would be appreciated. Thanks!

Looking at the spec and the headers for the request which would ultimately result in a 206 response, there was one header which struck out which looks like it would be perfect.
The header in question is the Content-Range header which could look like the following:
Content-Range: bytes 21010-47021/47022
What this is saying is it wants to grab bytes 21010-47021 out of 47022 bytes. All you should need to be worried about is the first number here and if it's 0 or not. If the header was set and the first number is 0, you can assume it's just beginning the download and you should increment the counter.

Related

Apache and Content-Range

I am trying to implement support for Content-Range in PHP-generated files. When a browser sends Range request my script gives correct bytes and it works well.
But while testing how Content-Range looks when downloading a PDF from Apache server I realized the first request from a web browser to my server does not contain Range header but somehow server still doesn't return full file and only 32 kB.
On this screenshot you can see that Firefox sends 5 requests to Apache for my_pdf.pdf and Apache each time responds with 32-192 kB. The whole PDF is 28 MB. Requests 2-5 do contain Range request. But the first request- highlighted does not. You can see on the right that Content-Length is 28 MB but that Apache returned only 32 kB.
So my question is- how did Apache know to return only 32 kB and not the whole 28 MB PDF file?

So my question is- how did Apache know to return only 32 kB and not the whole 28 MB PDF file?
It didn't. If you look at the Content-Length header in the response, it shows the full file size of 29.3 million bytes.
The client probably closed the connection without reading the entire response.

Answer posted by #duskwuff is correct- Firefox terminates the transfer of the first requests once it gets enough to process the PDF.
Below is just a few details I discovered.
Firefox will terminate if your scripts returns these headers:
Accept-Ranges: bytes
Content-Length: 29293315
You can (but don't have to) also return this header:
header("Content-Range: bytes 0-29293314/29293315");
However by default Apache tries to compress whatever PHP returns and then adds this header:
Transfer-Encoding: chunked
And when Firefox (and Chrome) see this they won't close the connection. So I just disabled Apache compression and everything works. Now Firefox just does a few requests, get bits of PDF instead of the whole file and renders first page just fine (because it didn't need whole PDF to render just the first page).

Get arount execution time limit in php

I've got some function that takes very long time to execute (downloading some external images in my case) and I want to avoid execution time exceeded error.
Is there any way to avoid this (for example by dividing downloading of single images into single php 'threads' or something like that) ?
I cannot change execution time limit or any of ini settings.
I'm not able to use cron works as it'd be used in WordPress theme and I can't control platform of end user.

One of the possibilities is to make a PHP script that downloads one external image, and let that script be called using Ajax. Then you can build a user interface with JavaScript which calls this PHP script for each image, one by one. It could show some progress bar depending on how many images have been downloaded already.

Yes you can. But you will have to host or proxy the images you want to download by chunk if the remote server does not understand partial transfer downloads.
Then you will have to make your PHP script request the image by chunks to the server
Request
GET /proxy/?url=http://example2.com/myimage.jpg HTTP/1.1
Host: www.example.com
Range: bytes=200-1000
Answer
HTTP/1.1 206 Partial Content
Date: Tue, 17 Feb 2015 10:50:59 GMT
Accept-ranges: bytes
Content-range: bytes 200-1000/6401
Content-type: image/jpeg
Content-length: 800
You will have many choice to call your php script enough times to get all the chunks : automatically refresh the page, ajax request, ...

I get many 206 partial content

I have a client application that sends the data to a php file (hosted on Apache). Usually this works without any problem. On a client site I get 206 partial content every time the client app sends data.
The data size is 10 - 30 kB so it is not huge.
If you have any suggestion - like changing Apache settings .. or something similar I would appreciate it.
Thanks.

Its not an issue. Any 2xx code means "Success". You can view details # Why does Firebug show a "206 Partial Content" response on a video loading request?

Browser shows time out while Server process is still running

I am having following problem:
I am running BIG memory process but have divided memory load into smaller chunks so no CPU time out issue.
In the Server I am creating .xml files with around 100kb sizes and they will be created around 100+.
Now main problem is browser shows Response Time out and IE at the below (just upper status bar) shows .php file download message.
During this in the backend (Server side) process is still running and continuously creating .xml files in incremental order. So no issue with that.
I have following php.ini configuration.
max_execution_time = 10000 ; Maximum execution time of each script, in seconds
max_input_time = 10000 ; Maximum amount of time each script may spend parsing request data
memory_limit = 2000M ; Maximum amount of memory a script may consume (128MB)
; Maximum allowed size for uploaded files.
upload_max_filesize = 2000M
I am running my site on IE. And I am using ZSCE with PHP 5.3
Can anybody redirect me on proper way on this issue?
Edit:
Uploading image of Time out and that's why asking for .php file download.
Edit 2:
I briefly explain my execution flow:
I have one PHP file with objects of Class Hierarchies which will start to execute Function1() from each class Hierarchy.
I have class file.
First, let say, Function1() is executed which contains logic of creating XML files in chunks.
Second, let say, Function2() is executed which will display output generated by Function1().
All is done in Class Hierarchies manner. So I can't terminate, in between, execution of Function1() until it get executed. And after that Function2() will be called.
Edit 3:
This is specially for #hakre.
As you asked some cross questions and I agree with some points but let me describe more in detail about the issue.
First I was loading around 100+ MB size XML Files at a time and that's why my Memory in local setup was hanging and stops everything on Machine and CPU time was utilizing its most resources.
I, then, divided this big size XML files in to small size (means now I am loading single XML file at a time and then unloading it after its usage). This saved me from Memory overload and CPU issue on local setup.
Now my backend process is running no CPU or Memory issue but issue is with Browser Timeout. I even tried cURL but as per my current structure it does seems to fit because of my class hierarchy issue. I have a set of classes in hierarchy and they all execute first their Process functions and then they all execute their Output functions. So unless and until Process functions get executed the Output functions do not comes in picture and that's why Browser shows Timeout.
I even followed instructions suggested by #vortex and got little success but not what I am looking for. Why I could not implement cURl because My process function is Creating required XML files at one go so it's taking too much time to output to Browser. As Process function is taking that much time no output is possible to assign to client unless and until it get completed.
cURL Output:
URL....: myurl
Code...: 200 (0 redirect(s) in 0 secs)
Content: text/html Size: -1 (Own: 433) Filetime: -1
Time...: 60.437 Start # 60.437 (DNS: 0 Connect: 0.016 Request: 0.016)
Speed..: Down: 7 (avg.) Up: 0 (avg.)
Curl...: v7.20.0
Contents of test.txt file
* About to connect() to mylocalhost port 80 (#0)
* Trying 127.0.0.1... * connected
* Connected to mylocalhost (127.0.0.1) port 80 (#0)
\> GET myurl HTTP/1.1
Host: mylocalhost
Accept: */*
< HTTP/1.1 200 OK
< Date: Tue, 06 Aug 2013 10:01:36 GMT
< Server: Apache/2.2.21 (Win32) mod_ssl/2.2.21 OpenSSL/0.9.8o
< X-Powered-By: PHP/5.3.9-ZS5.6.0 ZendServer
< Set-Cookie: ZDEDebuggerPresent=php,phtml,php3; path=/
< Cache-Control: private
< Transfer-Encoding: chunked
< Content-Type: text/html
<
* Connection #0 to host mylocalhost left intact
* Closing connection #0
Disclaimer : An answer for this question is chosen based on the first little success based on answer selected. The solution from #Hakre is also feasible when this type of question is occurred. But right now no answer fixed my question but little bit. Hakre's answer is also more detail in case of person finding for more details about this type of issues.

assuming you made all the server side modifications so you dodge a server timeout [i saw pretty much everyting explained above], in order to dodge browser timeout it is crucial that you do something like this
<?php
set_time_limit(0);
error_reporting(E_ALL);
ob_implicit_flush(TRUE);
ob_end_flush();
I can tell you from experience that internet explorer doesn't have any issues as long as you output some content to it every now and then. I run a 30gb database update everyday [that takes around 2-4 hours] and opera seems to be the only browser that ignores the content output.
if you don't set "ob_implicit_flush" you need to do an "ob_flush()" after every piece of content.
References
ob_implicit_flush
ob_flush
if you don't use ob_implicit_flush at the top of your script as I wrote earlier, you need to do something like:
<?php
echo 'dummy text or execution stats';
ob_flush();
within your execution loop

1. I am running BIG memory process but have divided memory load into smaller chunks so no CPU time out issue.
Now that's a wild guess. How did you find out it was a CPU time out issue in the first place? Did you even? If yes, what does your test now gives? If not, how do you test now that this is not a time-out issue?
Despite you state there won't be a certain issue, you don't proof that and many questions are still open. That invites for guessing which is counter-productive for trouble-shooting (which you are doing here).
What you write here just means that you wrote code to chunk memory, however, this is not a test for CPU time out issues. The one is writing code the other part is test. Don't mix the two. And don't draw wild assumptions. Issues are for the test, otherwise it didn't happen.
So much for your first point already just to show you that when doing troubleshooting, look for facts (monitor, test, profile, step-debug) not run assumptions. This is curcial otherwise you look in the wrong places and ask the wrong questions.
From what you describe how the client (browser) behaves, this is not a time-out-issue per-se. The problem you've got is that the answer between the header response and the body response is taking to long for the taste of your browser. The one browser is assuming a time-out (as such a boundary value has been triggered and this looks more correct to me) and the other browser is assuming somthing is coming up, why not save it.
So you merely have a processing issue here. Please consult the menual of your internet browsers (HTTP clients) which configuration values you can change to change this behavior. E.g. monitor with a curl-request on the command-line how long the request actually take. Then configure your browser to not time-out when connecting to that server under such an amount of time you just measured. For example if you're using Internet Explorer: http://www.ehow.com/how_6186601_change-internet-timeout-options.html or if you're using Mozilla Firefox: http://forums.mozillazine.org/viewtopic.php?f=7&t=102322&start=0
As you didn't show any code on the server-side I assume you want to solve this problem with client settings. Curl will help you to measure the number of seconds such a request takes. Use the -v (Verbose) switch to obtain detailed information about the request.
In case you don't want to solve this on the client, curl will still help you to measure important data and easily reproduce any underlying server-related timing issue. So you should go for Curl on the command-line in any case, especially as looking into response-headers might reveal what triggers the (again) esoteric internet explorer behavior. Again the -v switch does reveal you request and response headers.
If you like to automate such tests with a PHP script, it's also possible with the PHP Curl Extension. This has been outlined in:
Php - Debugging Curl

The problem is with your web-server, not the browser.
If you're using Apache, you need to adjust your Timeout value at httpd.conf or virtual hosts config.

You have 3 pages
Process - Creates the XML files and then updates a database value saying that the process is done
A PHP page that returns {true} or {false} based on the status of the process completion database value
An ajax front end, polling page 2 every few seconds to check weather the process is done or not
Long Polling

I have had this issue several times, while reading large csv file and puting it in database. I solved it in way, that i divided the reading and putting in database process into smaller parts. Like i created a new table to make log of how much data is readed and inserted, and next time the page reloads itself and start from that position. So you can do it by creating one xml in one attempt,and reload page and start form next one. In this way the memory used by browser is refreshed.
Hope it will help.

Is it possible to send some output to browser from the script while it's still processing, even white space? If, then do it, it should reset the timeout counter.
If it's not possible, you have to increase the timeout of IE in the registry:
HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings
You need ReceiveTimeout, if it's not there, create it as dword, and set the value in miliseconds.

What is a "CPU time out issue"?
The right way to solve the problem is to run the heavy stuff asynchronously, in a seperate session group (not the webserver process tree).

Try to include set_time_limit(0); in your PHP script page.
The following links might help you.
http://php.net/manual/en/function.set-time-limit.php
http://php.net/manual/en/function.ignore-user-abort.php

Caching HTTP responses when they are dynamically created by PHP

I think my question seems pretty casual but bear with me as it gets interesting (at least for me :)).
Consider a PHP page that its purpose is to read a requested file from filesystem and echo it as the response. Now the question is how to enable cache for this page? The thing to point out is that the files can be pretty huge and enabling the cache is to save the client from downloading the same content again and again.
The ideal strategy would be using the "If-None-Match" request header and "ETag" response header in order to implement a reverse proxy cache system. Even though I know this far, I'm not sure if this is possible or what should I return as response in order to implement this technique!

Serving huge or many auxiliary files with PHP is not exactly what it's made for.
Instead, look at X-accel for nginx, X-Sendfile for Lighttpd or mod_xsendfile for Apache.
The initial request gets handled by PHP, but once the download file has been determined it sets a few headers to indicate that the server should handle the file sending, after which the PHP process is freed up to serve something else.
You can then use the web server to configure the caching for you.
Static generated content
If your content is generated from PHP and particularly expensive to create, you could write the output to a local file and apply the above method again.
If you can't write to a local file or don't want to, you can use HTTP response headers to control caching:
Expires: <absolute date in the future>
Cache-Control: public, max-age=<relative time in seconds since request>
This will cause clients to cache the page contents until it expires or when a user forces a page reload (e.g. press F5).
Dynamic generated content
For dynamic content you want the browser to ping you every time, but only send the page contents if there's something new. You can accomplish this by setting a few other response headers:
ETag: <hash of the contents>
Last-Modified: <absolute date of last contents change>
When the browser pings your script again, they will add the following request headers respectively:
If-None-Match: <hash of the contents that you sent last time>
If-Modified-Since: <absolute date of last contents change>
The ETag is mostly used to reduce network traffic as in some cases, to know the contents hash, you first have to calculate it.
The Last-Modified is the easiest to apply if you have local file caches (files have a modification date). A simple condition makes it work:
if (!file_exists('cache.txt') ||
filemtime('cache.txt') > strtotime($_SERVER['HTTP_IF_MODIFIED_SINCE'])) {
// update cache file and send back contents as usual (+ cache headers)
} else {
header('HTTP/1.0 304 Not modified');
}
If you can't do file caches, you can still use ETag to determine whether the contents have changed meanwhile.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.