Apache2 with PDF and PHP - "This file does not start with "%PDF-" - php

I have been trying to find the reason for this error for weeks now - and I have come up a blank. The system uses PHP to generate dynamic .pdf files.
I have three servers: Dev (Win7 with Apache2), Test (Ubuntu 10.4 with nginx), and Live (Ubuntu 10.10 with nginx). All are running php5 and the system I have developed - same code. Equivalent, same config.
I have many browsers I have tested things with: DevIE (win7, IE8), DevFF (Win7 Firefox 3.5), DevSaf (win, Safari), LaptopFF (WinXP, Firfox 3.5), Laptop IE(WinXP, IE8 Test (Ubuntu FF3.5), and users (mostly IE8 on Win 7 and Win XP).
When I generate a PDF from Test it works correctly in all browsers (except Users which I can't test).
When I generate a PDF from Dev it fails from DevIE, DevFF and DevSaf, but calling for it from Test works.
Apache2 always fails from the same machine.
From the laptop, using FF succeeds, and using IE8 fails (see below).
The users are reporting intermittent problems. It fails, and then the repeat the request and it succeeds.
When it fails....
The log of the generated PDF is shown, sending the right sort of size reply (500KB to 1.8MB) with a 200 OK result. This is sometimes followed about 10 seconds later with a repeat of the same URL - but this generates the log-on screen (again 200 OK reply), but only 2K in size. The implication is that it was requested without the cookie.
Adobe Reader tries to display the log-on page, with the inevitable "This file does not start with "%PDF-" error message.
Except for when I try with the laptop and IE8 - then it fails with show source showing a 4 line html file with an empty body!
The system has been working for over a year - and only started failing with a change of production server about 2 months ago. The test version was not changed at this time, but started to fail also.
I have tried all sorts of headers, but nothing I have tried makes any difference. The current set of headers is:
header('Content-Disposition: inline; filename="'.$this->pdfFilename().'"');
header('Content-type: application/pdf');
header("Pragma: public");
$when = date('r',time()+20); // expire in 20 seconds
header("Expires: $when");
I've tried replacing inline with attachment. Adding and removing all sorts of no-cache headers. All to no avail.
The PDF is requested in a new window, by JavaScript - and is followed 8 seconds later by a refresh. I have tested without the new window, and without the refresh - no change.
I have has a few (small) PDFs served by the Dev server. So I have raised every limit I can think of. Now it always fails.
So I have a Windows Apache2.2 server that fails when browsed from the same machine and succeeds when browsed from other machines in Firefox.
There is no proxy or cache mechanism involved other than that in the browsers.
Has anyone any ideas about what might be going wrong? As I said, I have been testing and eliminating things for nearly 4 weeks now, on and off, and I have not yet even identified the failing component.

This is really tough to troubleshoot - for starters, (please excuse my bluntness, but) this a prime example of what a pipeline should not look like:
Three different operating systems.
Probably at least two different versions of PHP.
Two different webservers.
But anyway, a few general hints on debugging PHP:
make sure to enable error_log and log_errors in php.ini (set display_errors = Off)
use the most verbose error_reporting
set access_log and error_log in nginx.
crank up log level in nginx (I'm guessing you use php-cgi or php-fpm, so you should be able to see what status the backend emits when the download attemp fails).
Furthermore:
You haven't shared how the PDF is generated - are you sure all libraries used here are the same or at least somewhat the same across all systems?
In any case, just to be sure I would save the PDF on the server before it is offered to download. This allows you to troubleshoot the actual file — to see if the PDF generation actually worked.
Since you're saving the PDF, I'd see about putting it in a public folder, so you can see if you can just redirect to it after it's generated. And only if this works, then I'd work on a force-download kind of thing.
I would replicate the production environment in all stages. ;-) You need your dev server to be exactly like the production environment. For your own workstation, I'd recommend a VM (e.g. through Virtualbox with Ubuntu 10.10).
Let me know this gets you somewhere and reply with updates. :-)
Update:
I'd investigate these two headers:
header("Cache-Control: no-cache, must-revalidate"); // HTTP/1.1
header("Expires: Sat, 26 Jul 1997 05:00:00 GMT"); // Date in the past
Definitely helps with cache busting.

These are the headers, which finally worked in a similar situation in one of my apps:
header("Pragma: public");
header("Expires: 0");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("Cache-Control: private",false);
header( "Content-Type: application/pdf" );
header("Content-Disposition: inline; filename=\"YourPDF_" . time() . ".pdf\";");
header("Content-Transfer-Encoding: binary");
header("Content-Length: ". strlen( $pdfData ) );
I added the time() code to make the filename change each time, so that it likely passes all proxies.
From time to time but seldom, the problem re-appears. Then, we ask our clients to download the file using the browser context menu.
PS: The app uses ezPDF found here: http://www.ros.co.nz/pdf/

Related

Do I need to define a mime type in this header?

I am working with scanning images.
Today I made a step forward but I think that I need something more in the header maybe?
Currently a scan request is POSTed to app running on Apache server and PHP 7.0. I receive that post and today I figured out I needed to reply with a custom 201 header.
I have the following custom headers set in PHP
header('HTTP/1.1 201', Created);
header('Location: /eSCL/Scans');
header('Cache-Control: no-cache, no-store, must-revalidate');
After doing I was able to see that a request is now made for an image, from the app to Apache server to /eSCL/Scans/NextDocument. "NextDocument" is actually a softlink to a .jpg, (but may also later be a PDF). The app GETs this file, I can also see that the server replies with the JPG file . That image should display in the scan app that made the request, but never does.
I think however I need something in the header to tell it it is a JPG , especially given it loads from a softlink with no extension . I tried the following with the same result
header('HTTP/1.1 201', Created);
header('Content-type:image/jpeg');
header('Location: /eSCL/Scans');
header('Cache-Control: no-cache, no-store, must-revalidate');
There must be a magic soup that causes the image to display in the app.I have tried both VueScan and Mopria android app with the same results
Thanks for any ideas in advance

PHP: filesize() stat failed with link

I'm randomly getting download errors from a link on a page. I also simplified the link to a directory for easy usage in emails for users.
On the main page the link looks like this:
a href="http://myPage.com/Mac" target="_blank" id="macDownloadButton" class="downloadbutton w-button">Download Mac version</a>
On my server, that's a directory with an index.php in it which looks like this:
<?php
// mac version
$file="http://www.myPage.com/downloads/myApp_Mac.zip";
$filename="myApp_Mac.zip";
header('Content-Transfer-Encoding: binary');
header('Accept-Ranges: bytes');
header('Content-Length: ' . filesize($file));
header('Content-Encoding: none');
header('Content-Type: application/zip');
header('Content-Disposition: attachment; filename=' . $filename);
readfile($file);
exit;
?>
Again, the reason I do this is so it's a simple link to send to users in email like, "http://myPage.com/Mac" and "http://myPage.com/Windows".
The weird thing is that it mostly works...but sometimes it doesn't.
What am I doing wrong?
It's hard to know precisely what's wrong unless you check for errors on your readfile() call.
But you're invoking your web server from your web server here when you specify a filename starting with http. You're doing
readfile('http://www.myPage.com/downloads/myApp_Mac.zip');
where you could just as easily do
readfile('../downloads/myApp_Mac.zip');
and read the zip file from the local file system to send to your user.
What's more, filesize('../downloads/myApp_Mac.zip'); will yield a numerical value quickly and send it in the Content-Length header. That will allow the browser, by knowing the total size of the file you're sending, to display a meaningful progress bar.
You should remove the Accept-Ranges header; the php program you showed us doesn't honor range requests. If you lie to the browser by telling it you do honor those requests, the browser may get confused and corrupt the downloaded copy of your file. That will baffle your user.
Your Content-Disposition header is perfect. It defines the filename to be used on your user's machine in the downloads folder.
Simple operations are more reliable, and this may help you.
The reason you got stat failed with link as an error message is this: stat(2) is a operating-system call that operates on files in local and mounted file systems.
As previously mentioned by O. Jones you should definitely always use your local file path.
Most of my previous issues have been mostly browser related where I needed to tweak/add a http header, and in one case I needed to send all the HTTP headers in lowercase but I haven't had an issue like that in years. My personal recommendation would be to use a solid download library/function - it will make a noticeable difference to your productivity as well as rule out most browser related issues you may come across.
I have used the codeIgniter download helper for the last 3 years and recommend it for 99% of use cases. At the very least I would recommend your read through it's code - you will probably find a few cases you have never even considered such as clearing the output buffer,mime detection and even a special case for Android 2.1 as well as a few headers you may or may not need.
If all else fails I have no idea what server your running this on but if you continue to have issues I would recommend monitoring which processes your machine is running while paying close attention to ram and IO usage. I've have encountered bad/misbehaving services that run periodically using 99% of my IO or ram for short intervals at a time that caused a few really odd and unexpected errors.

Apache file caching after editing by ftp

I am experiencing this strange behaviour, on a shared hosting I am connected by FTP and when I am editing some file(and saving it), it takes at least a few minutes for that change to take effect. For example I put in my index.php file the line echo "test";die;and save it: the program (I am using file zilla) shows that the file is uploaded into server. just in case to be sure, am doing cat index.php (im connected by putty) and I can see that the change in fact is done. But, guess what, when I open in the browser it just works as before (without showing my "test"). But I just wait a few minutes and refresh the page it shows me that "test". Browser cache I deleted(though do not think it matters this case, also trying to refresh the page by CTRL+F5) but after all only after a few minutes changes take into effect. The same thing when I delete that line and double check that it is saved, again during a few minute still I see that echo, when when is nothing in the file already.
So, is there such a thing, that apache has some kind of cache, so even if I change the files in a physical drive after all it uses the file from there and only after a few minutes updates the cache ?
Thanks
I believe that if Varnish is set up correctly you can turn it off via PHP like so.
header('Pragma: no-cache');
header('Cache-Control: private, no-cache, no-store, max-age=0, must-revalidate, proxy-revalidate');
header('Expires: Tue, 04 Sep 2012 05:32:29 GMT');
Files being saved but php script did not change?
Try this in .htaccess - maybe because of new php-versions the apc cache is turned on by default:
php_flag opcache.enable Off

Slower downloads through Apache than with PHP readfile

I've set up a Download-Script with PHP on my server, which checks some details before letting the user download files via Apache (X-Sendfile). The Files are outside the Document-Root.
The code for downloading with Apache and the Module X-Sendfile is:
header("X-Sendfile: $fullpath");
header("Content-Type: application/octet-stream");
header("Content-Disposition: attachment; filename=\"$link_file\"");
When using Apache and X-Sendfile i have a download-speed of 500 kB/s with my client. I also tested it with Apache and without X-Sendfile with the same file within the Document Root - same thing here!
So I tested downloading the same file, with the same client, the same infrastructure on both sides and the same internet-connection a few seconds later via PHP with readfile:
header("Pragma: no-cache");
header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
header("Content-Type: application/octet-stream");
header("Content-Length: ".(string)(filesize($fullpath)));
header("Content-Disposition: attachment; filename=\"$link_file\"");
readfile($fullpath);
This time the download-speed was 9.500 kB/s!
I repeated this test using both options more than a multiple times and the result was the same every time trying. The only difference besides the download-speed was a waiting time of a few seconds (depending on the size of the file which was downloaded), when trying with the PHP readfile method. When repeating the PHP readfile method instantly, the waiting time didn't appear again. Most likely because it was stored in the memory after the first time.
I'm using a professional HP Raid-System on the server, which has an average local Speed of 800 MB/s, so the reason for this can't be the Diskspeed. Also i didn't find any compressing- or bandwith-settings in the httpd.conf of Apache.
Can anyone of you explain why there is such a great difference of the download-speed and how this can be changed?
Thank you in advance.
Server: Windows Server 2008 R2 Apache/2.2.21 (Win32) PHP/5.4.20
Client: Windows 7 Ultimate x64 Google Chrome 30.0.1599.101 LAN >100 Mbit/s
SOLUTION:
httpd.conf, turn on the line "EnableSendfile off"

Caching a GET request for audio with PHP

I have a PHP script that responds to a GET request for audio resources. An HTML5 Audio tag requests an audio file such as:
<audio src="get_audio.php?f=fun" preload></audio>
There is no need for the user to download that same audio file every time so I would like to cache it. In my PHP file I have:
header("Cache-Control: max-age=2419200");
header("Content-Type: audio/mpeg");
...
echo file_get_contents($path);
but when I view the Network tab of Chrome developer tools I see that it re-downloads the audio clip everytime rather than saying "from cache" and if I look in the Response headers I do see the Cache-Control header that I set. Why would it ignore this? Amidoingitright?
It's been a while since I did this in PHP, but try adding:
header("Pragma: public");
above the cache-control header.
I also think you need the expires header:
header('Expires: ' . gmdate('D, d M Y H:i:s', time()+2419200) . ' GMT');
Other than that you could start using the get_headers() function in PHP to debug where it's going wrong.
Do you see a header in the second request called If-Modified-Since:?
This is what you need to catch, parse and respond to - if you don't want to send the file again, you respond with HTTP/1.1 304 Not Modified. If you are using Apache, you can check for the header in the result of apache_request_headers(). If you are not using Apache, you may find it hard to handle this - you will probably have to find a way for the web server to set the headers as environment variables, so they are available in $_ENV or $_SERVER. There is a way to do this in Apache using mod_rewrite (see latest comment on page linked above), so there is probably a way to do it in other server environments as well.
Caching in HTTP 1.1 permits (and indeed, encourages) this behaviour - it means that the cached copy will always be up-to-date.

Categories