PHP Large report file download issue

PHP Large report file download issue - php

Solved
I actually found out what is going on here.
Turns out it was sending the whole file, but Excel (which I was using to open the result file for testing), will only display 65536 rows. If there are more than that, it will alert something to the effect of "the file is incompletely displayed" and then cut it off after that many records.
(Note to Joel Spolsky - please call your friends from the original Excel development team and yell at them for me =o)
Thanks!
I have a very simple script that pulls some data from a database, and sends it to the visitor as a .csv file.
I have the memory and execution time set to acceptable levels, but for a few large reports, I notice that the download cuts off after about 10 seconds.
This ONLY happens if I set it as a download in the headers. If I comment out the content-type, content-disposition, etc, and just write the data to the browser, then the entire file will download and display in the browser.
Code is as follows:
// Code removed.
Anyone have any ideas? Could this be a browser issue with file download?
Thanks!

I don't know what's causing your problem, but here is something that you can try:
Write the data to a file and then send it to the user using the X-Sendfile(see this) header. Alternatively you can redirect to the file.

I had a slightly similar issue, but with very slow downloads. The culprit was an overly aggressive antivirus package affecting only IE downloads. Check for Mcshield.exe.

Related

use file_put_contents('test/xxx.jpg', $url), get wrong image

I use php. I try download a image from a url, my code worked for some url, and others don't work.I wanna my code to work for all url. Or tell me what lead to this.
here is My script:
$imgUrl = 'http://www.inc.com/uploaded_files/image/i-love-me_49961.jpg';
$imageData = file_put_contents('test/xxx.jpg', file_get_contents($imgUrl));
Right now, I can get this image file(xxx.jpg), but when I open saved file in ACDSee,I get nothing.
however, if I use "http://www.wired.com/wp-content/uploads/2014/11/faa-drones-ft-660x330.jpg", My script works.
Please help me.

Interesting. This is a case of file_get_contents failing to get the correct image, so to speak, but the best matched SO questions I found would not help you, because they are about different things.
I shall answer this by laying out how you should solve this type of problems.
Problem solving is the simple art of breaking down the problem, and check the smaller pieces one by one until the cause is pinpointed.
First, did you get anything saved?
If yes, that means you did get something, and we can exclude all data read write problems including file permissions, network problems, access denials, or lack of curl extension.
If you didn't get the saved file, these issues has to be checked one by one.
In your case, I trust that you did get the file.
So the problem is now with the actual data.
Usually, we first verify that the source is ok.
Open it in browser. Save it. Open saved file in ACDSee.
It works! This is how we confirm the source is working, and ACDSee is working.
(And that the OS/browser/network is working, actually.)
Which leave us the saved data.
No programs can open it as jpeg, so we can be pretty certain the saved file is not a jpeg.
What is it, then?
If you use a hex editor (e.g. HxD) to open the PHP saved file (not a jpeg) and the manually saved file (confirmed jpeg), you will see that they are simply totally different.
manually saved image: FF D8 FF E1 ...
PHP saved image: 1F 8B 08 ...
If you lookup these first few bytes, called file headers, you will see that the PHP saved file is a gzip file.
To confirm this, you can rename the file's extension to .gz. Unzip it, and viola there is the image!
Note: hex comparison is pretty useful in sorting out the occasional weird problems, such as unwanted bom markers, line break conversion on binary files, or messy server filters.
So hex editors are indispensable for a good programmer, even a web programmer.
At this stage, the question becomes, why did I get a gzipped file?
A web programmer should know what is wrong by now, but let's pretend we do not.
There is not much problem space left.
It is either file_put_contents or file_get_contents.
If you do a little PHP coding to get in between them, you will see that file_put_contents is returning the gzipped data.
But where did file_put_contents get its data?
From the network, of course!
Now, let me introduce your a software called Wireshark.
These software are called packet sniffers, and they can show you the raw data going through your network cable or wifi.
Note: Packet sniffers are not easy. You need to know network protocols really well to make sense of anything.
They belongs to a class of low level debuggers called system monitors, and are often the last resort.
But this final hand is one of the distinctions between an average programmer and an expert.)
Indeed, with a packet sniffer, we can confirm that the server is responding with gzip encoded content, using Content-Encoding: gzip.
And so, we now know that the real cause is file_get_contents does not automatically decompress gzip content.
With this correct question, stackoverflow already has answers.
This is how we approach pretty much every programming problems, and why we answer more than we ask.
Hope you enjoyed the journey, and hope you will become the tour guide one day.

Streaming large (200 MB) zip through PHP

I am creating an server-side api for my app. One of the steps requires the app do download a very large zip file from my server, but it can't be done by just downloading http://mydomain.com/file.zip. My app passes some authentication headers and other safety stuff which must be processed by a php script (e.g. http://mydomain.com/download.php?auth=foo&key=value&etc)
I have done this kind of thing earlier using images:
<?php
$can_download = /* some complicated auth stuff */
if($can_download) {
header('Content-Type: image/jpeg');
echo file_get_contents('image.jpg');
}
?>
My question is: can it be done with 200+ MB zip file? I know I have to modify headers somehow and probably use some advanced php functions, but I couldn't find any tutorial neither here, on Stack Overflow, nor anywhere else.
Edit: I also have to be able to resume my downloads, because it's very likely that user would quit the app, though it should be able to resume the download (e.g. from 47%). Can it be done?
For those looking for the answer: the correct function is readfile and 'resuming' problem can be solved using this custom function.

The correct function to use in this instance is readfile.
The given code would encounter problems because it unnecessarily loads the file contents into memory before forwarding them to the browser. This does not help performance, and can easily result in running out of memory.

Yes, however you want to use something like readfile instead. The reason being is that file_get_contents reads the entire file into memory before passing it to the client.

What php Headers do I need to play an audio file

I have decrypted my audio file, I now want to play it and then unlink it.
What I currently have is:
<?php
$destination = "/tmp_upload_dir_copy/test.mp3"
header('Content-Type: audio/mpeg');
readfile($destination);
unlink($destination);
?>
Anyone have any ideas what I'm doing wrong or what else do I need?
maybe I need to use fpassthru() ?

PHP works only server side. You can only guarentee the file is SENT to the client, but there is no way to directly make sure it has been PLAYED by the client.
Setting the header will only influence how the browser treats the data (in this case the browser is informed the data is audio). Chrome, for example plays audio files but some browsers may give users a download prompt.
You'll need client side software, like a audio playing component (search "Flash MP3 player") to embed in a page to play the audio file.

Sounds like you don't really want to delete the file immediately, as any number of things could go wrong on the user's end between downloading the mp3 and actually playing it. The user might need to initiate the file transfer again in many cases. Instead you might want to set up a cron job that runs every night and deletes the mp3 files that are more than one day old. (Also, I'm not sure what you meant by "decrypt" the audio file.)

Upload progress using pure PHP/AJAX?

I'm sure this has been asked before, but as I can't seem to find a good answer, here I am, asking... again. :)
Is there any way, using only a mixture of HTML, JavaScript/AJAX, and PHP, to report the actual progress of a file upload?
In reply to anyone suggesting SWFUpload or similar:
I know all about it. Been down that road. I'm looking for a 100% pure solution (and yes, I know I probably won't get it).

Monitoring your file uploads with PHP/Javascript requires the PECL extension:
uploadprogress
A good example of the code needed to display the progress to your users is:
Uber Uploader
If I'm not mistaken it uses JQuery to communicate with PHP.
You could also write it yourself, It's not that complex.
Add a hidden element as the first element of upload form, named UPLOAD_IDENTIFIER.
Poll a PHP script that calls uploadprogress_get_info( UPLOAD_IDENTIFIER )
It return an array containing the following:
time_start - The time that the upload began (unix timestamp),
time_last - The time that the progress info was last updated,
speed_average - Average speed in bytes per second,
speed_last - Last measured speed in bytes per second,
bytes_uploaded - Number of bytes uploaded so far,
bytes_total - The value of the Content-Length header sent by the browser,
files_uploaded - Number of files uploaded so far,
est_sec - Estimated number of seconds remaining.
Let PHP return the info to Javascript and you should have plenty of information.
Depending on the audience, you will likely not use all the info available.

If you have APC installed (and by this point, you really should; it'll be standard in PHP6), it has an option to enable upload tracking.
There's some documentation, and Rasmus has written a code sample that uses YUI.

If you're able to add PECL packages into your PHP, there is the uploadprogress package.
The simplest way would be to just use swfupload, though.

Is there any way, using only a mixture of HTML, JavaScript/AJAX, and PHP, to report the actual progress of a file upload?
I don't know of any way to monitor plain HTML (multipart/form-data) file uploads in webserver-loaded PHP.
You need to have access to the progress of the multipart/form-data parser as the data comes in, but this looks impossible because the ways of accessing the HTTP request body from PHP ($HTTP_RAW_POST_DATA and php://input) are documented as being “not available with enctype="multipart/form-data"”.
You could do a script-assisted file upload in Firefox using an upload field's FileList to grab the contents of a file to submit in a segmented or non-multipart way. Still a bunch of work to parse though.
(You could even run a PHP script as a standalone server on another port just for receiving file uploads, using your own HTTP-handling code. But that's a huge amount of work for relatively little gain.)

I'd recommend you to five FancyUpload a try it's a really cool solution for progress bar and it's not necesarely attached to php. Checkout also the other tools at digitarald.de
cheers

IMHO, this is the problem that Web browsers should solve. We have progress meter for downloads, so why not for uploads as well?
Take a look at this for example:
http://www.fireuploader.com/

Generating ZIP files with PHP + Apache on-the-fly in high speed?

To quote some famous words:
“Programmers… often take refuge in an understandable, but disastrous, inclination towards complexity and ingenuity in their work. Forbidden to design anything larger than a program, they respond by making that program intricate enough to challenge their professional skill.”
While solving some mundane problem at work I came up with this idea, which I'm not quite sure how to solve. I know I won't be implementing this, but I'm very curious as to what the best solution is. :)
Suppose you have this big collection with JPG files and a few odd SWF files. With "big" I mean "a couple thousand". Every JPG file is around 200KB, and the SWFs can be up to a few MB in size. Every day there's a few new JPG files. The total size of all the stuff is thus around 1 GB, and is slowly but steadily increasing. Files are VERY rarely changed or deleted.
The users can view each of the files individually on the webpage. However there is also the wish to allow them to download a whole bunch of them at once. The files have some metadata attached to them (date, category, etc.) that the user can filter the collection by.
The ultimate implementation would then be to allow the user to specify some filter criteria and then download the corresponding files as a single ZIP file.
Since the amount of criteria is big enough, I cannot pre-generate all the possible ZIP files and must do it on-the-fly. Another problem is that the download can be quite large and for users with slow connections it's quite likely that it will take an hour or more. Support for "resume" is therefore a must-have.
On the bright side however the ZIP doesn't need to compress anything - the files are mostly JPEGs anyway. Thus the whole process shouldn't be more CPU-intensive than a simple file download.
The problems then that I have identified are thus:
PHP has execution timeout for scripts. While it can be changed by the script itself, will there be no problems by removing it completely?
With the resume option, there is the possibility of the filter results changing for different HTTP requests. This might be mitigated by sorting the results chronologically, as the collection is only getting bigger. The request URL would then also include a date when it was originally created and the script would not consider files younger than that. Will this be enough?
Will passing large amounts of file data through PHP not be a performance hit in itself?
How would you implement this? Is PHP up to the task at all?
Added:
By now two people have suggested to store the requested ZIP files in a temporary folder and serving them from there as usual files. While this is indeed an obvious solution, there are several practical considerations which make this infeasible.
The ZIP files will usually be pretty large, ranging from a few tens of megabytes to hundreads of megabytes. It's also completely normal for a user to request "everything", meaning that the ZIP file will be over a gigabyte in size. Also there are many possible filter combinations and many of them are likely to be selected by the users.
As a result, the ZIP files will be pretty slow to generate (due to sheer volume of data and disk speed), and will contain the whole collection many times over. I don't see how this solution would work without some mega-expensive SCSI RAID array.

This may be what you need:
http://pablotron.org/software/zipstream-php/
This lib allows you to build a dynamic streaming zip file without swapping to disk.

Use e.g. the PhpConcept Library Zip library.
Resuming must be supported by your webserver except the case where you don't make the zipfiles accessible directly. If you have a php script as mediator then pay attention to sending the right headers to support resuming.
The script creating the files shouldn't timeout ever just make sure the users can't select thousands of files at once. And keep something in place to remove "old zipfiles" and watch out that some malicious user doesn't use up your diskspace by requesting many different filecollections.

You're going to have to store the generated zip file, if you want them to be able to resume downloads.
Basically you generate the zip file and chuck it in a /tmp directory with a repeatable filename (hash of the search filters maybe). Then you send the correct headers to the user and echo file_get_contents to the user.
To support resuming you need to check out the $_SERVER['HTTP_RANGE'] value, it's format is detailed here and once your parsed that you'll need to run something like this.
$size = filesize($zip_file);
if(isset($_SERVER['HTTP_RANGE'])) {
//parse http_range
$range = explode( '-', $seek_range);
$new_length = $range[1] - $range[0]
header("HTTP/1.1 206 Partial Content");
header("Content-Length: $new_length");
header("Content-Range: bytes {$range[0]}-$range[1]");
echo file_get_contents($zip_file, FILE_BINARY, null, $range[0], $new_length);
} else {
header("Content-Range: bytes 0-$size");
header("Content-Length: ".$size);
echo file_get_contents($zip_file);
}
This is very sketchy code, you'll probably need to play around with the headers and the contents to the HTTP_RANGE variable a bit. You can use fopen and fwrite rather than file_get contents if you wish and just fseek to the right place.
Now to your questions
PHP has execution timeout for scripts. While it can be changed by the script itself, will there be no problems by removing it completely?
You can remove it if you want to, however if something goes pear shaped and your code get stuck in an infinite loop at can lead to interesting problems should that infinite loop be logging and error somewhere and you don't notice, until a rather grumpy sys-admin wonders why their server ran out of hard disk space ;)
With the resume option, there is the possibility of the filter results changing for different HTTP requests. This might be mitigated by sorting the results chronologically, as the collection is only getting bigger. The request URL would then also include a date when it was originally created and the script would not consider files younger than that. Will this be enough?
Cache the file to the hard disk, means you wont have this problem.
Will passing large amounts of file data through PHP not be a performance hit in itself?
Yes it wont be as fast as a regular download from the webserver. But it shouldn't be too slow.

i have a download page, and made a zip class that is very similar to your ideas.
my downloads are very big files, that can't be zipped properly with the zip classes out there.
and i had similar ideas as you.
the approach to give up the compression is very good, with that you not even need fewer cpu resources, you save memory because you don't have to touch the input files and can pass it throught, you can also calculate everything like the zip headers and the end filesize very easy, and you can jump to every position and generate from this point to realize resume.
I go even further, i generate one checksum from all the input file crc's, and use it as an e-tag for the generated file to support caching, and as part of the filename.
If you have already download the generated zip file the browser gets it from the local cache instead of the server.
You can also adjust the download rate (for example 300KB/s).
One can make zip comments.
You can choose which files can be added and what not (for example thumbs.db).
But theres one problem that you can't overcome with the zip format completely.
Thats the generation of the crc values.
Even if you use hash-file to overcome the memory problem, or use hash-update to incrementally generate the crc, it will use to much cpu resources.
Not much for one person, but not recommend for professional use.
I solved this with an extra crc value table that i generate with an extra script.
I add this crc values per parameter to the zip class.
With this, the class is ultra fast.
Like a regular download script, as you mentioned.
My zip class is work in progress, you can have a look at it here: http://www.ranma.tv/zip-class.txt
I hope i can help someone with that :)
But i will discontinue this approach, i will reprogram my class to a tar class.
With tar i don't need to generate crc values from the files, tar only need some checksums for the headers, thats all.
And i don't need an extra mysql table any more.
I think it makes the class easier to use, if you don't have to create an extra crc table for it.
It's not so hard, because tars file structure is easier as the zip structure.
PHP has execution timeout for scripts. While it can be changed by the script itself, will there be no problems by removing it completely?
If your script is safe and it closes on user abort, then you can remove it completely.
But it would be safer, if you just renew the timeout on every file that you pass throught :)
With the resume option, there is the possibility of the filter results changing for different HTTP requests. This might be mitigated by sorting the results chronologically, as the collection is only getting bigger. The request URL would then also include a date when it was originally created and the script would not consider files younger than that. Will this be enough?
Yes that would work.
I had generated a checksum from the input file crc's.
I used this as an e-tag and as part of the zip filename.
If something changed, the user can't resume the generated zip,
because the e-tag and filename changed together with the content.
Will passing large amounts of file data through PHP not be a performance hit in itself?
No, if you only pass throught it will not use much more then a regular download.
Maybe 0.01% i don't know, its not much :)
I assume because php don't do much with the data :)

You can use ZipStream or PHPZip, which will send zipped files on the fly to the browser, divided in chunks, instead of loading the entire content in PHP and then sending the zip file.
Both libraries are nice and useful pieces of code. A few details:
ZipStream "works" only with memory, but cannot be easily ported to PHP 4 if necessary (uses hash_file())
PHPZip writes temporary files on disk (consumes as much disk space as the biggest file to add in the zip), but can be easily adapted for PHP 4 if necessary.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.