Archives downloaded with CURL not valid - php

I have a PHP script that downloads automatically some zip files from certain URLs with cURL functions.
But there's a problem: zip archives downloaded with CURL, if opened with Windows native Zip Extractor, it gives me an "invalid archive" error. If I download the zip file from URL with my browser, it is ok.
For example: zip downloaded with CURL is 21,8 Kb and the one downloaded from browser is 21,4 Kb.
Here's my Curl Setup:
curl_setopt($this->ch, CURLOPT_URL, $link);
curl_setopt($this->ch, CURLOPT_HEADER, TRUE);
$data = curl_exec($this->ch);
Then I save the file ($data) locally on my website like this:
$file = fopen($full_path, "w+");
fputs($file, $data);
fclose($file);
With WinRar both zips are fine, but I need the script to download zip files that are 100% valid.
Can anyone help me with this?

Figured out the solution: CURLOPT_HEADER must be set to false, otherwise it will write HTTP headers in response (and so in my zip files).

Related

PHP curl resume download

I'm currently trying to download satellite images from esa's Copernicus / Sentinel project with curl. Unfortunately the download keeps stopping at around 90% and the php script returns an Internal Server Error (500).
Therefore I would like to resume the download at a specific byte number. It seems that the esa server just ignores the http-range-header (CURLOPT_RANGE) and CURLOPT_RESUME_FROM doesn't change anything either.
If I use Google Chrome to download the file manually, the download also interrupts but continues after some time.
So, if Google Chrome can resume the download, curl should be able to do that, too. I would appreciate any help on how to do that.
Some details:
The file I'm trying to download is here (420MB), to access it you need to register at scihub.esa.int/dhus/.
Content-Type is application/octet-stream
My code:
$save_file = fopen($save_filepath, "w+");
$open_file = curl_init(str_replace(" ","%20", $url));
curl_setopt($open_file, CURLOPT_USERPWD, $username.":".$password);
curl_setopt($open_file, CURLOPT_TIMEOUT, 300);
curl_setopt($open_file, CURLOPT_FILE, $save_file);
curl_setopt($open_file, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($open_file, CURLOPT_PROGRESSFUNCTION, "trackprogress");
curl_setopt($open_file, CURLOPT_NOPROGRESS, false);
curl_exec($open_file);
curl_close($open_file);
fclose($save_file);
It works perfectly for smaller files (I've tested it with some images and pdf-files) and I can also download most of the satellite image (the first 380MB are downloaded). I tried to increase the timeout value, too, but the script terminates long before the 5 minutes are reached.
I tried curl_setopt($open_file, CURLOPT_RESUME_FROM, 1048576); and curl_setopt($open_file, CURLOPT_RANGE, "1048576-"); but the file always starts with the same bytes.
EDIT:
I can't answer my question, but for this specific case I found a workaround. So, if anybody reads this and also wants to download these satellite images with cURL by chance, here is what I did:
When downloading not just the image file, but the zip-file with some additional data, the download still keeps stopping, however with curl_setopt($open_file, CURLOPT_RESUME_FROM, $bytes_already_loaded); it is possible to skip the bytes which had previously been loaded and resume the download (which isn't possible for the image file). Thus, use this link instead of the image file.

Downloaded GitLab archive repository, using GitLab's API, appears to be corrupted

I'm using cURL to download a repository from GitLab using their API. All this is done using PHP. Code bellow:
$zipResource = fopen('archive.zip', 'w');
$ch = curl_init("http://example.com/api/v3/projects");
curl_setopt($ch, CURLOPT_HTTPHEADER, array("PRIVATE-TOKEN: private_token_goes_here"));
curl_setopt($ch, CURLOPT_URL, "http://example.com/api/v3/projects/64/repository/archive");
curl_setopt($ch, CURLOPT_FILE, $zipResource);
curl_exec($ch);
curl_close($ch);
A brief overview: create an empty zip file. Connect to the server, get the archive and write it to the zip file.
The archive appears on the server, I can download it, and I can unzip it when I double-click it, all the files are there and everything seems to be in order.
However, when I try to unzip it using a terminal the following error pops up:
Archive: archive.ZIP
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
What I've tried so far was:
Setting the headers as "Content-type: application/zip". Setting the headers as "Content-Transfer-Encoding: Binary"(with the archive file type being in 'wb' //binary) or "Content-type: application/octet-stream" and so on. The end-result was always the same, meaning I'd get the error mentioned above when trying to unzip the archive.
I can only assume that I'm either not using cURL properly, not setting the headers properly or there's something wrong with their API(highly unlikely).
Any nudging in the right direction is greatly appreciated.
I ran into the same problem with Gitlab. Download as tar in my case works fine. I also found that the downloaded ZIP file actually contains all data, but is corrupted. The data from the ZIP file can still be recovered if Java is installed:
jar xvf corrupted-zip-file.zip
It's probably a GZipped TARball you're downloading.
You should put the .zip extension at the end of the URL you're accessing.

Corrupt image when extract from zip

I trying download a zip file using curl from one virtual host to another, in a same server. Zip file contains *.php and *.jpg files.
The problem is: sometimes JPG files get corrupt, like this:
Here is my code :
$out = fopen(ABSPATH.'/templates/default.zip','w+');
$ch = curl_init();
curl_setopt($ch, CURLOPT_FILE, $out);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_URL, 'http://share.example.com/templates/default.zip');
curl_exec($ch);
curl_close($ch);
$zip = new ZipArchive;
if ($zip->open(ABSPATH.'/templates/default.zip') === TRUE)
{
if($zip->extractTo(ABSPATH.'/templates'))
{
echo 'OK';
}
$zip->close();
}
//$zip->close();
I don't understand what happen to my jpg. I also tried using pclzip.lib.php, but no luck. How to solve this problem ?
Thanks in advance
Have you tried downloading the file via curl and unzipping it normally (i.e. without php)? To figure out whether the download causes the problem or the unzip.
You might also try to replace one of both parts using shell_exec (wget instead of curl, unzip instead of ZipArchive). I mean just for debugging, not for production maybe.
Finally i found what is the problem.
I'm using Nginx web server, when i change nginx config files:
sendfile on;
became
sendfile off;
My image not corrupt anymore. So its not php or curl problem. Interesting article: http://technosophos.com/node/172

Unzip file with php?

I have a scrip that downloads a zip archive and I need to extract the contents to the directory that zip archive is in. I have tried various things, this being the last:
mkdir("/home/site/public_html/".$db."", 0777);
$url = 'http://wordpress.org/latest.zip';
$path = "/home/site/public_html/".$db."/latest.zip";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
curl_close($ch);
file_put_contents($path, $data);
$zip = new ZipArchive;
$zip->open("/home/site/public_html/".$db."/latest.zip");
$zip->extractTo("/home/site/public_html/".$db."/");
$zip->close();
The zip files downloads just fine but it won't extract. Is there another way I can extract the files?
This sounds like a permissions error; it's common for many hosting providers (and good security practice) for the web user (what PHP is running as) to to have limited permissions, such as no write within web directories. See if you can get more information on the failure by upping the error_reporting level (http://php.net/manual/en/function.error-reporting.php), and if this is the issue, it can be solved with suexec. (http://www.alain.knaff.lu/howto/PhpSuexec/) Be careful!

Using TorrentFlux's Download Action Programmatically

Summary : I'm coding a script which automatically download .torrent file from isohunt.com and then download that torrent to DOWNLOADS folder. But i can't download contents of torrent.
I have a torrent file (file.torrent) . I can use TorrentFlux's WebApp interface and download torrent file. But i want to start download programmatically.
I found TorrentFLux WebApp using dispatcher.php file like this to start downloading :
dispatcher.php?action=start&transfer=_file.torrent
I'm trying request this file with cUrl, but it's not working.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://localhost/torrentflux/html/dispatcher.php?action=start&transfer=file.torrent");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_exec($ch);
curl_close();
Note : I'm asking here because Torrentflux official forum has database error.

Categories