I'm trying to download a csv that's been gz'd directly to a file:
$fp = fopen ($file.'.csv', 'w+');
$ch = curl_init($url.'/'.$file.'.csv.gz');
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip');
curl_exec($ch);
curl_close($ch);
fclose($fp);
What ends up happening is {$file}.csv is written to disk, however it's still encoded. If I rename the stored file {$file}.csv.gz and gunzip it the data is decoded properly.
I ended up downloading the file and creating another block of code to decode it afterwards.
$buffer_size = 262144;
$in_file_handle = gzopen($symbol.'.csv.gz', 'rb');
$out_file_handle = fopen($symbol.'.csv', 'wb');
// Keep repeating until the end of the input file
while(!gzeof($in_file_handle)) {
fwrite($out_file_handle, gzread($in_file_handle, $buffer_size));
}
// Files are done, close files
fclose($out_file_handle);
gzclose($in_file_handle);
I would however, still be very much interested in getting curl to do it all in one step.
Related
I would like to copy a PDF from a URL (API) to our server with PHP.
When I call the URL in the browser, the file start to download directly, so this ain't a static PDF/URL. I think that's problem.
I've tried different functions with PHP but with no luck:
file_put_contents, copy, fopen/write.
Could you please advise?
For example i've tried:
$url_label = "http://example.com/Public/downloadlabel.aspx?username=$username&password=$password&layout=label10x15&zipcode=$zipcode&shipment=$ShipmentID";
file_put_contents("label.pdf", fopen($url_label, 'r'));
The PDF-file is created in my folder, but this file is empty (0 bytes).
And with curl I hoped to pass the forced download:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url_label);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
$data = curl_exec($ch);
curl_close($ch);
$destination = dirname(__FILE__) . '/file.pdf';
$file = fopen($destination, "w+");
fputs($file, $data);
fclose($file);
curl_close($ch);
The PDF-file is created with 277 bytes, but is corrupted.
Any ideas?
I am creating an internal tool for my team that will allow trusted team members to save remote files to our server via php and curl. I have the open, write, and close working perfectly, but I would like to add a check to make sure the file is of a certain mime type before it creates and writes the local file.
How could I do this, based on an array of mime types?
$ch = curl_init();
$fp = fopen($local_file, 'w+');
$ch = curl_init($remote_file);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_ENCODING, "");
curl_exec($ch);
curl_close($ch);
fclose($fp);
I solved this by checking the mime via fileinfo after the file has been transfered. If it is not a valid mime type, then I remove it.
$ch = curl_init();
$fp = fopen($local_file, 'w+');
$ch = curl_init($remote_file);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_ENCODING, "");
curl_exec($ch);
curl_close($ch);
fclose($fp);
$finfo = new finfo(FILEINFO_MIME);
$mime_type = $finfo->file($local_file);
if (strpos($mime_type, 'application/xml') === false) {
unlink($local_file);
}
For example:
$ch = curl_init('http://static.php.net/www.php.net/images/php.gif');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_exec($ch);
$mime = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
in $mime has mime type of file.
Tricky! You could download the whole file first (into memory, or a temporary folder.) If you want to stream it, you may have to:
set CURLOPT_HEADER to include the HTTP response headers in the data you get back
set CURLOPT_WRITEFUNCTION instead of CURLOPT_FILE, read and parse the http headers to see the mime type, and then decide if you're going to create/write a file or not.
Obviously this is quite a bit of work, as you'll have to do some basic parsing of the HTTP headers, and possibly buffering to get the whole HTTP headers at once.
Hopefully someone will post an easier solution.
pseudocode:
state = headers
buf = ''
fd = null
func writefunc(ch, data)
if state is headers
buf .= data
div = buf.strpos "\r\n\r\n"
if div !== false
mime = get_mime buf, div
if mime_ok mime
fd = fopen ...
fd.write buf.substr div+4
state = saving
else
# returns other than data.length() abort connection
return 0
else
fd.write data
return data.length()
I would like to download a file with Curl.
The problem is that the download link is not direct, for example:
http://localhost/download.php?id=13456
When I try to download the file with curl, it download the file download.php!
Here is my curl code:
###
function DownloadTorrent($a) {
$save_to = $this->torrentfolder; // Set torrent folder for download
$filename = str_replace('.torrent', '.stf', basename($a));
$fp = fopen ($this->torrentfolder.strtolower($filename), 'w+');//This is the file where we save the information
$ch = curl_init($a);//Here is the file we are downloading
curl_setopt($ch, CURLOPT_ENCODING, "gzip"); // Important
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_URL, $fp);
curl_setopt($ch, CURLOPT_HEADER,0); // None header
curl_setopt($ch, CURLOPT_BINARYTRANSFER,1); // Binary trasfer 1
curl_exec($ch);
curl_close($ch);
fclose($fp);
}
Is there a way to download the file without knowing the path?
You may try CURLOPT_FOLLOWLOCATION
TRUE to follow any "Location: " header that the server sends as part
of the HTTP header (note this is recursive, PHP will follow as many
"Location: " headers that it is sent, unless CURLOPT_MAXREDIRS is
set).
So it will result into:
function DownloadTorrent($a) {
$save_to = $this->torrentfolder; // Set torrent folder for download
$filename = str_replace('.torrent', '.stf', basename($a));
$fp = fopen ($this->torrentfolder.strtolower($filename), 'w+');//This is the file where we save the information
$ch = curl_init($a);//Here is the file we are downloading
curl_setopt($ch, CURLOPT_ENCODING, "gzip"); // Important
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER,0); // None header
curl_setopt($ch, CURLOPT_BINARYTRANSFER,1); // Binary transfer 1
curl_exec($ch);
curl_close($ch);
fclose($fp);
}
Set the FOLLOWLOCATION option to true, e.g.:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
Options are documented here: http://www.php.net/manual/en/function.curl-setopt.php
Oooh !
CURLOPT_FOLLOWLOCATION work perfect...
The problem is that I use CURLOPT_URL for fopen(), I simply change CURLOPT_URL whit CURLOPT_FILE
and it works very well!
thank you for your help =)
I've to download the pdf files related to the data from the web source. I know the full path of the file. I've tried with curl but it is taking long time and writing a 0 byte file.
$ch = curl_init ($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER,1);
$rawdata = curl_exec($ch);
curl_close ($ch);
if(file_exists($fullpath)){
unlink($fullpath);
}
$fp = fopen($fullpath,'x');
fwrite($fp, $rawdata);
fclose($fp);
$ch = curl_init("http://www.example.com/");
$fp = fopen("example_homepage.txt", "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
http://www.php.net/manual/en/curl.examples-basic.php
Or with this (if fopen wrappers are set up in your PHP conf):
$file = 'http://somehosted.com/file.pdf'; // URL to the file
$contents = file_get_contents($file); // read the remote file
touch('somelocal.pdf'); // create a local EMPTY copy
file_put_contents('somelocal.pdf', $contents); // put the fetchted data into the newly created file
// done :)
And this one might fit you the best: http://www.jonasjohn.de/snippets/php/curl-example.htm
It's hard to say without seeing what your code looks like and where you might be going wrong, but take a look at this and see if there's anything that stands out as something you might have overlooked:
http://davidwalsh.name/download-urls-content-php-curl
I want to connect to a remote file and writing the output from the remote file to a local file, this is my function:
function get_remote_file_to_cache()
{
$the_site="http://facebook.com";
$curl = curl_init();
$fp = fopen("cache/temp_file.txt", "w");
curl_setopt ($curl, CURLOPT_URL, $the_site);
curl_setopt($curl, CURLOPT_FILE, $fp);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_exec ($curl);
$httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if($httpCode == 404) {
touch('cache/404_err.txt');
}else
{
touch('cache/'.rand(0, 99999).'--all_good.txt');
}
curl_close ($curl);
}
It creates the two files in the "cache" directory, but the problem is it does not write the data into the "temp_file.txt", why is that?
Actually, using fwrite is partially true.
In order to avoid memory overflow problems with large files (Exceeded maximum memory limit of PHP), you'll need to setup a callback function to write to the file.
NOTE: I would recommend creating a class specifically to handle file downloads and file handles etc. rather than EVER using a global variable, but for the purposes of this example, the following shows how to get things up and running.
so, do the following:
# setup a global file pointer
$GlobalFileHandle = null;
function saveRemoteFile($url, $filename) {
global $GlobalFileHandle;
set_time_limit(0);
# Open the file for writing...
$GlobalFileHandle = fopen($filename, 'w+');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FILE, $GlobalFileHandle);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, "MY+USER+AGENT"); //Make this valid if possible
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); # optional
curl_setopt($ch, CURLOPT_TIMEOUT, -1); # optional: -1 = unlimited, 3600 = 1 hour
curl_setopt($ch, CURLOPT_VERBOSE, false); # Set to true to see all the innards
# Only if you need to bypass SSL certificate validation
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
# Assign a callback function to the CURL Write-Function
curl_setopt($ch, CURLOPT_WRITEFUNCTION, 'curlWriteFile');
# Exceute the download - note we DO NOT put the result into a variable!
curl_exec($ch);
# Close CURL
curl_close($ch);
# Close the file pointer
fclose($GlobalFileHandle);
}
function curlWriteFile($cp, $data) {
global $GlobalFileHandle;
$len = fwrite($GlobalFileHandle, $data);
return $len;
}
You can also create a progress callback to show how much / how fast you're downloading, however that's another example as it can be complicated when outputting to the CLI.
Essentially, this will take each block of data downloaded, and dump it to the file immediately, rather than downloading the ENTIRE file into memory first.
Much safer way of doing it!
Of course, you must make sure the URL is correct (convert spaces to %20 etc.) and that the local file is writeable.
Cheers,
James.
Let's try sending GET request to http://facebook.com:
$ curl -v http://facebook.com
* Rebuilt URL to: http://facebook.com/
* Hostname was NOT found in DNS cache
* Trying 69.171.230.5...
* Connected to facebook.com (69.171.230.5) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.35.0
> Host: facebook.com
> Accept: */*
>
< HTTP/1.1 302 Found
< Location: https://facebook.com/
< Vary: Accept-Encoding
< Content-Type: text/html
< Date: Thu, 03 Sep 2015 16:26:34 GMT
< Connection: keep-alive
< Content-Length: 0
<
* Connection #0 to host facebook.com left intact
What happened? It appears that Facebook redirected us from http://facebook.com to secure https://facebook.com/. Note what is response body length:
Content-Length: 0
It means that zero bytes will be written to xxxx--all_good.txt. This is why the file stays empty.
Your solution is absolutelly correct:
$fp = fopen('file.txt', 'w');
curl_setopt($handle, CURLOPT_FILE, $fp);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
All you need to do is change URL to https://facebook.com/.
Regarding other answers:
#JonGauthier: No, there is no need to use fwrite() after curl_exec()
#doublehelix: No, you don't need CURLOPT_WRITEFUNCTION for such a simple operation which is copying contents to file.
#ScottSaunders: touch() creates empty file if it doesn't exists. I think it was intention of OP.
Seriously, three answers and every single one is invalid?
You need to explicitly write to the file using fwrite, passing it the file handle you created earlier:
if ( $httpCode == 404 ) {
...
} else {
$contents = curl_exec($curl);
fwrite($fp, $contents);
}
curl_close($curl);
fclose($fp);
In your question you have
curl_setopt($curl, CURLOPT_FILE, $fp);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
but from PHP's curl_setopt documentation notes...
It appears that setting CURLOPT_FILE before setting CURLOPT_RETURNTRANSFER doesn't work, presumably because CURLOPT_FILE depends on CURLOPT_RETURNTRANSFER being set.
So do this:
<?php
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FILE, $fp);
?>
not this:
<?php
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
?>
...stating "CURLOPT_FILE depends on CURLOPT_RETURNTRANSFER being set".
Reference: https://www.php.net/manual/en/function.curl-setopt.php#99082
To avoid memory leak problems:
I was confronted with this problem as well. It's really stupid to say but the solution is to set CURLOPT_RETURNTRANSFER before CURLOPT_FILE!
it seems CURLOPT_FILE depends on CURLOPT_RETURNTRANSFER.
$curl = curl_init();
$fp = fopen("cache/temp_file.txt", "w+");
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($curl, CURLOPT_FILE, $fp);
curl_setopt($curl, CURLOPT_URL, $url);
curl_exec ($curl);
curl_close($curl);
fclose($fp);
The touch() function doesn't do anything to the contents of the file. It just updates the modification time. Look at the file_put_contents() function.