Download an Excel file with PHP and Curl - php

I have a repetitive task that I do daily. Log in to a web portal, click a link that pops open a new window, and then click a button to download an Excel spreadsheet. It's a time consuming task that I would like to automate.
I've been doing some research with PHP and cUrl, and while it seems like it should be possible, I haven't found any good examples. Has anyone ever done something like this, or do you know of any tools that are better suited for it?

Are you familiar with the basics of HTTP requests? Like, do you know the difference between a POST and a GET request? If what you're doing amounts to nothing more than GET requests, then it's actually super simple and you don't need to use cURL at all. But if "clicking a button" means submitting a POST form, then you will need cURL.
One way to check this is by using a tool such as Live HTTP Headers and watching what requests happen when you click on your links/buttons. It's up to you to figure out which variables need to get passed along with each request and which URLs you need to use.
But assuming that there is at least one POST request, here's a basic script that will post data and get back whatever HTML is returned.
<?php
if ( $ch = curl_init() ) {
$data = 'field1=' . urlencode('somevalue');
$data .= '&field2[]=' . urlencode('someothervalue');
$url = 'http://www.website.com/path/to/post.asp';
$userAgent = 'Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)';
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
$html = curl_exec($ch);
curl_close($ch);
} else {
$html = false;
}
// write code here to look through $html for
// the link to download your excel file
?>

try this >>>
$ch = curl_init();
$csrf_token = $this->getCSRFToken($ch);// this function to get csrf token from website if you need it
$ch = $this->signIn($ch, $csrf_token);//signin function you must do it and return channel
curl_setopt($ch, CURLOPT_HTTPGET, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 300);// if file large
curl_setopt($ch, CURLOPT_URL, "https://your-URL/anything");
$return=curl_exec($ch);
// the important part
$destination ="files.xlsx";
if (file_exists( $destination)) {
unlink( $destination);
}
$file=fopen($destination,"w+");
fputs($file,$return);
if(fclose($file))
{
echo "downloaded";
}
curl_close($ch);

Related

How to POST a file with PHP cURL with progress

I'm currently running a personal file upload site, which roughly does the following;
Select files to upload through the website
PHP uploads and zips the files together
The zipped file gets sent via cURL to a remote storage server
A URL is presented to share the files
I'm having an issue with step 3. I currently have a progress bar while the file uploads to the website and I'd like another percentage to show the cURL transfer to the storage server. I've seen that you that can use CURLOPT_PROGRESSFUNCTION - but I can not get it working, after a lot of searching.
My current code is as below;
// Prepare the POST data
$data = array(
'file' => new CurlFile('/path/to/local.zip', 'application/zip', 'uploaded_archive.zip'),
'upload_folder' => 'folder_name'
);
// Send the POST
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://example.com/upload.php");
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0");
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_NOPROGRESS, false);
curl_setopt($ch, CURLOPT_PROGRESSFUNCTION, 'progressCall');
curl_setopt($ch, CURLOPT_BUFFERSIZE, 128);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 160);
curl_setopt($ch, CURLOPT_TIMEOUT, 160);
$res = curl_exec($ch);
curl_close($ch);
This calls "progressCall" with the progress, which looks like the below to test;
// Function to save the output of progress from CURLOPT_PROGRESSFUNCTION
function progressCall($ch, $dltotal, $dlnow, $uptotal, $upnow) {
$progress = "$ch/$dltotal/$dlnow/$uptotal/$upnow";
file_put_contents('tmp/curlupload.txt', 'PERC: ' . $progress . "\n", FILE_APPEND);
}
The problem is, I get the output below;
https://gist.github.com/ialexpw/6160e32495d857cc0d8984373f3808ae
PERC: Resource id #15/0/0/0/65532
PERC: Resource id #15/0/0/0/65532
PERC: Resource id #15/0/0/0/131064
PERC: Resource id #15/0/0/0/131064
PERC: Resource id #15/0/0/0/196596
This seems to show the uploaded amount going up (which is OK), but has 0 for the total upload amount. I've tried a lot of example online, but I can not work out why the total amount to be uploaded is always 0. I tried setting a Content-Length, although it's the same.
Any help would be appreciated if something is wrong with the above.
Thanks!
Hello sorry for the response 2 years after, but just working on it !
Possible for Upload, but as mentionned we don't have in this example the total file size.
The easiest way if possible is to take the file size on the callback
function, with a global for example :
function progressCall($ch, $dltotal, $dlnow, $uptotal, $upnow) {
// You will get this before with | $fsize = filesize($thefile);
global $fsize;
$percent = round($upnow / $fsize * 100);
echo $percent . ' % downloaded';
}

JIRA API attachment names contain the whole paths of the posted files

I have been working with Jira API and have seen inconsistent results for my requests. Sometimes it works and sometimes it doesn't. Last week I was able to post attachments to issues just fine, but now an old problem occurred: the names of the attachments contain the whole path of the posted file, hence the attachments can't be opened. I use json representation to post files:
$array = array("file"=>"#filename");
json_encode($array);
...
This gets the file posted but the problem is when it's posted the file names in JIRA are like this:
/var/www/user/text.text
Needless to say it can't be opened in JIRA. I had this problem before, then it suddenly disappeared, now it occurred again. I don't really get it. By the way I am not using curl for this request even though it might be recommended in the documentation.
I realize this question is somewhat old but I had a similar problem. It seems Jira doesn't necessarily trim the filename as expected. I was able to fix it with the following. If you're using PHP >= 5.5.0:
$url = "http://example.com/jira/rest/api/2/issue/123456/attachments";
$headers = array("X-Atlassian-Token: nocheck");
$attachmentPath = "/full/path/to/file";
$filename = array_pop(explode('/', $attachmentPath));
$cfile = new CURLFile($attachmentPath);
$cfile->setPostFilename($filename);
$data = array('file'=>$cfile);
$ch = curl_init();
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERPWD, "$user:$pass");
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
$result = curl_exec($ch);
$ch_error = curl_error($ch);
if ($ch_error){
echo "cURL Error: $ch_error"; exit();
} else {
print_r($result);
}
For PHP <5.5.0 but > 5.2.10 (see this bug):
$data = array('file'=>"#{$attachmentPath};filename={$filename}");
Yes, I filed an issue on this at https://jira.atlassian.com/browse/JRA-30765
Adding attachments to JIRA by REST is sadly not as useful as it could be.
Interesting that the problem went away - perhaps you were running your script from a different location?

Can't get to work php script using file_get_content or cURL

I have this issue please.
I use this script here to send an automatic SMS:
<?php
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('http://url.goes.here:8080/bulksms/bulksms?username=user&passwd=password&type=0&dlr=1&destination=phone&source=FLM&message=hello');
?>
The issue is, if i submit the URL directly i get something like:
1701|355662080090|a2406d38-1baf-42a2-a1a5-e5798859e400
And a sms arrives to my phone, but if i use the above methods i won't get the sms..
Anyone can suggest me why is so?
Thanks..
If you DO have allow_url_fopen activated in your php.ini, then you should look into changing the user agent for the request (use cURL).
Many SMS gateways use user agent filtering when processing their requests...
Correct me if I am wrong, but I suppose you use RouteSMS, right?

How CURL Login with Captcha and Session

define('COOKIE', './cookie.txt');
define('MYURL', 'https://register.pandi.or.id/main');
function getUrl($url, $method='', $vars='', $open=false) {
$agents = 'Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16';
$header_array = array(
"Via: 1.1 register.pandi.or.id",
"Keep-Alive: timeout=15,max=100",
);
static $cookie = false;
if (!$cookie) {
$cookie = session_name() . '=' . time();
}
$referer = 'https://register.pandi.or.id/main';
$ch = curl_init();
if ($method == 'post') {
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "$vars");
}
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $header_array);
curl_setopt($ch, CURLOPT_USERAGENT, $agents);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 5);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt($ch, CURLOPT_REFERER, $referer);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIE);
curl_setopt($ch, CURLOPT_COOKIEFILE, COOKIE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
$buffer = curl_exec($ch);
if (curl_errno($ch)) {
echo "error " . curl_error($ch);
die;
}
curl_close($ch);
return $buffer;
}
function save_captcha($ch) {
$agents = 'Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.204 Safari/534.16';
$url = "https://register.pandi.or.id/jcaptcha";
static $cookie = false;
if (!$cookie) {
$cookie = session_name() . '=' . time();
}
$ch = curl_init(); // Initialize a CURL session.
curl_setopt($ch, CURLOPT_URL, $url); // Pass URL as parameter.
curl_setopt($ch, CURLOPT_USERAGENT, $agents);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIE);
curl_setopt($ch, CURLOPT_COOKIEFILE, COOKIE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // Return stream contents.
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1); // We'll be returning this
$data = curl_exec($ch); // // Grab the jpg and save the contents in the
curl_close($ch); // close curl resource, and free up system resources.
$captcha_tmpfile = './captcha/captcha-' . rand(1000, 10000) . '.jpg';
$fp = fopen($tmpdir . $captcha_tmpfile, 'w');
fwrite($fp, $data);
fclose($fp);
return $captcha_tmpfile;
}
if (isset($_POST['captcha'])) {
$id = "yudohartono";
$pw = "mypassword";
$postfields = "navigation=authenticate&login-type=registrant&username=" . $id . "&password=" . $pw . "&captcha_response=" . $_POST['captcha'] . "press=login";
$url = "https://register.pandi.or.id/main";
$result = getUrl($url, 'post', $postfields);
echo $result;
} else {
$open = getUrl('https://register.pandi.or.id/main', '', '', true);
$captcha = save_captcha($ch);
$fp = fopen($tmpdir . "/cookie12.txt", 'r');
$a = fread($fp, filesize($tmpdir . "/cookie12.txt"));
fclose($fp);
<form action='' method='POST'>
<img src='<?php echo $captcha ?>' />
<input type='text' name='captcha' value=''>
<input type='submit' value='proses'>
</form>";
if (!is_readable('cookie.txt') && !is_writable('cookie.txt')) {
echo "cookie fail to read";
chmod('../pandi/', '777');
}
}
this cookie.txt
# Netscape HTTP Cookie File
# http://curl.haxx.se/rfc/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.
register.pandi.or.id FALSE / FALSE 0 JSESSIONID 05CA8241C5B76F70F364CA244E4D1DF4
after i submit form just display
HTTP/1.1 200 OK Date: Wed, 27 Apr 2011 07:38:08 GMT Server: Apache-Coyote/1.1 X-Powered-By: Servlet 2.4; Tomcat-5.0.28/JBoss-4.0.0 (build: CVSTag=JBoss_4_0_0 date=200409200418) Content-Length: 0 Via: 1.1 register.pandi.or.id Content-Type: text/plain X-Pad: avoid browser bug
if not error "Captcha invalid"
always failed login to pandi
what wrong in my script?
I'm not want to Break Captcha but i want display captcha and user input captcha from my web page, so user can registrar domain dotID from my web automaticaly
A captcha is intended to differentiate between humans and robots (programs). Seems like you are trying to log in with a program. The captcha seems to do its job :).
I don't see a legal way around.
It happens because,
You took your captcha image from first getURL (ie first curl_exec) and processed the captcha but to submit your captcha you are requested getURL (ie again curl_exec) which means to a new page with a new captcha again.
So you are placing the old captcha and putting it in the new captcha. I'm having the same problem & resolved it.
Captcha is a dynamic image created by the server when you hit the page. It will keep changing, you must extract the captcha from the page and then parse it and then submit your page for a login. Captcha will keep changing as and when the page is triggered to load!
Using a headless browsing solution this is possible. ie: zombie.js coffee.js on Node.. Also it may be possible to extract the "image" from the captcha and, using image recognition, "read" the image and convert it to text, which is then posted with the form.
As of today, the only surefire method to "trick" a captcha is to use headless browsing.
Yes, Andro Selva is right. On the second request it gives new captcha. Once it loads captcha with getUrl function and the second load is from the save_captcha function, so this are 2 different images.
It must do something like this:
Download the captcha image before close the curl and before post and tell the script to wait untill you provide captcha answer - I will use preg_match. It will require some javascript as well.
If the captcha image is generated from javascript, you need to execute this javascript with the same cookie or token. In this situation, the easier solution is to record the headers with e.g. livehttpheaders addon for mozila ffox.
With PHP I do not know how to do it, you have to get the captcha and find a way to solve it. It has a lot of algorithms to do it for you, but if you want to use java, I already hacked the source code from this link to get the code to solve the captcha and it works very well for a lot of captcha systems.
So, you could try to implement your own captcha solver, that will take a lot of time, try to find an existing implementation for PHP, or, IMHO, the best option, to use the JDownloader code base.

Updating Twitter background via API

I'm having a little trouble updating backgrounds via Twitter's API.
$target_url = "http://www.google.com/logos/11th_birthday.gif";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Expect:'));
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html = curl_exec($ch);
$content = $to->OAuthRequest('http://twitter.com/account/update_profile_background_image.xml', array('profile_background_image_url' => $html), 'POST');
When I try to pull the raw data via cURL or file_get_contents, I get this...
Expectation Failed The expectation given in the Expect request-header
field could not be met by this server.
The client sent
Expect: 100-continue but we only allow the 100-continue expectation.
OK, you can't direct Twitter to a URL, it won't accept that. Looking around a bit I've found that the best way is to download the image to the local server and then pass that over to Twitter almost like a form upload.
Try the following code, and let me know what you get.
// The URL from an external (or internal) server we want to grab
$url = 'http://www.google.com/logos/11th_birthday.gif';
// We need to grab the file name of this, unless you want to create your own
$filename = basename($url);
// This is where we'll be saving our new file to. Replace LOCALPATH with the path you would like to save the file to, i.e. www/home/content/my_directory/
$newfilename = 'LOCALPATH' . $filename;
// Copy it over, PHP will handle the overheads.
copy($url, $newfilename);
// Now it's OAuth time... fingers crossed!
$content = $to->OAuthRequest('http://twitter.com/account/update_profile_background_image.xml', array('profile_background_image_url' => $newfilename), 'POST');
// Echo something so you know it went through
print "done";
Well, given the error message, it sounds like you should load the URL's contents yourself, and post the data directly. Have you tried that?

Categories