I'm downloading large files with curl, but i don't think the buffer is emptied because the ram mileage keeps increasing until it reaches 100%, here is the code that i use.
if i close and open curl will that help??
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
while($start_range <= $end_range) {
if(($start_range + 999999) > $end_range) $range = $start_range.'-';
else $range = $start_range.'-'.($start_range + 999999);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch,CURLOPT_HTTPHEADER,array("ETag: $rddash"));
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_RANGE,$range);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1');
if ($tmp = curl_exec($ch)) $start_range +=1000000;
echo $tmp;
flush();
}
curl_close($ch);
curl_close(), takes a Curl resource as its only parameter,
closes the Curl session, then frees up the associated memory.
Related
I am scraping content from a url that is hosted in the UK using curl. When i view the site in my browser from the US it shows the product pricing in dollars but when i use curl to retrieve content it returns in Euros. I need it to return in US dollars as if you were viewing it from a browser in the US. Below is the code I am using
function LoadCURLPage($url, $agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-us; rv:1.4)
Gecko/20030624 Netscape/7.1 (ax)",
$cookie = '', $referer = '', $post_fields = '', $return_transfer = 1,
$follow_location = 1, $ssl = '', $curlopt_header = 1)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
if($ssl)
{
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
}
curl_setopt ($ch, CURLOPT_HEADER, $curlopt_header);
curl_setopt ($ch, CURLOPT_HTTPHEADER,array('User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16', 'Accept-language: en-us,en;q=0.7,bn;q=0.3', 'Accept-charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7'));
if($agent)
{
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
}
if($post_fields)
{
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);
}
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
if($referer)
{
curl_setopt($ch, CURLOPT_REFERER, $referer);
}
if($cookie)
{
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
}
$result = curl_exec ($ch);
curl_close ($ch);
return $result;
}
// the url
$url = "http://us.asos.com/Adidas-Honey-Silver-Mid-Sneakers/ysrqb/?iid=2212284";
//the function
echo LoadCURLPage($url);
It's in a cookie. So either visit the page that sets that cookie, or edit your cookiejar file.
Does anyone know how to download a file using curl and PHP and supports resume. and link to a tutorial or code demonstration will be good. I'm using this code now but it doesn't support resume, and i don't know how to handle headers. help will be appreciated
$start_range = 0;
$url = URL to remote file
$end_range = $filesize;
while($start_range <= $end_range) {
if(($start_range + 9999999) > $end_range) $range = $start_range.'-';
else $range = $start_range.'-'.($start_range + 9999999);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_REFERER, $refurl);
curl_setopt($ch, CURLOPT_RANGE,$range);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1');
curl_exec($ch);
curl_close($ch);
$start_range +=10000000;
flush();
ob_flush()
}
I've been playing with this curl facebook login script for a while just trying to get to grips with some of the features in curl, but it seems that I can not get the cookies to register:
php script
function facebookLogin(){
$login_email = 'email';
$login_pass = 'pass';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.facebook.com/login.php');
curl_setopt($ch, CURLOPT_POSTFIELDS,'email='.urlencode($login_email).'&pass='.urlencode($login_pass).'&login=Login');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt");
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_REFERER, "http://www.facebook.com");
$page = curl_exec($ch);
echo $page;
}
I have a text file called cookies.txt which is in the same directory as the script, but after running this script nothing is written into the file and therefore no cookies are created, this is a big issue when trying to explore other web pages on the same website as you have to keep logging in.
Where am I going wrong?
Ok it turns out it is registered even if the cookies.txt file is empty but you need to make sure you call this file when you try to explore other parts of the site e.g.
function facebookGoToMessages(){
facebookLogin();
$ch = curl_init ("http://www.facebook.com/messages");
curl_setopt ($ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_REFERER, "http://www.facebook.com");
$page = curl_exec ($ch);
echo $page;
}
I am writing a script to download files from a password protected members area. I have it working right now by using a curl call to login and then download. But the issue I am trying to fix is that I could like to have a script login and save the cookie then another script use the cookie to download the file needed. Now I am not sure if this is possible.
Here is my working code:
$cookie_file_path = "downloads/cookie.txt";
$fp = fopen($cookie_file_path, "w");
fclose($fp);
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_URL, $loginUrl);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $loginPostInfo);
curl_exec($ch);
// harddcode some known data
$downloadSize = 244626770;
$chuckSize = 1024*2048;
$filePath = "downloads/file.avi";
$file = fopen($filePath, "w");
$downloaded = 0;
$startTime = microtime(true);
while ($downloaded < $downloadSize) {
// DOWNLOAD
curl_setopt($ch, CURLOPT_RANGE, $downloaded."-".($downloaded + $chuckSize - 1));
curl_setopt($ch, CURLOPT_URL, $downloadUrl);
$result = curl_exec($ch);
$nowTime = microtime(true);
fwrite($file, $result);
echo "\n\nprogress: ".$downloaded."/".$downloadSize." - %".(round($downloaded / $downloadSize, 4) * 100);
$downloaded += $chuckSize;
// calculate kbps
$totalTime = $nowTime - $startTime;
$kbps = $downloaded / $totalTime;
echo "\ndownloaded: ".$downloaded." bytes";
echo "\ntime: ".round($totalTime, 2);
echo "\nkbps: ".(round($kbps / 1024, 2));
}
fclose($file);
curl_close($ch);
So is it possible to close the curl after the login curl_exec and then open a curl call again to download the file using the cookie I saved during the login part?
Yes it's possible.
CURLOPT_COOKIEJAR is the write path for cookies, while CURLOPT_COOKIEFILE is the read path for cookies. If you provide CURLOPT_COOKIEFILE with the same path as you did with CURLOPT_COOKIEJAR, cURL will persist the cookies across requests:
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
On ImpressPages I've done it this way:
//initial request with login data
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com/login.php');
curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/32.0.1700.107 Chrome/32.0.1700.107 Safari/537.36');
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, "username=XXXXX&password=XXXXX");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie-name'); //could be empty, but cause problems on some hosts
curl_setopt($ch, CURLOPT_COOKIEFILE, '/var/www/ip4.x/file/tmp'); //could be empty, but cause problems on some hosts
$answer = curl_exec($ch);
if (curl_error($ch)) {
echo curl_error($ch);
}
//another request preserving the session
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com/profile');
curl_setopt($ch, CURLOPT_POST, false);
curl_setopt($ch, CURLOPT_POSTFIELDS, "");
$answer = curl_exec($ch);
if (curl_error($ch)) {
echo curl_error($ch);
}
Yes, please loop at the CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE. I see you already use CURLOPT_COOKIEJAR, so you should probably only dive into *_COOKIEJAR.
Does anybody know why could cURL under php5 be so damn slow to fail even at 45s timeout, downloading a few kb file on a speedO'light server?
The code is here as requested (although I upped the timeouts even more for the script not to fail during execution and changed useragent to Mozilla/4.0 from initial Chrome):
$ch = curl_init('http://www.somesite.com/' . $key);
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.somesite.com/somereferer/');
// curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/530.5 (KHTML, like Gecko) Chrome/2.0.172.39 Safari/530.5');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0');
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
curl_setopt($ch, CURLOPT_DNS_CACHE_TIMEOUT, 600);
hmm, could be a few things, maybe some verbose output will have an error of some kind
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_VERBOSE, true); // some output will go to stderr / error_log
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
$response = curl_exec($ch);
$errStr = curl_error($ch);
$errNum = curl_errno($ch);
$head = curl_getinfo($ch, CURLINFO_HEADER_OUT);
$ci = curl_getinfo($ch);
print_r(array($head, $errStr, $errNum, $ci));
Sometimes the user agent will change how a site responds, may need to do something like:
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.001 (windows; U; NT4.0; en-US; rv:1.0) Gecko/25250101');
When I set a CONNECTtimeout, I get faster response.
Including this option:
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,1)