download with resume a file using curl - php

Does anyone know how to download a file using curl and PHP and supports resume. and link to a tutorial or code demonstration will be good. I'm using this code now but it doesn't support resume, and i don't know how to handle headers. help will be appreciated
$start_range = 0;
$url = URL to remote file
$end_range = $filesize;
while($start_range <= $end_range) {
if(($start_range + 9999999) > $end_range) $range = $start_range.'-';
else $range = $start_range.'-'.($start_range + 9999999);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_REFERER, $refurl);
curl_setopt($ch, CURLOPT_RANGE,$range);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1');
curl_exec($ch);
curl_close($ch);
$start_range +=10000000;
flush();
ob_flush()
}

Related

Grab image url using php DOM

little problem with php DOM.
<?php
$url = "http://www.dogpile.com/search/images?q=" . rawurlencode($_GET['q']);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_MAXREDIRS, 3);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13");
curl_setopt($ch, CURLOPT_HEADER, FALSE);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$output = curl_exec($ch);
curl_close($ch);
$dom = new DOMDocument;
$dom->loadHTML($output);
print_r($dom->getElementById('imageResults'));
?>
I'm trying to get thumbnail and full image url from the results. But I cannot grab required info...
http://simplehtmldom.sourceforge.net/manual.html
Download library read manual and use it!

Amazon Blocks cURL Request?

I am trying to use php cURL to fetch amazon web page but get
HTTP/1.1 503 Service Temporarily Unavailable instead. Is Amazon blocking cURL?
http://www.amazon.com/gp/offer-listing/B003B7Q5YY/
<?php
function get_html_content($url) {
// fake user agent
$userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch,CURLOPT_COOKIEFILE,'cookies.txt');
curl_setopt($ch,CURLOPT_COOKIEJAR,'cookies.txt');
$string = curl_exec($ch);
curl_close($ch);
return $string;
}
echo get_html_content("http://www.amazon.com/gp/offer-listing/B003B7Q5YY");
?>
I use simple
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $offers_page);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5');
$html = curl_exec($ch);
curl_close($ch);
but i have another problem. if you send a lot of queries to amazon - they start send 500 page to you.

How do I change header location of curl content scraper

I am scraping content from a url that is hosted in the UK using curl. When i view the site in my browser from the US it shows the product pricing in dollars but when i use curl to retrieve content it returns in Euros. I need it to return in US dollars as if you were viewing it from a browser in the US. Below is the code I am using
function LoadCURLPage($url, $agent = "Mozilla/5.0 (Windows; U; Windows NT 5.0; en-us; rv:1.4)
Gecko/20030624 Netscape/7.1 (ax)",
$cookie = '', $referer = '', $post_fields = '', $return_transfer = 1,
$follow_location = 1, $ssl = '', $curlopt_header = 1)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
if($ssl)
{
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
}
curl_setopt ($ch, CURLOPT_HEADER, $curlopt_header);
curl_setopt ($ch, CURLOPT_HTTPHEADER,array('User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16', 'Accept-language: en-us,en;q=0.7,bn;q=0.3', 'Accept-charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7'));
if($agent)
{
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
}
if($post_fields)
{
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_fields);
}
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
if($referer)
{
curl_setopt($ch, CURLOPT_REFERER, $referer);
}
if($cookie)
{
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
}
$result = curl_exec ($ch);
curl_close ($ch);
return $result;
}
// the url
$url = "http://us.asos.com/Adidas-Honey-Silver-Mid-Sneakers/ysrqb/?iid=2212284";
//the function
echo LoadCURLPage($url);
It's in a cookie. So either visit the page that sets that cookie, or edit your cookiejar file.

Embebed player can't connect to the network - CURL

I've made, using CURL, script that log in to the page which provides free streaming, then with CURL I'm going to subpage with choosen stream to watch.
Everything works fine while script is running via localhost (I'm using xampp), but when I put it on my web server it says that it can't connect to the network. Only thing that looks different is the cookie, on the web server it has not new lines /n. Everything is in one line.
How to deal with it? This is my class, which i use to connect with page:
class openTV {
public $channel;
function __construct($channel) {
$this -> channel = $channel;
}
function openChannel() {
$login_email = 'mail#gmail.com';
$login_pass = 'pass';
$fp = fopen("cookie.txt", "w");
fclose($fp);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://strona/user/login');
curl_setopt($ch, CURLOPT_POSTFIELDS,'email='.urlencode($login_email).'&password='.urlencode($login_pass));
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_REFERER, "http://strona/user/login");
$page = curl_exec($ch);
curl_setopt($ch, CURLOPT_URL, $this->channel);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.3) Gecko/20070309 Firefox/2.0.0.3");
curl_setopt($ch, CURLOPT_REFERER, $this->channel);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt');
curl_setopt($ch, CURLOPT_POST, 0);
$info = curl_getinfo ($ch);
$page = curl_exec($ch);
preg_match('/session_token=\[[a-zA-Z0-9]{8}\]/', $page, $matches);
$return['token'] = substr($matches[0], 31, 8);
preg_match('/<object(.*)>[.\s\S]*<\/object>/', $page, $matches);
$return['player'] = $matches[0];
//$return['player'] = $page;
$return['channel'] = $this->channel;
return $return;
}
}
You're using http://strona/ as your host.
Your server probably uses different configuration that doesn't try to append .example.com when it fails to find the host directly, on Linux that can be seen in /etc/resolve.conf:
search example.com
nameserver 1.2.3.4
Using full domain name (http://strona.example.com) or IP should fix the problem.
If not so, try whether you are able to ping (or otherwise connect) from server to target host, it may be an networking issue.

empty the ram after a curl call

I'm downloading large files with curl, but i don't think the buffer is emptied because the ram mileage keeps increasing until it reaches 100%, here is the code that i use.
if i close and open curl will that help??
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
while($start_range <= $end_range) {
if(($start_range + 999999) > $end_range) $range = $start_range.'-';
else $range = $start_range.'-'.($start_range + 999999);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch,CURLOPT_HTTPHEADER,array("ETag: $rddash"));
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_RANGE,$range);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1');
if ($tmp = curl_exec($ch)) $start_range +=1000000;
echo $tmp;
flush();
}
curl_close($ch);
curl_close(), takes a Curl resource as its only parameter,
closes the Curl session, then frees up the associated memory.

Categories