I'm trying to get a page content with cURL or file_get_content. On many websites it's working but i'm trying to do that on a friend's server and it's not.
I think there is a protection with header or things like that. I get the following error code : 401 forbidden. If i try to reach the same page with a normal browser it works.
Here is my code for the file_get_contents function :
$homepage = file_get_contents('http://192.168.1.3');
echo $homepage; // just a test to see if the page is loaded, it's not.
if (preg_match("/my regex/", $homepage)) {
// ... some code
}
I also tryed with cURL :
$url = urlencode('http://192.168.1.3');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0');
$result = curl_exec($ch) or die("Not working");
curl_close($ch);
echo $result; // not working ..
Nothing works, maybe i should add more args to curl_setopt ...
Thanks.
PS : If i try with linux (wget) i get an error, but if i try with aria2c it's working.
HTTP Status 401 means that UNAUTHORIZED. You need send the server with username and passwd。
With file_get_contents, you add the second param . That's a context-steam, which you can set header info.
You'd better to use curl for file_get_contents intend to access local file, as it's a block function. Add the option as following, it's a basic authorize.
curl_setopt($ch,CURLOPT_USERPWD,"my_username:my_password");
try this update with useragent
<?php
$curlSession = curl_init();
curl_setopt($curlSession, CURLOPT_URL, 'http://192.168.1.3/');
curl_setopt($curlSession,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($curlSession, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curlSession, CURLOPT_RETURNTRANSFER, true);
$homepage = curl_exec($curlSession);
curl_close($curlSession);
echo $homepage ;
?>
if still getting blank page you have to install this add-on on firefox and see the "request-headers" and "response-headers"
Related
I have an extremely simple script:
<?php
$jsonurl = "http://api.wipmania.com/json";
$json = file_get_contents($jsonurl);
echo $json;
?>
It works for this URL, but when I call it with this URL: https://erikberg.com/nba/standings.json
it is not echoing the data. What is the reason for this? I'm probably missing a concept here. Thanks
The problem for that particular URL is that it's expecting a different User Agent, different to the default that PHP is using with file_get_contents()
Here is a better example using CURL. It's more robust although it takes more lines of code to configure it and make it run:
// create curl resource
$ch = curl_init();
// set the URL
curl_setopt($ch, CURLOPT_URL, 'https://erikberg.com/nba/standings.json');
// Return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// Fake the User Agent for this particular API endpoint
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
// $output contains the output string.
$output = curl_exec($ch);
// close curl resource to free up system resources.
curl_close($ch);
// You have your JSON response here
echo $output;
I've written a small PHP script for grabbing images with curl and saving them locally.
It reads the urls for the images from my db, grabs it and saves the file to a folder.
Tested and works on a couple other websites before, fails with a new one I'm trying it with.
I did some reading around, modified the script a bit but still nothing.
Please suggest what to look out for.
$query_products = "SELECT * from product";
$products = mysql_query($query_products, $connection) or die(mysql_error());
$row_products = mysql_fetch_assoc($products);
$totalRows_products = mysql_num_rows($products);
do {
$ch = curl_init ($row_products['picture']);
$agent= 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)';
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:2.0) Gecko/20110319 Firefox/4.0');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
$rawdata = curl_exec ($ch);
$http_status = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close ($ch);
if($http_status==200){
$fp = fopen("images/products/".$row_products['productcode'].".jpg", 'w');
fwrite($fp, $rawdata);
fclose($fp);
echo ' -- Downloaded '.$newname.' to local: '.$newname.'';
} else {
echo ' -- Failed to download '.$row_products['picture'].'';
}
usleep(500);
} while ($row_products = mysql_fetch_assoc($products));
Your target website may require/check a combination of things. In order:
Location. Some websites only allow the referer to be a certain value (either their site or no referer, to prevent hotlinking)
Incorrect URL
Cookies. Yes, this can be checked
Authentication of some sort
The only way to do this is to sniff what a normal request looks like and to mimic it. Your MSIE user-agent string looks different from a genuine MSIE UA, however, and I'd consider changing it to an exact copy of a real one if I were you.
Could you get curl to output to a file (using the setopt for output stream) and telling us what error code you are getting, along with the URL of an image? This will help me be more precise.
Also, 0 isn't a success - it's a failure
I am trying to download a source code of web pages using curl php code but its downloading only for few pages for rest pages file is empty.
I googled it but im not getting solution.
My source code is :-
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $strurl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch,CURLOPT_USERAGENT, 'CURL via PHP');
$out = curl_exec($ch);
$fp = fopen('f1.html', 'w');
fwrite($fp, $out);
fclose($fp);
curl_close($ch);
What options to add ? Where i am wrong ?
Pls help.
Try setting a user-agent that suggests you're a browser. Some servers will block curl/wget/etc.
For example: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.152 Safari/537.22
I have a long url nested in a variable: $mp4, and trying to download it with curl but i'm getting malformed error. Please help me if you can, thank you in advance!
The below is what I have in my php script:
exec("curl -o $fnctid.mp4 \"$mp4\"");
Error message:
curl: (3) <url> malformed
Sample url to test download:
http://f26.stream.nixcdn.com/6f4df1d8c248cf149b846c24d32f1c35/514e0209/PreNCT5/22-TaylorSwift-2426783.mp4
The current url is returning 408 - Request Timeout if that is fixed you are you this simple code :
$url = 'http://f26.stream.nixcdn.com/6f4df1d8c248cf149b846c24d32f1c35/514e0209/PreNCT5/22-TaylorSwift-2426783.mp4';
$useragent = 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.2 (KHTML, like Gecko) Chrome/5.0.342.3 Safari/533.2';
$file = __DIR__ . DIRECTORY_SEPARATOR . basename($url);
$fp = fopen($file, 'w+');
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_TIMEOUT, 320);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
echo curl_exec($ch);
var_dump(curl_getinfo($ch)); // return request information
curl_close($ch);
fclose($fp);
This error can be resolved through using urlencode
$url = urlencode ( $url )
This function is convenient when encoding a string to be used in a query part of a URL, as a convenient way to pass variables.
Hope this solve answer
I want to download a page from the web, it's allowed to do when you are using a simple browser like Firefox, but when I use "file_get_contents" the server refuses and replies that it understands the command but don't allow such downloads.
So what to do? I think I saw in some scripts (on Perl) a way to make your script like a real browser by creating a user agent and cookies, which makes the servers think that your script is a real web browser.
Does anyone have an idea about this, how it can be done?
Use CURL.
<?php
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
// set the UA
curl_setopt($ch, CURLOPT_USERAGENT, 'My App (http://www.example.com/)');
// Alternatively, lie, and pretend to be a browser
// curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)');
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
?>
(From http://uk.php.net/manual/en/curl.examples-basic.php)
Yeah, CUrl is pretty good in getting page content. I use it with classes like DOMDocument and DOMXPath to grind the content to a usable form.
function __construct($useragent,$url)
{
$this->useragent='Firefox (WindowsXP) - Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.'.$useragent;
$this->url=$url;
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$this->xpath = new DOMXPath($dom);
}
...
public function displayResults($site)
$data=$this->path[0]->length;
for($i=0;$i<$data;$i++)
{
$delData=$this->path[0]->item($i);
//setting the href and title properties
$urlSite=$delData->getElementsByTagName('a')->item(0)->getAttribute('href');
$titleSite=$delData->getElementsByTagName('a')->item(0)->nodeValue;
//setting the saves and additoinal
$saves=$delData->getElementsByTagName('span')->item(0)->nodeValue;
if ($saves==NULL)
{
$saves=0;
}
//build the array
$this->newSiteBookmark[$i]['source']='delicious.com';
$this->newSiteBookmark[$i]['url']=$urlSite;
$this->newSiteBookmark[$i]['title']=$titleSite;
$this->newSiteBookmark[$i]['saves']=$saves;
}
The latter is a part of a class that scrapes data from delicious.com .Not very legal though.
This answer takes your comment to Rich's answer in mind.
The site is probably checking whether or not you are a real user using the HTTP referer or the User Agent string. try setting these for your curl:
//pretend you came from their site already
curl_setopt($ch, CURLOPT_REFERER, 'http://domainofthesite.com');
//pretend you are firefox 3.06 running on windows Vista
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.0.6) Gecko/2009011913 Firefox/3.0.6');
Another way to do it (though others have pointed out a better way), is to use PHP's fopen() function, like so:
$handle = fopen("http://www.example.com/", "r");//open specified URL for reading
It's especially useful if cURL isn't available.