I want to do 300 redirect by php but first I want script will check whether the site is online or not if online then it will redirect else it will show unable to redirect. Can any one tell me how is possible?
thanks
This should do it (edited for betterness)
$destination = 'http://www.google.com';
$ch = curl_init($destination);
// use request type HEAD because it's faster and lighter
curl_setopt($ch, CURLOPT_NOBODY, true);
// prevent curl from dumping any output
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// prevent curl from producing any output
curl_setopt($ch, CURLOPT_HEADER, false);
// run request
curl_exec($ch);
// consider request a success if the HTTP Code is less than 400 (start of errors)
// change this to whatever you expect to get, e.g. "equal to 200 (OK)"
$success = (bool) ((int)curl_getinfo($ch, CURLINFO_HTTP_CODE) < 400);
curl_close($ch);
// redirect or die
if ($success) {
header('Location: ' . $destination, true, 301);
} else {
die('Unable to redirect to '.$destination.'');
}
Related
I'm new with curl and I can't find my answer.
I want to get the http status of a page, so I'm using
$httpCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
But this does not work if there is a redirection (for example a 301 that redirect to another page, it will give me a 200 answer) because curlinfo_http_code gives the Last received HTTP code
Any idea how I can get the first received http code ?
thanks
You need to configure curl with CURLOPT_FOLLOWLOCATION = 0
The way I am accomplishing this is by having PHP consult an INI file with the different return codes and replacing them with something human readable
<?php
$ch = curl_init(); // create cURL handle (ch)
if (!$ch) {
die("Couldn't initialize a cURL handle");
}
// set some cURL options
$ret = curl_setopt($ch, CURLOPT_URL, "http://mail.yahoo.com");
$ret = curl_setopt($ch, CURLOPT_HEADER, 1);
$ret = curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$ret = curl_setopt($ch, CURLOPT_RETURNTRANSFER, 0);
$ret = curl_setopt($ch, CURLOPT_TIMEOUT, 30);
// execute
$ret = curl_exec($ch);
if (empty($ret)) {
// some kind of an error happened
die(curl_error($ch));
curl_close($ch); // close cURL handler
} else {
$info = curl_getinfo($ch);
curl_close($ch); // close cURL handler
if (empty($info['http_code'])) {
die("No HTTP code was returned");
} else {
// load the HTTP codes
$http_codes = parse_ini_file("HTMLCodes.ini");
// echo results
echo "The server responded: <br />";
echo $http_codes[$info['http_code']];
}
}
?>
example of the ini file with return codes
[Informational 1xx]
100="Continue"
101="Switching Protocols"
[Successful 2xx]
200="OK"
201="Created"
202="Accepted"
203="Non-Authoritative Information"
204="No Content"
205="Reset Content"
206="Partial Content"
[Redirection 3xx]
300="Multiple Choices"
301="Moved Permanently"
302="Found"
303="See Other"
304="Not Modified"
305="Use Proxy"
306="(Unused)"
307="Temporary Redirect"
Below code snippet is to get the final URL(which has media/zip/rar file) from redirect URL by using cURL. It gets the final URL, no doubt about it, but what it does is according to the size of file it varies in time to get URL.
Suppose file at final URL is 1MB, it will take around 5sec to retrieve. But if the file is about 35MB, it takes time about 150 sec. I think cURL is downloading result and finally fetching the URL from result.
<?php
echo get_rurl("x_url");//1.2MB -> 5-10sec
//echo get_rurl("y_url");//31.6MB -> 150sec
function get_rurl($url){
// initialize cURL
$curl = curl_init($url);
curl_setopt_array($curl, array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_FOLLOWLOCATION => true,
));
// execute the request
$result = curl_exec($curl);
// fail if the request was not successful
if ($result === false) {
curl_close($curl);
return null;
}
// extract the target url
$redirectUrl = curl_getinfo($curl, CURLINFO_EFFECTIVE_URL);
curl_close($curl);
return $redirectUrl;
}
?>
i cant use file_get_content() because i just want to get the final URL from given redirect URL.
So in short - how to get the final URL from redirect URL without downloading results.
Hope i make it clear. Any help will be appreciated.
This works fine with CURLINFO_EFFECTIVE_URL, but for it the option CURLOPT_FOLLOWLOCATION must set to TRUE. This is on the grounds that CURLINFO_EFFECTIVE_URL returns precisely what it says, the effective url that ends up getting loaded. If the CURLOPT_FOLLOWLOCATION=False then the effective url will be requested url, else it will be final url that is redirected to.
I did this using curl_getinfo. which gives me information regarding the last transfer
<?php
echo get_rurl("xurl");
//echo get_rurl("yurl");
function get_rurl($url){
// initialize cURL
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); //specify your URL
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); //disable follow redirects
$http_data = curl_exec($ch); //hit the $url
$curl_info = curl_getinfo($ch);
return $curl_info['redirect_url'];// extract final url
}
?>
or
Even you can use CURLINFO_REDIRECT_URL or CURLINFO_EFFECTIVE_URL depending upon your use cases. refer here
<?php
echo get_rurl("xurl");
//echo get_rurl("yurl");
function get_rurl($url){
// initialize cURL
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url); //specify your URL
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); //disable follow redirects
$http_data = curl_exec($ch); //hit the $url
return curl_getinfo($ch, CURLINFO_REDIRECT_URL);
}
?>
Hope this helps to others users too.
According to the documentation of libcurl (https://curl.haxx.se/libcurl/c/CURLOPT_FOLLOWLOCATION.html), this is exactly as is expected when using CURLOPT_FOLLOWLOCATION => true,. You probably want to change this to false.
If you just enter the urls into the browser you can see that both work, cdon works even without javascript, have they blocked cURL somehow?
I'm trying to build a scraper to benifit legal movies online which would benifit them a whole lot, seems stupid blocking scrapers in general imho. Although I'm far from sure that's whats going on here! Might be just an error somewhere..
// Works
get_file1('http://sfanytime.com/sv-SE/Sokresultat/?field=all&q=The+Matrix', '/', 'sfanytime.html');
// Saves a blank 0 KB file
get_file1('http://downloads.cdon.com/index.phtml?action=search&search_terms=The+Matrix', '/', 'cdon.html');
function get_file1($file, $local_path, $newfilename) {
$out = fopen($newfilename, 'wb');
if ($out === FALSE) {
return false;
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_FILE, $out);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_URL, $file);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
$error = curl_error($ch);
if (strlen($error) > 0) {
echo "<br>Error is : ". $error;
return false;
}
curl_close($ch);
return true;
}
You should change the line
curl_setopt($ch, CURLOPT_FAILONERROR, true);
...to...
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
CURLOPT_FAILONERROR will cause a "silent fail" - which from what you say, is not what you want. I have replaced this with CURLOPT_FOLLOWLOCATION, because when I visit the second URL, I get redirected to a "choose your country" type page, which will be a response with an empty body - which is why you get an empty file.
There is no problem with your code as such, simply a problem with the way you handle the response from the second URL. You don't see an error because, technically, there wasn't one.
The curl_getinfo function returns a lot of metadata about the result of an HTTP request. However, for some reason it doesn't include the bit of information I want at the moment, which is the target URL if the request returns an HTTP redirection code.
I'm not using CURLOPT_FOLLOWLOCATION because I want to handle specific redirect codes as special cases.
If cURL can follow redirects, why can't it tell me what they redirect to when it isn't following them?
Of course, I could set the CURLOPT_HEADER flag and pick out the Location header. But is there a more efficient way?
This can be done in 4 steps:
Step 1. Initialise curl
curl_init($ch); //initialise the curl handle
//COOKIESESSION is optional, use if you want to keep cookies in memory
curl_setopt($this->ch, CURLOPT_COOKIESESSION, true);
Step 2. Get the headers for $url
curl_setopt($ch, CURLOPT_URL, $url); //specify your URL
curl_setopt($ch, CURLOPT_HEADER, true); //include headers in http data
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); //don't follow redirects
$http_data = curl_exec($ch); //hit the $url
$curl_info = curl_getinfo($ch);
$headers = substr($http_data, 0, $curl_info['header_size']); //split out header
Step 3. Check if you have the correct response code
if (!($curl_info['http_code']>299 && $curl_info['http_code']<309)) {
//return, echo, die, whatever you like
return 'Error - http code'.$curl_info['http_code'].' received.';
}
Step 4. Parse the headers to get the new URL
preg_match("!\r\n(?:Location|URI): *(.*?) *\r\n!", $headers, $matches);
$url = $matches[1];
Once you have the new URL you can then repeat steps 2-4 as often as you like.
You can simply use it: (CURLINFO_REDIRECT_URL)
$info = curl_getinfo($ch, CURLINFO_REDIRECT_URL);
echo $info; // the redirect URL without following it
as you mentioned, disable the CURLOPT_FOLLOWLOCATION option (before executing) and place my code after executing.
CURLINFO_REDIRECT_URL - With the CURLOPT_FOLLOWLOCATION option
disabled: redirect URL found in the last transaction, that should be
requested manually next. With the CURLOPT_FOLLOWLOCATION option
enabled: this is empty. The redirect URL in this case is available in
CURLINFO_EFFECTIVE_URL
Refrence
curl doesn't seem to have a function or option to get the redirect target, it can be extracted using various techniques:
From the response:
Apache can respond with a HTML page in case of a 301 redirect (Doesn't seem to be the case with 302's).
If the response has a format similar to:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved here.</p>
<hr>
<address>Apache/2.2.16 (Debian) Server at www.xxx.yyy Port 80</address>
</body></html>
You can extract the redirect URL using DOMXPath:
$i = 0;
foreach($urls as $url) {
if(substr($url,0,4) == "http") {
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$result = #curl_exec($c);
$status = curl_getinfo($c,CURLINFO_HTTP_CODE);
curl_close($c);
$results[$i]['code'] = $status;
$results[$i]['url'] = $url;
if($status === 301) {
$xml = new DOMDocument();
$xml->loadHTML($result);
$xpath = new DOMXPath($xml);
$href = $xpath->query("//*[#href]")->item(0);
$results[$i]['target'] = $href->attributes->getNamedItem('href')->nodeValue;
}
$i++;
}
}
Using CURLOPT_NOBODY
There is a faster way however, as #gAMBOOKa points out; Using CURLOPT_NOBODY. This approach just sends a HEAD request instead of GET (not downloading the actual content, so it should be faster and more efficient) and stores the response header.
Using a regex the target URL can be extracted from the header:
foreach($urls as $url) {
if(substr($url,0,4) == "http") {
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_NOBODY,true);
curl_setopt($c, CURLOPT_HEADER, true);
$result = #curl_exec($c);
$status = curl_getinfo($c,CURLINFO_HTTP_CODE);
curl_close($c);
$results[$i]['code'] = $status;
$results[$i]['url'] = $url;
if($status === 301 || $status === 302) {
preg_match("#https?://([-\w\.]+)+(:\d+)?(/([\w/_\-\.]*(\?\S+)?)?)?#",$result,$m);
$results[$i]['target'] = $m[0];
}
$i++;
}
}
No there is no more efficient way
Your can use CURLOPT_WRITEHEADER + VariableStream
So.. you could write headers to variable and parse it
I had the same problem and curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); was of any help.
So, I decided not to use CURL but file_get_contents instead:
$data = file_get_contents($url);
$data = str_replace("<meta http-equiv=\"Refresh\" content=\"0;","<meta",$data);
The last line helped me to block the redirection although the product is not a clean html code.
I parsed the data and could retrieve the redirection URL I wanted to get.
Can anyone tell me if its possible to validate a link with php? By validate, I mean check if the link is active and works not just the actual format of the link.
You need to do a HEAD request and check the response. 200 indicates the request succeeded. There are others that can be found here that you may want to treat as valid. (301 and 302 redirects spring to mind)
If you use cURL, you could use something like this
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, TRUE); //Include the headers
curl_setopt($ch, CURLOPT_NOBODY, TRUE); //Make HEAD request
$response = curl_exec($ch);
if ( $response === false ){
//something went wrong, assume not valid
}
//list of status codes you want to treat as valid:
$validStatus = array(200, 301, 302, 303, 307);
if( !in_array(curl_getinfo($ch, CURLINFO_HTTP_CODE), $validStatus) ) {
//the HTTP code is not valid. The url is not valid
}
curl_close($ch);