How can I get the link address after a URL has been redirected?
Take for example this URL: http://www.boligsiden.dk/viderestilling/992cff55882a40f79e64b0a25e847a69
How can I make a PHP script echo the final URL? (http://www.eltoftnielsen.dk/default.aspx?side=sagsvisning&AutoID=125125&DID=140 in this case)
Note: The following solution isn't ideal for high traffic situations.
$url = 'http://www.boligsiden.dk/viderestilling/992cff55882a40f79e64b0a25e847a69';
file_get_contents($url);
preg_match('/(Location:|URI:)(.*?)\n/', implode("\n", $http_response_header), $matches);
if (isset($matches[0]))
{
echo $matches[0];
}
Here's what happens: file_get_contents() redirects and downloads the target website but writes the original response header into $http_response_header.
the preg_match tries to find the first "Location: x" match and returns it.
use this
<?php
$name="19875379";
$url = "http://www.ikea.co.il/default.asp?strSearch=".$name;
$ch = curl_init();
$timeout = 0;
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$header = curl_exec($ch);
$redir = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
//print_r($header);
$x = preg_match("/<script>location.href=(.|\n)*?<\/script>/", $header, $matches);
$script = $matches[0];
$redirect = str_replace("<script>location.href='", "", $script);
$redirect = "http://www.ikea.co.il" . str_replace("';</script>", "", $redirect);
echo $redirect;
?>
enter link description here
Related
I am trying to set up a tool to get someone's Twitter ID without using twitter's API.
In my example, my username is quixthe2nd.
So if you enter someone's twitter username in this link:
https://twitter.com/quixthe2nd/profile_image?size=original
It would redirect to:
https://pbs.twimg.com/profile_images/1116692743361687553/0P-dk3sF.jpg
The ID is 1116692743361687553. It is listed after https://pbs.twimg.com/profile_images/.
My code is:
$url = "https://twitter.com/quixthe2nd/profile_image?size=original";
function get_redirect_target($url)
{
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$headers = curl_exec($ch);
curl_close($ch);
// Check if there's a Location: header (redirect)
if (preg_match('/^Location: (.+)$/im', $headers, $matches))
return trim($matches[1]);
// If not, there was no redirect so return the original URL
// (Alternatively change this to return false)
return $url;
}
function get_redirect_final_target($url)
{
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // follow redirects
curl_setopt($ch, CURLOPT_AUTOREFERER, 1); // set referer on redirect
curl_exec($ch);
$target = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close($ch);
if ($target)
return $target;
return false;
}
$s = $target;
if(preg_match('/profile_images\/\K[^\/]+/',$s,$matches)) {
print($matches[0]);
}
My code doesn't output anything. How would I be able grab that via PHP?
You can use this regex and grab your value from group1,
profile_images\/([^\/]+)
Regex Demo
Alternatively, you can use \K so your full match is your intended text.
profile_images\/\K[^\/]+
Regex Demo using \K
PHP Code Demo
$s = "https://pbs.twimg.com/profile_images/1116692743361687553/0P-dk3sF.jpg";
if(preg_match('/profile_images\/\K[^\/]+/',$s,$matches)) {
print($matches[0]);
} else {
echo "Didn\'t match";
}
Prints,
1116692743361687553
Task: We have wikipedia English page and need to retrieve the same page address in Japanese.
suggested to parse http://en.wikipedia.org/wiki/Mini 4wd?action=raw results (there are other languages links in the bottom), but this way is too inefficient. Are there any other ways is the one real option?
We found some API in Wiki that seems fine for single word. but for two words like - Kamen rider, mini 4wd ... it doesn't work.
My code is not working
$url = 'https://en.wikipedia.org/w/api.php?action=query&prop=langlinks&format=json&lllimit=100&llprop=url&lllang=ja&titles=Kamen rider';
$url = rawurldecode(urlencode($url));
echo $url;
// outputs: https://en.wikipedia.org/w/api.php?action=query&prop=langlinks&format=json&lllimit=100&llprop=url&lllang=ru&titles=Mini+4wd
// and then the rest your logic whatever it is the rest
$header[] = "Accept: application/json";
$header[] = "Accept-Encoding: gzip";
$ch = curl_init();
curl_setopt( $ch, CURLOPT_HTTPHEADER, $header );
curl_setopt($ch,CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, 'GET');
curl_setopt( $ch, CURLOPT_URL, $url );
curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );
$response = json_decode(curl_exec($ch));
/* echo '<pre>';
print_r($response);
echo '</pre>'; */
exit;
Two words doesn't work because its not properly formatted. Kamen<space>rider and mini<space>4wd has spaces. You need it to be converted first. Consider this example:
$url = 'https://en.wikipedia.org/w/api.php?action=query&prop=langlinks&format=json&lllimit=100&llprop=url&lllang=ru&titles=Mini 4wd';
$url = rawurldecode(urlencode($url));
echo $url;
// outputs: https://en.wikipedia.org/w/api.php?action=query&prop=langlinks&format=json&lllimit=100&llprop=url&lllang=ru&titles=Mini+4wd
// and then the rest your logic whatever it is the rest
$contents = file_get_contents($url);
$contents = json_decode($contents, true);
// echo '<pre>';
// print_r($contents);
// echo '</pre>';
Sample Fiddle
Kindly try this code it works. Your $keywords = 'Mini 4wd';
$url = 'https://en.wikipedia.org/w/api.php?action=query&prop=langlinks&format=json&lllimit=100&llprop=url&lllang=ja&titles='.$keywords.'&redirects=';
$url1 = rawurldecode(urlencode($url));
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, $url1);
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
// $output contains the output string
$output = curl_exec($ch);
$info = curl_getinfo($ch);
$exec = explode('"url":"',$output);
$exe = explode('",',$exec[1]);
$URL = $exe[0];
Output :
<p>Wikipedia Help here as <?php echo $URL;?></p>
I want to get the data from the following page : http://kovv.mavari.be/kalender.aspx when you press on the submit button and the dropdownlists have no selected values. (So the page where you see a big table)
I've tried to follow a tutorial you can find here: http://www.mishainthecloud.com/2009/12/screen-scraping-aspnet-application-in.html.
This is what I have so far:
public function teamsoostVlaanderen()
{
$url = "http://kovv.mavari.be/kalender.aspx";
$regs=array();
$cookies = '../src/VolleyScout/VolleyScoutBundle/Resources/doc/cookie.txt';
// regular expressions to parse out the special ASP.NET
// values for __VIEWSTATE and __EVENTVALIDATION
$regexViewstate = '/__VIEWSTATE\" value=\"(.*)\"/i';
$regexEventVal = '/__EVENTVALIDATION\" value=\"(.*)\"/i';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$data=curl_exec($ch);
$viewstate = $this->regexExtract($data,$regexViewstate,$regs,1);
$eventval = $this->regexExtract($data, $regexEventVal,$regs,1);
$postData = '__VIEWSTATE='.rawurlencode($viewstate)
.'&__EVENTVALIDATION='.rawurlencode($eventval)
.'&ctl00_ContentPlaceHolder1_ddlGeslacht'
.'&ctl00$ContentPlaceHolder1$ddlReeks'
.'&ctl00_ContentPlaceHolder1_ddlDatum'
.'&ctl00$ContentPlaceHolder1$btnZoek:zoek'
;
curl_setOpt($ch, CURLOPT_POST, TRUE);
curl_setopt($ch, CURLOPT_POSTFIELDS, $postData);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookies);
$data = curl_exec($ch);
echo $data;
curl_close($ch);
die();
}
public function regexExtract($text, $regex, $regs, $nthValue)
{
if (preg_match($regex, $text, $regs)) {
$result = $regs[$nthValue];
}
else {
$result = "";
}
return $result;
}
But I still get the page without a post (so not with the table). When I check my cookies.txt file it's empty, maybe there's the problem? Can somebody help me find the problem?
Appropriate regex:
$regexViewstate = '/__VIEWSTATE\" value=\"([^"]*)\"/i';
$regexEventVal = '/__EVENTVALIDATION\" value=\"([^"]*)\"/i';
And missing the equal sign from your post parameters:
$postData = '__VIEWSTATE='.rawurlencode($viewstate)
.'&__EVENTVALIDATION='.rawurlencode($eventval)
.'&ctl00_ContentPlaceHolder1_ddlGeslacht='
.'&ctl00$ContentPlaceHolder1$ddlReeks='
.'&ctl00_ContentPlaceHolder1_ddlDatum='
.'&ctl00$ContentPlaceHolder1$btnZoek=zoek'
Hi I know its a very common topic on StackOverFlow.
I have already spent my entire week to search it out.
I have a url : abc.com/default.asp?strSearch=19875379
this further redirect to this url: abc.com/default.asp?catid={170D4F36-39F9-4C48-88EB-CFC8DDF1F531}&details_type=1&itemid={49F6A281-8735-4B74-A170-B6110AF6CC2D}
I have made my effort to get the final url in my php code using Curl but can't make it.
here is my code:
<?php
$name="19875379";
$url = "http://www.ikea.co.il/default.asp?strSearch=".$name;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$a = curl_exec($ch);
curl_close( $ch );
// the returned headers
$headers = explode("\n",$a);
// if there is no redirection this will be the final url
$redir = $url;
// loop through the headers and check for a Location: str
$j = count($headers);
for($i = 0; $i < $j; $i++){
// if we find the Location header strip it and fill the redir var
//print_r($headers);
if(strpos($headers[$i],"Location:") !== false){
$redir = trim(str_replace("Location:","",$headers[$i]));
break;
}
}
// do whatever you want with the result
echo $redir;
?>
it gives me url "abc.com/default.asp?strSearch=19875379" instead of this url "abc.com/default.asp?catid={170D4F36-39F9-4C48-88EB-CFC8DDF1F531}&details_type=1&itemid={49F6A281-8735-4B74-A170-B6110AF6CC2D}"
Thanks in advance for your kind help :)
Thank you everyone for helping me in my situation.
Actually I want to develop a scraper in php for ikea website used in Israel (in Hebrew).
After putting a lot of hours I recognize that there is no server side redirection in url which I put to get the redirected url. It may be javascript redirection.
I have now implemented the below code and it works for me.
<?php
$name="19875379";
$url = "http://www.ikea.co.il/default.asp?strSearch=".$name;
$ch = curl_init();
$timeout = 0;
curl_setopt ($ch, CURLOPT_URL, $url);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$header = curl_exec($ch);
$redir = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
//print_r($header);
$x = preg_match("/<script>location.href=(.|\n)*?<\/script>/", $header, $matches);
$script = $matches[0];
$redirect = str_replace("<script>location.href='", "", $script);
$redirect = "http://www.ikea.co.il" . str_replace("';</script>", "", $redirect);
echo $redirect;
?>
Thanks again everyone :)
The accepted answer is applicable to a very specific scenario. So, most of us will be better off having a more general answer. Though you can extract the more general answer from within the accepted answer, separately having that part may be more helpful.
So, if you just want to get the last redirected URL, this code will help.
<?php
function redirectedUrl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']); // set browser info to avoid old browser warnings
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // allow url redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // get the return value of curl execution as a string
$html = curl_exec($ch);
// store last redirected url in a variable before closing the curl session
$lastUrl = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close($ch);
return $lastUrl;
}
First of all, I didn't see any redirection while I have given a run on your code. Anyway, here are few things you can do for this(keeping your approach intact):
First of all, make sure that the header will be returned to your curl output(in this case at $a).
curl_setopt($ch, CURLOPT_HEADER, true);
Now, separates only the header portion from the whole http response.
// header will be at 0 index, and html will be at 1 index.
$header = explode("\n\r",$a);
Explode the header string into headers array.
$headers = explode("\n", $header[0]);
You can use curl_getinfo() ...
http://php.net/manual/en/function.curl-getinfo.php
ex of site using ssl ( HTTPs ) : https://www.eb2a.com
1 - i tried to get its content using file_get_contents, but not work and give error
ex :
<?php
$contents = file_get_contents("https://www.eb2a.com/");
echo $contents;
?>
2 - i tried to use fopen, but not work and give error
ex:
<?php
$url = 'https://www.eb2a.com/';
$contents = fopen($url, 'r');
echo "$contents";
?>
3 - i tried to use CURL, but not work and give BLANK PAGE
ex :
function cURL($url, $ref, $header, $cookie, $p){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_REFERER, $ref);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
if ($p) {
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $p);
}
$result = curl_exec($ch);
curl_close($ch);
if ($result){
return $result;
}else{
return '';
}
}
$file = cURL('https://www.eb2a.com/','https://www.eb2a.com/',0,0,null);
echo $file
any one have any idea ??
To fetch the contents from secure protocal https, you need to have openssl extenstion enabled from php.ini file and the authentication for that matter.
web site redirect 301
change url
from
https://www.eb2a.com/
to
https://www.eb2a.com
or if still not working or need use it with /
loook at LINK
use curl redirect rules