How to resolve url's to final destination in php - php

How do i resolve the urls like the one below:
http://www.google.co.in/url?sa=t&source=newssearch&cd=1&ved=0CC4QqQIwAA&url=http%3A%2F%2Fwww.usatoday.com%2Fnews%2Fworld%2Fstory%2F2011-09-18%2Findia-earthquake-fatalities%2F50456078%2F1&ei=JkF2TriYPImGrAeHxdCFDQ&usg=AFQjCNEshh4QAZQlM_tVPoT_l7rJ0ag21Q
to it's final url
http://www.usatoday.com/news/world/story/2011-09-18/india-earthquake-fatalities/50456078/1
I've tried curl but it's resolving it to http://www.google.co.in/http

http://sandbox.phpcode.eu/g/fc7c1/1
$ch = curl_init('http://www.google.co.in/url?sa=t&source=newssearch&cd=1&ved=0CC4QqQIwAA&url=http%3A%2F%2Fwww.usatoday.com%2Fnews%2Fworld%2Fstory%2F2011-09-18%2Findia-earthquake-fatalities%2F50456078%2F1&ei=JkF2TriYPImGrAeHxdCFDQ&usg=AFQjCNEshh4QAZQlM_tVPoT_l7rJ0ag21Q');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, true);
$response = curl_exec($ch);
$info = curl_getinfo($ch);
echo $info['url'];

All you are after is the value of the url parameter. You can preg_split the initial url by /&\?/, then take the element starting with url=, finally split it by = sign and use urldecode on the final value.

Related

Change relative URLs to absolute URLs after Curl

I'm trying to find a regular expression that is able to change all URLs of a curl'ed document from relative to absolute.
One of the way I found is the post here but it works only for the first URL and not for all.
This is the code I'm using:
$url="http://www.example.com";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_DNS_USE_GLOBAL_CACHE, 0);
curl_setopt($ch, CURLOPT_DNS_CACHE_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$result=curl_exec($ch);
curl_close($ch);
$result = preg_replace('~(href|src)=(["\'])(?!#)(?!http://)([^\2]*)\2~i','$1="http://www.example.com$3"', $result);
echo $result;
Where am I doing wrong?
EDIT
Just to explain better. I haven't an array of urls, but I have an entire document gathered from curl so I need a preg replace method.
I'm not exactley sure why it replaces it just one time (maybe it has something to do with the backreference), but when you wrap it in a while loop, it should work.
$pattern = '~(href|src)=(["\'])(?!#|//|http)([^\2]*)\2~i';
while (preg_match($pattern, $result)) {
$result = preg_replace($pattern,'$1="http://www.example.com$3"', $result);
}
(I also changed the pattern slightly.)

curl json with cookie returns "‹ŠŽÿÿ)»L"

i am trying to get the content of this json: http://steamcommunity.com/market/pricehistory/?country=DE&currency=3&appid=730&market_hash_name=Chroma%20Case
This is my code:
$url = "http://steamcommunity.com/market/pricehistory/?country=DE&currency=3&appid=730&market_hash_name=Chroma%20Case";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_COOKIE, 'steamLogin = 76561198075419487%7C%7C3F1A776553C4BE1D0F6DA83059052E79DB7EB3C7');
$output = curl_exec($ch);
$info = curl_getinfo($ch);
curl_close($ch);
$json_string = json_encode($output, JSON_PRETTY_PRINT);
When printing out $json_string it results in nothing, $output results in "‹ŠŽÿÿ)»L". I would like to grab the actual content on the website, the steamLogin-Cookie is needed for that. The cookie that's stored in my browser at the moment is the one I hardcoded in the source.
If you need any more info, feel free to ask.
Adding curl_setopt($ch, CURLOPT_ENCODING,""); made it :)

Using curl to bring search results from external site

I have 2 sites, one main, one external. On the main site, I am using Lucene to search through it. The problem is, I am trying to also search through the external site.
The Form action for the external site:
<form action="https://secure.bcchf.ca/SuperheroPages/searchResults.cfm?Event=WOT" method="post" name="search_tribute" >
I've tried to use curl, but it only brings up the search form without actually doing the search (the field is empty as well).
<?php
$ch = curl_init("https://secure.bcchf.ca/SuperheroPages/searchResults.cfm?Event=WOT");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, tname='hello');
$output = curl_exec($ch);
echo $output;
curl_close($ch);
?>
Any tips?
I don't have access to the form action since it's on an external site. All i have is a form that links to it when I submit it.
<?php
$ch = curl_init("https://secure.bcchf.ca/SuperheroPages/searchResults.cfm?Event=WOT");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, array("teamName" => "hello", "searchType" => "team"));
$output = curl_exec($ch);
echo $output;
curl_close($ch);
?>
Can you try this?
I'm pretty sure it's supposed to be teamName instead of tName
Most search engine use GET and not POST .. you can try
// asumption
$_POST['search'] = "hello";
// Return goole Search Result
echo curlGoogle($_POST['search']);
function curlGoogle($keyword) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.google.com/search?hl=en&q=' . urlencode($keyword) . '&btnG=Google+Search&meta=');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FILETIME, true);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
Or if you want post then
curl_setopt($ch, CURLOPT_POSTFIELDS, array("search"=>"hello"));
Your php code is not valid syntax, it does not compile.
So if this is really what you have, your problem is that your file generates a fatal error.
That being said, this question is hard to answer since we don't know the site you want to grab your search results from.
Try modifying your line like this:
curl_setopt($ch, CURLOPT_POSTFIELDS, "search=hello");
or alternatively
curl_setopt($ch, CURLOPT_POSTFIELDS, array("search" => "hello");
Maby it will work, however it may be that more post data is required or that the element name is not correct.
You have to look at the form or try making a request and look at it with chromes developer tools or firebug.
Also there are a number of ways for external sites to prevent what you are doing, altough evertything can be worked around somehow.
Assuming that is not the case, I hope i could help you.
Try just putting it into an array.
as that will be the variable the $_POST checks on the other side
and just checked your link, its teamName for the field
$fields = array("teamName"=>"julia");
Then..
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields);
So your complete code is...
<?php
$ch = curl_init("https://secure.bcchf.ca/SuperheroPages/searchResults.cfm?Event=WOT");
$fields = array("teamName"=>"julia");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $fields);
$output = curl_exec($ch);
var_dump($output);
curl_close($ch);
?>

Getting executed URL from CURL

I have a Affiliate URL Like http://track.abc.com/?affid=1234
open this link will go to http://www.abc.com
now i want to execute the http://track.abc.com/?affid=1234 Using CURL
and now how i can Get http://www.abc.com
with Curl ?
If you want cURL to follow redirect headers from the responses it receives, you need to set that option with:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
You may also want to limit the number of redirects it follows using:
curl_setopt($ch, CURLOPT_MAXREDIRS, 3);
So you'd using something similar to this:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://track.abc.com/?affid=1234");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_MAXREDIRS, 3);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$data = curl_exec($ch);
Edit: Question wasn't exactly clear but from the comment below, if you want to get the redirect location, you need to get the headers from cURL and parse them for the Location header:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://track.abc.com/?affid=1234");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, true);
$data = curl_exec($ch);
This will give you the headers returned by the server in $data, simply parse through them to get the location header and you'll get your result. This question shows you how to do that.
I wrote a function that will extract any header from a cURL header response.
function getHeader($headerString, $key) {
preg_match('#\s\b' . $key . '\b:\s.*\s#', $headerString, $header);
return substr($header[0], strlen($key) + 3, -2);
}
In this case, you're looking for the value of the header Location. I tested the function by retrieving headers from a TinyURL, that redirects to http://google.se, using cURL.
$url = "http://tinyurl.com/dtrkv";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$data = curl_exec($ch);
curl_close($ch);
$location = getHeader($data, 'Location');
var_dump($location);
Output from the var_dump.
string(16) "http://google.se"

How to get the URL of a download link

I am trying to parse a page which contains some links. These links, if followed, will redirect to some files to download.
For example, Download which redirects to <a href="http://example.com/1.pdf".
I don't want to download the file, I just want to get the file link (int this case http://example.com/1.pdf).
I am trying this:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, FALSE); // Return in string
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
var_dump(curl_getinfo($ch));
But, it gives me the file contents.
Does anyone have any idea how to this?
==EDIT==
Thank you guys. I solved it like this:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLINFO_HEADER_OUT, TRUE);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE);
curl_exec($ch);
$info = curl_getinfo($ch);
Now, $info contains the header and I can the link from it.
The reason the output is being sent to the screen is because you're telling cURL to do so. If you want to store the response in a variable the following line:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, FALSE);
should read:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
Then, actually retrieve the returned output from curl_exec like so:
$output = curl_exec($ch);
Once you have the returned HTML content from the remote page in the $output variable you can use DOMdocs or regex (but preferably DOM) to parse out any information you want.
UPDATE
I can't tell because the question is vaguely worded: is there actually a Location header redirect happening? If so, you'll want to do as #heiko suggests to prevent cURL from following the redirect and retrieve the headers. Then you can easily parse the contents of the location header:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
curl_setopt($ch, CURLINFO_HEADER, TRUE); // add header output
# make sure to not follow Location: Header
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
# add Response Header to Output, so that you can find the Location-Header in there!
curl_setopt($ch, CURLINFO_HEADER_OUT, TRUE);
Use RETURN TRANSFER as 1, also use htmlentities() if you want to display HTML source on your page , else just echo the variable ( to display the page [redirects to google] ).
<?php
$url = "http://www.google.co.in";
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // Return in string
curl_setopt($ch, CURLOPT_URL, $url);
$varx = curl_exec($ch);
echo htmlentities($varx);
?>
With the $varx variable , use Regular Expressions to match which data you want.

Categories