For a little while now I've been searching for a code to get URL's out of a string using PHP. I'm basically trying to get a Shortened URL out of a message, and then later do a HEAD request to find the actual link.
Anyone have any code that returns URLs from strings?
Thanks in advance.
Edit for Ghost Dog:
Here is a sample of what I am parsing:
$test = "I am testing this application for http://test.com YAY!";
And here is the response I got that solved it:
$regex = '$\b(https?|ftp|file)://[-A-Z0-9+&##/%?=~_|!:,.;]*[-A-Z0-9+&##/%=~_|]$i';
preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER);
$A = $result[0];
foreach($A as $B)
{
$URL = GetRealURL($B);
echo "$URL<BR>";
}
function GetRealURL( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_USERAGENT => "spider",
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
return $header['url'];
}
See Answer for the Details.
This code may be helpful (see MadTechie's latest post):
http://www.phpfreaks.com/forums/index.php/topic,245248.msg1146218.html#msg1146218
<?php
$string = "some random text http://tinyurl.com/9uxdwc some http://google.com random text http://tinyurl.com/787988";
$regex = '$\b(https?|ftp|file)://[-A-Z0-9+&##/%?=~_|!:,.;]*[-A-Z0-9+&##/%=~_|]$i';
preg_match_all($regex, $string, $result, PREG_PATTERN_ORDER);
$A = $result[0];
foreach($A as $B)
{
$URL = GetRealURL($B);
echo "$URL<BR>";
}
function GetRealURL( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_USERAGENT => "spider",
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
return $header['url'];
}
?>
Something like:
$matches = array();
preg_match_all('/http:\/\/[a-zA-Z0-9.-]+\/[a-zA-Z0-9.-]+/', $text, $matches);
print_r($matches);
You'll need to tune the regexp to get exactly what you want.
To get the URL out, consider something as simple as:
curl -I http://url.com/path | grep Location: | awk '{print $2}'
Related
I'm having problem with cURL, it's returning no result or "Object not found!"
I know the site actually provide their own API, but i need more information to be grabbed.
How to solve this problem ?
here's the code
<?php
$URL = "http://myanimelist.net/manga.php?q=nisekoi";
//Initl curl
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
echo get_web_page($URL)['content'];
?>
I'm using curreny codes on my site. With the following:
<?php
$contents = file_get_contents('http://www.tcmb.gov.tr/kurlar/today.html');
$contents = iconv("windows-1254" , "utf8" , $contents);
$dollar = preg_match('~ABD DOLARI\s+(.+?)\s+(.+?)\s+~i', $contents, $matches) ? array('buying' => $matches[1], 'selling' => $matches[2]) : '';
$euro = preg_match('~EURO\s+(.+?)\s+(.+?)\s+~i', $contents, $matches) ? array('buying' => $matches[1], 'selling' => $matches[2]) : '';
$gbp = preg_match('~İNGİLİZ STERLİNİ\s+(.+?)\s+(.+?)\s+~i', $contents, $matches) ? array('buying' => $matches[1], 'selling' => $matches[2]) : '';
$chf = preg_match('~İSVİÇRE FRANGI\s+(.+?)\s+(.+?)\s+~i', $contents, $matches) ? array('buying' => $matches[1], 'selling' => $matches[2]) : '';
echo '
<table class="form" style="background:#fff;width:300px;margin-left:14px;">
<tr style="border-bottom:1px solid #e4e4e4;">
..
But today my site is give error:
Warning: eval() (/var/www/vhosts/mysite.com/httpdocs/modules/php/php.module(80) : eval()'d code dosyasının 2 satırı) içinde file_get_contents() [function.file-get-contents]: http:// wrapper is disabled in the server configuration by allow_url_fopen=0.
I did ask my to hosting support about this problem and they are say:
"Don't use fopen option, please use 'fsockopen'" But i don't know how can i do this?
Plese help me. Thanks.
Use curl instead then. A function to replace file_get_contents from a remote server is:
function get_web_page( $url ) {
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
//If you want error information, 'return $header;' instead.
return $content;
}
From there change $contents = file_get_contents('http://www.tcmb.gov.tr/kurlar/today.html'); to $contents = get_web_page('http://www.tcmb.gov.tr/kurlar/today.html');
I'm trying to understand how getting an access_token works with CURL.
$url = "https://api.com/oauth/access_token";
$access_token_parameters = array(
'client_id' => '',
'client_secret' => '',
'grant_type' => '',
'redirect_uri' => '',
'code' => $_GET['code']
);
$curl = curl_init($url);
curl_setopt($curl,CURLOPT_POST,true);
curl_setopt($curl,CURLOPT_POSTFIELDS,$access_token_parameters);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$result = curl_exec($curl);
curl_close($curl);
I'm supposed to get a JSON string with this right?
I tried a various of things to output the string
$test = json_decode($result);
print_r($test);
$arr = json_encode($result,true);
foreach($arr as $val){
echo $val['access_token'];
}
Am I doing this wrong?
I believe the correct JSON output should be something like this :
{
"access_token": "fb2e77d.47a0479900504cb3ab4a1f626d174d2d",
"user": {
"id": "1574083",
"username": "snoopdogg",
"full_name": "Snoop Dogg",
"profile_picture": "http://distillery.s3.amazonaws.com/profiles/profile_1574083_75sq_1295469061.jpg"
}
}
But this is not working? I'm trying to get the access_token from the server.
Any help would be appreciated! Thank you
Try use below function to get content
$response = get_web_page($url);
$resArr = array();
$resArr = json_decode($response);
//echo"<pre>"; print_r($resArr); echo"</pre>";
function get_web_page($url) {
$options = array (CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle compressed
CURLOPT_USERAGENT => "test", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10 ); // stop after 10 redirects
$ch = curl_init ( $url );
curl_setopt_array ( $ch, $options );
$content = curl_exec ( $ch );
$err = curl_errno ( $ch );
$errmsg = curl_error ( $ch );
$header = curl_getinfo ( $ch );
$httpCode = curl_getinfo ( $ch, CURLINFO_HTTP_CODE );
curl_close ( $ch );
$header ['errno'] = $err;
$header ['errmsg'] = $errmsg;
$header ['content'] = $content;
return $header ['content'];
}
I have the following JSON code:
{"username":"user1","password":"123456"}
That I need to pass to a url, lets say: http://api.mywebsite.com
I'm an extreme php newb, so I've been following a curl tutorial, but here is my current PHP code:
<?php
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => true, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle compressed
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
?>
You might want to look into the CURLOPT_POSTFIELDS AND CURLOPT_POST options. These allow you to do a POST request and pass the data set into the CURLOPT_POSTFIELDS in the request.
Something in the lines of this:
$body = 'bar=1&foo=2&baz=3';
$c = curl_init ($url);
curl_setopt ($c, CURLOPT_POST, true);
curl_setopt ($c, CURLOPT_POSTFIELDS, $body);
curl_setopt ($c, CURLOPT_RETURNTRANSFER, true);
When you want to use normal GET Params:
$jsonString ='{"username":"user1","password":"123456"}';
$params = json_decode($jsonString);
$getParams = '';
$first = true;
foreach ($params as $key => $param){
if ($first){
$getParams .= '?';
$first = false;
} else{
$getParams .= '&';
}
$getParams .= $key .'=' .$param;
}
echo $getParams;
get_web_page($url . $getParams);
I'm using file_get_contents() to grab content from a site, and amazingly it works even if the URL I pass as argument redirects to another URL.
The problem is I need to know the new URL, is there a way to do that?
If you need to use file_get_contents() instead of curl, don't follow redirects automatically:
$context = stream_context_create(
array(
'http' => array(
'follow_location' => false
)
)
);
$html = file_get_contents('http://www.example.com/', false, $context);
var_dump($http_response_header);
Answer inspired by: How do I ignore a moved-header with file_get_contents in PHP?
You might make a request with cURL instead of file_get_contents().
Something like this should work...
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, FALSE);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$a = curl_exec($ch);
if(preg_match('#Location: (.*)#', $a, $r))
$l = trim($r[1]);
Source
Everything in one function:
function get_web_page( $url ) {
$res = array();
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // do not return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$res['content'] = $content;
$res['url'] = $header['url'];
return $res;
}
print_r(get_web_page("http://www.example.com/redirectfrom"));
A complete solution using the bare file_get_contents (note the in-out $url parameter):
function get_url_contents_and_final_url(&$url)
{
do
{
$context = stream_context_create(
array(
"http" => array(
"follow_location" => false,
),
)
);
$result = file_get_contents($url, false, $context);
$pattern = "/^Location:\s*(.*)$/i";
$location_headers = preg_grep($pattern, $http_response_header);
if (!empty($location_headers) &&
preg_match($pattern, array_values($location_headers)[0], $matches))
{
$url = $matches[1];
$repeat = true;
}
else
{
$repeat = false;
}
}
while ($repeat);
return $result;
}
Note that this works only with an absolute URL in the Location header. If you need to support relative URLs, see
PHP: How to resolve a relative url.
For example, if you use the solution from the answer by #Joyce Babu, replace:
$url = $matches[1];
with:
$url = getAbsoluteURL($matches[1], $url);
I use get_headers($url, 1);
In my case redirect url in get_headers($url, 1)['Location'][1];