How to check if a given value is a valid URL - php

I need some function to check is the given value is a url.
I have code:
<?php
$string = get_from_db();
list($name, $url) = explode(": ", $string);
if (is_url($url)) {
$link = array('name' => $name, 'link' => $url);
} else {
$text = $string;
}
// Make some things
?>

If you're running PHP 5 (and you should be!), just use filter_var():
function is_url($url)
{
return filter_var($url, FILTER_VALIDATE_URL) !== false;
}
Addendum: as the PHP manual entry for parse_url() (and #Liutas in his comment) points out:
This function is not meant to validate the given URL, it only breaks it up into the above listed parts. Partial URLs are also accepted, parse_url() tries its best to parse them correctly.
For example, parse_url() considers a query string as part of a URL. However, a query string is not entirely a URL. The following line of code:
var_dump(parse_url('foo=bar&baz=what'));
Outputs this:
array(1) {
["path"]=>
string(16) "foo=bar&baz=what"
}

use parse_url and check for false
<?php
$url = 'http://username:password#hostname/path?arg=value#anchor';
print_r(parse_url($url));
echo parse_url($url, PHP_URL_PATH);
?>
The above example will output:
Array
(
[scheme] => http
[host] => hostname
[user] => username
[pass] => password
[path] => /path
[query] => arg=value
[fragment] => anchor
)
/path

You can check if ParseUrl returns false.

Related

Having trouble with preg_match pinterest username url

Please help the statement i am using for matching pinterest username url is
$url = http://pinterest.com/username
preg_match("|^http(s)?://pinterest.com/(.*)?$|i", $url);
but preg_match result are returning 0
You are missing the third parameter of the preg_match function.
$url = "http://pinterest.com/username";
preg_match("|^http(s)?://pinterest.com/(.*)?$|i", $url, $match);
print_r($match);
results in
Array
(
[0] => http://pinterest.com/username
[1] =>
[2] => username
)
Or in an if statement:
$url = "http://pinterest.com/username";
if (preg_match("|^http(s)?://pinterest.com/(.*)?$|i", $url, $match)) {
// true
}
<?php
$url = "http://pinterest.com/username";
if(preg_match("|^http(s)?://pinterest.com/(.*)?$|i", $url)){
echo "true";
}
else{
echo "false";
}
?>
output:
true
What else you want ?
No one said that need to escape point.
So more correct code will be something like this:
$url = "https://pinterest.com/username";
preg_match("|(?:https?://)(?:www\.)?pinterest\.com/(.+)/?|i", $url, $match);
It will return username. I don't know the rules that have pinterest for usernames so I just match all that are inside of slashes.
It will work with links like:
https://pinterest.com/username/
https://www.pinterest.com/username
pinterest.com/username
and other
Don't use this regular expression for validation

php regex preg_match on a variable containing a url

I'm trying to run a regex on a url to extract all the segments after the host. I can't get it working when the host segment is in a variable and i'm not sure how to get it working
// this works
if(preg_match("/^http\:\/\/myhost(\/[a-z0-9A-Z-_\/.]*)$/", $url, $matches)) {
return $matches[2];
}
// this doesn't work
$siteUrl = "http://myhost";
if(preg_match("/^$siteUrl(\/[a-z0-9A-Z-_\/.]*)$/", $url, $matches)) {
return $matches[2];
}
// this doesn't work
$siteUrl = preg_quote("http://myhost");
if(preg_match("/^$siteUrl(\/[a-z0-9A-Z-_\/.]*)$/", $url, $matches)) {
return $matches[2];
}
In PHP, there is a function called parse_url. (Something similar to what you are trying to achieve through your code).
<?php
$url = 'http://username:password#hostname/path?arg=value#anchor';
print_r(parse_url($url));
echo parse_url($url, PHP_URL_PATH);
?>
OUTPUT :
Array
(
[scheme] => http
[host] => hostname
[user] => username
[pass] => password
[path] => /path
[query] => arg=value
[fragment] => anchor
)
/path
You forgot to escape the / in your variable declaration. One quick fix is to change your regex delimiter from / to #. Try:
$siteUrl = "http://myhost";
if(preg_match("#^$siteUrl(\/[a-z0-9A-Z-_\/.]*)$#", $url, $matches)) { //note the hashtags!
return $matches[2];
}
Or without changing the regex delimiter:
$siteUrl = "http:\/\/myhost"; //note how we escaped the slashes
if(preg_match("/^$siteUrl(\/[a-z0-9A-Z-_\/.]*)$/", $url, $matches)) { //note the hashtags!
return $matches[2];
}

Extract top domain from string php

I need to extract the domain name out of a string which could be anything. Such as:
$sitelink="http://www.somewebsite.com/product/3749875/info/overview.html";
or
$sitelink="http://subdomain.somewebsite.com/blah/blah/whatever.php";
In any case, I'm looking to extract the 'somewebsite.com' portion (which could be anything), and discard the rest.
With parse_url($url)
<?php
$url = 'http://username:password#hostname/path?arg=value#anchor';
print_r(parse_url($url));
?>
The above example will output:
Array
(
[scheme] => http
[host] => hostname
[user] => username
[pass] => password
[path] => /path
[query] => arg=value
[fragment] => anchor
)
Using thos values
echo parse_url($url, PHP_URL_HOST); //hostname
or
$url_info = parse_url($url);
echo $url_info['host'];//hostname
here it is
<?php
$sitelink="http://www.somewebsite.com/product/3749875/info/overview.html";
$domain_pieces = explode(".", parse_url($sitelink, PHP_URL_HOST));
$l = sizeof($domain_pieces);
$secondleveldomain = $domain_pieces[$l-2] . "." . $domain_pieces[$l-1];
echo $secondleveldomain;
note that this is not probably the behavior you are looking for, because, for hosts like
stackoverflow.co.uk
it will echo "co.uk"
see:
http://publicsuffix.org/learn/
http://www.dkim-reputation.org/regdom-libs/
http://www.dkim-reputation.org/regdom-lib-downloads/ <-- downloads here, php included
2 complexe url
$url="https://www.example.co.uk/page/section/younameit";
or
$url="https://example.co.uk/page/section/younameit";
To get "www.example.co.uk":
$host=parse_url($url, PHP_URL_HOST);
To get "example.co.uk" only
$parts = explode('www.',$host);
$domain = $parts[1];
// ...or...
$domain = ltrim($host, 'www.')
If your url includes "www." or not you get the same end result, i.e. "example.co.uk"
VoilĂ !
You need package that uses Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.) and multilevel subdomains. Regex, parse_url() or string functions will never produce absolutely correct result.
I recomend use TLD Extract. Here example of code:
$extract = new LayerShifter\TLDExtract\Extract();
$result = $extract->parse('http://www.somewebsite.com/product/3749875/info/overview.html');
$result->getSubdomain(); // will return (string) 'www'
$result->getHostname(); // will return (string) 'somewebsite'
$result->getSuffix(); // will return (string) 'com'
$result->getRegistrableDomain(); // will return (string) 'somewebsite.com'
For a string that could be anything, new approach:
function extract_plain_domain($text) {
$text=trim($text,"/");
$text=strtolower($text);
$parts=explode("/",$text);
if (substr_count($parts[0],"http")) {
$parts[0]="";
}
reset ($parts);while (list ($key, $val) = each ($parts)) {
if (!empty($val)) { $text=$val; break; }
}
$parts=explode(".",$text);
if (empty($parts[2])) {
return $parts[0].".".$parts[1];
} else {
$num_parts=count($parts);
return $parts[$num_parts-2].".".$parts[$num_parts-1];
}
} // end function extract_plain_domain
You can use the Utopia Domains library (https://github.com/utopia-php/domains), it will return the domain TLD and public suffix based on Mozilla public suffix list (https://publicsuffix.org), it can be used as an alternative to the currently archived TLDExtract package.
You can use 'parse_url' function to get the hostname from your URL and than use Utopia Domains parser to get the correct TLD and join it together with the domain name:
<?php
require_once './vendor/autoload.php';
use Utopia\Domains\Domain;
$url = 'http://demo.example.co.uk/site';
$domain = new Domain(parse_url($url, PHP_URL_HOST)); // demo.example.co.uk
var_dump($domain->get()); // demo.example.co.uk
var_dump($domain->getTLD()); // uk
var_dump($domain->getSuffix()); // co.uk
var_dump($domain->getName()); // example
var_dump($domain->getSub()); // demo
var_dump($domain->isKnown()); // true
var_dump($domain->isICANN()); // true
var_dump($domain->isPrivate()); // false
var_dump($domain->isTest()); // false
var_dump($domain->getName().'.'.$domain->getSuffix()); // example.co.uk

Parsing GET request parameters in a URL that contains another URL

Here is the url:
http://localhost/test.php?id=http://google.com/?var=234&key=234
And I can't get the full $_GET['id'] or $_REQUEST['d'].
<?php
print_r($_REQUEST['id']);
//And this is the output http://google.com/?var=234
//the **&key=234** ain't show
?>
$get_url = "http://google.com/?var=234&key=234";
$my_url = "http://localhost/test.php?id=" . urlencode($get_url);
$my_url outputs:
http://localhost/test.php?id=http%3A%2F%2Fgoogle.com%2F%3Fvar%3D234%26key%3D234
So now you can get this value using $_GET['id'] or $_REQUEST['id'] (decoded).
echo urldecode($_GET["id"]);
Output
http://google.com/?var=234&key=234
To get every GET parameter:
foreach ($_GET as $key=>$value) {
echo "$key = " . urldecode($value) . "<br />\n";
}
$key is GET key and $value is GET value for $key.
Or you can use alternative solution to get array of GET params
$get_parameters = array();
if (isset($_SERVER['QUERY_STRING'])) {
$pairs = explode('&', $_SERVER['QUERY_STRING']);
foreach($pairs as $pair) {
$part = explode('=', $pair);
$get_parameters[$part[0]] = sizeof($part)>1 ? urldecode($part[1]) : "";
}
}
$get_parameters is same as url decoded $_GET.
While creating url encode them with urlencode
$val=urlencode('http://google.com/?var=234&key=234')
Click here
and while fetching decode it wiht urldecode
You may have to use urlencode on the string 'http://google.com/?var=234&key=234'
I had a similar problem and ended up using parse_url and parse_str, which as long as the URL in the parameter is correctly url encoded (which it definitely should) allows you to access both all the parameters of the actual URL, as well as the parameters of the encoded URL in the query parameter, like so:
$get_url = "http://google.com/?var=234&key=234";
$my_url = "http://localhost/test.php?id=" . urlencode($get_url);
function so_5645412_url_params($url) {
$url_comps = parse_url($url);
$query = $url_comps['query'];
$args = array();
parse_str($query, $args);
return $args;
}
$my_url_args = so_5645412_url_params($my_url); // Array ( [id] => http://google.com/?var=234&key=234 )
$get_url_args = so_5645412_url_params($my_url_args['id']); // Array ( [var] => 234, [key] => 234 )
you use bad character like ? and & and etc ...
edit it to new code
see this links
http://antoine.goutenoir.com/blog/2010/10/11/php-slugify-a-string/
http://sourcecookbook.com/en/recipes/8/function-to-slugify-strings-in-php
also you can use urlencode
$val=urlencode('http://google.com/?var=234&key=234')
The correct php way is to use parse_url()
http://php.net/manual/en/function.parse-url.php
(from php manual)
This function parses a URL and returns an associative array containing any of the various components of the URL that are present.
This function is not meant to validate the given URL, it only breaks it up into the above listed parts. Partial URLs are also accepted, parse_url() tries its best to parse them correctly.
if (isset($_SERVER['HTTPS'])){
echo "https://$_SERVER[HTTP_HOST]$_SERVER[REQUEST_URI]$_SERVER[QUERY_STRING]";
}else{
echo "http://$_SERVER[HTTP_HOST]$_SERVER[REQUEST_URI]$_SERVER[QUERY_STRING]";
}

Get keyword from a (search engine) referrer url using PHP

I am trying to get the search keyword from a referrer url. Currently, I am using the following code for Google urls. But sometimes it is not working...
$query_get = "(q|p)";
$referrer = "http://www.google.com/search?hl=en&q=learn+php+2&client=firefox";
preg_match('/[?&]'.$query_get.'=(.*?)[&]/',$referrer,$search_keyword);
Is there another/clean/working way to do this?
Thank you,
Prasad
If you're using PHP5 take a look at http://php.net/parse_url and http://php.net/parse_str
Example:
// The referrer
$referrer = 'http://www.google.com/search?hl=en&q=learn+php+2&client=firefox';
// Parse the URL into an array
$parsed = parse_url( $referrer, PHP_URL_QUERY );
// Parse the query string into an array
parse_str( $parsed, $query );
// Output the result
echo $query['q'];
There are different query strings on different search engines. After trying Wiliam's method, I have figured out my own method. (Because, Yahoo's is using 'p', but sometimes 'q')
$referrer = "http://search.yahoo.com/search?p=www.stack+overflow%2Ccom&ei=utf-8&fr=slv8-msgr&xargs=0&pstart=1&b=61&xa=nSFc5KjbV2gQCZejYJqWdQ--,1259335755";
$referrer_query = parse_url($referrer);
$referrer_query = $referrer_query['query'];
$q = "[q|p]"; //Yahoo uses both query strings, I am using switch() for each search engine
preg_match('/'.$q.'=(.*?)&/',$referrer,$keyword);
$keyword = urldecode($keyword[1]);
echo $keyword; //Outputs "www.stack overflow,com"
Thank you,
Prasad
To supplement the other answers, note that the query string parameter that contains the search terms varies by search provider. This snippet of PHP shows the correct parameter to use:
$search_engines = array(
'q' => 'alltheweb|aol|ask|ask|bing|google',
'p' => 'yahoo',
'wd' => 'baidu',
'text' => 'yandex'
);
Source: http://betterwp.net/wordpress-tips/get-search-keywords-from-referrer/
<?php
class GET_HOST_KEYWORD
{
public function get_host_and_keyword($_url) {
$p = $q = "";
$chunk_url = parse_url($_url);
$_data["host"] = ($chunk_url['host'])?$chunk_url['host']:'';
parse_str($chunk_url['query']);
$_data["keyword"] = ($p)?$p:(($q)?$q:'');
return $_data;
}
}
// Sample Example
$obj = new GET_HOST_KEYWORD();
print_r($obj->get_host_and_keyword('http://www.google.co.in/search?sourceid=chrome&ie=UTF-&q=hire php php programmer'));
// sample output
//Array
//(
// [host] => www.google.co.in
// [keyword] => hire php php programmer
//)
// $search_engines = array(
// 'q' => 'alltheweb|aol|ask|ask|bing|google',
// 'p' => 'yahoo',
// 'wd' => 'baidu',
// 'text' => 'yandex'
//);
?>
$query = parse_url($request, PHP_URL_QUERY);
This one should work For Google, Bing and sometimes, Yahoo Search:
if( isset($_SERVER['HTTP_REFERER']) && $_SERVER['HTTP_REFERER']) {
$query = getSeQuery($_SERVER['HTTP_REFERER']);
echo $query;
} else {
echo "I think they spelled REFERER wrong? Anyways, your browser says you don't have one.";
}
function getSeQuery($url = false) {
$segments = parse_url($url);
$keywords = null;
if($query = isset($segments['query']) ? $segments['query'] : (isset($segments['fragment']) ? $segments['fragment'] : null)) {
parse_str($query, $segments);
$keywords = isset($segments['q']) ? $segments['q'] : (isset($segments['p']) ? $segments['p'] : null);
}
return $keywords;
}
I believe google and yahoo had updated their algorithm to exclude search keywords and other params in the url which cannot be received using http_referrer method.
Please let me know if above recommendations will still provide the search keywords.
What I am receiving now are below when using http referrer at my website end.
from google: https://www.google.co.in/
from yahoo: https://in.yahoo.com/
Ref: https://webmasters.googleblog.com/2012/03/upcoming-changes-in-googles-http.html

Categories