PHP check if url is valid - php

I wonder what would be the best way in php to check if provided url is valid... At first I tried with:
filter_var($url, FILTER_VALIDATE_URL) === false
But it does not accept www.example.com (without protocol). So I tried with a simple modification:
protected function checkReferrerUrl($url) {
if(strpos($url, '://') == false) {
$url = "http://".$url;
}
if(filter_var($url, FILTER_VALIDATE_URL) === false) {
return false;
}
return true;
}
Now it works fine with www.example.com but also accepts simple foo as it converts to http://foo. However though this is not a valid public url I think... so what would you suggest? Go back to traditional regexp?

I recommend, that you do not use filter_var with type URL.
There are much more side-effects.
For example, these are valid URLs according to filter_var:
http://example.com/"><script>alert(document.cookie)</script>
http://example.ee/sdsf"f
Additionally FILTER_VALIDATE_URL does not support internationalized domain names (IDN).
I recommend using a regex combined with some ifs afterwards (f.e. for the domain) for security reasons.
Without the security aspect I am using parse_url to take my parts. But this function has a similar issue, when the scheme (no http/https) is missing.

Use this
<?php
$url = 'www.example.com';
if(validateURL($url)){
echo "Valid";
}else{
echo "invalid";
}
function validateURL($URL) {
$pattern_1 = "/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
$pattern_2 = "/^(www)((\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
if(preg_match($pattern_1, $URL) || preg_match($pattern_2, $URL)){
return true;
} else{
return false;
}
}
?>
Try this one too
<?php
// Assign URL to $URL variable
$url = 'http://example.com';
// Check url using preg_match
if (preg_match("/^(https?:\/\/+[\w\-]+\.[\w\-]+)/i",$url)){
echo "Valid";
}else{
echo "invalid";
}
?>

Related

php filter_var and FILTER_FLAG_SCHEME_REQUIRED

How do I tell PHP 7.1.3 that when I use filter_var to validate a URL, that the scheme is NOT required? just saying www.google.com for example is a completely valid URL, but filter_var will return false for that. I see that I can add flags, but not how to remove them.
<?php
$url = 'www.google.com';
$sanitized = filter_var($url, FILTER_SANITIZE_URL);
$valid = filter_var($sanitized, FILTER_VALIDATE_URL);
if ($valid) {
echo "Yes\n";
} else {
echo "'$sanitized' is not valid\n";
}

PHP - FILTER_VALIDATE_URL not finding subdomains with underscore

Why the PHP function: FILTER_VALIDATE_URL thinks that an URL with a subdomain that contains an underscore is invalid?
<?php
$url = "http://smiling_politely.blogspot.com";
if (!filter_var($url, FILTER_VALIDATE_URL) === false) {
echo("$url is a valid URL");
} else {
echo("$url is not a valid URL");
}
?>
How can I make sure that this FILTER_VALIDATE_URL includes such existing URLs (possibly the fastest way for execution)?
Ok, I came up with this solution, hopefully it's going to work well..
<?php
$url = "http://smiling_politely.blogspot.com";
$check = parse_url($url,PHP_URL_HOST);
if(null!==$check) echo 'Valid'; else echo 'NOT valid.';
?>

This URL validation is not working properly

I have a code that I which is taken from another stackoverflow post,
here it is,
function validate_url($url)
{
$pattern = "/^((ht|f)tp(s?)\:\/\/|~/|/)?([w]{2}([\w\-]+\.)+([\w]{2,5}))(:[\d]{1,5})?/";
if (!preg_match($pattern, $url))
{
$this->form_validation->set_message('validate_url', 'The URL you entered is not correctly formatted.');
return false;
}
return false;
}
It is not working properly. It is allowing URL without (something like) .com or .in (anything after dot).
Meaning, it should allow proper the URL as
http://something.com or
http://www.something.in or
but not
http://something (without .in or .com or any other) or
something or
www.something
I don't know much about regular expressions. Please help me..
take a look at this site:
https://mathiasbynens.be/demo/url-regex
It contains a lot of different URL validation regex.
The one from Diego Perini:
_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?#)?(?:(?!10(?:\.\d{1,3}){3})(?!127(?:\.\d{1,3}){3})(?!169\.254(?:\.\d{1,3}){2})(?!192\.168(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]+-?)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$_iuS
seems so much better than the one used by filter_var.
You can use filter_var filter_var('http://example.com', FILTER_VALIDATE_URL); for validating your urls.
Here you can find all types of validation types you can use.
If you want to require the http(s) you can add use this.
filter_var('http://example.com', FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED)
Use this
<?php
$url = "http://something";
if ((!filter_var($url, FILTER_VALIDATE_URL) === false) && #fopen($url,"r")) {
echo("$url is a valid URL");
} else {
echo("$url is not a valid URL");
}
?>
Or use this
<?php
$url = 'http://example';
if(validateURL($url)){
echo "Valid";
}else{
echo "invalid";
}
function validateURL($URL) {
$pattern_1 = "/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
$pattern_2 = "/^(www)((\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
if(preg_match($pattern_1, $URL) || preg_match($pattern_2, $URL)){
return true;
} else{
return false;
}
}
?>
You can use following regular expression
$url = 'http://something.com';
$regex = "((https?|ftp)\:\/\/)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?#)?"; // User and Pass
$regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP
$regex .= "(\:[0-9]{2,5})?"; // Port
$regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+\/\$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor
if(preg_match("/^$regex$/", $url)) {
echo "Matched";
} else {
echo "Not Matched";
}

Url validation with regex for old php version

Note: I'm using an older PHP version so FILTER_VALIDATE_URL is not available at this time.
After many many searches I am still unable to find the exact answer that can cover all URL structure possibilities but at the end I'm gonna use this way:
I'm using the following function
1) Function to get proper scheme
function convertUrl ($url){
$pattern = '#^http[s]?://#i';
if(preg_match($pattern, $url) == 1) { // this url has proper scheme
return $url;
} else {
return 'http://' . $url;
}
}
2) Conditional to check if it is a URL or not
if (preg_match("/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&##\/%?=~_|!:,.;]*[-a-z0-9+&##\/%=~_|]/i", $url)) {
echo "URL is valid";
}else {
echo "URL is invalid<br>";
}
Guess What!? It works so perfect for all of these possibilities:
$url = "google.com";
$url = "www.google.com";
$url = "http://google.com";
$url = "http://www.google.com";
$url = "https://google.com";
$url = "https://www.codgoogleekarate.com";
$url = "subdomain.google.com";
$url = "https://subdomain.google.com";
But still have this edge case
$url = "blahblahblahblah";
The function convertUrl($url) will convert this to $url = "http://blahblahblahblah";
then the regex will consider it as valid URL while it isn't!!
How can I edit it so that it won't pass a URL with this structure http://blahblahblahblah
If you want to validate internet url's, add a check for including a dot (.) character in your reg-ex.
Note: http://blahblahblah is a valid url as is http://localhost
Try this:
if (preg_match("/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?#)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/", $url)) {
echo "URL is valid";
}else {
echo "URL is invalid<br>";
}

Compare two strings (urls) for same domain

I'm trying to compare two urls using PHP, ensuring that the domain name is the same. It cannot be the sub-domain. It has to literally be the same domain. Example:
http://www.google.co.uk would validate as true compared to http://www.google.co.uk/pages.html.
but
http://www.google.co.uk would validate as false compared to http://www.something.co.uk/pages.html.
Use parse_url(), and compare the "host" index in the array returned from the two calls to parse_url().
Use parse_url()
$url1 = parse_url("http://www.google.co.uk");
$url2 = parse_url("http://www.google.co.uk/pages.html");
if ($url1['host'] == $url2['host']){
//matches
}
simple, use parse_url()
$url1 = parse_url('http://www.google.co.uk');
$url2 = parse_url('http://www.google.co.uk/pages.html');
if($url1['host'] == $url2['host']){
// same domain
}
You could use parse_url for this
$url1 = parse_url('http://www.google.com/page1.html');
$domain1 = $url1['host'];
$url2 = parse_url('http://www.google.com/page2.html');
$domain2 = $url2['host'];
if($domain1 == $domain2){
// something
}
Expanding the answer given by Ariel, the code you could use is similar to the following one:
<?php
compare_host('http://www.google.co.uk', 'http://www.something.co.uk/pages.html');
function compare_host($url1, $url2)
{
// PHP prior of 5.3.3 emits a warning if the URL parsing failed.
$info = #parse_url($url1);
if (empty($info)) {
return FALSE;
}
$host1 = $info['host'];
$info = #parse_url($url2);
if (empty($info)) {
return FALSE;
}
return (strtolower($host1) === strtolower($info['host']));
}

Categories