I know this question has been asked many times before but none of the options seem to work for me (including the regexes and php's filter_var). I'm looking for a way to make sure a URL is a valid possibility before continuing, below are examples of valid URLs
google.com
www.google.com
http://www.google.com
https://www.google.com
http://google.com
https://google.com
domain-site.com
google.net
google.ca
google.email
IPs (ex. 198.XX.XXX.XXX)
Here are invalid URLs
test
http://test
https://test
Below are links/things I've tried to get the result I want above:
PHP regex for url validation, filter_var is too permisive
filter_var($url_new, FILTER_VALIDATE_URL)
preg_match('#^((https?)://(?:([a-z0-9-.]+:[a-z0-9-.]+)#)?([a-z0-9-.]+)(?::([0-9]+))?)(?:/|$)((?:[^?/]*/)*)([^?]*)(?:\?([^\#]*))?(?:\#.*)?$#i', $toLoad, $tmp)
etc
I know I can cURL to check if a domain exists and this may be the easiest option but its also very slow, I would prefer a regex type solution where I can get a preliminary check of the domain's validity.
Any help is appreciated!
Related
I'm aware of filter_var() and it's FILTER_VALIDATE_URL filter. The point is there are some URLs which are exist but not count as a valid URL and I need to verify them. For example, URLs with spaces.
At the moment I am checking only those protocols that application is interested in (http, https and ftp) using strpos().
But I was wondering if there is a more generic method in PHP that I could employ?
It might help if i explain that i need to differentiate if the target source is a URL or a local path.
Use function parse_url() to split the URL into components then do some basic analysis on the pieces it returns (or just check if the returned value is an array() or FALSE).
As the documentation says:
This function is not meant to validate the given URL, it only breaks it up into the above listed parts. Partial URLs are also accepted, parse_url() tries its best to parse them correctly.
And also:
On seriously malformed URLs, parse_url() may return FALSE.
It looks like it matches your request pretty well.
Hi I would like to validate this following urls, so they all would pass with or without http/www part in them as long as there is TLD present like .com, .net, .org etc..
Valid URLs Should Be:
http://www.domain.com
http://domain.com
https://www.domain.com
https://domain.com
www.domain.com
domain.com
To support long tlds:
http://www.domain.com.uk
http://domain.com.uk
https://www.domain.com.uk
https://domain.com.uk
www.domain.com.uk
domain.com.uk
To support dashes (-):
http://www.domain-here.com
http://domain-here.com
https://www.domain-here.com
https://domain-here.com
www.domain-here.com
domain-here.com
Also to support numbers in domains:
http://www.domain1-test-here.com
http://domain1-test-here.com
https://www.domain1-test-here.com
https://domain1-test-here.com
www.domain1-test-here.com
domain-here.com
Also maybe allow even IPs:
127.127.127.127
(but this is extra!)
Also allow dashes (-), forgot to mantion that =)
I've found many functions that validate one or another but not both at same time.
If any one knows good regex for it, please share. Thank you for your help.
For url validation perfect solution.
Above Answer is right but not work on all domains like .me, .it, .in
so please user below for url match:
$pattern = '/(?:https?:\/\/)?(?:[a-zA-Z0-9.-]+?\.(?:[a-zA-Z])|\d+\.\d+\.\d+\.\d+)/';
if(preg_match($pattern, "http://website.in"))
{
echo "valid";
}else{
echo "invalid";
}
When you ignore the path part and look for the domain part only, a simple rule would be
(?:https?://)?(?:[a-zA-Z0-9.-]+?\.(?:com|net|org|gov|edu|mil)|\d+\.\d+\.\d+\.\d+)
If you want to support country TLDs as well you must either supply a complete (current) list or append |.. to the TLD part.
With preg_match you must wrap it between some delimiters
$pattern = ';(?:https?://)?(?:[a-zA-Z0-9.-]+?\.(?:com|net|org|gov|edu|mil)|\d+\.\d+\.\d+\.\d+);';
$index = preg_match($pattern, $url);
Usually, you use /. But in this case, slashes are part of the pattern, so I have chosen some other delimiter. Otherwise I must escape the slashes with \
$pattern = '/(?:https?:\/\/)?(?:[a-zA-Z0-9.-]+?\.(?:com|net|org|gov|edu|mil)|\d+\.\d+\.\d+\.\d+)/';
Don't use a regular expression. Not every problem that involves strings needs to use regexes.
Don't write your own URL validator. URL validation is a solved problem, and there is existing code that has already been written, debugged and testing. In fact, it comes standard with PHP.
Look at PHP's built-in filtering functionality: http://us2.php.net/manual/en/book.filter.php
I think you can use flags for filter_vars.
For FILTER_VALIDATE_URL there is several flags available:
FILTER_FLAG_SCHEME_REQUIRED Requires the URL to contain a scheme
part.
FILTER_FLAG_HOST_REQUIRED Requires the URL to contain a host
part.
FILTER_FLAG_PATH_REQUIRED Requires the URL to contain a path
part.
FILTER_FLAG_QUERY_REQUIRED Requires the URL to contain a query
string.
FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED used by default.
Lets say you want to check for path part and do not want to check for scheme part, you can do something like this (falg is a bitmask):
filter_var($url, FILTER_VALIDATE_URL, ~FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_PATH_REQUIRED)
I know there have been many questions asking for help converting URLs to clickable links in strings, but I haven't found quite what I'm looking for.
I want to be able to match any of the following examples and turn them into clickable links:
http://www.domain.com
https://www.domain.net
http://subdomain.domain.org
www.domain.com/folder
subdomain.domain.net
subdomain.domain.edu/folder/subfolder
domain.net
domain.com/folder
I do not want to match random.stuff.separated.with.periods.
EDIT: Please keep in mind that these URLs need to be found within larger strings of 'normal' text. For example, I want to match 'domain.net' in "Hello! Come check out domain.net!".
I think this could be accomplished with a regex that can determine whether the matching url contains .com, .net, .org, or .edu followed by either a forward slash or whitespace. Other than a user typo, I can't imagine any other case in which a valid URL would have one of those followed by anything else.
I realize there are many valid domain extensions out there, but I don't need to support them all. I can just choose which to support with something like (com|net|org|edu) in the regex. Unfortunately, I'm not skilled enough with regex yet to know how to properly implement this.
I'm hoping someone can help me find a regular expression (for use with PHP's preg_replace) that can match URLs based on just about any text connected by one or more dots and either ending with one of the specified extensions followed by whitespace OR containing one of the specified extensions followed by a slash and possibly folders.
I did several searches and so far have not found what I'm looking for. If there already exists a SO post that answers this, I apologize.
Thanks in advance.
--- EDIT 3 ---
After days of trial and error and some help from SO, here's what works:
preg_replace_callback('#(\s|^)((https?://)?(\w|-)+(\.(\w+|-)*)+(?<=\.net|org|edu|com|cc|br|jp|dk|gs|de)(\:[0-9]+)?(?:/[^\s]*)?)(?=\s|\b)#is',
create_function('$m', 'if (!preg_match("#^(https?://)#", $m[2]))
return $m[1]."".$m[2].""; else return $m[1]."".$m[2]."";'),
$event_desc);
This is a modified version of anubhava's code below and so far seems to do exactly what I want. Thanks!
You can use this regex:
#(\s|^)((?:https?://)?\w+(?:\.\w+)+(?<=\.(net|org|edu|com))(?:/[^\s]*|))(?=\s|\b)#is
Code:
$arr = array(
'http://www.domain.com/?foo=bar',
'http://www.that"sallfolks.com',
'This is really cool site: https://www.domain.net/ isn\'t it?',
'http://subdomain.domain.org',
'www.domain.com/folder',
'Hello! You can visit vertigofx.com/mysite/rocks for some awesome pictures, or just go to vertigofx.com by itself',
'subdomain.domain.net',
'subdomain.domain.edu/folder/subfolder',
'Hello! Check out my site at domain.net!',
'welcome.to.computers',
'Hello.Come visit oursite.com!',
'foo.bar',
'domain.com/folder',
);
foreach($arr as $url) {
$link = preg_replace_callback('#(\s|^)((?:https?://)?\w+(?:\.\w+)+(?<=\.(net|org|edu|com))(?:/[^\s]*|))(?=\s|\b)#is',
create_function('$m', 'if (!preg_match("#^(https?://)#", $m[2]))
return $m[1]."".$m[2].""; else return $m[1]."".$m[2]."";'),
$url);
echo $link . "\n";
OUTPUT:
http://www.domain.com/?foo=bar
http://www.that"sallfolks.com
This is really cool site: https://www.domain.net/ isn't it?
http://subdomain.domain.org
www.domain.com/folder
Hello! You can visit vertigofx.com/mysite/rocks for some awesome pictures, or just go to vertigofx.com by itself
subdomain.domain.net
subdomain.domain.edu/folder/subfolder
Hello! Check out my site at domain.net!
welcome.to.computers
Hello.Come visit oursite.com!
foo.bar
domain.com/folder
PS: This regex only supports http and https scheme in URL. So eg: if you want to support ftp also then you need to modify the regex a little.
'/(http(s)?:\/\/)?[\w\/\.]+(\.((com)|(edu)|(net)|(org)))[\w\/]*/'
That works for your examples. You might want to add extra characters support for "-", "&", "?", ":", etc in the last bracket.
'/(http(s)?:\/\/)?[\w\/\.]+(\.((com)|(edu)|(net)|(org)))[\w\/\?=&-;]*/'
This will support parameters and port numbers.
eg.: www.foo.ca:8888/test?param1=val1¶m2=val2
Thanks a ton. I modified his final solution to allow all domains (.ca, .co.uk), not just the specified ones.
$html = preg_replace_callback('#(\s|^)((https?://)?(\w|-)+(\.[a-z]{2,3})+(\:[0-9]+)?(?:/[^\s]*)?)(?=\s|\b)#is',
create_function('$m', 'if (!preg_match("#^(https?://)#", $m[2])) return $m[1]."".$m[2].""; else return $m[1]."".$m[2]."";'),
$url);
As far as i know there are many other questions similar to title, but my main reason for asking this question is i want my validation as perfect as i want. Here is my explanation which URL should valid
http:// (if given then match otherwise skip),
domain.com (should match & return validate)
subdomain.domain.com (should match & return validate)
www.com (should return false)
http://www.com (should return false)
I searched a lot about perfect regex pattern according to my need but didn't succeed so thats why i made my self and posting here to want to know that anyother Valid URL would it skip or not except http://localhost.
If yes then please correct me.
Pattern:
((?:http|https|ftp)://)?(?:www.)?((?!www)[A-Z0-9][A-Z0-9_-]*(?:.[A-Z0-9][A-Z0-9_-]*)+):?(\d+)?/?
I know this actually doesn't answer your question directly, but REGEXes aside, you can also use filter_var(), with the flag FILTER_VALIDATE_URL, which returns the URL in case of valid url, or FALSE otherwise:
var_dump(filter_var('http://example.com', FILTER_VALIDATE_URL));
// string(18) http://example.com
You can read here the filters used by this function, especially the last row regarding flags used by the VALIDATE_URL filter.
I actually don't know how it's implemented internally, but I suppose it works better than many regexes you can find outside in the wild internet.
So I've got this URL regex:
/(?:((?:[^-/"':!=a-z0-9_#]|^|\:))((https?://)((?:[^\p{P}\p{Lo}\s].-|[^\p{P}\p{Lo}\s])+.[a-z]{2,}(?::[0-9]+)?)(/(?:(?:([a-z0-9!*';:=+\$/%#[]-_,~]+))|#[a-z0-9!*';:=+\$/%#[]-_,~]+/|[.\,]?(?:[a-z0-9!*';:=+\$/%#[]-_~]|,(?!\s)))*[a-z0-9=#/]?)?(\?[a-z0-9!*'();:&=+\$/%#[]-_.,~]*[a-z0-9_&=#/])?))/iux
What it's currently matching:
http://www.google.com
http://google.com
I need it to also match:
www.google.com
google.com
I tried making the protocol part of the regex optional by slapping a ? at the end "(https?:\/\/)?" but that didn't do anything.
Ideas?
I'd look for something in the language that you are using to do this. URLs are tough to match with a regex. If you insist, I changed yours to make the (https?://) optional. I did not check it though.
/(?:((?:[^-/"':!=a-z0-9_#]|^|\:))((https?://)?((?:[^\p{P}\p{Lo}\s].-|[^\p{P}\p{Lo}\s])+.[a-z]{2,}(?::[0-9]+)?)(/(?:(?:([a-z0-9!*';:=+\$/%#[]-_,~]+))|#[a-z0-9!*';:=+\$/%#[]-_,~]+/|[.\,]?(?:[a-z0-9!*';:=+\$/%#[]-_~]|,(?!\s)))*[a-z0-9=#/]?)?(\?[a-z0-9!*'();:&=+\$/%#[]-_.,~]*[a-z0-9_&=#/])?))/iux
I got this example from the RFC 3986 and was directed there by this comment. Although, I'd still recommend using something from whatever language you are using rather than a regex.
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
Since you are using PHP, did you consider using parse_url? It looks like it will return false on bad urls.