PHP RegEx for "Website Name" - php

Duplicate: PHP validation/regex for URL
My goal is create a PHP regex for website name. The regex is for a lead gathering form and should accept any legit kind of website name syntax that someone might enter. After an exhaustive search, I'm surprised that I can't find one out there.
Here are the regex matches that I'm looking for:
somewebsite.com
http://somewebsite.com
http://www.somewebsite.com
AND, it should also match:
any of the above with a trailing backslash, such as: somewebsite.com/
subdomains

No RegEx necessary.
$subject = 'example.com';
$part = (stripos($subject, 'http://') === FALSE) ? 'http://' : '' ;
var_dump(filter_var($part.$subject, FILTER_VALIDATE_URL));

You might need to tweak it:
<?php
$pattern = '/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?#)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';
$url1 = "http://www.somewebsite.com";
$url2 = "https://www.somewebsite.com";
$url3 = "https://somewebsite.com";
$url4 = "www.somewebsite.com";
$url5 = "somewebsite.com";
function valURL($pattern, $url) {
$return = false;
if(preg_match($pattern, $url)) {
$return = true;
}
if($return == true) {
echo "Match URL: <font color='green'>" . $url . "</font><br /><br />";
} else {
echo "Try Again: <font color='red'>URL: " . $url . "</font><br /><br />";
}
}
valURL($pattern, $url1);
valURL($pattern, $url2);
valURL($pattern, $url3);
valURL($pattern, $url4);
valURL($pattern, $url5);
?>

I decided to benchmark the answers here to prove that regular expressions are not the answer for such simple tasks. Andy Leekman's code is whole 30% to 60% quicker than other answers. He did have a bug, but I fixed that with a line of code. You can view my results below.
Here's the code on which the tests ran.
http://pastie.org/476900
alt text http://img254.imageshack.us/img254/7821/capturevzh.png
PS If anyone elses uses a regular expression to validate a URL I might go mad ;)

/^([a-z0-9]([-a-z0-9]*[a-z0-9])?\\.)+((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(c[acdfghiklmnorsuvxyz]|cat|com|coop)|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw])$/i
http://www.shauninman.com/archive/2006/05/08/validating_domain_names
Courtesy of google. It is VERY complex though, so someone else might have a simpler one.
EDIT: Try andy's answer first. If you can find an alternative to a regex, 9/10 the alternative is much better.

^(https?://)?(([0-9a-z_!'().&=$%-]: )?[0-9a-z_!'().&=$%-]#)?(([0-9]{1,3}\.){3}[0-9]{1,3}|([0-9a-z_!'()-]\.)([0-9a-z][0-9a-z-]{0,61})?[0-9a-z]\.[a-z]{2,6})(:[0-9]{1,4})?((/?)|(/[0-9a-z_!*'().;?:#&=$,%#-])/?)$

Related

Make PHPBB 3.0.14 and ABBC3 compatible with PHP 7.3

I'm trying to make ABBC3 work with PHP 7.3 and PHPBB 3.0.14 since I can't move to PHPBB 3.3 due lots of issues with MODs not ported to extensions and theme (Absolution).
I have asked help in PHPBB forum without luck because 3.0.x and 3.1.x version are not supported anymore.
So after dozens of hours trying to understand bbcode functions I'm almost ready.
My code works when there's a single bbcode in message. But doesn't works when there's more bbcode or it's mixed with texts.
So I would like to get some help to solve this part to make everything work.
In line 98 in includes/bbcode.php this function:
$message = preg_replace($preg['search'], $preg['replace'], $message);
Is returning something like this:
$message = "some text $this->Text_effect_pass('glow', 'red', 'abc') another text. $this->moderator_pass('"fernando"', 'hello!') more text"
For this message:
some text [glow=red]abc[/glow] another text.
[mod="fernando"]hello![/mod] more text
The input for preg_replace above is like this just for context:
"some text [glow=red:mkpanc3g]abc[/glow:mkpanc3g] another text. [mod="fernando":mkpanc3g]hello![/mod:mkpanc3g]"
So basically I have to split this string in valid expressions to apply eval() then concatenate everything. Like this:
$message = "some text". eval($this->Text_effect_pass('glow', 'red', 'abc');) . "another text " . eval($this->moderator_pass('"fernando"', 'hello!');). "more text"
In this specific case there's also double quotes left in '"fernando"'.
I know is not safe apply eval() to user input so I would like to make some type of preg_match and/or preg_split to get values inside of () to pass as parameter to my functions.
The functions are basically:
Text_effect_pass()
moderator_pass()
anchor_pass()
simpleTabs_pass()
I'm thinking in something like this (Please ignore errors here):
if(preg_match("/$this->Text_effect_pass/", $message)
{
then split the string and get value inside of() and remove extra single or double quotes.
after:
$textEffect = Text_effect_pass($value[0], $value[1], $value[2]);
Finally concatenate everything:
$message = $string[0] .$textEffect. $string[1];
}
if(preg_match("/$this->moderator_pass/", $message)
{
.....
}
P.S.: ABBC3 is not compatible with PHP 7.3 due usage of e modifier. I have edited everything to remove the modifier.
Here you can see it working separately:
bbcode 1
bbcode 2
Can someone give me some help please?
After long time searching for a solution for this problem I found this site that helped me build the regex.
Now I have managed to solve the problem and I have my forum fully working with PHPBB 3.14, PHP 7.3 and ABBC3.
My solution is:
// Start Text_effect_pass
$regex = "/(\\$)(this->Text_effect_pass)(\().*?(\')(,)( )(\').*?(\')(,)( )(\').*?(\'\))/is";
if (preg_match_all($regex, $message, $matches)) {
foreach ($matches[0] as $key => $func) {
$bracket = preg_split("/(\\$)(this->Text_effect_pass)/", $func);
$param = explode("', '", $bracket[1]);
$param[0] = substr($param[0], 2);
$param[2] = substr($param[2], 0, strrpos($param[2], "')"));
$effect = $this->Text_effect_pass($param[0], $param[1], $param[2]);
if ($key == 0) {
$init = $message;
} else {
$init = $mess;
}
$mess = str_replace($matches[0][$key], $effect, $init);
}
$message = $mess;
} // End Text_effect_pass
// Start moderator_pass
$regex = "/(\\$)(this->moderator_pass)(\().*?(\')(,).*?(\').*?(\'\))/is";
if (preg_match_all($regex, $message, $matches)) {
foreach ($matches[0] as $key => $func) {
$bracket = "/(\\$)(this->moderator_pass)/";
$bracket = preg_split($bracket, $func);
$param = explode("', '", $bracket[1]);
$param[0] = substr($param[0], 2);
$param[1] = substr($param[1], 0, strrpos($param[1], "')"));
$effect = $this->moderator_pass($param[0], $param[1]);
if ($key == 0) {
$init = $message;
} else {
$init = $mess;
}
$mess = str_replace($matches[0][$key], $effect, $init);
}
$message = $mess;
} // End moderator_pass
If someone is interested can find patch files and instructions here.
Best regards.

file_get_contents() find why not work

<?
$ip = '95.79.1.36'; //russian ip for test
$str = 'http://ipgeobase.ru:7020/geo?ip='.$ip;
$content = file_get_contents($str);
preg_match_all('#<country>(.*)(</country>)#Usi', $content, $matches);
$country = $matches[0][0];
preg_match_all('#<city>(.*)(</city>)#Usi', $content, $matches);
$city = $matches[0][0];
if($country == 'RU'){
echo 'City: '.$city.'';
}else{
echo 'Country: '.$country.'';
}
?>
The problem is $country == 'RU' , not work, my question is why ?
Thanks )))
You probably shouldn't be parsing HTML/XHTML/XML with Regex. See: RegEx match open tags except XHTML self-contained tags
I recommend using PHP's SimpleXML parser. The following worked for me:
<?php
$ip = '95.79.1.36'; //russian ip for test
$str = 'http://ipgeobase.ru:7020/geo?ip='.$ip;
$results = simplexml_load_file($str);
$country = $results->ip->country;
$city = $results->ip->city;
if($country == 'RU'){
echo 'City: '.$city.'';
}else{
echo 'Country: '.$country.'';
}
?>
Your server probably does not allow_url_fopen (php.ini directive). Anyway, the technology you are looking for for this particular case is cURL : https://php.net/curl.
I'd be delighted to provide more explanations about your code specifically, and even provide cURL code samples, once you'll have edited your question properly, with more information and attempts resulting of your efforts.

How to filter URLs that contain white space with preg match?

I parse through a text that contains several links. Some of them contain white spaces but have a file ending. My current pattern is:
preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $links, $match);
This works the same way:
preg_match_all('/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/', $links, $match);
I don't know much about the patterns and didn't find a good tutorial that explains the meaning of all possible patterns and shows examples.
How could I filter an URL like this:
http://my-url.com/my doc.doc or even http://my-url.com/my doc with more white spaces.doc
The \s in that preg_match_all functions stands for a white space. But how could I check if there is a file ending behind one or some white spaces?
Is it possible?
Why not just make use of PHP's FILTER functions. ?
<?php
$url = "http://my-url.com/my doc.doc";
if(!filter_var($url, FILTER_VALIDATE_URL))
{
echo "URL is not valid";
}
else
{
echo "URL is valid";
}
OUTPUT :
URL is not valid
this might be what you are looking for which uses urlencode
$file = "my doc with more white spaces.doc";
echo " http://my-url.com/" . urlencode($file);
which produces:
http://my-url.com/my+doc+with+more+white+spaces.doc
or with rawurlencode
produces:
http://my-url.com/my%20doc%20with%20more%20white%20spaces.doc
EDIT: Something like the following might help to parse your urls with parse_url
DEMO
$url = 'http://my-url.com/my doc with more white spaces.doc';
$purl = parse_url($url);
$rurl = "";
if(isset($purl['scheme'])){
$rurl .= $purl['scheme'] . "://";
}
if(isset($purl['host'], $purl['path'])){
$rurl .= $purl['host'] . rawurlencode($purl['path']);
}
if($rurl === ""){
$rurl = $url;#error parsing error/invalid url?
}
for sub directories you can do
$purl['path'] = implode('/', array_map(function($value){return rawurlencode($value);}, explode('/', $purl['path'])));
I don't know much about php but this regex
(http|ftp)(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?
will match every url even with spaces
I think this regex will do.
use this regex
preg_match_all("/^(?si)(?>\s*)(((?>https?:\/\/(?>www\.)?)?(?=[\.-a-z0-9]{2,253}(?>$|\/|\?|\s))[a-z0-9][a-z0-9-]{1,62}(?>\.[a-z0-9][a-z0-9-]{1,62})+)(?>(?>\/|\?).*)?)?(?>\s*)$/", $input_lines, $output_array);
Demo
Alright after doing this really helpful tutorial I finally know how the regex syntax works. After finishing it I experimented a bit on this site
It was pretty easy after figuring out that all hyperlinks in my parsed document were in between quotation marks so I just had to change the regex to:
preg_match_all('#\bhttps?://[^()<>"]+#', $links, $match);
so that after the " it is looking for the next match that begins with http.
But that's not the full solution yet. The user Class was right - without rawurlencode the filenames it won't work.
So the next step was this:
function endsWith($haystack, $needle)
{
return $needle === "" || substr($haystack, -strlen($needle)) === $needle;
}
if(endsWith($textlink, ".doc") || endsWith($textlink, ".docx") || endsWith($textlink, ".pdf") || endsWith($textlink, ".jpg") || endsWith($textlink, ".jpeg") || endsWith($textlink, ".png")){
$file = substr( $textlink, strrpos( $textlink, '/' )+1 );
$rest_url=substr($textlink, 0, strrpos($textlink, '/' )+1 );
$textlink=$rest_url.rawurlencode($file);
}
That filters the filenames from the URLs and rawurlencodes them so that the the output links are correct.
I think this should work:
$url = '...';
$url_new = '';
$array = explode(' ',$url);
foreach($array as $name => $val){
if ($val!=' '){
$url_new = $url_new.$val;
}
}

Error using preg_relace to change url youtube?

I have a sample code:
<?php
$url = 'http://www.youtube.com/watch?v=KTRPVo0d90w';
$pattern = '/http:\/\/www\.youtube\.com\/watch\?(.*?)v=([a-zA-Z0-9_\-]+)(\S*)/i';
$replace = $pattern.'&w=550';
$string = preg_replace($pattern, $replace, $url);
?>
How to result is http://www.youtube.com/watch?v=KTRPVo0d90w&w=550
You can just append using the . operator:
<?php
$url = 'http://www.youtube.com/watch?v=KTRPVo0d90w';
$string = $url.'&w=550';
?>
Use preg_match instead:
<?php
$url = 'http://www.youtube.com/watch?v=KTRPVo0d90w&s=222';
$pattern = '/v=[^&]+/i';
preg_match($pattern, $url, $match);
echo 'http://www.youtube.com/watch?'.$match[0].'&w=550';
?>
Like below?
$url = 'http://www.youtube.com/watch?v=KTRPVo0d90w';
$bit = '&w=550';
echo "${url}${bit}";
Don't get me wrong, I'm not looking to gain any points here, but just thought I would add to this question and include a few options. I love toying with ideas like this every once in a while.
Using jh314's idea to concatenate the strings, thought that this could be used for future use, to actually replace a string inside the video's YouTube number, should the occasion ever present itself.
Such as $number for instance.
<?php
$url = 'http://www.youtube.com/watch?v=';
$number = 'KTRPVo0d90w';
$string = $url.$number.'&w=550';
// Output to screen
echo $string;
echo "<br>";
// Link to video
echo "Click for the video";
?>
The same could easily be done for the video's width.

PHP normalize remote url's [duplicate]

This question already has an answer here:
How do I apply URL normalization rules in PHP?
(1 answer)
Closed 9 years ago.
Is there any quick function that will convert: HtTp://www.ExAmPle.com/blah to http://www.example.com/blah
Basically I want to lower case the case-insensitive parts of a url.
No, you'll have to write code for it on your own.
But you can use parse_url() to split the URL into its parts.
Since you asked for "quick," here's a one-liner that does the job:
$url = 'HtTp://User:Pass#www.ExAmPle.com:80/Blah';
echo preg_replace_callback(
'#(^[a-z]+://)(.+#)?([^/]+)(.*)$#i',
create_function('$m',
'return strtolower($m[1]).$m[2].strtolower($m[3]).$m[4];'),
$url);
Outputs:
http://User:Pass#www.example.com:80/Blah
EDIT/ADD:
I've tested, and this version is about 55% faster than using preg_replace_callback with an anonymous function:
echo preg_replace(
'#(^[a-z]+://)(.+#)?([^/]+)(.*)$#ei',
"strtolower('\\1').'\\2'.strtolower('\\3').'\\4'",
$url);
I believe this class will do what you're looking for http://www.glenscott.co.uk/blog/2011/01/09/normalize-urls-with-php/
Here's a solution, expanding on what #ThiefMaster already mentioned:
DEMO
function urltolower($url){
if (($_url = parse_url($url)) !== false){ // valid url
$newUrl = strtolower($_url['scheme']) . "://";
if ($_url['user'] && $_url['pass'])
$newUrl .= $_url['user'] . ":" . $_url['pass'] . "#";
$newUrl .= strtolower($_url['host']) . $_url['path'];
if ($_url['query'])
$newUrl .= "?" . $_url['query'];
if ($_url['fragment'])
$newUrl .= "#" . $_url['fragment'];
return $newUrl;
}
return $url; // could return false if you'd like
}
Note: Not battle-tested but it should get you going.

Categories