preg_match with 2 rules doesn't work

preg_match with 2 rules doesn't work - php

I want that preg_match accepts https:// and http:// but also URLs without that like google.de, sh.st and stuff like that.
This if statement works, but it only accepts https:// and http:// URLs
if(!preg_match("/^[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i", $flink)) {
$html = "Error: invalid URL";
}
I tried this, but this doesn't work too...
$bd = "/^[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i";
$dbb = "/^[A-Za-z0-9\-_]+\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i";
if(!preg_match($bd, $flink) || !preg_match($dbb, $flink)) {
$html = "Error: invalid URL";
}
What is wrong? The problem page is https://viid.su

I think you want:
/^[A-Za-z0-9\-_]+\.+[A-Za-z0-9\.\/%&=\?\-_]+$/
^-- removed a \ here.
for the second regexp. Then it will match google.de and sh.st.

Related

php preg_match get everything after match in string

Looking for how to get the complete string in a URI, after the away?to=
My code:
if (isset($_SERVER[REQUEST_URI])) {
$goto = $_SERVER[REQUEST_URI];
}
if (preg_match("/to=(.+)/", $goto, $goto_url)) {
$link = "<a href='{$goto_url[1]}' target='_blank'>{$goto_url[1]}</a>";
The original link is:
https://domain.com/away?to=http://www.zdf.de/ZDFmediathek#/beitrag/video/2162504/Verschw%C3%B6rung-gegen-die-Freiheit-%281%29
.. but my code is cutting the string after the away?to= to only
http://www.zdf.de/ZDFmediathek
You know the fix for this preg_match function to allow really every character following the away?to= ??
UPDATE:
Found out, that $_SERVER['REQUEST_URI'] or $_SERVER['QUERY_STRING'] is already cutting the original URL. Do you know why and how to prevent that?

try use (.*) to get all after to=
$str = 'away?to=dfkhgkjdshfgkhldsflkgh';
preg_match("/to=(.*)/", $str, $goto_url);
echo $goto_url[1]; //dfkhgkjdshfgkhldsflkgh

Instead of extracting the URL with regex from the request URI you can just get it from the $_GET array:
$link = "<a href='{$_GET['to']}' target='_blank'>{$_GET['to']}</a>";

PHP Auto-correcting URLs

I dont wan't reinvent wheel, but i couldnt find any library that would do this perfectly.
In my script users can save URLs, i want when they give me list like:
google.com
www.msn.com
http://bing.com/
and so on...
I want to be able to save in database in "correct format".
Thing i do is I check is it there protocol, and if it's not present i add it and then validate URL against RegExp.
For PHP parse_url any URL that contains protocol is valid, so it didnt help a lot.
How guys you are doing this, do you have some idea you would like to share with me?
Edit:
I want to filter out invalid URLs from user input (list of URLs). And more important, to try auto correct URLs that are invalid (ex. doesn't contains protocol). Ones user enter list, it should be validated immediately (no time to open URLs to check those they really exist).
It would be great to extract parts from URL, like parse_url do, but problem with parse_url is, it doesn't work well with invalid URLs. I tried to parse URL with it, and for parts that are missing (and are required) to add default ones (ex. no protocol, add http). But parse_url for "google.com" wont return "google.com" as hostname but as path.
This looks like really common problem to me, but i could not find available solution on internet (found some libraries that will standardize URL, but they wont fix URL if it is invalid).
Is there some "smart" solution to this, or I should stick with my current:
Find first occurrence of :// and validate if it's text before is valid protocol, and add protocol if missing
Found next occurrence of / and validate is hostname is in valid format
For good measure validate once more via RegExp whole URL
I just have feeling I will reject some valid URLs with this, and for me is better to have false positive, that false negative.

I had the same problem with parse_url as OP, this is my quick and dirty solution to auto-correct urls(keep in mind that the code in no way are perfect or cover all cases):
Results:
http:/wwww.example.com/lorum.html => http://www.example.com/lorum.html
gopher:/ww.example.com => gopher://www.example.com
http:/www3.example.com/?q=asd&f=#asd =>http://www3.example.com/?q=asd&f=#asd
asd://.example.com/folder/folder/ =>http://example.com/folder/folder/
.example.com/ => http://example.com/
example.com =>http://example.com
subdomain.example.com => http://subdomain.example.com
function url_parser($url) {
// multiple /// messes up parse_url, replace 2+ with 2
$url = preg_replace('/(\/{2,})/','//',$url);
$parse_url = parse_url($url);
if(empty($parse_url["scheme"])) {
$parse_url["scheme"] = "http";
}
if(empty($parse_url["host"]) && !empty($parse_url["path"])) {
// Strip slash from the beginning of path
$parse_url["host"] = ltrim($parse_url["path"], '\/');
$parse_url["path"] = "";
}
$return_url = "";
// Check if scheme is correct
if(!in_array($parse_url["scheme"], array("http", "https", "gopher"))) {
$return_url .= 'http'.'://';
} else {
$return_url .= $parse_url["scheme"].'://';
}
// Check if the right amount of "www" is set.
$explode_host = explode(".", $parse_url["host"]);
// Remove empty entries
$explode_host = array_filter($explode_host);
// And reassign indexes
$explode_host = array_values($explode_host);
// Contains subdomain
if(count($explode_host) > 2) {
// Check if subdomain only contains the letter w(then not any other subdomain).
if(substr_count($explode_host[0], 'w') == strlen($explode_host[0])) {
// Replace with "www" to avoid "ww" or "wwww", etc.
$explode_host[0] = "www";
}
}
$return_url .= implode(".",$explode_host);
if(!empty($parse_url["port"])) {
$return_url .= ":".$parse_url["port"];
}
if(!empty($parse_url["path"])) {
$return_url .= $parse_url["path"];
}
if(!empty($parse_url["query"])) {
$return_url .= '?'.$parse_url["query"];
}
if(!empty($parse_url["fragment"])) {
$return_url .= '#'.$parse_url["fragment"];
}
return $return_url;
}
echo url_parser('http:/wwww.example.com/lorum.html'); // http://www.example.com/lorum.html
echo url_parser('gopher:/ww.example.com'); // gopher://www.example.com
echo url_parser('http:/www3.example.com/?q=asd&f=#asd'); // http://www3.example.com/?q=asd&f=#asd
echo url_parser('asd://.example.com/folder/folder/'); // http://example.com/folder/folder/
echo url_parser('.example.com/'); // http://example.com/
echo url_parser('example.com'); // http://example.com
echo url_parser('subdomain.example.com'); // http://subdomain.example.com

It's not 100% foolproof, but a 1 liner.
$URL = (((strpos($URL,'https://') === false) && (strpos($URL,'http://') === false))?'http://':'' ).$URL;
EDIT
There was apparently a problem with my initial version if the hostname contain http.
Thanks Trent

Detecting a URL using PHP preg_match that allows the absence of www

What would be the PHP URL preg_match code that allows the absence of the www. part of the link?
Normal preg_match URL code:
%^((https?://)|(www\.))([a-z0-9-].?)+(:[0-9]+)?(/.*)?$%i
I've tried this:
%^((https?://)|([a-z0-9-].?)+(:[0-9]+)?(/.*)?$%i
Extra background:
This is for an if statement that checks a user's input.
elseif (!preg_match("%^((https?://)|(www\.))([a-z0-9-].?)+(:[0-9]+)?(/.*)?$%i",$link)){
$linkErr = "Please enter a valid URL.";
}

You match either https?:// or www. Why can't there be both?
Not tested and just written down:
^(https?://)?(www\.)?[a-z]+\.[a-z]{2,3}(:[0-9]+)?(/.*)?$
https:// www. google . com :1234 /blabla.php?foo=bar

function hasWww($url) {
$data = parse_url($url);
if (isset($data['host'])) return strpos($data['host'], 'www.') === 0;
return false;
}

Try to insert a ? between ')('.
Try that:
preg_match("%^((https?://)|(www\.))?([a-z0-9-].?)+(:[0-9]+)?(/.*)?$%i", $link);

check if a string is a URL [duplicate]

This question already has answers here:
Best way to check if a URL is valid
(12 answers)
Closed 7 years ago.
I've seen many questions but wasn't able to understand how it works
as I want a more simple case.
If we have text, whatever it is, I'd like to check if it is a URL or not.
$text = "something.com"; //this is a url
if (!IsUrl($text)){
echo "No it is not url";
exit; // die well
}else{
echo "Yes it is url";
// my else codes goes
}
function IsUrl($url){
// ???
}
Is there any other way rather than checking with JavaScript in the case JS is blocked?

The code below worked for me:
if(filter_var($text, FILTER_VALIDATE_URL))
{
echo "Yes it is url";
exit; // die well
}
else
{
echo "No it is not url";
// my else codes goes
}
You can also specify RFC compliance and other requirements on the URL using flags. See PHP Validate Filters for more details.

PHP's filter_var function is what you need. Look for FILTER_VALIDATE_URL. You can also set flags to fine-tune your implementation.
No regex needed....

http://www.php.net/manual/en/function.preg-match.php#93824
<?php
$regex = "((https?|ftp)\:\/\/)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?#)?"; // User and Pass
$regex .= "([a-z0-9-.]*)\.([a-z]{2,3})"; // Host or IP
$regex .= "(\:[0-9]{2,5})?"; // Port
$regex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+\/\$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?"; // Anchor
if(preg_match("/^$regex$/i", $url)) // `i` flag for case-insensitive
{
return true;
}
?>
but your example URL is over simplified, (\w+)\.(\w+) would match it. somebody else mentioned filter_var which is simply a filter_var($url, FILTER_VALIDATE_URL) but it doesn't seem to like non-ascii characters so, beware...

Check if it is a valid url (example.com IS NOT a valid URL)
function isValidURL($url)
{
return preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*
(:[0-9]+)?(/.*)?$|i', $url);
}
How to use the function:
if(!isValidURL($fldbanner_url))
{
$errMsg .= "* Please enter valid URL including http://<br>";
}
Source: http://phpcentral.com/208-url-validation-in-php.html

Regexes are a poor way to validate something as complex as a URL.
PHP's filter_var() function offers a much more robust way to validate URLs. Plus, it's faster, since it's native code.

I don't think there is a definitive answer to this. Example of a valid URL:
localhost
http://xxx.xxx.xxx/alkjnsdf
abs.com
If you have some text. and not a large amount of it. You can check by doing a CURL request and see if that returns a valid response. Otherwise if I put localhost, it could be a link and it could be something else and you wouldn't be able check it.

You could use the following regex pattern to check if your variable is an url or not :
$pattern = "\b(([\w-]+:\/\/?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|\/)))";

Something like might work for you:
$arr = array('abc.com/foo',
'localhost',
'abc+def',
'how r u',
'https://how r u',
'ftp://abc.com',
'a.b');
foreach ($arr as $u) {
$url = $u;
if (!preg_match('#^(?:https?|ftp)://#', $url, $m))
$url = 'http://' . $url;
echo "$u => ";
var_dump(filter_var($url, FILTER_VALIDATE_URL));
}
OUTPUT:
abc.com/foo => string(18) "http://abc.com/foo"
localhost => string(16) "http://localhost"
abc+def => string(14) "http://abc+def"
how r u => bool(false)
https://how r u => bool(false)
ftp://abc.com => string(13) "ftp://abc.com"
a.b => string(10) "http://a.b"
So basically wherever you notice false as return value that is an INVALID URL for you.

regex to create link from url and strip www

I have a PHP function which takes a passed url and creates a clean link. It puts the full link in the anchor tags and presents just "www.domain.com" from the url. It works well but I would like to modify it so it strips out the "www." part as well.
<?php
// pass a url like: http://www.yelp.com/biz/my-business-name
// should return: yelp.com
function formatURL($url, $target=FALSE) {
if ($target) { $anchor_tag = "\\4"; }
else { $anchor_tag = "\\4"; }
$return_link = preg_replace("`(http|ftp)+(s)?:(//)((\w|\.|\-|_)+)(/)?(\S+)?`i", $anchor_tag, $url);
return $return_link;
}
?>
My regex skills are not that strong so any help greatly appreciated.

Take a look at parse_url: http://us2.php.net/manual/en/function.parse-url.php
This will simplify your logic quite a bit can can make replacing the www. a simple string replace.
$link = 'http://www.yelp.com/biz/my-business-name';
$hostname = parse_url($link, PHP_URL_HOST));
if(strpos($hostname, 'www.') === 0)
{
$hostname = substr($hostname, 4);
}
I have modified my original answer to account for the issue in the comments. The preg_replace in the post below will also work and is a bit more concise, I will leave this here to show an alternative solution that does not require invoking the regex engine if you desire.

This will get your the Domain name minus the www :
$url = preg_replace('/^www./', '', parse_url($url, PHP_URL_HOST));
^ in the regex means only remove www from the start of the string
Working example : http://codepad.org/FTNikw8g

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.