URL with query string validation using PHP - php

I need a PHP validation function for URL with Query string (parameters seperated with &). currently I've the following function for validating URLs
$pattern = '/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?#)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';
echo preg_match($pattern, $url);
This function correctly validates input like
google.com
www.google.com
http://google.com
http://www.google.com ...etc
But this won't validate the URL when it comes with parameters (Query string). for eg.
http://google.com/index.html?prod=gmail&act=inbox
I need a function that accepts both types of URL inputs. Please help. Thanks in advance.

A simple filter_var
if(filter_var($yoururl, FILTER_VALIDATE_URL))
{
echo 'Ok';
}
might do the trick, although there are problems with url not preceding the schema:
http://codepad.org/1HAdufMG
You can turn around the issue by placing an http:// in front of urls without it.
As suggested by #DaveRandom, you could do something like:
$parsed = parse_url($url);
if (!isset($parsed['scheme'])) $url = "http://$url";
before feeding the filter_var() function.
Overall it's still a simpler solution than some extra-complicated regex, though..
It also has these flags available:
FILTER_FLAG_PATH_REQUIRED FILTER_VALIDATE_URL Requires the URL to
contain a path part. FILTER_FLAG_QUERY_REQUIRED FILTER_VALIDATE_URL
Requires the URL to contain a query string.

http://php.net/manual/en/function.parse-url.php
Some might think this is not a 100% bullet-proof,
but you can give a try as a start

Related

Codeigniter current_url avoid jpg etc

I'm quite new in codeigniter. I'm using the current_url() function to preserve previously viewed page's URL. But the function (I think from different ajax calls) gives i.e. jpg files' url.
Like this:
/uploads/default/files/HTC.jpg
I'd like to avoid these and just preserve those URLs which are used in Browser's URL bar.
Any idea? Thanks in advance!
I assume you are using current_url() and then saving the output of this function somewhere for later retrieval.
So you could, before you save the string, perform a regex check to see if it fits the format you want:
$pattern = '~.+\.[a-zA-Z]{0,3}$~';
$string = current_url();
preg_match($pattern, $string, $matches);
if (empty($matches)) {
// We can save the url
}
The regex will hit on Urls which end in a . with zero to 3 letters:
HTC.jpg will fail
la/di.php will fail
la/items/2 will pass

Regex to filter int from url

I'm trying to filter out a value of a url.
The url looks like the following:
http://userimages-akm.imvu.com/catalog/includes/modules/phpbb2/images/avatars/145870556_47076915459092eafd7b69.jpg
Now i'm trying to only receive the following part from the url: 145870556
I thought about using a regex. But i won't get a working regex beside this one:
^((http[s]?|ftp):\/)?\/?([^:\/\s]+)((\/\w+)*\/)([\w\-\.]+[^#?\s]+)(.*)?(#[\w\-]+)?$
Is there a better regex to use?
If the image filename always follows the same format <timestamp>_<hex-value>.<extension>, then you don't need to match the entire URL.
$url = 'http://userimages-akm.imvu.com/catalog/includes/modules/phpbb2/images/avatars/145870556_47076915459092eafd7b69.jpg';
preg_match_all('~\/(\d+)_.*$~', $url, $matches);
// $matches[1] = '145870556';
https://regex101.com/r/vsDnoj/2

How to properly sanitize URL as a link in PHP?

I have a site where users can share a link to their homepage such as http://example.com/user. Currently, I am using the PHP function filter_var($_POST['url'], FILTER_VALIDATE_URL) to validate the URL before adding it to database using prepared statement.
However, I realize that the PHP filter function accepts input such as http://example.com/<script>alert('XSS');</script> which could be used for cross-site scripting. To counter that, I use htmlspecialchars on the URL within the <a> tag and rawurlencode on the href attribute of the tag.
But rawurlencode causes the / in the URL to be converted to %2f, which makes the URL unrecognizable. I am thinking of doing a preg_replace for all %2f back to /. Is this the way to sanitize the URL for display as a link?
This is outdated now :
I am using the PHP function filter_var($_POST['url'],
FILTER_VALIDATE_URL) to validate the URL before adding it to database
using prepared statement.
Instead of FILTER_VALIDATE_URL
you can use the following trick :
$url = "your URL"
$validation = "/^(http|https|ftp):\/\/([A-Z0-9][A-Z0-9_-]*(?:\.[A-Z0-9][A-Z0-9_-]*)+):?(\d+)?\/?/i";
if((bool)preg_match($validation, $url) === false)
echo 'Not a valid URL';
I think it may works for you. All the best :)
After sanitizing, the URL Use, XSS related scripts, just change %2f character using
str_replace('%2f', '/', $result) after your your code, but before the filter_var() and it will change it back to its original character. So, your script can go on.
Do not allow urls with tags.
A user inserting a tag to a url means its probably malicious.
Having "homepages" containing tags is just wrong.

Validate URL with or without protocol

Hi I would like to validate this following urls, so they all would pass with or without http/www part in them as long as there is TLD present like .com, .net, .org etc..
Valid URLs Should Be:
http://www.domain.com
http://domain.com
https://www.domain.com
https://domain.com
www.domain.com
domain.com
To support long tlds:
http://www.domain.com.uk
http://domain.com.uk
https://www.domain.com.uk
https://domain.com.uk
www.domain.com.uk
domain.com.uk
To support dashes (-):
http://www.domain-here.com
http://domain-here.com
https://www.domain-here.com
https://domain-here.com
www.domain-here.com
domain-here.com
Also to support numbers in domains:
http://www.domain1-test-here.com
http://domain1-test-here.com
https://www.domain1-test-here.com
https://domain1-test-here.com
www.domain1-test-here.com
domain-here.com
Also maybe allow even IPs:
127.127.127.127
(but this is extra!)
Also allow dashes (-), forgot to mantion that =)
I've found many functions that validate one or another but not both at same time.
If any one knows good regex for it, please share. Thank you for your help.
For url validation perfect solution.
Above Answer is right but not work on all domains like .me, .it, .in
so please user below for url match:
$pattern = '/(?:https?:\/\/)?(?:[a-zA-Z0-9.-]+?\.(?:[a-zA-Z])|\d+\.\d+\.\d+\.\d+)/';
if(preg_match($pattern, "http://website.in"))
{
echo "valid";
}else{
echo "invalid";
}
When you ignore the path part and look for the domain part only, a simple rule would be
(?:https?://)?(?:[a-zA-Z0-9.-]+?\.(?:com|net|org|gov|edu|mil)|\d+\.\d+\.\d+\.\d+)
If you want to support country TLDs as well you must either supply a complete (current) list or append |.. to the TLD part.
With preg_match you must wrap it between some delimiters
$pattern = ';(?:https?://)?(?:[a-zA-Z0-9.-]+?\.(?:com|net|org|gov|edu|mil)|\d+\.\d+\.\d+\.\d+);';
$index = preg_match($pattern, $url);
Usually, you use /. But in this case, slashes are part of the pattern, so I have chosen some other delimiter. Otherwise I must escape the slashes with \
$pattern = '/(?:https?:\/\/)?(?:[a-zA-Z0-9.-]+?\.(?:com|net|org|gov|edu|mil)|\d+\.\d+\.\d+\.\d+)/';
Don't use a regular expression. Not every problem that involves strings needs to use regexes.
Don't write your own URL validator. URL validation is a solved problem, and there is existing code that has already been written, debugged and testing. In fact, it comes standard with PHP.
Look at PHP's built-in filtering functionality: http://us2.php.net/manual/en/book.filter.php
I think you can use flags for filter_vars.
For FILTER_VALIDATE_URL there is several flags available:
FILTER_FLAG_SCHEME_REQUIRED Requires the URL to contain a scheme
part.
FILTER_FLAG_HOST_REQUIRED Requires the URL to contain a host
part.
FILTER_FLAG_PATH_REQUIRED Requires the URL to contain a path
part.
FILTER_FLAG_QUERY_REQUIRED Requires the URL to contain a query
string.
FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED used by default.
Lets say you want to check for path part and do not want to check for scheme part, you can do something like this (falg is a bitmask):
filter_var($url, FILTER_VALIDATE_URL, ~FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_PATH_REQUIRED)

PHP url recognition

How can you check if text typed in from a user is an url?
Lets say I want to check if the text is a url and then check if the string has "youtube.com" in it. Afterwards, I want to get the portion of the link which is of interest for me between the substrings "watch?v=" and any "&" parameters if they do exist.
parse_url() is probably a good choice here. If you get a bad URL, the function will return false. Otherwise, it will break the URL up into its component pieces and you can use the ones you need.
Example:
$urlParts = parse_url('http://www.youtube.com/watch?v=MX0D4oZwCsA');
if ($urlParts == false) echo "Bad URL";
else echo "Param string is ".$urlParts['query'];
Outputs:
Param string is v=MX0D4oZwCsA
You could split the query portion as needed using explode() for specific parameters.
Edit: Keep in mind that parse_url() tries as hard as possible to parse the string it is given, so bad URLs will often succeed, although the resulting data array will be very odd. It's obviously up to you how definitive you want your validation to be and what exactly you require out of your user input.
preg_match('#watch\?v=([^&]+)#', $url, $matches);
echo $matches[1];
Its strongly recommended not to use parse_url() for url validation.
here is a nice solution.

Categories