this is my first question. And btw I am unconfy with RegExes.
I was thinking about a PHP function that validates domains or URLs, given by user input. (Sub)Domains shall be collected via html input field.
So I have to deal with different formats like http(s)://domain.tld and domain.tld both with the possibility of including a path or being invalid.
The function should rather correct almost correct user input instead of returning false.
In the end, I want to return the format (sub.)domain.tld, but only for real existing domains.
My WIP-solution is the following. What do you think about it?
function valDomain($url,$prefix=""){
$url = trim($url);
$url = str_replace(" ", "", $url);
$url = trim($url,'.');
$url = trim($url,'?');
$url = trim($url,'-');
$url = trim($url,'/');
$url = strtolower($url);
$url = substr($url,0,100);
if(strpos($url,'.') == false) {
return false;
}
if(strpos($url,'http') !== false) {
$x = parse_url($url);
if(isset($x['host'])){
$url = $x['host'];
}
}
if(strpos($url,'/') !== false) {
$x = explode("/", $url);
if(isset($x[0])){
$url = $x[0];
}
}
if(checkdnsrr($url,"A")){
return $prefix.$url;
} else {
return false;
}
}
For explanation: It tidies up the user input, checks if it can be a url/domain at all, takes the host if it's a proper url, deletes the path, and then, when it only should be the raw url, check if there is a dns entry corresponding to it. Only if yes, it returns the validated domain. Other it returns false.
Does this make sense?
(The $prefix argument can optionally be used to add a http:// to the url in order to render a hyperlink).
Retrieved results will be stored in database, so they need to be hack-safe.
Related
I wonder what would be the best way in php to check if provided url is valid... At first I tried with:
filter_var($url, FILTER_VALIDATE_URL) === false
But it does not accept www.example.com (without protocol). So I tried with a simple modification:
protected function checkReferrerUrl($url) {
if(strpos($url, '://') == false) {
$url = "http://".$url;
}
if(filter_var($url, FILTER_VALIDATE_URL) === false) {
return false;
}
return true;
}
Now it works fine with www.example.com but also accepts simple foo as it converts to http://foo. However though this is not a valid public url I think... so what would you suggest? Go back to traditional regexp?
I recommend, that you do not use filter_var with type URL.
There are much more side-effects.
For example, these are valid URLs according to filter_var:
http://example.com/"><script>alert(document.cookie)</script>
http://example.ee/sdsf"f
Additionally FILTER_VALIDATE_URL does not support internationalized domain names (IDN).
I recommend using a regex combined with some ifs afterwards (f.e. for the domain) for security reasons.
Without the security aspect I am using parse_url to take my parts. But this function has a similar issue, when the scheme (no http/https) is missing.
Use this
<?php
$url = 'www.example.com';
if(validateURL($url)){
echo "Valid";
}else{
echo "invalid";
}
function validateURL($URL) {
$pattern_1 = "/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
$pattern_2 = "/^(www)((\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
if(preg_match($pattern_1, $URL) || preg_match($pattern_2, $URL)){
return true;
} else{
return false;
}
}
?>
Try this one too
<?php
// Assign URL to $URL variable
$url = 'http://example.com';
// Check url using preg_match
if (preg_match("/^(https?:\/\/+[\w\-]+\.[\w\-]+)/i",$url)){
echo "Valid";
}else{
echo "invalid";
}
?>
I have social bookmarking website and in this website users can submit link from others website (using booklet or bookmark button in bookmark bar, or by adding URLs in direct method).
The users have problem with some URLs when they add links with bookmark button in their browsers. The problem occurs with URLs that contain "&" character. Most of the users who work with Safari on Mac or Windows can not add such link with bookmark button.
Issue is that all URLs with "&" end up with $isLink = preg_match($pattern, $url); // Returns false (see the code below).
I removed part of my code (see comments in the snippet), and that fixed the problem.
But I do not want to remove this code. How can I fix the problem without removing it?
$url = htmlspecialchars(sanitize($_POST['url'], 3));
$url = str_replace('&', '&', $url);
$url = html_entity_decode($url);
if (strpos($url,'http')!==0) {
$url = "http://$url";
}
// check if URL is valid format
$pattern = '/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?#)?([\d\w]([-\d\w]{0,253}[\d\w])?\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.,\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.,\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.,\/\d\w]|%[a-fA-f\d]{2,2})*)?$/';
// vvv I REMOVED FROM HERE vvv
$isLink = preg_match($pattern, $url); // Returns true if a link
// ^^^ UNTIL HERE ^^^
if($url == "http://" || $url == "") {
if(Submit_Require_A_URL == false) {
$linkres->valid = true;
} else {
$linkres->valid = false;
}
$linkres->url_title = "";
} elseif ($isLink == false) {
$linkres->valid = false;
}
Website bookmark button code is:
javascript:q=(document.location.href);void(open('http://website.com/submit.php?url='+escape(q),'_self','resizable,location,menubar,toolbar,scrollbars,status'));
Why are you not using the PHP function "filter_var()" to check the url:
$url = $_POST['url'];
$isLink = filter_var($url, FILTER_VALIDATE_URL);
I am trying to get URL path and to save it as variable...
$setURL = true;
$getDomain = "http://".$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
$getSubdomain = "http://".$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
if ($setURL === true) {
$result = 'http://'.parse_url($getDomain, PHP_URL_HOST) . '/';
echo 'get domain';
} else {
$result = 'http://'.parse_url($getSubdomain, PHP_URL_HOST).parse_url($getSubdomain, PHP_URL_PATH);
echo 'get subdomain';
}
$siteURL = $result;
So basically if I defined variable $setURL = true; it will return correct URL for simple domain ... http://domain-name.com
However else does not work as I want to... else is there for subdomains. So if I set $setURL = false; it should return following... http://domain-name.com/path/
But unfortunately it return more then that... It returns anything I type as URL...
http://domain-name.com/path/something/index.php it will return all of that as URL!
Please help me to fix this as I don't have any ideas how I could manage to make it.
Formally, a subdomain precedes a domain name. For example, in ftp.debian.us, ftp is the subdomain.
It sounds like what you want is the first path in the URI. You can use PHP's explode() method to grab the first segment in the path.
$uriparts = explode($_SERVER['REQUEST_URI']) // = '/path/to/somewhere/index.html'
$path = $uriparts[1] // = 'path'
I'm trying to compare two urls using PHP, ensuring that the domain name is the same. It cannot be the sub-domain. It has to literally be the same domain. Example:
http://www.google.co.uk would validate as true compared to http://www.google.co.uk/pages.html.
but
http://www.google.co.uk would validate as false compared to http://www.something.co.uk/pages.html.
Use parse_url(), and compare the "host" index in the array returned from the two calls to parse_url().
Use parse_url()
$url1 = parse_url("http://www.google.co.uk");
$url2 = parse_url("http://www.google.co.uk/pages.html");
if ($url1['host'] == $url2['host']){
//matches
}
simple, use parse_url()
$url1 = parse_url('http://www.google.co.uk');
$url2 = parse_url('http://www.google.co.uk/pages.html');
if($url1['host'] == $url2['host']){
// same domain
}
You could use parse_url for this
$url1 = parse_url('http://www.google.com/page1.html');
$domain1 = $url1['host'];
$url2 = parse_url('http://www.google.com/page2.html');
$domain2 = $url2['host'];
if($domain1 == $domain2){
// something
}
Expanding the answer given by Ariel, the code you could use is similar to the following one:
<?php
compare_host('http://www.google.co.uk', 'http://www.something.co.uk/pages.html');
function compare_host($url1, $url2)
{
// PHP prior of 5.3.3 emits a warning if the URL parsing failed.
$info = #parse_url($url1);
if (empty($info)) {
return FALSE;
}
$host1 = $info['host'];
$info = #parse_url($url2);
if (empty($info)) {
return FALSE;
}
return (strtolower($host1) === strtolower($info['host']));
}
I need a certain part of a URL extracted.
Example:
http://www.domain.com/blog/entry-title/?standalone=1 is the given URL.
blog/entry-title should be extracted.
However, the extraction should also work with http://www.domain.com/index.php/blog/[…]as the given URL.
This code is for a Content Management System.
What I've already come up with is this:
function getPathUrl() {
$folder = explode('/', $_SERVER['SCRIPT_NAME']);
$script_filename = pathinfo($_SERVER['SCRIPT_NAME']); // supposed to be 'index.php'
$request = explode('/', $_SERVER['REQUEST_URI']);
// first element is always ""
array_shift($folder);
array_shift($request);
// now it's only the request url. filtered out containing folders and 'index.php'.
$final_request = array_diff($request, array_intersect($folder, $request));
// the indexes are mangled up in a strange way. turn 'em back
$final_request = array_values($final_request);
// remove empty elements in array (caused by superfluent slashes, for instance)
array_clean($final_request);
// make a string out of the array
$final_request = implode('/', $final_request);
if ($_SERVER['QUERY_STRING'] || substr($final_request, -1) == '?') {
$final_request = substr($final_request, 0, - strlen($_SERVER['QUERY_STRING']) - 1);
}
return $final_request;
}
However, this code does not take care of the arguments at the end of the URL (like ?standalone=1). It works for anchors (#read-more), though.
Thanks a ton guys and have fun twisting your brains. Maybe we can do this shit with a regular expression.
There's many examples and info for what you want at:
http://php.net/manual/en/function.parse-url.php
That should do what you need:
<?php
function getPath($url)
{
$path = parse_url($url,PHP_URL_PATH);
$lastSlash = strrpos($path,"/");
return substr($path,1,$lastSlash-1);
}
echo getPath("http://www.domain.com/blog/entry-title/?standalone=1");
?>