I need to filter some email address based on the domain name :
Basically if the domain name is yahoo-inc.com, facebook.com, baboo.com .. (and a few others) the function should do something and if the domain is different it should do something else .
The only way I know to do this is to use a pattern/regex with preg_match_all and create cases/conditions for each balcklisted domain (e.g if domain = yahoo-inc) do this elseif (domain == facebook.com ) do this ... etc but I need to know if there is a more simple/concis way to include all the domains that I want to filter in a single variable/array and then apply only 2 conditions (e.g if email is in black list {do something } else {do something else}
Extract the domain portion (i.e. everything after the last '#'), down case it, and then use in_array to check whether it's in your blacklist:
$blacklist = array('yahoo-inc.com', 'facebook.com', ...);
if (in_array($domain, $blacklist)) {
// bad domain
} else {
// good domain
}
Adding on to #Alnitak here is the full code to do what you need
$domain = explode("#", $emailAddress);
$domain = $domain[(count($domain)-1)];
$blacklist = array('yahoo-inc.com', 'facebook.com', ...);
if (in_array($domain, $blacklist)) {
// bad domain
} else {
// good domain
}
Well here's a very simple way to do this, a VALID email address should only ever contain a single # symbol, so aslong as it validation you can just explode the string by # and collect the second segment.
Example:
if (filter_var($user_email, FILTER_VALIDATE_EMAIL))
{
//Valid Email:
$parts = explode("#",$user_email);
/*
* You may want to use in_array if you already have a compiled array
* The switch statement is mainly used to show visually the check.
*/
switch(strtolower($parts[1]))
{
case 'facebook.com':
case 'gmail.com':
case 'googlemail.com':
//Do Something
break;
default:
//Do something else
break;
}
}
Related
Hey I am trying to display a different phone number for visitors my website from my Google adwords campaign.
The code below works without the else statement (so if I click through to the page from Google it will display a message, and if I visit the site regularly it does not). When I added the else statement it outputs both numbers. Thank you
<?php
// The domain list.
$domains = Array('googleadservices.com', 'google.com');
$url_info = parse_url($_SERVER['HTTP_REFERER']);
if (isset($url_info['host'])) {
foreach($domains as $domain) {
if (substr($url_info['host'], -strlen($domain)) == $domain) {
// GOOGLE NUMBER HERE
echo ('1234');
}
// REGULAR NUMBER HERE
else {
echo ('12345');
}
}
}
?>
Your logic is slightly skewed; you're checking to see if the URL from parse_url matches the domains in your array; but you're running through the whole array each time. So you get both a match and a non-match, because google.com matches one entry but not the other.
I'd suggest making your domains array into an associative array:
$domains = Array('googleadservices.com' => '1234',
'google.com' => '12345' );
Then you just need to check once:
if (isset($url_info['host'])) {
if (isset($domains[$url_info['host']])) {
echo $domains[$url_info['host']];
}
}
I've not tested this, but it should be enough for you to see the logic.
(I've also removed the substr check - you may need to put that back in, to ensure that you're getting the exact string that you need to look for)
I dont wan't reinvent wheel, but i couldnt find any library that would do this perfectly.
In my script users can save URLs, i want when they give me list like:
google.com
www.msn.com
http://bing.com/
and so on...
I want to be able to save in database in "correct format".
Thing i do is I check is it there protocol, and if it's not present i add it and then validate URL against RegExp.
For PHP parse_url any URL that contains protocol is valid, so it didnt help a lot.
How guys you are doing this, do you have some idea you would like to share with me?
Edit:
I want to filter out invalid URLs from user input (list of URLs). And more important, to try auto correct URLs that are invalid (ex. doesn't contains protocol). Ones user enter list, it should be validated immediately (no time to open URLs to check those they really exist).
It would be great to extract parts from URL, like parse_url do, but problem with parse_url is, it doesn't work well with invalid URLs. I tried to parse URL with it, and for parts that are missing (and are required) to add default ones (ex. no protocol, add http). But parse_url for "google.com" wont return "google.com" as hostname but as path.
This looks like really common problem to me, but i could not find available solution on internet (found some libraries that will standardize URL, but they wont fix URL if it is invalid).
Is there some "smart" solution to this, or I should stick with my current:
Find first occurrence of :// and validate if it's text before is valid protocol, and add protocol if missing
Found next occurrence of / and validate is hostname is in valid format
For good measure validate once more via RegExp whole URL
I just have feeling I will reject some valid URLs with this, and for me is better to have false positive, that false negative.
I had the same problem with parse_url as OP, this is my quick and dirty solution to auto-correct urls(keep in mind that the code in no way are perfect or cover all cases):
Results:
http:/wwww.example.com/lorum.html => http://www.example.com/lorum.html
gopher:/ww.example.com => gopher://www.example.com
http:/www3.example.com/?q=asd&f=#asd =>http://www3.example.com/?q=asd&f=#asd
asd://.example.com/folder/folder/ =>http://example.com/folder/folder/
.example.com/ => http://example.com/
example.com =>http://example.com
subdomain.example.com => http://subdomain.example.com
function url_parser($url) {
// multiple /// messes up parse_url, replace 2+ with 2
$url = preg_replace('/(\/{2,})/','//',$url);
$parse_url = parse_url($url);
if(empty($parse_url["scheme"])) {
$parse_url["scheme"] = "http";
}
if(empty($parse_url["host"]) && !empty($parse_url["path"])) {
// Strip slash from the beginning of path
$parse_url["host"] = ltrim($parse_url["path"], '\/');
$parse_url["path"] = "";
}
$return_url = "";
// Check if scheme is correct
if(!in_array($parse_url["scheme"], array("http", "https", "gopher"))) {
$return_url .= 'http'.'://';
} else {
$return_url .= $parse_url["scheme"].'://';
}
// Check if the right amount of "www" is set.
$explode_host = explode(".", $parse_url["host"]);
// Remove empty entries
$explode_host = array_filter($explode_host);
// And reassign indexes
$explode_host = array_values($explode_host);
// Contains subdomain
if(count($explode_host) > 2) {
// Check if subdomain only contains the letter w(then not any other subdomain).
if(substr_count($explode_host[0], 'w') == strlen($explode_host[0])) {
// Replace with "www" to avoid "ww" or "wwww", etc.
$explode_host[0] = "www";
}
}
$return_url .= implode(".",$explode_host);
if(!empty($parse_url["port"])) {
$return_url .= ":".$parse_url["port"];
}
if(!empty($parse_url["path"])) {
$return_url .= $parse_url["path"];
}
if(!empty($parse_url["query"])) {
$return_url .= '?'.$parse_url["query"];
}
if(!empty($parse_url["fragment"])) {
$return_url .= '#'.$parse_url["fragment"];
}
return $return_url;
}
echo url_parser('http:/wwww.example.com/lorum.html'); // http://www.example.com/lorum.html
echo url_parser('gopher:/ww.example.com'); // gopher://www.example.com
echo url_parser('http:/www3.example.com/?q=asd&f=#asd'); // http://www3.example.com/?q=asd&f=#asd
echo url_parser('asd://.example.com/folder/folder/'); // http://example.com/folder/folder/
echo url_parser('.example.com/'); // http://example.com/
echo url_parser('example.com'); // http://example.com
echo url_parser('subdomain.example.com'); // http://subdomain.example.com
It's not 100% foolproof, but a 1 liner.
$URL = (((strpos($URL,'https://') === false) && (strpos($URL,'http://') === false))?'http://':'' ).$URL;
EDIT
There was apparently a problem with my initial version if the hostname contain http.
Thanks Trent
I am writing some code for my website and I need to find out a way to detect a user's input and determine whether or not it is a domain or an IP. If it is a domain it will then parse the domain from e.g. http://www.google.com to google.com, then resolve it and echo it back into the user input field. If it is a IP then it will skip all of these steps and remain in the IP user input field.
This is what I have so far:
<?php
if( $_POST['submit'] )
{
$ip_address=$_POST["host"];
if (ctype_alpha($ip_address))
{
// get host name from URL
preg_match('#^(?:http://)?([^/]+)#i', $ip_address, $matches);
$host = $matches[1];
// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
$new_ip_address = $matches;
//resolve parsed ip
$presolved = gethostbyname($new_ip_address);
}
else
{
echo "Do stuff.\n";
}
}
?>
Have you considered looking at it from the other direction? Detecting an IP address literal is much easier than parsing and validating a DNS name.
Parse the URL with parse_url
Split $url['host'] by '.'. If the result is an array of four integer elements, then it is probably an IP address.
Bonus points for checking that each integer is in the appropriate range (e.g, 1-255 for the first octet and 0-255 for the remaining octets).
In any case, use parse_url instead of trying to crack this with the regex hammer.
Try using http://php.net/filter_var
var_dump(filter_var('http://example.com', FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED));
To check if the input was an ip, you can use php's ip2long (http://php.net/manual/en/function.ip2long.php)
See the example below, which will return an IP or false.
<?php
function checkOrMakeIp($address) {
$long = ip2long($address);
if ($long != -1 || $long !== FALSE) {
// It is a IP, just return the address
return $address;
} elseif($ip = gethostbynamel($address)) {
// It is a hostname, but now we have the ip, return it
return $ip[0];
}
return false;
}
?>
Notice the gethostbynamel, it will return FALSE if the hostname could not be resolved; otherwise an array. (http://www.php.net/manual/en/function.gethostbynamel.php)
I am trying to validate if a domain does have GET parameters with preg_match and and a REGEX, which i require it to have for my purposes.
What I have got working is validating a domain without GET parameters like so:
if (preg_match("/^[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}$/", 'domain.com')) {
echo 'true';
} else {
echo 'false';
}
I get true for this test.
So far so good. What I am having trouble with is adding in the GET parameters, Amongst a number of REGEX's I have tried with still no luck is the following:
if (preg_match("/^[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}([/?].*)?$/", 'domain.com?test=test')) {
echo 'true';
} else {
echo 'false';
}
Here i get false returned and hence am not able to validate a domain with GET parameters which are required.
Any assistance will be much appreciated ^^
Regards
This code is not tested, but I think it should work:
$pattern = "([a-z0-9-.]*)\.([a-z]{2,3})"; //Host
$pattern .= "(\?[a-z+&\$_.-][a-z0-9;:#&%=+\/\$_.-]*)?"; //Get requests
if (preg_match($pattern, 'domain.com?test=test')) {
echo 'true';
} else {
echo 'false';
}
What is the advantage of using a REGEX?
Why not just
<?php
$xGETS = count($_GET);
if(!$xGETS)
{
echo 'false';
} else {
echo 'true';
}
// PHP 5.2+
$xGETS = filter_var('http://domain.com?test=test', FILTER_VALIDATE_URL, FILTER_FLAG_QUERY_REQUIRED);
if(!$xGETS)
{
echo 'false';
} else {
echo 'true';
}
Your first regular expression will reject some valid domain names (e.g. from the museum and travel TLDs and domain names that include upper case letters) and will recognize some invalid domain names (e.g. where a label or the whole domain name is too long).
If this is fine with you, you might just as well search for the first question mark and treat the prefix as domain name and the suffix as "GET parameters" (actually called query string).
If this is not fine with you, a simple regular expression will not suffice to validate domain names, because of the length constraints of domain names and labels.
I am trying to make a user submit link box. I've been trying all day and can't seem to get it working.
The goal is to make all of these into example.com... (ie. remove all stuff before the top level domain)
Input is $url =
Their are 4 types of url:
www.example.com...
example.com...
http://www.example.com...
http://example.com...
Everything I make works on 1 or 2 types, but not all 4.
How one can do this?
You can use parse_url for that. For example:
function parse($url) {
$parts = parse_url($url);
if ($parts === false) {
return false;
}
return isset($parts['scheme'])
? $parts['host']
: substr($parts['path'], 0, strcspn($parts['path'], '/'));
}
This will leave the "www." part if it already exists, but it's trivial to cut that out with e.g. str_replace. If the url you give it is seriously malformed, it will return false.
Update (an improved solution):
I realized that the above would not work correctly if you try to trick it hard enough. So instead of whipping myself trying to compensate if it does not have a scheme, I realized that this would be better:
function parse($url) {
$parts = parse_url($url);
if ($parts === false) {
return false;
}
if (!isset($parts['scheme'])) {
$parts = parse_url('http://'.$url);
}
if ($parts === false) {
return false;
}
return $parts['host'];
}
Your input can be
www.example.com
example.com
http://www.example.com
http://example.com
$url_arr = parse_url($url);
echo $url_arr['host'];
output is example.com
there's a few steps you can take to get a clean url.
Firstly you need to make sure there is a protocol to make parse_url work correctly so you can do:
//Make sure it has a protocol
if(substr($url,0,7) != 'http://' || substr($url,0,8) != 'https://')
{
$url = 'http://' . $url;
}
Now we run it through parse_url()
$segments = parse_url($url);
But this is where it get's complicated because the way domain names are constructed is that you can have 1,2,3,4,5,6 .. .domain levels, meaning that you cannot detect the domain name from all urls, you have to have a pre compiled list of tld's to check the last portion of the domain, so you then can extract that leaving the website's domain.
There is a list available here : http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
But you would be better of parsing this list into mysql and then select the row where the tld matches the left side of the domain string.
Then you order by length and limit to 1, if this is found then you can do something like:
$db_found_tld = 'co.uk';
$domain = 'a.b.c.domain.co.uk';
$domain_name = substr($domain,0 - strlen($db_found_tld));
This would leave a.b.c.domain, so you have removed the tld, now the domain name would be extracted like so:
$parts = explode($domain_name);
$base_domain = $parts[count($parts) - 1];
now you have domain.
this seems very lengthy but I hope now you know that its not easy to get just the domain name without tld or sub domains.