Find first position where pattern matching failed.

Find first position where pattern matching failed. - php

i am trying to find the common errors users have while entering email ids. I can always validate EMAIL using PHP Email Filter
$email = "someone#exa mple.com";
if(!filter_var($email, FILTER_VALIDATE_EMAIL))
{
echo "E-mail is not valid";
}
else
{
echo "E-mail is valid";
}
or pattern matching
$email = test_input($_POST["email"]);
if (!preg_match("/([\w\-]+\#[\w\-]+\.[\w\-]+)/",$email))
{
$emailErr = "Invalid email format";
}
I agree that these are not full proof ways to validate emails. However they should capture 80% of cases.
What I want is - Which position email became invalid? if its a space, at what position user had entered space. or did it fail because of "." in the end?
Any pointers?
-Ajay
PS : I have seen other thread regarding email validations. I can add complexity and make it 100%. concern here is to capture the most common mistakes made by people when entering Email ID.

This is difficult because sometimes it's not always a single character that makes an email address invalid. The example you give could easily be solved by:
$position = strpos('someone#exa mple.com', ' ');
However, it seems you are not interested in an all encompassing solution but rather something that will catch the majority of character based errors. I would take the approach of using the regular expression but capture each section of the email address in a sub pattern for further validation. For example:
$matches = null;
$result = preg_match("/(([\w\-]+)\#([\w\-]+)\.([\w\-]+))/", $email, $matches);
var_dump($matches);
By capturing sections of the regex validation in sub patterns you could then dive further into each section and run similar or different tests to determine where the user went wrong. For example you could try and match up the TLD of the email address against a whitelist. Of course there are also much more robust email validators in frameworks like Zend or Symfony that will tell you more specifically WHY an email address is not valid, but in terms of knowing which specific character position is at fault (assuming it's a character that is at fault) I think a combination of tactics would work best.

There is no way I know of in Java to report back the point at which a regex failed. What you could do is start building a set of common errors (as described by Manu) that you can check for (this might or might not use regex expressions). Then categorize into these known errors and 'other', counting the frequency of each. When an 'other' error occurs, develop a regex that would catch it.
If you want some assistance with tracking down why the regex failed you could use a utility such as regexbuddy, shown in this answer.

Just implement some checks on your own:
Point at the end:
if(substr($email, -1) == '.')
echo "Please remove the point at the end of you email";
Spaces found:
$spacePos = strpos($email, ' ');
if(spacePos !== false)
echo "Please remove the space at pos: ".$spacePos;
And so on...

First of all, I would like to say that the reason your example fails is not the space. It is the lack of '.' in former part and lack of '#' in the latter part.
If you input
'someone#example.co m' or 's omeone#example.com', it will success.
So you may need 'begin with' and 'end with' pattern to check strictly.
There is no exist method to check where a regular expression match fails as I know since check only gives the matches, but if you really want to find it out , we can do something by 'break down' the regular expression.
Let's take a look at your example check.
preg_match ("/^[\w\-]+\#[\w\-]+\.[\w\-]+$/",'someone#example.com.');
If it fails, you can check where its 'sub expression' successes and find out where the problem is:
$email = "someone#example.com.";
if(!preg_match ("/^[\w\-]+\#[\w\-]+\.[\w\-]+$/",$email)){ // fails because the final '.'
if(preg_match("/^[\w\-]+\#[\w\-]+\./",$email,$matches)){ // successes
$un_match = "[\w\-]+"; // What is taken from the tail of the regular expression.
foreach ($matches as $match){
$email_tail = str_replace($match,'',$email); // The email without the matching part. in this case : 'com.'
if(preg_match('/^'.$un_match.'/',$email_tail,$match_tails)){ // Check and delete the part that tail match the sub expression. In this example, 'com' matches /[\w\-]+/ but '.' doesn't.
$result = str_replace($match_tails[0],'',$email_tail);
}else{
$result = $email_tail;
}
}
}
}
var_dump($result); // you will get the last '.'
IF you understand the upper example, then we can make our solution more common, for instance, something like below:
$email = 'som eone#example.com.';
$pattern_chips = array(
'/^[\w\-]+\#[\w\-]+\./' => '[\w\-]+',
'/^[\w\-]+\#[\w\-]+/' => '\.',
'/^[\w\-]+\#/' => '[\w\-]+',
'/^[\w\-]+/' => '\#',
);
if(!preg_match ("/^[\w\-]+\#[\w\-]+\.[\w\-]+$/",$email)){
$result = $email;
foreach ($pattern_chips as $pattern => $un_match){
if(preg_match($pattern,$email,$matches)){
$email_tail = str_replace($matches[0],'',$email);
if(preg_match('/^'.$un_match.'/',$email_tail,$match_tails)){
$result = str_replace($match_tails[0],'',$email_tail);
}else{
$result = $email_tail;
}
break;
}
}
if(empty($result)){
echo "There has to be something more follows {$email}";
}else{
var_dump($result);
}
}else{
echo "success";
}
and you will get output:
string ' eone#example.com.' (length=18)

Related

Use of Preg_match to Determine Mobile Number or Email

I'm asking if there are better ways of determining what string has been inputted, either a phone number or an email, here are my already working code
public function InviteFriend($invitation)
{
// Initialize Connection
$conn = $this->conn;
// Check what type of Invitation it is
if (preg_match_all('~\b\d[- /\d]*\d\b~', $invitation, $res) > 0) {
$type = 'phone';
} else if (preg_match_all('/^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,})$/i', $invitation, $res) > 0) {
$type = 'email';
}
echo $type;
}
But my concern is if a user typed both phone and email in the same string, which of the if statement would be picked and which would be ignored? and is my way of determining which type of string proper or is there a more efficient way?
Thanks

There are two anchors almost available in all regex flavors which you have used in your second regex for validating an email address, shown as ^ and $ and meant as beginning and end of input string respectively.
You should use them for first validation as well. Your phone number validation lacks a good validation since it validates an arbitrary sequence of strings like 1------- --------5 that doesn't look like a phone number and much more things since it doesn't match against whole string (missing both mentioned anchors). So I used \d{10} to indicate a 10-digit phone number that you may want to change it to meet your own requirements, this time more precisely.
You don't really want that kind of email validation either. Something more simpler is better:
public function InviteFriend($invitation)
{
if (preg_match('~^\d{10}$~', $invitation)) {
$type = 'phone';
} else if (preg_match('~^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,})$~i', $invitation)) {
$type = 'email';
}
echo $type ?? 'Error';
}

PHP Regular find url and email

I need to find "http" and "https" and "email" inside text. I have tried:
$regex = "((https|ftp)\:\/\/)"; // http and https
$regex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?#)"; // email
if(preg_match("/^$regex$/", $comment))
{
$r = 'find';
} else {
$r = 'not find';
}
with the following text:
$comment = 'hello https:// and hello email mail#mail.com'
But it doesn't work. Probably because of wrong split.
I tried with filter like this:
if (filter_var($comment, FILTER_VALIDATE_EMAIL)) {
return new \Exception('exist url');
}
but when I have only email in comment filter, it works and finds this email, if I have 'hello mail#mail.com', it does not find the email.
For filter FILTER_VALIDATE_URL - it won't find the url if it is like this 'hello erl https:\hello'
How to write the right regex to find url or email in some text?

In this line
if(preg_match("/^$regex$/", $comment))
you have ^ then $regex and $. The ^ means to only match at the beginning of the line. The $ means to only match at the end. Therefor, this would only match if that line contained only the match and nothing else.
Remove it to get matches anywhere in a line.
if(preg_match("/$regex/", $comment))
However, I would suggest to simply look for a library that offers the matching you are looking for. A library will have much more testing done to cover edge cases and other things that are easily missed.

PHP: How to validate UK landline and mobile numbers?

How can I validate UK telephone numbers? I copied the answer from this site, but this answer only accept mobile number. I want to accept both landline and mobile number. Is it possible?
# #reference: http://stackoverflow.com/questions/8099177/validating-uk-phone-numbers-in-php
$telephone = "01752311149"; // not ok.
$telephone = "07742055388"; // ok.
$pattern = "/^(\+44\s?7\d{3}|\(?07\d{3}\)?)\s?\d{3}\s?\d{3}$/";
if (!preg_match($pattern, $telephone))
{
$error = true;
$message.='<error elementid="telephone" message="invalid" />';
}
I have tried with this regex below but it doesn't work at all,
#http://stackoverflow.com/questions/14512810/regular-expression-mobile-and-landline-number
$pattern = "/^\(0\d{1,2}\)\d{3}-\d{4}$/";

There's a selection of regular expressions for validating phone numbers at Regular Expressions for Validating and Formatting GB Telephone Numbers:
Alternatively, there's one at RegExLib.com that seems to work well:
^((\(44\))( )?|(\(\+44\))( )?|(\+44)( )?|(44)( )?)?((0)|(\(0\)))?( )?(((1[0-9]{3})|(7[1-9]{1}[0-9]{2})|(20)( )?[7-8]{1})( )?([0-9]{3}[ -]?[0-9]{3})|(2[0-9]{2}( )?[0-9]{3}[ -]?[0-9]{4}))$
Edit:
This will allow mobile, landline, and special service numbers (999, 123, etc.) -- assumes that spaces have been stripped:
'/^(?>(?>\+44|0)(?>(?!7624)(?>[12389]\d|5[56]|7[06])\d{8}|(?>(?>[58]00|1\d{2})\d{6})|(?>8001111|845464\d)|7(?>[45789]\d{8}|624\d{6}))|999|112|100|101|111|116|123|155|118\d{3}|(?>\+44|0)(?>800111|8454647))$/D'

You'd be better off using an existing library, rather than trying your own validation, as rules are somewhat complex and likely to change with time.
For example, this PHP version of Google's libphonenumber:
$phoneUtil = PhoneNumberUtil::getInstance();
try {
$numberProto = $phoneUtil->parse('02012345678', 'GB');
} catch (NumberParseException $e) {
echo $e;
}
Then you can check the validity of the number with:
$phoneUtil->isValidNumber($numberProto);
Furthermore, this library allows you to detect the phone number type (fixed line, mobile, voip, etc.)

PHP: need explanation using [a-zA-Z0-9]

I am new to PHP (not programming overall), and having problems with this simple line of code. I want to check whether some input field has been filled as anysymbolornumber#anysymbolornumber just for checking whether correct email was typed. I don't get any error, but the whole check system doesn't work. Here is my code and thanks!
if ($email = "[a-zA-Z0-9]#[a-zA-Z0-9]")
{

Since your new to php , i suggest you should buy a book or read an tutorial or two.
For email validation you should use filter_var an build in function that comes with with php 5.2 and up :
<?php
if(!filter_var("someone#example....com", FILTER_VALIDATE_EMAIL)){
echo("E-mail is not valid");
}else{
echo("E-mail is valid");
}
?>

you can use other functions .. instead of regular expressions
if(filter_var($email,FILTER_VALIDATE_EMAIL)){
echo "Valid email";
}else{
echo "Not a valid email";
}

As correctly pointed out in the comments, the regex you are using isn't actually a very good way of validating the email. There are much better ways, but if you are just wanting to get a look at how regular expressions work, it is a starting point. I am not an expert in regex, but this will at least get your if statement working :)
if(preg_match("[a-zA-Z0-9]#[a-zA-Z0-9]",$email)
{
// Your stuff
}

It looks like you're trying to verify that an email address matches a certain pattern. But you're not using the proper function. You probably want something like preg_match( $pattern, $target ).
Also, your regex isn't doing what you would want anyway. In particular, you need some quantifiers, or else your email addresses will only be able to consist of one character ahead of the #, and one after. And you need anchors at the beginning and end of the sequence so that you're matching against the entire address, not just the two characters closest to the #.
Consider this:
if( preg_match("^[a-zA-Z0-9._-]+#[a-zA-Z0-9._-]+$", $email ) ) {
// Whatever
}
Keep in mind, however, that this is really a poor-man's approach to validating an email address. Email addresses can contain a lot more characters than those listed in the character class I provided. Furthermore, it would also be possible to construct an invalid email address with those same character classes. It doesn't even begin to deal with Unicode. Using a regex to validate an email address is quite difficult. Friedl takes a shot at it in Mastering Regular Expressions (O'Reilly), and his effort takes a 2KB regular expression pattern. At best, this is only a basic sanity check. It's not a secure means of verifying an email address. At worst, it literally misses valid regexes, and still matches invalid ones.
There is the mailparse_rfc822_parse_addresses function which is more reliable in detecting and matching email addresses.

You need to use preg_match to run the regular expression.
Now you're setting the $email = to the regular expression.
It could look like:
if ( preg_match("[a-zA-Z0-9]#[a-zA-Z0-9]", $email ))
Also keep in mind when matching in an if you must use the == operator.
I believe best pratice would be to use a filter_var instead like:
if( ! filter_var( $email , FILTER_VALIDATE_EMAIL )) {
// Failed.
}

Another way taken from: http://www.linuxjournal.com/article/9585
function check_email_address($email) {
// First, we check that there's one # symbol,
// and that the lengths are right.
if (!ereg("^[^#]{1,64}#[^#]{1,255}$", $email)) {
// Email invalid because wrong number of characters
// in one section or wrong number of # symbols.
return false;
}
// Split it into sections to make life easier
$email_array = explode("#", $email);
$local_array = explode(".", $email_array[0]);
for ($i = 0; $i < sizeof($local_array); $i++) {
if
(!ereg("^(([A-Za-z0-9!#$%&'*+/=?^_`{|}~-][A-Za-z0-9!#$%&
↪'*+/=?^_`{|}~\.-]{0,63})|(\"[^(\\|\")]{0,62}\"))$",
$local_array[$i])) {
return false;
}
}
// Check if domain is IP. If not,
// it should be valid domain name
if (!ereg("^\[?[0-9\.]+\]?$", $email_array[1])) {
$domain_array = explode(".", $email_array[1]);
if (sizeof($domain_array) < 2) {
return false; // Not enough parts to domain
}
for ($i = 0; $i < sizeof($domain_array); $i++) {
if
(!ereg("^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])|
↪([A-Za-z0-9]+))$",
$domain_array[$i])) {
return false;
}
}
}
return true;
}

Only execute script if entered email is from a specific domain

I am trying to create a script that will only execute its actions if the email address the user enters is from a specific domain. I created a regex that seems to work when testing it via regex utility, but when its used in my PHP script, it tells me that valid emails are invalid. In this case, I want any email that is from #secondgearsoftware.com, #secondgearllc.com or asia.secondgearsoftware.com to echo success and all others to be rejected.
$pattern = '/\b[A-Z0-9\._%+-]+#((secondgearsoftware|secondgearllc|euro\.secondgearsoftware|asia\.secondgearsoftware)+\.)+com/';
$email = urldecode($_POST['email']);
if (preg_match($pattern, $email))
{
echo 'success';
}
else
{
echo 'opposite success';
}
I am not really sure what's futzed with the pattern. Any help would be appreciated.

Your regular expression is a bit off (it will allow foo#secondgearsoftwaresecondgearsoftware.com) and can be simplified:
$pattern = '/#((euro\.|asia\.)?secondgearsoftware|secondgearllc)\.com$/i';
I've made it case-insensitive and anchored it to the end of the string.
There doesn't seem to be a need to check what's before the "#" - you should have a proper validation routine for that if necessary, but it seems you just want to check if the email address belongs to one of these domains.

You probably need to use /\b[A-Z0-9\._%+-]+#((euro\.|asia\.)secondgearsoftware|secondgearllc)\.com/i (note the i at the end) in order to make the regex case-insensitive. I also dropped the +s as they allow for infinite repetition which doesn't make sense in this case.

Here's an easy to maintain solution using regular expressions
$domains = array(
'secondgearsoftware',
'secondgearllc',
'euro\.secondgearsoftware',
'asia\.secondgearsoftware'
);
preg_match("`#(" .implode("|", $domains). ")\.com$`i", $userProvidedEmail);
Here's a couple of tests:
$tests = array(
'bob#secondgearsoftware.com',
'bob#secondgearllc.com',
'bob#Xsecondgearllc.com',
'bob#secondgearllc.net',
'bob#euro.secondgearsoftware.org',
'bob#euro.secondgearsoftware.com',
'bob#euroxsecondgearsoftware.com',
'bob#asia.secondgearsoftware.com'
);
foreach ( $tests as $test ) {
echo preg_match("`#(" .implode("|", $domains). ")\.com$`i", $test),
" <- $test\n";
}
Result (1 is passing of course)
1 <- bob#secondgearsoftware.com
1 <- bob#secondgearllc.com
0 <- bob#Xsecondgearllc.com
0 <- bob#secondgearllc.net
0 <- bob#euro.secondgearsoftware.org
1 <- bob#euro.secondgearsoftware.com
0 <- bob#euroxsecondgearsoftware.com
1 <- bob#asia.secondgearsoftware.com

I suggest you drop the regex and simply use stristr to check if it matches. Something like this should work:
<?php
// Fill out as needed
$domains = array('secondgearsoftware.com', 'secondgearllc.com');
$email = urldecode($_POST['email']);
$found = false;
for(i=0;i<count($domains);i++)
{
if ($domains[i] == stristr($email, $domains[i]))
$found = true;
}
if ($found) ...
?>
The function stristr returns the e-mail address from the part where it found a match to the end, which should be the same as the match in this case. Technically there could be something prior to the domains (fkdskjfsdksfks.secondgeartsoftware.com), but you can just insert "#domainneeded.com" to prevent this. This code is also slightly longer, but easily extended with new domains without worrying about regex.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Find first position where pattern matching failed. - php

Just implement some checks on your own: Point at the end: if(substr($email, -1) == '.') echo "Please remove the point at the end of you email"; Spaces found: $spacePos = strpos($email, ' '); if(spacePos !== false) echo "Please remove the space at pos: ".$spacePos; And so on...

Related

Use of Preg_match to Determine Mobile Number or Email

PHP Regular find url and email

PHP: How to validate UK landline and mobile numbers?

PHP: need explanation using [a-zA-Z0-9]

Only execute script if entered email is from a specific domain

Categories

Resources