managing the phone number validation exploits using regex - php

I have written regex to validate the US and UK phone numbers. It is working fine but not for all case.
like it should not filter legitimate numbers like : 12345678 or 123456789, 1989 etc. Probably I need to validate each area code of US UK for first three digits. Am I right?
Here is the list of all UK area code: http://www.area-codes.org.uk/ big list. Do I need to include all of them in regex?
Issue: it should also filter exploits like this : 203453seven67
how it could be done?
Here is the example : http://ideone.com/zwzmKU
REgex:
$pattern = '((^\(?(?:(?:0(?:0|11)\)?[\s-]?\(?|\+)44\)?[\s-]?\(?(?:0\)?[\s-]?\(?)?|0)(?:\d{2}\)?[\s-]?\d{4}[\s-]?\d{4}|\d{3}\)?[\s-]?\d{3}[\s-]?\d{3,4}|\d{4}\)?[\s-]?(?:\d{5}|\d{3}[\s-]?\d{3})|\d{5}\)?[\s-]?\d{4,5}|8(?:00[\s-]?11[\s-]?11|45[\s-]?46[\s-]?4\d))(?:(?:[\s-]?(?:x|ext\.?\s?|\#)\d+)?)$)|(\(?[2-9][0-8][0-9]\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}))';

For making sure the phone number is correct, avoid using regex and use some standard library which can help you with the phone number validations.
I suggest https://code.google.com/p/libphonenumber/

Related

!preg_match - non US phone numbers

in the past I have only been required to use a preg_match for US only numbers such as:
elseif(!preg_match("/^[0-9]{3}-[0-9]{3}-[0-9]{4}$/", $telefon))
{
$error = "Your message has not been sent as you did not enter your telephone number, please try again.";
selected_values();
}
But I not need to expand to incorporate UK and German number.
These numbers are often formatted with [spaces] and (brackets).
I tried the following with litte success:
elseif(!preg_match("/^[0-9]$/", $telefon))
Can anyone help me to have a preg_match that incorporates many different variations of phone numbers?
THANKS
For UK numbers, there's a pretty good answer here. (not the accepted answer tho! as the comment says, that's not a good answer)
But if you're trying to accept international numbers in general, you need to be pretty open about what you accept -- In addition to brackets and spaces, you may also find people using plus signs, dots, hyphens, slashes, and more.
There is a recognised standard for international phone number formatting, which looks like this:
+44.1234567890
Where +44 is the international dialcode (UK in this case), followed by a dot, followed by the rest of the number (minus the leading zero, where applicable) without any other formatting.
Of course that doesn't help if you need to accept numbers being entered by users with whatever formatting they're using, but it might help if you consider this format as a target -- ie, accept whatever the user enters, and try to reformat it to this standard.
Once you've decided to do that, the process becomes simply a case of stripping off any formatting from the main number. You don't really need to worry about what formatting is supplied as long as the number of digits is correct. Then just prepend the country code and the dot, and you're done.

Regular expression for street address

I am trying to match street addresses containing the street and number.
I need the expression to match words for the street name, followed by the number.
For example I want to match "somestreet 25", "some other street 23","a-third street 190", but not "a_fourth street 67".
I am trying with it for an hour but I am not even close to good with regex's.
So far all I've got is /^[a-zA-Z]+([\s][a-zA-Z]+)([\s][0-9]){1,4}$/ but needless to say, it is not working.
--- EDIT ---
I understand that there is no standard, global way of writing the street address, and that regular expressions can't really be complicated enough to cover the problem on a global scope, but the site is for a local restaurant, and all I want is the address to look like it could be an address (even then, without map and telephone verification it could still be a fake one).
There will, however, be human verification at all times before anything is sent, and also it is a rather small neighborhood, so both the delivery person and the restaurant owner know if the order is fake or not.
All I want is to keep them from getting spammed with silly !##$ characters in the address, and have a decent readable address formatting for them to work with.
This should work on your examples:
/^[a-zA-Z]([a-zA-Z-]+\s)+\d{1,4}$/
You've overcomplicated it a little bit. This is a case-insensitive expression that looks for letters with hyphens and spaces, followed by numbers, matching your stated criteria.
/^([a-z- ]+)\s+([0-9]+)$/i
But what about me? I live on 30th Ave.
By the way, I used [0-9]+ for one or more numbers at the end, instead of your {1,4} range. If you must not have more than 4, then switch it back to your range {1,4}.
This will do
/^([A-Z][-A-Z ]+)\s+(\d+)$/i
I think street names have no regular formation. So Regular Expression is not applicable for this

Identifying international phone numbers in string in PHP

I am trying to write a function that will pull valid phone numbers from a string that are valid somewhere on the planet. This is for a truly international site for an organization that has locations all over the globe and users in each location accessing it.
I mainly need this for a database migration. The previous sites that I am migrating from only used a simple text field with not instructions and no filtering. So this results in the phone fields being used in all sorts of creative ways.
What I am looking for it just to identify the first phone number in the string, then possibly remove any excessive characters before setting the result as user profile information.
There's a PHP port available of Google's Phone Number Library.
you could use something like this:
$pattern = '/([\+_\-\(\)a-z ]+)/';
or
$pattern = '/([^0-9]+)/i';
$phone = preg_replace($pattern,'', $phone);
or, use a php filter like:
$phone = (int) filter_var($phone, FILTER_SANITIZE_NUMBER_INT);
although with the filter you would need to be careful if you were allowing the value to start with "0".
then, either way, check a range of lengths for allowed phone numbers ~6-12 or whatever your range covers.
First, you will need to compile a list of valid phone number formats.
Second, you will need to create regular expressions to identify each format.
Third, you will run the regexes against your text to locate the numbers.

how to detect telephone numbers in a text (and replace them)?

I know it can be done for bad words (checking an array of preset words) but how to detect telephone numbers in a long text?
I'm building a website in PHP for a client who needs to avoid people using the description field to put their mobile phone numbers..(see craigslist etc..)
beside he's going to need some moderation but i was wondering if there is a way to block at least the obvious like nnn-nnn-nnnn, not asking to block other weird way of writing like HeiGHT*/four*/nine etc...
Welcome to the world of regular expressions. You're basically going to want to use preg_replace to look for (some pattern) and replace with a string.
Here's something to start you off:
$text = preg_replace('/\+?[0-9][0-9()\-\s+]{4,20}[0-9]/', '[blocked]', $text);
this looks for:
a plus symbol (optional), followed by a number, followed by between 4-20 numbers, brackets, dashes or spaces, followed by a number
and replaces with the string [blocked].
This catches all the obvious combinations I can think of:
012345 123123
+44 1234 123123
+44(0)123 123123
0123456789
Placename 123456 (although this one will leave 'Placename')
however it will also strip out any succession of 6+ numbers, which might not be desirable!
To do so you must use regular expressions as you may know.
I found this pattern that could be useful for your project:
<?php
preg_match("/(^(([\+]\d{1,3})?[ \.-]?[\(]?\d{3}[\)]?)?[ \.-]?\d{3}[ \.-]?\d{4}$)/", $yourText, $matches);
//matches variable will contain the array of matched strings
?>
More information about this pattern can be found here http://gskinner.com/RegExr/?2rirv where you can even test it online. It's a great tool to test regular expressions.
preg_match($pattern, $subject) will return 1 (true) if pattern is found in subject, and 0 (false) otherwise.
A pattern to match the example you give might be '/\d{3}-\d{3}\d{4}/'
However whatever you choose for your pattern will suffer from both false positives and false negatives.
You might also consider looking for words like mob, cell or tel next to the number.
The fill details of the php pattern matching can be found at http://www.php.net/manual/en/reference.pcre.pattern.syntax.php
Ian
p.s. It can't be done for bad words, as the people in Scunthorpe will tell you.
I think that use a too tight regular espression would lead to loose a great number of detections.
You should check for portions of 10 consecutive chatacters containing more than 5 digits.
So it is similar you will have an analisys routine queued to be called after any message insertion due to the computational weight.
After the 6 or more digits have been isolated replace them as you prefer, including other syblings digits.
Better in any case to preserve original data, so you can try and train your detection algorithm until it works the best way.
Then you can also study your user data to create more complex euristics, such like case insensitive numbers written as letters, mixed, dot separated, etc...
It's not about write the most perfect regex, is about approaching the problem statistically and dinamically.
And remember, after you take action, user will change their insertion habits as consequence, so stats will change and you will need to learn and update your euristics.

PHP: How to validate a phone number if well formed?

Using PHP, how can I verify if a phone # is well formed?
It seems easiest to simply strip all non-numeric data, leaving only the numbers. Then to check if 10 digits exist.
Is this the best and easiest way?
The best? No. Issues I see with this approach:
Some area codes - like 000-###-#### - are not valid. See http://en.wikipedia.org/wiki/List_of_NANP_area_codes
Some exchanges - like ###-555-#### - are not valid. See http://en.wikipedia.org/wiki/555_%28telephone_number%29
Some people will enter a 1 before their number, i.e. 1-###-###-####.
Some people are only reachable at an extension, like ###-###-#### x####.
Some companies tack on extra digits, like 1-800-GO-FLOWERS. The additional digits are simply ignored by the phone system, but a user might expect to be able to enter the whole thing.
International phone numbers are not necessarily 10 digits, even if you discount the country codes.
Good enough? Quite possibly, but that's up to you and your app.
You can use a regex for it:
$pattern_phone = "|^[0-9\+][0-9\s+\-]*$|i";
if(!preg_match($pattern_phone,$phone)){
//Somethings wrong
}
Haven't tested the regex, so it may not be 100% correct.
Checking for 10 digits after stripping will check the syntax but won't check the validity. For that you'd need to determine what valid numbers are available in the region/country and probably write a regex to match the patterns.
The problem with validating/filtering data like this usually comes down the the answer to this question: "How strict do I want to be?" which then devolves into a series of "feature" questions
Are you going to accept international numbers?
Are you going to accept extensions?
Are you going to allow various formats i.e., (111) 222-3333 vs 111.222.3333
Depending on your business rules, the answers to these questions can vary. But to be the most flexible, I recommend 3 fields to take a phone number
Country Code (optional)
Phone Number
Extension (optional)
All 3 fields can be programmatically limited/filters for numeric values only. You can then combine them before storing into some parse-able format, or store each value individually.
Answering if something is "the best" thing to do, is nearly impossible (unless you're the one answering your own question).
The way you propose it, stripping all non-digits and then check if there are 10 digits, might result in unwanted behaviour for a string like:
George Washington (February 22, 1732 –
December 14, '99) was the commander
of the Continental Army in the
American Revolutionary War and served
as the first President of the United
States of America.
since stripping all non-digits will result in the string 2217321499 which is 10 fdigits long, but I highly doubt that the entire string should be considered as a valid phone number.
What format you need? You can use regular expressions to this.

Categories