!preg_match - non US phone numbers

!preg_match - non US phone numbers - php

in the past I have only been required to use a preg_match for US only numbers such as:
elseif(!preg_match("/^[0-9]{3}-[0-9]{3}-[0-9]{4}$/", $telefon))
{
$error = "Your message has not been sent as you did not enter your telephone number, please try again.";
selected_values();
}
But I not need to expand to incorporate UK and German number.
These numbers are often formatted with [spaces] and (brackets).
I tried the following with litte success:
elseif(!preg_match("/^[0-9]$/", $telefon))
Can anyone help me to have a preg_match that incorporates many different variations of phone numbers?
THANKS

For UK numbers, there's a pretty good answer here. (not the accepted answer tho! as the comment says, that's not a good answer)
But if you're trying to accept international numbers in general, you need to be pretty open about what you accept -- In addition to brackets and spaces, you may also find people using plus signs, dots, hyphens, slashes, and more.
There is a recognised standard for international phone number formatting, which looks like this:
+44.1234567890
Where +44 is the international dialcode (UK in this case), followed by a dot, followed by the rest of the number (minus the leading zero, where applicable) without any other formatting.
Of course that doesn't help if you need to accept numbers being entered by users with whatever formatting they're using, but it might help if you consider this format as a target -- ie, accept whatever the user enters, and try to reformat it to this standard.
Once you've decided to do that, the process becomes simply a case of stripping off any formatting from the main number. You don't really need to worry about what formatting is supplied as long as the number of digits is correct. Then just prepend the country code and the dot, and you're done.

Related

Better to store phone in three fields or as one?

I am struggling with the decision to separate phone numbers stored in a MySQL database.
One school of thought is to break out the phone as:
area code (123)
prefix (123)
suffix (1234)
Another is to simply place the file in a single field with whatever formatting deemed appropriate:
123456789
(123) 123-4567
123-456-7890
My initial reason for thinking the first would be better is in terms of being able to quickly and easily gather statistical data based on the phone numbers collected from our members (X number of members have a 123 area code for example).
Is there really a 'right' way to do it? I do realize that paired with PHP I can retrieve and reformat any way I want but I'd like to use best practice.
Thanks for your advice
EDIT
I will only be storing North American phone numbers for the time being

I vote for one field, processing the data as you put it in so that it's in a known format. I've tried both ways, and the one-field approach seems to generate less code overall.

You want to store it in the most efficient way in the DB, precisely because it's so easy to reformat in PHP. Go for the all-numeric field, with no separators (1231231234) since that would be the best way. If you have international phone numbers, add the country code as well. Then in your code you can format it using regular expressions to look however you want it.

I would store phone numbers as strings, not numbers.
Phone numbers are identifiers that happen to use digits.
Phone numbers starting with zero are valid, but may be interpreted as octal by a programming language.
Strip the phone number to only digits and store the extension in a separate field.
This will allow for uniform formatting later.
For US, strip the prepending ’1′ digit (and determine formatting based on length of the string (10 digits for US)).

I'm in the process of building a callcenter application (it manages queues of contact information for a group of distributed callers to contact) and the architecture specified one field, no spaces, dashes, etc. After quite a bit of analysis, I agree it seems the best.
Based on the variability of entry for phone numbers (apostrophes, dots, dashes, and combinations of each) I built a simple function that deals with user entry, stripping down all but the numbers themselves, and also a "rebuilder" that reformats the raw number into something that's more visually appealing to the user.
Since they've been helpful to me, here's what I've written so far:
public static function cleanPhoneNumbers($input) {
return preg_replace("/[^0-9]/", "", $input);
}
public static function formatPhoneNumbers($phone_number) {
if(strlen($phone_number) == 7) {
return preg_replace("/([0-9]{3})([0-9]{4})/", "$1-$2", $phone_number);
} elseif(strlen($phone_number) == 10) {
return preg_replace("/([0-9]{3})([0-9]{3})([0-9]{4})/", "$1-$2-$3", $phone_number);
} else {
return $phone_number;
}
}
Some caveats: My app is not available for international customers right now (there's a voip application built into it that we don't want to allow to call outside of the US right now) so I've not taken the time to setup for international possibilities. Also, as this is in progress, I will likely return to refactor and bolster these functions later.
I've found one weakness so far that has been a bit of a pain for me. In my app, I have to disallow calls to be made by timezone based on the time of day (for instance, don't allow someone on the West Coast to be called at 6:00am when it's 9:00am in the East) To do that, I have to join a separate area code table to my table with the phone numbers by comparing 3 digit area codes to get the timezone. But I can't simply compare the zip code to my phone number field, because they'd never match. So, I have to deal with additional SQL to get just the first three digits of the number. Not a game-changer, but more work and confusion nonetheless.

Definitely store them in one field as a text string, and only store the numbers. Think of it this way; no matter what the numbers are, its all one telephone number. However, the segmenting of the numbers is dependent on a number of things (locality, how many numbers provided, even personal preference). Easier to store the one and change it later with text manipulation.

I think splitting the number in 3 fields is the best options if you want to use area codes as filters, otherwise, you should only use 1 field.
Remember to use ZEROFILL is you plan on storing them as numbers ;)

it really depends on a couple factors:
is it possible you will have international numbers?
how much area code/city code searching/manipulation will you be doing?
No matter what, I would only store numbers, it's easy enough to format either in MySQL or PHP and add parentheses and dashes.
Unless I was going to do a log of searching by area code, I would just put the entire phone number into a single field since I assume most of the time you would be retrieving the entire phone number anyway.
If it's possible that you will take international numbers in the future:
You might want to add a country field though, that way you won't have to guess what country they are from when dealing with the number.

What you use depends on how you plan to use the data, and where the program will be used.
If you want to efficiently search records by area code, then split out the area code; queries will perform much faster when they're doing simple string comparisons versus string manipulation of the full phone number to get the area code.
HOWEVER, be advised that phone numbers formatted XXX-XXX-XXXX are only found in the US, Canada, and other smaller Caribbean territories that are subject to the NANPA system. Various other world regions (EU, Africa, ASEAN) have very different numbering standards. In such cases, splitting out the equivalent of the "area code" may not make sense. Also, if all you want to do is display a phone number to the user, then just store it as a string.
Whether to store a number with a format or not is mostly personal preference. Storing the raw number allows the formatting to be changed easily; you could go from XXX-XXX-XXXX to (XXX) XXX-XXXX by changing a couple lines of code instead of reformatting the 10 million numbers you already have. Removing special characters from a phone number is also a relatively simple Regex. Storing without formatting will also save you a few bytes per number and allow you to use a fixed-length field (saving further data overhead inherent in varchars). This may be of use in a mobile app where storage is at a premium. However, that 5-terabyte distributed SQL cluster in your server room is probably not gonna notice much difference between a char(10) and a varchar(15). Storing them formatted also speeds up loading the data; you don't have to format it first, just yank it out of the DB and plaster it on the page.

extracting phone numbers

I'm trying to exract phone numbers from a set of data. It has to be able to extract international and local numbers from each country.
The rules I've laid out for it are:
1. Look for the international symbol, indicating it's an international dialing number with a valid extension(from +1 to +999).
2. If the plus symbol is present, make sure the next following character is a number.
3. If there is none, look at the length to validate it is between 7 and 10 digits long.
4. In the event that the number is divided (correctly via international standers) by either a hyphen(-) or space make sure the amount of digits in between them are either 3 or 4
What I've got so var is:
\+(?=[1-999])(\d{4}[0-9][-\s]\d{3}[0-9][-\s]\d{4}[0-9])|(\d{7,11}[0-9])
That's for international, and the local search is\d{7,10}
The thing is, that it doesn't actually pick up numbers with spaces or hyphens in it.
Can anybody give me some advice on it?

\d already means "digit", so you shouldn't put another [0-9] after it (which means the same).
In the same vein, [1-999] doesn't mean what you think it does. It in fact matches one (1) digit between 1 and 9. You probably want \d{1,3} although that would also match 0.
Then, you're only allowing one variation of dividing blocks (4-3-4) - why? This is not going to match many, many valid phone numbers.
I would suggest the following:
Search your string using the regex \+?(?=\d)[\d\s-]{7,13}\b to grab anything that remotely looks like a phone number. Perhaps you also want to include parentheses and slashes in the allowed character list: \+?(?=\d)[\d\s/()-]{7,14}\b
Then process and validate those strings separately, best after removing all punctuation/whitespace (except the +).

I'm not sure it will be possible to create a regex to match every country - some countries have conflicting rules.
it's entirely possible to have e.g. two valid local numbers contained within 1 valid international number.
You might want to start by looking at some of the answers to this question:
A comprehensive regex for phone number validation
If you're looking to create something definitive for every country, good luck, and you'll probably need to spend a while with some technical standards...
i.e. both 177 and 186-0039-011-81-90-1177-1177 are valid phone numbers in the same country

how to detect telephone numbers in a text (and replace them)?

I know it can be done for bad words (checking an array of preset words) but how to detect telephone numbers in a long text?
I'm building a website in PHP for a client who needs to avoid people using the description field to put their mobile phone numbers..(see craigslist etc..)
beside he's going to need some moderation but i was wondering if there is a way to block at least the obvious like nnn-nnn-nnnn, not asking to block other weird way of writing like HeiGHT*/four*/nine etc...

Welcome to the world of regular expressions. You're basically going to want to use preg_replace to look for (some pattern) and replace with a string.
Here's something to start you off:
$text = preg_replace('/\+?[0-9][0-9()\-\s+]{4,20}[0-9]/', '[blocked]', $text);
this looks for:
a plus symbol (optional), followed by a number, followed by between 4-20 numbers, brackets, dashes or spaces, followed by a number
and replaces with the string [blocked].
This catches all the obvious combinations I can think of:
012345 123123
+44 1234 123123
+44(0)123 123123
0123456789
Placename 123456 (although this one will leave 'Placename')
however it will also strip out any succession of 6+ numbers, which might not be desirable!

To do so you must use regular expressions as you may know.
I found this pattern that could be useful for your project:
<?php
preg_match("/(^(([\+]\d{1,3})?[ \.-]?[\(]?\d{3}[\)]?)?[ \.-]?\d{3}[ \.-]?\d{4}$)/", $yourText, $matches);
//matches variable will contain the array of matched strings
?>
More information about this pattern can be found here http://gskinner.com/RegExr/?2rirv where you can even test it online. It's a great tool to test regular expressions.

preg_match($pattern, $subject) will return 1 (true) if pattern is found in subject, and 0 (false) otherwise.
A pattern to match the example you give might be '/\d{3}-\d{3}\d{4}/'
However whatever you choose for your pattern will suffer from both false positives and false negatives.
You might also consider looking for words like mob, cell or tel next to the number.
The fill details of the php pattern matching can be found at http://www.php.net/manual/en/reference.pcre.pattern.syntax.php
Ian
p.s. It can't be done for bad words, as the people in Scunthorpe will tell you.

I think that use a too tight regular espression would lead to loose a great number of detections.
You should check for portions of 10 consecutive chatacters containing more than 5 digits.
So it is similar you will have an analisys routine queued to be called after any message insertion due to the computational weight.
After the 6 or more digits have been isolated replace them as you prefer, including other syblings digits.
Better in any case to preserve original data, so you can try and train your detection algorithm until it works the best way.
Then you can also study your user data to create more complex euristics, such like case insensitive numbers written as letters, mixed, dot separated, etc...
It's not about write the most perfect regex, is about approaching the problem statistically and dinamically.
And remember, after you take action, user will change their insertion habits as consequence, so stats will change and you will need to learn and update your euristics.

Quickly checking for telephone number format in input text using PHP?

What's the most efficient way to quickly check for telephone number format from a form input text?

easiest way is to strip out everything that is not a number and then count how many digits there are. That way you don't have to force a certain format on the user. As far as how many numbers you should check for...depends on if you are checking for just local numbers, US numbers, international numbers, etc...but for instance:
$number="(123) 123-1234";
if (strlen(preg_replace('~[^0-9]~','',$number)) == 10) {
// 10 digit number (US area code plus phone number)
}

That really depends on the location. There are different conventions from place-to-place as to the length of the area code and the length of the entire number. Being more permissive would be better.
A regular expression to ensure that it is an optional '+', followed by any number of digits would be the bare minimum. Allowing optional dashes ("-") or spaces separating groups of numbers could be a nice addition. Also, what about extension numbers?
In truth, it's probably best to just check that it includes some numbers.

If you are only dealing with U.S. phone numbers, you might follow the common UI design of using three textboxes = one for area code + second one for exchange + third one for number. That way, you can test that each one contains only digits, and that the correct number of digits is entered.
You can also use the common techniques of testing each keypress for numbers (ignoring keypresses that are not digits), and advancing to the next textbox after the required number of characters have been entered.
That second feature makes it easier for users to do the data entry.
Using separate textboxes also makes it a little easier for users to read their input, easier than, say, reading that they have entered ten digits in a row correctly.
It also avoids you having to deal with people who use different punctuation in their entries -- surrounding the area code with parentheses, using hyphens or dots between sections.
Edit: If you decide to stick with just one textbox, there are some excellent approaches to using regex in this SO question.

If you are only dealing with U.S. phone numbers, you might follow the common UI design of using three textboxes = one for area code + second one for exchange + third one for number. That way, you can test that each one contains only digits, and that the correct number of digits is entered.
If you are dealing with numbers worldwide, this breaks down because some countries don't use area codes at all and number lengths vary from country to country.
Subscriber numbers may be as short as 3 digits or as long as 11 or 12 digits. Area codes can range from 1 to 6 digits where used. The data will also need to be stored with the country code in order to have any chance of correctly formatting it for display.

PHP: How to validate a phone number if well formed?

Using PHP, how can I verify if a phone # is well formed?
It seems easiest to simply strip all non-numeric data, leaving only the numbers. Then to check if 10 digits exist.
Is this the best and easiest way?

The best? No. Issues I see with this approach:
Some area codes - like 000-###-#### - are not valid. See http://en.wikipedia.org/wiki/List_of_NANP_area_codes
Some exchanges - like ###-555-#### - are not valid. See http://en.wikipedia.org/wiki/555_%28telephone_number%29
Some people will enter a 1 before their number, i.e. 1-###-###-####.
Some people are only reachable at an extension, like ###-###-#### x####.
Some companies tack on extra digits, like 1-800-GO-FLOWERS. The additional digits are simply ignored by the phone system, but a user might expect to be able to enter the whole thing.
International phone numbers are not necessarily 10 digits, even if you discount the country codes.
Good enough? Quite possibly, but that's up to you and your app.

You can use a regex for it:
$pattern_phone = "|^[0-9\+][0-9\s+\-]*$|i";
if(!preg_match($pattern_phone,$phone)){
//Somethings wrong
}
Haven't tested the regex, so it may not be 100% correct.

Checking for 10 digits after stripping will check the syntax but won't check the validity. For that you'd need to determine what valid numbers are available in the region/country and probably write a regex to match the patterns.

The problem with validating/filtering data like this usually comes down the the answer to this question: "How strict do I want to be?" which then devolves into a series of "feature" questions
Are you going to accept international numbers?
Are you going to accept extensions?
Are you going to allow various formats i.e., (111) 222-3333 vs 111.222.3333
Depending on your business rules, the answers to these questions can vary. But to be the most flexible, I recommend 3 fields to take a phone number
Country Code (optional)
Phone Number
Extension (optional)
All 3 fields can be programmatically limited/filters for numeric values only. You can then combine them before storing into some parse-able format, or store each value individually.

Answering if something is "the best" thing to do, is nearly impossible (unless you're the one answering your own question).
The way you propose it, stripping all non-digits and then check if there are 10 digits, might result in unwanted behaviour for a string like:
George Washington (February 22, 1732 –
December 14, '99) was the commander
of the Continental Army in the
American Revolutionary War and served
as the first President of the United
States of America.
since stripping all non-digits will result in the string 2217321499 which is 10 fdigits long, but I highly doubt that the entire string should be considered as a valid phone number.

What format you need? You can use regular expressions to this.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.