extracting phone numbers - php

I'm trying to exract phone numbers from a set of data. It has to be able to extract international and local numbers from each country.
The rules I've laid out for it are:
1. Look for the international symbol, indicating it's an international dialing number with a valid extension(from +1 to +999).
2. If the plus symbol is present, make sure the next following character is a number.
3. If there is none, look at the length to validate it is between 7 and 10 digits long.
4. In the event that the number is divided (correctly via international standers) by either a hyphen(-) or space make sure the amount of digits in between them are either 3 or 4
What I've got so var is:
\+(?=[1-999])(\d{4}[0-9][-\s]\d{3}[0-9][-\s]\d{4}[0-9])|(\d{7,11}[0-9])
That's for international, and the local search is\d{7,10}
The thing is, that it doesn't actually pick up numbers with spaces or hyphens in it.
Can anybody give me some advice on it?

\d already means "digit", so you shouldn't put another [0-9] after it (which means the same).
In the same vein, [1-999] doesn't mean what you think it does. It in fact matches one (1) digit between 1 and 9. You probably want \d{1,3} although that would also match 0.
Then, you're only allowing one variation of dividing blocks (4-3-4) - why? This is not going to match many, many valid phone numbers.
I would suggest the following:
Search your string using the regex \+?(?=\d)[\d\s-]{7,13}\b to grab anything that remotely looks like a phone number. Perhaps you also want to include parentheses and slashes in the allowed character list: \+?(?=\d)[\d\s/()-]{7,14}\b
Then process and validate those strings separately, best after removing all punctuation/whitespace (except the +).

I'm not sure it will be possible to create a regex to match every country - some countries have conflicting rules.
it's entirely possible to have e.g. two valid local numbers contained within 1 valid international number.
You might want to start by looking at some of the answers to this question:
A comprehensive regex for phone number validation
If you're looking to create something definitive for every country, good luck, and you'll probably need to spend a while with some technical standards...
i.e. both 177 and 186-0039-011-81-90-1177-1177 are valid phone numbers in the same country

Related

!preg_match - non US phone numbers

in the past I have only been required to use a preg_match for US only numbers such as:
elseif(!preg_match("/^[0-9]{3}-[0-9]{3}-[0-9]{4}$/", $telefon))
{
$error = "Your message has not been sent as you did not enter your telephone number, please try again.";
selected_values();
}
But I not need to expand to incorporate UK and German number.
These numbers are often formatted with [spaces] and (brackets).
I tried the following with litte success:
elseif(!preg_match("/^[0-9]$/", $telefon))
Can anyone help me to have a preg_match that incorporates many different variations of phone numbers?
THANKS
For UK numbers, there's a pretty good answer here. (not the accepted answer tho! as the comment says, that's not a good answer)
But if you're trying to accept international numbers in general, you need to be pretty open about what you accept -- In addition to brackets and spaces, you may also find people using plus signs, dots, hyphens, slashes, and more.
There is a recognised standard for international phone number formatting, which looks like this:
+44.1234567890
Where +44 is the international dialcode (UK in this case), followed by a dot, followed by the rest of the number (minus the leading zero, where applicable) without any other formatting.
Of course that doesn't help if you need to accept numbers being entered by users with whatever formatting they're using, but it might help if you consider this format as a target -- ie, accept whatever the user enters, and try to reformat it to this standard.
Once you've decided to do that, the process becomes simply a case of stripping off any formatting from the main number. You don't really need to worry about what formatting is supplied as long as the number of digits is correct. Then just prepend the country code and the dot, and you're done.

PHP: Validate string containing numbers, separated by hyphen (possible by preg_match)?

I’m trying to validate a string which contains numbers where each four numbers are separated by a hyphen, for example 1111-2222-3333-4444
I’m trying to do some kind of validating so I can guarantee that this format is being used (with 16 digits, three hyphens and nothing else). I’ve this preg_match where it checks for digits only but I need to accept hyphens and this format.
preg_match('/^[0-9]{1,}$/', $validatenumbers)
I’ve tried to do it with regex but unfortunately it isn’t my strongest side so I haven’t been able to correctly validate the numbers.
It is important that it is in PHP and not Javascript because of the ability to “turn off” javascript in a browser.
preg_match("/^([0-9]{4}-){3}[0-9]{4}$/", $input);
([0-9]{4}-){3} Matches exactly 3 groups of 4 digits followed by a hyphen. That is terminated by another group [0-9]{4} (4 digits without a hyphen).
preg_match('/^[0-9]{4}\-[0-9]{4}\-[0-9]{4}\-[0-9]{4}$/',$numbers);
i think that should work.
This looks like a credit card number. If that's the case, you should use a Luhn checksum instead of a simple regex.
try:
if(preg_match('#^\d{4}-\d{4}-\d{4}-\d{4}$#',$string){}
If you require to match that exact format the pattern would be '~^\d{4}-\d{4}-\d{4}-\d{4}$~', or you can write it more generally like this: '/^(\d+-)*\d+$/' (this would match 11, 11-11111... and so on),

Format telephone number

I have to format a telephone number list, and I'd wish to extract and separate the prefix from the number for better viewing.
I have a list of all possible prefixes, but there is no regular pattern.
I mean, I could have these numbers:
00 - 12345 (short prefix)
0000 - 12345 (long prefix)
How can I manage that? Numbers are plain, without any special char (ie without / \ - . , ecc ecc).
Prefixes are like that:
030
031
0321
0322
...
...
Most of the time I have the town of the customer (it's not required) so, usually i can get the prefix from there, but that's not a sure thing, since town and telephone couldn't be linked.
== EDIT ==
Prefix list is 231 entries long. Maybe I'll add somthing more, so take 300 as safe value
Moreover, prefixes come from a single country only (Italy)
I have to save plain numbers without any separator so users can search for it. Infact if they put separators they will never able to find again that.
More info
Prefix ALWAYS starts with a leading 0, its lenght ranges from 2-4
But the more i study this thing, the more i think i can't work it out :(
Because of the extremely varied telephone number formats used around the world, it's probably going to be tough to correctly parse any phone number that is put into your system.
I'm not certain if it would make your ask any easier, but I had the idea that parsing from Right-to-Left might be easier for you, since it's the Prefix length that's unknown
What a pain. I would use a logic funnel to narrow possible choices and finally take a best guess.
First, test if the first few numbers can match anything on your prefix list. For some, hopefully only one prefix can possibly be correct.
Then, perhaps you could use the city to eliminate prefixes from entirely different countries.
Finally, you could default to the most popular format for prefixes.
Without any other information, you can't do better than a good guess unless you want to default to no format at all.
I'm really confused. What do you mean, "extract and separate"? My guess is these phone numbers are in a MySQL database, and you get to use PHP. Are you trying to get the prefix from the numbers, and then insert them into a different field in the same row? Are you pulling these numbers from the database, and you would just like to print the prefixes to the screen?
Regardless of what you're trying to do, and taking for granted that you're using PHP and regexs, isn't this essentially what you're looking for?:
$telephone_number = '333-12345';
$matched = array();
preg_match('~^(\d+)-~', $telephone_number, $matched);
$matched[1] // Should be '333'
ok, I worked it out.
I saw that there aren't shor prefixes that share chars with longer one.
I mean:
02 -> there will never be a longer prefix as 021, 022 and so on
so things are pretty easy now:
I get first 4 numbers -> is that in my prefix table?
YES: stop here
NO: get first 3
and so on..
thanks for your help

Quickly checking for telephone number format in input text using PHP?

What's the most efficient way to quickly check for telephone number format from a form input text?
easiest way is to strip out everything that is not a number and then count how many digits there are. That way you don't have to force a certain format on the user. As far as how many numbers you should check for...depends on if you are checking for just local numbers, US numbers, international numbers, etc...but for instance:
$number="(123) 123-1234";
if (strlen(preg_replace('~[^0-9]~','',$number)) == 10) {
// 10 digit number (US area code plus phone number)
}
That really depends on the location. There are different conventions from place-to-place as to the length of the area code and the length of the entire number. Being more permissive would be better.
A regular expression to ensure that it is an optional '+', followed by any number of digits would be the bare minimum. Allowing optional dashes ("-") or spaces separating groups of numbers could be a nice addition. Also, what about extension numbers?
In truth, it's probably best to just check that it includes some numbers.
If you are only dealing with U.S. phone numbers, you might follow the common UI design of using three textboxes = one for area code + second one for exchange + third one for number. That way, you can test that each one contains only digits, and that the correct number of digits is entered.
You can also use the common techniques of testing each keypress for numbers (ignoring keypresses that are not digits), and advancing to the next textbox after the required number of characters have been entered.
That second feature makes it easier for users to do the data entry.
Using separate textboxes also makes it a little easier for users to read their input, easier than, say, reading that they have entered ten digits in a row correctly.
It also avoids you having to deal with people who use different punctuation in their entries -- surrounding the area code with parentheses, using hyphens or dots between sections.
Edit: If you decide to stick with just one textbox, there are some excellent approaches to using regex in this SO question.
If you are only dealing with U.S. phone numbers, you might follow the common UI design of using three textboxes = one for area code + second one for exchange + third one for number. That way, you can test that each one contains only digits, and that the correct number of digits is entered.
If you are dealing with numbers worldwide, this breaks down because some countries don't use area codes at all and number lengths vary from country to country.
Subscriber numbers may be as short as 3 digits or as long as 11 or 12 digits. Area codes can range from 1 to 6 digits where used. The data will also need to be stored with the country code in order to have any chance of correctly formatting it for display.

PHP: How to validate a phone number if well formed?

Using PHP, how can I verify if a phone # is well formed?
It seems easiest to simply strip all non-numeric data, leaving only the numbers. Then to check if 10 digits exist.
Is this the best and easiest way?
The best? No. Issues I see with this approach:
Some area codes - like 000-###-#### - are not valid. See http://en.wikipedia.org/wiki/List_of_NANP_area_codes
Some exchanges - like ###-555-#### - are not valid. See http://en.wikipedia.org/wiki/555_%28telephone_number%29
Some people will enter a 1 before their number, i.e. 1-###-###-####.
Some people are only reachable at an extension, like ###-###-#### x####.
Some companies tack on extra digits, like 1-800-GO-FLOWERS. The additional digits are simply ignored by the phone system, but a user might expect to be able to enter the whole thing.
International phone numbers are not necessarily 10 digits, even if you discount the country codes.
Good enough? Quite possibly, but that's up to you and your app.
You can use a regex for it:
$pattern_phone = "|^[0-9\+][0-9\s+\-]*$|i";
if(!preg_match($pattern_phone,$phone)){
//Somethings wrong
}
Haven't tested the regex, so it may not be 100% correct.
Checking for 10 digits after stripping will check the syntax but won't check the validity. For that you'd need to determine what valid numbers are available in the region/country and probably write a regex to match the patterns.
The problem with validating/filtering data like this usually comes down the the answer to this question: "How strict do I want to be?" which then devolves into a series of "feature" questions
Are you going to accept international numbers?
Are you going to accept extensions?
Are you going to allow various formats i.e., (111) 222-3333 vs 111.222.3333
Depending on your business rules, the answers to these questions can vary. But to be the most flexible, I recommend 3 fields to take a phone number
Country Code (optional)
Phone Number
Extension (optional)
All 3 fields can be programmatically limited/filters for numeric values only. You can then combine them before storing into some parse-able format, or store each value individually.
Answering if something is "the best" thing to do, is nearly impossible (unless you're the one answering your own question).
The way you propose it, stripping all non-digits and then check if there are 10 digits, might result in unwanted behaviour for a string like:
George Washington (February 22, 1732 –
December 14, '99) was the commander
of the Continental Army in the
American Revolutionary War and served
as the first President of the United
States of America.
since stripping all non-digits will result in the string 2217321499 which is 10 fdigits long, but I highly doubt that the entire string should be considered as a valid phone number.
What format you need? You can use regular expressions to this.

Categories