Build phone number regex filter - php

I need to make phone number filter using reg ex.
here is my code
<?php
$LIST =['1234123', '0121234123', '123-1234', '1234-123','0123-123-1234',
'123 4123', '012a1234123', '123x1234', '12341-23', '012-3123-1234'];
$regex = '/(\d{3}-\d{4}$)|(\d{4}-\d{3}$)|(\d{3,4}-\d{3}-\d{4}$)|(\d{3,4}-\d{4}-\d{3}$)|(^[0-9]{7,10}$)/';
foreach ($LIST as $key => $value)
{
echo $value.">>".(preg_match($regex, $value)).'<br/>';
}
?>
and here is the results
1234123>>1
0121234123>>1
123-1234>>1
1234-123>>1
0123-123-1234>>1
123 4123>>0
012a1234123>>0
123x1234>>0
12341-23>>0
012-3123-1234>>1
What makes me confused is the last one.
I want to make the last want will give 0 result while maintain the rest result.

Working with phone numbers is not that easy. The phone numbers may be different from country to country, from one operator to another, etc.
Instead of using an regex let me recommend you to use a library for phone numbers: libphonenumber-for-php.
The library can be used for parsing, formatting, storing and validating international phone numbers. This library is based on Google's libphonenumber.

Related

How to verfiy postcode using php?

I'm having postcode data in below format:
I need to map customer with their provided postcode.
EG. If customer enters postcode 36421 than it will assign to its
related representative that is "John".
If customer enter postcode 36222 than it will assign to related
representative that is "Sam".
If customer is from swiss country than it will assign to only one
representative as per data in image.
I'm confused how can I map customer.
I also tried with sub_str like,
$postCode = $_POST['postcode'];
$postCode3Digit = substr($postCode, 0, 3);
But it breaks.
Also tried with regex but at same moment i think than I have to write regex for every single postcode :(
I tried with switch case but it seems to be taking more time.
What will be the best way to achieve this ? Any help would be appreciated.
I would look it up by using regex, because ZIP-data often are a little too complex for substr() only. So what I would do is to loop trough the persons and to compare the zip of the person with the zip-input. Keep in mind that you need to replace the X (in your script) with dots. Dots are recognized as one single character in regex.
$search = '34567';
$reps = array('adam' => '12...',
'bdam' => '23...',
'cdam' => '345..',
'ddam' => '346..');
foreach($reps as $name => $zip){
if(preg_match('/^'.$zip.'$/', $search)){
echo $name; break;
}
}

Ordering and Selecting frequently used tags

I have looked on stackoverflow for a solution to this however couldn't find a good answers which outlined the issues I was having; Essentially what I'm trying to achieve is to array out 15 of the most frequent tags used from all my users subjects.
This is how I currently select the data
$sql = mysql_query("SELECT subject FROM `users`");
$row = mysql_fetch_array($sql);
I do apologise for the code looking nothing like what I'm trying to achieve I really don't have any clue where to begin with trying to achieve this and came here for a possible solution. Now this would work fine and I'd be able to array them out and however my problem is the subjects contain words along with the hash tags so an example room subject would look like hey my name is example #follow me how would I only grab the #followand once I've grabbed all the hashtags from all of the subjects to echo the most frequent 15?
Again I apologise for the code looking nothing like what I'm trying to achieve and I appreciate anyone's help. This was the closest post I found to solving my issue however was not useful.
Example
Here is three room subjects;
`Hello welcome to my room #awesome #wishlist`
`Hey hows everyone doing? #friday #awesome`
`Check out my #wishlist looking #awesome`
This is what I'm trying to view them as
[3] #awesome [2] #wishlist [1] #friday
What you want to achieve here is pretty complex for an SQL query and you are likely to run in to efficiency problems with parsing the subject every time you want to run this code.
The best solution is probably to have a table that associates tags with users. You can update this table every time a user changes their subject. To get the number of times a tag is used then becomes trivial with COUNT(DISTINCT tag).
One way would be to parse the result set in PHP. Once you query your subject line from the database, let's say you have them in the array $results, then you can build a frequency distribution of words like this:
$freqDist = [];
foreach($results as $row)
{
$words = explode(" ", $row);
foreach($words as $w)
{
if (array_key_exists($w, $freqDist))
$freqDist[$w]++;
else
$freqDist[$w] = 1;
}
}
You can then sort in descending order and display the distribution of words like this:
arsort($freqDist);
foreach($freqDist as $word => $count)
{
if (strpos($word, '#') !== FALSE)
echo "$word: $count\n";
else
echo "$word: does not contain hashtag, DROPPED\n";
}
You could also use preg_match() to do fancier matching if you want but I've used a naive approach with strpos() to assume that if the word has '#' (anywhere) it's a hashtag.
Other functions of possible use to you:
str_word_count(): Return information about words used in a string.
array_count_values(): Counts all the values of an array.

United Kingdom (GB) postal code validation without regex

I have tried several regexes and still some valid postal codes sometimes get rejected.
Searching the internet, Wikipedia and SO, I could only find regex validation solutions.
Is there a validation method which does not use regex? In any language, I guess it would be easy to port.
I supose the easiest would be to compare against a postal code database, yet that would need to be maintained and updated periodically from a reliable source.
Edit: To help future visitors and keep you from posting any more regexes, here's a regex which I have tested (as of 2013-04-24) to work for all postal codes in Code Point (see #Mikkel Løkke's answer):
//PHP PCRE (it was on Wikipedia, it isn't there anymore; I might have modified it, don't remember).
$strPostalCode=preg_replace("/[\s]/", "", $strPostalCode);
$bValid=preg_match("/^(GIR 0AA)|(((A[BL]|B[ABDHLNRSTX]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]?|F[KY]|G[LUY]?|H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTY]?|T[ADFNQRSW]|UB|W[ADFNRSV]|YO|ZE)[1-9]?[0-9]|((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-HJKMNPR-Y]|(SW|W)([2-9]|[1-9][0-9])|EC[1-9][0-9])[0-9][ABD-HJLNP-UW-Z]{2})$/i", $strPostalCode);
I'm writing this answer based on the wiki page.
When checking on the validation part, it seems that there are 6 type of formats (A = letter and 9 = digit):
AA9A 9AA AA9A9AA AA9A9AA
A9A 9AA Removing space A9A9AA order it AA999AA
A9 9AA ------------------> A99AA -------------> AA99AA
A99 9AA A999AA A9A9AA
AA9 9AA AA99AA A999AA
AA99 9AA AA999AA A99AA
As we can see, the length may vary from 5 to 7 and we have to take in account some special cases if we want to.
So the function we are coding has to do the following:
Remove spaces and convert to uppercase (or lower case).
Check if the input is an exception, if it is it should return valid
Check if the input's length is 4 < length < 8.
Check if it's a valid postcode.
The last part is tricky, but we will split it in 3 sections by length for some overview:
Length = 7: AA9A9AA and AA999AA
Length = 6: AA99AA, A9A9AA and A999AA
Length = 5: A99AA
For this we will be using a switch(). From now on it's just a matter of checking character by character if it's a letter or a number on the right place.
So let's take a look at our PHP implementation:
function check_uk_postcode($string){
// Start config
$valid_return_value = 'valid';
$invalid_return_value = 'invalid';
$exceptions = array('BS981TL', 'BX11LT', 'BX21LB', 'BX32BB', 'BX55AT', 'CF101BH', 'CF991NA', 'DE993GG', 'DH981BT', 'DH991NS', 'E161XL', 'E202AQ', 'E202BB', 'E202ST', 'E203BS', 'E203EL', 'E203ET', 'E203HB', 'E203HY', 'E981SN', 'E981ST', 'E981TT', 'EC2N2DB', 'EC4Y0HQ', 'EH991SP', 'G581SB', 'GIR0AA', 'IV212LR', 'L304GB', 'LS981FD', 'N19GU', 'N811ER', 'NG801EH', 'NG801LH', 'NG801RH', 'NG801TH', 'SE18UJ', 'SN381NW', 'SW1A0AA', 'SW1A0PW', 'SW1A1AA', 'SW1A2AA', 'SW1P3EU', 'SW1W0DT', 'TW89GS', 'W1A1AA', 'W1D4FA', 'W1N4DJ');
// Add Overseas territories ?
array_push($exceptions, 'AI-2640', 'ASCN1ZZ', 'STHL1ZZ', 'TDCU1ZZ', 'BBND1ZZ', 'BIQQ1ZZ', 'FIQQ1ZZ', 'GX111AA', 'PCRN1ZZ', 'SIQQ1ZZ', 'TKCA1ZZ');
// End config
$string = strtoupper(preg_replace('/\s/', '', $string)); // Remove the spaces and convert to uppercase.
$exceptions = array_flip($exceptions);
if(isset($exceptions[$string])){return $valid_return_value;} // Check for valid exception
$length = strlen($string);
if($length < 5 || $length > 7){return $invalid_return_value;} // Check for invalid length
$letters = array_flip(range('A', 'Z')); // An array of letters as keys
$numbers = array_flip(range(0, 9)); // An array of numbers as keys
switch($length){
case 7:
if(!isset($letters[$string[0]], $letters[$string[1]], $numbers[$string[2]], $numbers[$string[4]], $letters[$string[5]], $letters[$string[6]])){break;}
if(isset($letters[$string[3]]) || isset($numbers[$string[3]])){
return $valid_return_value;
}
break;
case 6:
if(!isset($letters[$string[0]], $numbers[$string[3]], $letters[$string[4]], $letters[$string[5]])){break;}
if(isset($letters[$string[1]], $numbers[$string[2]]) || isset($numbers[$string[1]], $letters[$string[2]]) || isset($numbers[$string[1]], $numbers[$string[2]])){
return $valid_return_value;
}
break;
case 5:
if(isset($letters[$string[0]], $numbers[$string[1]], $numbers[$string[2]], $letters[$string[3]], $letters[$string[4]])){
return $valid_return_value;
}
break;
}
return $invalid_return_value;
}
Note that I've not added British Forces Post Office and non-geographic codes.
Usage:
echo check_uk_postcode('AE3A 6AR').'<br>'; // valid
echo check_uk_postcode('Z9 9BA').'<br>'; // valid
echo check_uk_postcode('AE3A6AR').'<br>'; // valid
echo check_uk_postcode('EE34 6FR').'<br>'; // valid
echo check_uk_postcode('A23A 7AR').'<br>'; // invalid
echo check_uk_postcode('A23A 7AR').'<br>'; // invalid
echo check_uk_postcode('WA3334E').'<br>'; // invalid
echo check_uk_postcode('A2 AAR').'<br>'; // invalid
As supplied by the UK government.
(GIR 0AA)|((([A-Z-[QVX]][0-9][0-9]?)|(([A-Z-[QVX]][A-Z-[IJZ]][0-9][0-9]?)|(([A-Z-[QVX]][0-9][A-HJKSTUW])|([A-Z-[QVX]][A-Z-[IJZ]][0-9][ABEHMNPRVWXY])))) [0-9][A-Z-[CIKMOV]]{2})
I've built London only postcode based apps using the postcodes I got from HERE. But to be honest, even with London postcodes only, you need a lot more storage than necessary. Sure, the idea is trivial.
Store the postcodes, take the user input or whatever, and see if you get a match. But you are complicating the solution far more than you think. I HAD to use actual postcodes to achieve what I wanted, but for simple validation purposes, as hard as "maintaining" a regex is, storing tens of thousands or hundreds of thousands(if not more) and validating more or less in real-time is a far more difficult task.
If a mini distributed service sounds like a more efficient solution than a regex, go for it, but I'm sure it isn't. Unless you need geo-spatial querying of your own data against UK postcodes or things like that, I doubt DB storage is a feasible solution. Just my 2 cents.
Update
According to this index, there are 1,758,417 postcodes in the UK. I can tell you I am using a few Mongo clusters (Amazon EC2 High Memory Instances) to provide reliable London only services(indexing only London postcodes), and it's quite a pricy thing, even with basic storage.
Admittedly, the app is performing medium complexity geo-spatial queries, but the storage requirements alone are very expensive and demanding.
Bottom line, just stick to regex and be done with it in two minutes.
Im looking at the Postcodes in United Kingdom link in wikipedia right now.
http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom
The Validation section lists six formats with a combination of letters and numbers. Then there's more information in the notes below that. The first thing that I would try is a BNF type grammar with a tool like GoldParserBuilder. You could describe the basic formats in a more readable format, with efficient parser and lexer automatically generated. In the past, I've successfully used such tools to avoid writing huge, ugly regexes.
From that point, the program has a properly formatted zip code of a known type. At this point, the specific numbers or letters might violate something. Each type of zip code can have a function programmed to look for violations of that specific type. The final product will consist of an automatically generated parser that passes unvalidated, but structured/identified, zip codes to a dedicated validation function. You can then refactor or optimize from there.
(You can also use the grammar itself to enforce or disallow certain literals and combinations. Whatever is more readable or comprehensible for you. Different people gravitate toward different ends of these things.)
Here's a page highlighting advantages of GOLD Parsing System.You can use any you like: I just promote this one b/c it's good at its job and has steadily improved over many years.
http://www.goldparser.org/about/why-use-gold.htm
I would think the RegEX, while long-winded would probably be the best solution if all you want to do is validate if something could be a valid UK post code.
If you need absolute data, consider using Ordnance Survey OpenData initiative "Code-Point® Open" dataset, which is a CSV of lots of data points in Great Britain (so not Northern Ireland I'm guessing) one of which is postcode. Be aware that the file is 20MB, so you may have to convert it to a more manageable format.
Regexes are hard to debug, hard to port from one regex flavor to another (silent "errors"), and hard to update.
That is true for most regexes, but why don't you just split it up into multiple parts? You can easily split it into six parts for the six different general rules and maybe even more if you take all of the special cases into account.
Creating a well-commented method of 20 lines with simple regexes is easy to debug (one simple regex per line) and also easy to update. The porting problem is the same, but on the other hand you do not need to use some fancy grammar lib.
Are third party services an option?
http://www.postcodeanywhere.co.uk/address-validation/
GeoNames Database:
http://www.geonames.org/postal-codes/
+1 for the "why care" comments. I have had to use the 'official' regex in various projects and while I have never attempted to break it down, it works and it does the job. I've used it with Java and PHP code without any need to convert it between regex formats.
Is there a reason why you would have to debug it or break it down?
Incidentally, the regex rule used to be found on wikipedia, but it appears to have gone.
Edit: As for the space/no-space debate, the postcode should be valid with or without the space. As the last part of the postcode (after the space) is ALWAYS three digits, it is possible to insert the space manually, which will then allow you to run it through the regex rule.
Take the list of valid postcodes and check if the one entered is in it.

How to understand if an e-mail address is an education e-mail address or not?

I want only college students to be able to sign up my website, but I couldn't figure out how to control that. I also want .edu.fr, edu.tr or other .edu extensions to be able to join my website not just .edu's. I was thinking about using some reg-ex but I couldn't find any solution. I would be glad if someone can help me?
Shouldn't be that important but I am using PHP with laravel framework.
Most educational institutions have domain names that follow these pattern:
uni.edu
uni.edu.fr
uni.ac.uk
The following regular expression covers all such cases:
/(\.edu(\.[a-z]+)?|\.ac\.[a-z]+)$/
You can add cases to the regex as needed. Check that the email is real by sending an automated email with a confirmation link.
Corresponding PHP:
if (preg_match('/(\.edu(\.[a-zA-Z]+)?|\.ac\.[a-zA-Z]+)$/i', $domain)) {
// allow
}
There's not a great way to do it, but one possible way might be to explode the address using the # symbol:
// Split the email address into 2 values of an array using the # symbol as delimiter.
$emailParts = explode('#', $theEmailAddress);
// If the second part (domain part) contains .edu, period, country code or just .edu, then allow signup.
if (preg_match('/\.edu\.[^.]+$/i', trim($emailParts[1])) || preg_match('/\.edu$/i', trim($emailParts[1]))) {
// Use the above if you are assuming that the country codes can be any number of characters. If you know for sure country codes are 2 chars, use this condition:
// (preg_match('/\.edu\.[^.]{2}$/i', trim($emailParts[1])) || preg_match('/\.edu$/i', trim($emailParts[1])))
// Allow signup
}
Of course, this does NOT guarantee that the domain or the email address is an existing one!

php extract UK postal code and validate it

I have some text blocks like
John+and+Co-Accountants-Hove-BN31GE-2959519
I need a function to extract the postcode "BN31GE". It may happen to not exist and have a text block without postcode so the function must also validate if the extracted text is valid postcode .
John+and+Co-Accountants-Hove-2959519
The UK Government Data Standard for postcodes is:
((GIR 0AA)|((([A-PR-UWYZ][0-9][0-9]?)|(([A-PR-UWYZ][A-HK-Y][0-9][0-9]?)|(([A-PR-UWYZ][0-9][A-HJKSTUW])|([A-PR-UWYZ][A-HK-Y][0-9][ABEHMNPRVWXY])))) [0-9][ABD-HJLNP-UW-Z]{2}))
Edit: I had the above in some (personal) code with a reference to a now non-existence UK government web page. The appropriate British Standard is BS7666 and information on this is currently available here. That lists a slightly different regex.
Find below code to extract valid UK postal code. It return array if post code found otherwise empty.
<?php
$getPostcode="";
$str="John+and+Co-Accountants-Hove-BN31GE-2959519";
$getArray = explode("-",$str);
if(is_array($getArray) && count($getArray)>0) {
foreach($getArray as $key=>$val) {
if(preg_match("/^(([A-PR-UW-Z]{1}[A-IK-Y]?)([0-9]?[A-HJKS-UW]?[ABEHMNPRVWXY]?|[0-9]?[0-9]?))\s?([0-9]{1}[ABD-HJLNP-UW-Z]{2})$/i",strtoupper($val),$postcode)) {
$getPostcode = $postcode[0];
}
}
}
print"<pre>";
print_r($getPostcode);
?>
Use a regex: preg_grep function,
I don't know the format of english postcodes but you could go with something like:
(-[a-zA-Z0-9]+-)+
This matches
"-Accountants-"
"-BN31GE-"
You can then proceed at taking always the second value or you can enhance you regex to match exactly english postcodes, something like maybe
([A-Z0-9]{6})

Categories