I'm definitely not the worst when it comes down to regex, but this one has got me stumped.
In short, this is the code I currently have.
$aNumbers = array(
'612345678',
'546123465',
'131234567',
'+31(0)612345678'
);
foreach($aNumbers as $sNumber) {
$aMatches = array();
$sNumber = preg_replace('/(\(0\)|[^\d]+)/', '', $sNumber);
preg_match('/(\d{1,2})?(\d{3})(\d{3})(\d{3})$/', $sNumber, $aMatches);
var_dump($sNumber);
var_dump($aMatches);
}
Simply put, I want to match specific formats for telephone numbers to ensure a unified display.
+31(0)612345678
+31(0)131234567
Both stripped would be without + and (0).
Cut down in parts:
31 6 123 456 78
Country Net Number
31 13 123 456 78
Country Net Number
Now, in some cases the +31 (or +1, +222) are optional. The 6 and 13 are always included, but as a fun twist, the following format is also possible:
31 546 123 456
Country Net Number
Is this even possible with regex?
I've answered a few of these types of questions, and my strategy is to identify certain portions of formatting or number relationships that convey meaning, and get rid of the rest.
One of my examples that parses non-NANP number formatting uses a list of valid area codes in the parsing expression, and identifies country code when present. It extracts the country code, area code, and then the rest of the number.
or your country, I am assuming the list of area/net/region codes in HansM's answer is either correct or easily replaceable, so I'll guess that this modification of a regex might be useful:
^[ -]*(\+31)?[ -]*[(0)]*[ -]*(7|43|32|45|33|49|39|31|47|34|46|41|90|44|351|353|358)[ -]*((?:\d[ -]*)+)
It will first match the country code, if it is present, and store it in back-reference 1, then ignore a single zero. It will then match one of the area/net/region codes and store it in back-reference 2. It will then get any number of digits (one or more), mixed with dashes (-) and/or spaces () and store those into back-reference 3
After this, you could parse the third numbering group for validity or further reformatting
I'm testing it on Regex 101, but I could use a list of acceptable and unacceptable input, and how it should be reformatted when acceptable...
[EDIT]
I've used this list of city codes for the Netherlands and modified the expression thusly:
^[ -]*(\+31)?[ -]*[(0)]*[ -]*([123457]0|23|24|26|35|45|71|73|570)[ -]*((?:\d[ -]*)+)
which performs the following parsing:
input (1) (2) (3)
--------------------- ------ ------ ---------------
0707123456 70 7123456
0267-123456 26 7-123456
0407-12 34 56 40 7-12 34 56
0570123456 570 123456
07312345 73 12345
+31(0)734423211 +31 73 4423211
but I still don't know if that's helpful for you
[EDIT 2]
Wikipedia has what appears to be a more comprehensive list of codes
010, 0111, 0113, 0114, 0115, 0117, 0118, 013, 015, 0161, 0162, 0164, 0165, 0166, 0167, 0168, 0172, 0174, 0180, 0181, 0182, 0183, 0184, 0186, 0187, 020, 0222, 0223, 0224, 0226, 0227, 0228, 0229, 023, 024, 0251, 0252, 0255, 026, 0294, 0297, 0299, 030, 0313, 0314, 0315, 0316, 0317, 0318, 0320, 0321, 033, 0341, 0342, 0343, 0344, 0345, 0346, 0347, 0348, 035, 036, 038, 040, 0411, 0412, 0413, 0416, 0418, 043, 045, 046, 0475, 0478, 0481, 0485, 0486, 0487, 0488, 0492, 0493, 0495, 0497, 0499, 050, 0511, 0512, 0513, 0514, 0515, 0516, 0517, 0518, 0519, 0521, 0522, 0523, 0524, 0525, 0527, 0528, 0529, 053, 0541, 0543, 0544, 0545, 0546, 0547, 0548, 055, 0561, 0562, 0566, 0570, 0571, 0572, 0573, 0575, 0577, 0578, 058, 0591, 0592, 0593, 0594, 0595, 0596, 0597, 0598, 0599, 070, 071, 072, 073, 074, 075, 076, 077, 078, 079
which can be used in the code selection portion like this (if you'd prefer it to be more easily read and updated):
10|111|113|114|115|117|118|13|15|161|162|164|165|166|167|168|172|174|180|181|182|183|184|186|187|20|222|223|224|226|227|228|229|23|24|251|252|255|26|294|297|299|30|313|314|315|316|317|318|320|321|33|341|342|343|344|345|346|347|348|35|36|38|40|411|412|413|416|418|43|45|46|475|478|481|485|486|487|488|492|493|495|497|499|50|511|512|513|514|515|516|517|518|519|521|522|523|524|525|527|528|529|53|541|543|544|545|546|547|548|55|561|562|566|570|571|572|573|575|577|578|58|591|592|593|594|595|596|597|598|599|70|71|72|73|74|75|76|77|78|79
or like this (if you'd prefer a more efficient evaluation of the expression):
1([035]|1[134578]|6[124-8]|7[24]|8[0-467])|2([0346]|2[2346-9]|5[125]|9[479])|3([03568]|1[34-8]|2[01]|4[1-8])|4([0356]|1[12368]|7[58]|8[15-8]|9[23579])|5([0358]|[19][1-9]|2[1-5789]|4[13-8]|6[126]|7[0-3578])|7[0-9]
I have used the nuget package libphonenumber-csharp.
That has helped me to create a (Dutch) phone number validator, here is a code snippet, without other parts of my solution it will not compile but at least you can get an idea of how to handle this.
public override void Validate()
{
ValidationMessages = new Dictionary<string, string>();
ErrorMessage = string.Empty;
string phoneNumber;
string countryCode = _defaultCountryCode;
// If the phoneNumber is not required, it is allowed to be empty.
// So in that case isValid gets defaultvalue true
bool isValid = (!_isRequired);
if (!string.IsNullOrEmpty(_phoneNumber))
{
var phoneUtil = PhoneNumberUtil.GetInstance();
try
{
phoneNumber = PhoneNumbers.PhoneNumberUtil.Normalize(_phoneNumber);
countryCode = PhoneNumberUtil2.GetRegionCode(phoneNumber, _defaultCountryCode);
PhoneNumber oPhoneNumber = phoneUtil.Parse(phoneNumber, countryCode);
var t1 = oPhoneNumber.NationalNumber;
var t2 = oPhoneNumber.CountryCode;
var formattedNo = phoneUtil.Format(oPhoneNumber, PhoneNumberFormat.E164);
isValid = PhoneNumbers.PhoneNumberUtil.IsViablePhoneNumber(formattedNo);
}
catch (NumberParseException e)
{
var err = e.ToString();
isValid = false;
}
}
if ((isValid) && (!string.IsNullOrEmpty(_phoneNumber)))
{
Regex regexValidator = null;
string regex;
// Additional validations for Dutch phone numbers as LibPhoneNumber is to graceful as it comes to
// thinking if a number is valid.
switch (countryCode)
{
case "NL":
if (_phoneNumber.StartsWith("0800") || _phoneNumber.StartsWith("0900"))
{
// 0800/0900 numbers
regex = #"((0800|0900)(-| )?[0-9]{4}([0-9]{3})?$)";
regexValidator = new Regex(regex);
isValid = regexValidator.IsMatch(_phoneNumber);
}
else
{
string phoneNumberCheck = _phoneNumber.Replace("(", "").Replace(")", "").Replace("-", "").Replace(" ", "");
regex = #"^(0031|\+31|0)[1-9][0-9]{8}$";
regexValidator = new Regex(regex);
isValid = regexValidator.IsMatch(phoneNumberCheck);
}
break;
}
}
if (!isValid)
{
ErrorMessage = string.Format(TextProvider.Get(TextProviderConstants.ValMsg_IsInAnIncorrectFormat_0),
ColumnInfoProvider.GetLabel(_labelKey));
ValidationMessages.Add(_messageKey, ErrorMessage);
}
}
Also useful might be my class PhoneNumberUtil2 that builds upon the nuget package libphonenumber-csharp:
// Code start
using System.Collections.Generic;
using System.Globalization;
using System.Linq;
using System.Text;
using PhoneNumbers;
namespace ProjectName.Logic.Miscellaneous
{
public class PhoneNumberUtil2
{
/// <summary>
/// Returns the alphanumeric country code for a normalized phonenumber. If a phonenumber does not contain
/// an international numeric country code, the default country code for the website is returned.
/// This works for 17 countries: NL, GB, FR, DE, BE, AU, SE, NO, IT, TK, RU, CH, DK, IR, PT, ES, FI
/// </summary>
/// <param name="normalizedPhoneNumber"></param>
/// <param name="defaultCountryCode"> </param>
/// <returns></returns>
public static string GetRegionCode(string normalizedPhoneNumber, string defaultCountryCode)
{
if (normalizedPhoneNumber.Length > 10)
{
var dict = new Dictionary<string, string>();
dict.Add("7", "RU");
dict.Add("43", "AT");
dict.Add("32", "BE");
dict.Add("45", "DK");
dict.Add("33", "FR");
dict.Add("49", "DE");
dict.Add("39", "IT");
dict.Add("31", "NL");
dict.Add("47", "NO");
dict.Add("34", "ES");
dict.Add("46", "SE");
dict.Add("41", "CH");
dict.Add("90", "TR");
dict.Add("44", "GB");
dict.Add("351", "PT");
dict.Add("353", "IE");
dict.Add("358", "FI");
// First check 3-digits International Calling Codes
if (dict.ContainsKey(normalizedPhoneNumber.Substring(0, 3)))
{
return dict[normalizedPhoneNumber.Substring(0, 3)];
}
// Then 2-digits International Calling Codes
if (dict.ContainsKey(normalizedPhoneNumber.Substring(0, 2)))
{
return dict[normalizedPhoneNumber.Substring(0, 2)];
}
// And finally 1-digit International Calling Codes
if (dict.ContainsKey(normalizedPhoneNumber.Substring(0, 1)))
{
return dict[normalizedPhoneNumber.Substring(0, 1)];
}
}
return defaultCountryCode;
}
}
}
I am working on multilingual application with a centralized language system. It's based on language files for each language and a simple helper function:
en.php
$lang['access_denied'] = "Access denied.";
$lang['action-required'] = "You need to choose an action.";
...
return $lang;
language_helper.php
...
function __($line) {
return $lang[$line];
}
Up til now, all strings were system messages addressed to the current user, hence I always could do it that way. Now, I need create other messages, where the string should depend on a dynamic value. E.g. in a template file I want to echo the number of action points. If the user only has 1 point, it should echo "You have 1 point."; but for zero or more than 1 point it should be "You have 12 points."
For substitution purposes (both strings and numbers) I created a new function
function __s($line, $subs = array()) {
$text = $lang[$line];
while (count($subs) > 0) {
$text = preg_replace('/%s/', array_shift($subs), $text, 1);
}
return $text;
}
Call to function looks like __s('current_points', array($points)).
$lang['current_points'] in this case would be "You have %s point(s).", which works well.
Taking it a step further, I want to get rid of the "(s)" part. So I created yet another function
function __c($line, $subs = array()) {
$text = $lang[$line];
$text = (isset($sub[0] && $sub[0] == 1) ? $text[0] : $text[1];
while (count($subs) > 0) {
$text = preg_replace('/%d/', array_shift($subs), $text, 1);
}
return $text;
}
Call to function looks still like __s('current_points', array($points)).
$lang['current_points'] is now array("You have %d point.","You have %d points.").
How would I now combine these two functions. E.g. if I want to print the username along with the points (like in a ranking). The function call would be something like __x('current_points', array($username,$points)) with $lang['current_points'] being array("$s has %d point.","%s has %d points.").
I tried to employ preg_replace_callback() but I am having trouble passing the substitute values to that callback function.
$text = preg_replace_callback('/%([sd])/',
create_function(
'$type',
'switch($type) {
case "s": return array_shift($subs); break;
case "d": return array_shift($subs); break;
}'),
$text);
Apparently, $subs is not defined as I am getting "out of memory" errors as if the function is not leaving the while loop.
Could anyone point me in the right direction? There's probably a complete different (and better) way to approach this problem. Also, I still want to expand it like this:
$lang['invite_party'] = "%u invited you to $g party."; should become Adam invited you to his party." for males and "Betty invited you to her party." for females. The passed $subs value for both $u and $g would be an user object.
As mentionned by comments, I guess gettext() is an alternative
However if you need an alternative approach, here is a workaround
class ll
{
private $lang = array(),
$langFuncs = array(),
$langFlags = array();
function __construct()
{
$this->lang['access'] = 'Access denied';
$this->lang['points'] = 'You have %s point{{s|}}';
$this->lang['party'] = 'A %s invited you to {{his|her}} parteh !';
$this->lang['toto'] = 'This glass seems %s, {{no one drank in already|someone came here !}}';
$this->langFuncs['count'] = function($in) { return ($in>1)?true:false; };
$this->langFuncs['gender'] = function($in) { return (strtolower($in)=='male')?true:false; };
$this->langFuncs['emptfull'] = function($in) { return ($in=='empty')?true:false; };
$this->langFlags['points'] = 'count';
$this->langFlags['toto'] = 'emptfull';
$this->langFlags['party'] = 'gender';
}
public function __($type,$param=null)
{
if (isset($this->langFlags[$type])) {
$f = $this->lang[$type];
preg_match("/{{(.*?)}}/",$f,$m);
list ($ifTrue,$ifFalse) = explode("|",$m[1]);
if($this->langFuncs[$this->langFlags[$type]]($param)) {
return $this->__s(preg_replace("/{{(.*?)}}/",$ifTrue,$this->lang[$type]),$param);
} else {
return $this->__s(preg_replace("/{{(.*?)}}/",$ifFalse,$this->lang[$type]),$param);
}
} else {
return $this->__s($this->lang[$type],$param);
}
}
private function __s($s,$i=null)
{
return str_replace("%s",$i,$s);
}
}
$ll = new ll();
echo "Call : access - NULL\n";
echo $ll->__('access'),"\n\n";
echo "Call : points - 1\n";
echo $ll->__('points',1),"\n\n";
echo "Call : points - 175\n";
echo $ll->__('points',175),"\n\n";
echo "Call : party - Male\n";
echo $ll->__('party','Male'),"\n\n";
echo "Call : party - Female\n";
echo $ll->__('party','Female'),"\n\n";
echo "Call : toto - empty\n";
echo $ll->__('toto','empty'),"\n\n";
echo "Call : toto - full\n";
echo $ll->__('toto','full');
This outputs
Call : access - NULL
Access denied
Call : points - 1
You have 1 point
Call : points - 175
You have 175 points
Call : party - Male
A Male invited you to his parteh !
Call : party - Female
A Female invited you to her parteh !
Call : toto - empty
This glass seems empty, no one drank in already
Call : toto - full
This glass seems full, someone came here !
This may give you an idea on how you could centralize your language possibilities, creating your own functions to resolve one or another text.
Hope this helps you.
If done stuff like this a while ago, but avoided all the pitfalls you are in by separating concerns.
On the lower level, I had a formatter injected in my template that took care of everything language-specific. Formatting numbers for example, or dates. It had a function "plural" with three parameters: $value, $singular, $plural, and based on the value returned one of the latter two. It did not echo the value itself, because that was left for the number formatting.
The whole translation was done inside the template engine. It was Dwoo, which can do template inheritance, so I set up a master template with all HTML structure inside, and plenty of placeholders. Each language was inheriting this HTML master and replaced all placeholders with the right language output. But because we are still in template engine land, it was possible to "translate" the usage of the formatter functions. Dwoo would compile the template inheritance on the first call, including all subsequent calls to the formatter, including all translated parameters.
The gender problem would be getting basically the same soluting: gender($sex, $male, $female), with $sex being the gender of the subject, and the other params being male or female wording.
Perhaps a better aproach is the one used by function t in Drupal, take a look:
http://api.drupal.org/api/drupal/includes!bootstrap.inc/function/t/7
http://api.drupal.org/api/drupal/includes!bootstrap.inc/function/format_string/7
In my php application I want to get the nearest postal code of the given post code.
That means I enter a post code as 680721 I want to get the nearest post code of this from my database.
How can I do this?
This is the table I used for store postal codes.
Here varpin is the postal code field.
Having said all this, a quick browse through the “External Links” on the UK Postcodes Wikipedia entry, and I quickly found an article by Paul Jenkins entitled UK Post Code Distance Calculation in PHP, which is fantastic, you can even download it here (uk_postcode_calc.zip).
After a short examination it seems this does exactly what it says on the tin, and simply calculates the distance.
However, with a quick google for php distance calculation, you can quickly find that there are more refined equivalents of the distance calculation. I thought it might be a good idea to use one of those instead.
After a bit of tweaking, here’s what I came up with in the end:
function distance($lat1, $lon1, $lat2, $lon2, $u=’1′) {
$u=strtolower($u);
if ($u == ‘k’) { $u=1.609344; } // kilometers
elseif ($u == ‘n’) { $u=0.8684; } // nautical miles
elseif ($u == ‘m’) { $u=1; } // statute miles (default)
$d=sin(deg2rad($lat1))*sin(deg2rad($lat2))+cos(deg2rad($lat1))*cos(deg2rad($lat2))*cos(deg2rad($lon1-$lon2));
$d=rad2deg(acos($d));
$d=$d*60*1.1515;
$d=($d*$u); // apply unit
$d=round($d); // optional
return $d;
}
So, that’s the hard parts done (database and maths), next is simply a case of using this information to “find the closest” from the postcode we input to an array of postcodes we supply…
To find the “closest” postcode, effectively what we’re trying to do is find the “shortest” distance between the postcodes, or, simply the smallest number in the results, assuming we put the results into an array with the key as the postcode and the distance as the value.
All we have to do is create a simple script that will find the smallest number in a given array, then return the appropriate key. Simple!
function closest ($needle,$haystack) {
if (!$needle || !$haystack) { return; }
if (!is_array($haystack)) { return; }
$smallest=min($haystack); //smallest value
foreach ($haystack as $key => $val) {
if ($val == $smallest) { return $key; }
}
}
The above script does exactly what we want, using the “min” function we can quickly work out what we need to return.
The only task left is to bind all this together, we need to create two functions that will:
Get the distance using the postcode to get the longitude and latitude from the database.
Create an array with the postcodes as the keys, and the distance as the values.
Very simple!
Function 1, Postcode Distance
function postcode_distance ($from,$to) {
// Settings for if you have a different database structure
$table=’postcodes_uk’;
$lat=’lat’;
$lon=’lon’;
$postcode=’postcode’;
// This is a check to ensure we have a database connection
if (!#mysql_query(‘SELECT 0′)) { return; }
// Simple regex to grab the first part of the postcode
preg_match(‘/[A-Z]{1,2}[0-9R][0-9A-Z]?/’,strtoupper($from),$match);
$one=$match[0];
preg_match(‘/[A-Z]{1,2}[0-9R][0-9A-Z]?/’,strtoupper($to),$match);
$two=$match[0];
$sql = “SELECT `$lat`, `$lon` FROM `$table` WHERE `$postcode`=’$one’”;
$query = mysql_query($sql);
$one = mysql_fetch_row($query);
$sql = “SELECT `$lat`, `$lon` FROM `$table` WHERE `$postcode`=’$two’”;
$query = mysql_query($sql);
$two = mysql_fetch_row($query);
$distance = distance($one[0], $one[1], $two[0], $two[1]);
// For debug only…
//echo “The distance between postcode: $from and postcode: $to is $distance miles\n”;
return $distance;
}
Function 2, Postcode Closest
function postcode_closest ($needle,$haystack) {
if (!$needle || !$haystack) { return; }
if (!is_array($haystack)) { return; }
foreach ($haystack as $postcode) {
$results[$postcode]=postcode_distance($needle,$postcode);
}
return closest($needle,$results);
}
So, with that done, place the 4 above functions into a file such as “postcode.php”, ready for use in the real world…
Test case:
<?php
include_once(‘postcode.php’);
if ($_POST) {
include_once(‘db.php’);
$postcodes=array(‘TF9 9BA’,'ST4 3NP’);
$input=strtoupper($_POST['postcode']);
$closest=postcode_closest($input,$postcodes);
}
if (isset($closest)) {
echo “The closest postcode is: $closest”;
}
?>
<form action=”" method=”post”>
Postcode: <input name=”postcode” maxlength=”9″ /><br />
<input type=”submit” />
</form>
You can download this script here: postcode_search.phps
Note: In the above test case, I have a “db.php” file which contains my database details and starts a database connection. I suggest you do the same.
Ensure you have your database populated, you should be able to use Paul Jenkins’s UK Postcode csv, allowing you to use your own table structure.
Well, that’s all folks, I can now use this script to provide any locations that match the “closest” postcode.