Get first part of UK postcode only - php

I'm trying to get the first part of a UK postcode from a string that may have only the first part of the postcode or the full postcode in it. I'm struggling to make it work. I've got it working if the full postcode is entered by using a look-ahead, but I can't seem to make the look-ahead optional, so if only the first part of the postcode is entered it is matched.
My regex so far is ([A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW])(?=( ?[0-9][ABD-HJLNP-UW-Z]{2})))
I've got several postcodes that must match and these are the results using the above regex:
A10EA - Should match and does
A1 - Should match but doesn't
A10 0EA - Should match and does
A10 - Should match but doesn't
BH18 1AE - Should match and does
BH18AE - Should match and does
EC1M 6HJ - Should match and does
EC1M - Should match but doesn't
Z10 2EV - Shouldn't match and doesn't
QE3 6DA - Shouldn't match but matches E3 6DA
Can someone please help me solve this issue?
The RegEx I've been working from is the official one from the post office:
/^(GIR ?0AA|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?[0-9][ABD-HJLNP-UW-Z]{2})$/i
Before anyone flags this as a duplicate of PHP Find first part of UK postcode when full or part can be entered, it's not. The answer for that question doesn't work, see my comment to the answer.

According this wiki page the post code always ends in 'digit letter letter', that would be a regex pattern of \d\w\w$. Now we know how to spot what the end is, we just want to capture the rest.
A pattern like (\S*)\s*\d\w\w$ will work. That will capture the first half, and ensure that you do not get the last 'digit letter letter part. It will capture the first part by getting anything not white space, ie only letters and digits.
To fully explain this, the brackets () is what we are capturing. \S says 'any one non white space character, with \S*being all that we can get. so (\S*) captures everything up to a space character, but will capture everything if the user doesn't enter one. The full regex I provided will also try to capture 'any white space, one digit, two letters, end of string' which will ensure that AA999AA is split into AA99 and 9AA.
I've also just noticed though that your question states you might not actually have that second part. I think you could get around that by checking the string length. If you trim white space and the length is less than 5 characters, you must only have the first part, so no need for any regex.
disclaimer this will not work for Anguillan postcodes. To support their postcodes as well, I think (\S*)\s*(?:\d\w\w|-\d{4})$ would work.

I've been looking at this the wrong way. I want to get the first part of the postcode and remove the second part if present, so why not validate the postcode first, then check for an end and strip it if necessary.
I'm already validating the postcode, this is code I already had:
$validate = Validation::factory(array('postcode' => $postcode));
$validate->rule('postcode', 'not_empty');
$validate->rule('postcode', 'regex', array(':value', '/^(GIR ?(0AA)?|[A-PR-UWYZ]([0-9]{1,2}|([A-HK-Y][0-9]([0-9ABEHMNPRV-Y])?)|[0-9][A-HJKPS-UW]) ?([0-9][ABD-HJLNP-UW-Z]{2})?)$/i'));
if ( ! $validate->check())
{
$postcode = '';
}
So now I've added in this after it:
if ($postcode)
{
$short_postcode = $postcode;
// Check for an end section and then if present, remove it
if (preg_match('/ ?([0-9])[ABD-HJLNP-UW-Z]{2})$/i', $postcode, $match, PREG_OFFSET_CAPTURE))
{
$short_postcode = substr($postcode, 0, $match[0][1]);
}
}
and this leaves me with just the first part of the postcode, which is what I wanted. This Eval.in shows it working for all the examples in my question.

Related

PHP preg_replace: find string part not starting with an exclamation point

I am working on some very messy Excel sheets, and trying to use PHP to find clues..
I have a MySQL database with all formulas from an excel document, and as usual, the cellnames from the current sheet do not have a "sheetname!" in front of it. To make it searchable (and find dead-routes in the formulas) I like to replace all formulas in the database with their sheetname as prefix.
Example:
=+(sheet_factory_costs!A17/sheet_employees!D23)+T12+W12
The database contains the name of the current sheet, and I like to change the formula above with that sheetname (let's call it "sheet_turnover").
=+(sheet_factory_costs!A17 / sheet_employees!D23)+sheet_turnover!T12+sheet_turnover!W12
I try this in PHP with preg_replace, and I think I need the following rules:
Find one or two letters, directly followed by a number. This is always a cell-adress within formulas.
When there is a ! on the position before, there is already a sheetname. So I am only looking for the letters and numbers NOT starting with an exclamation point.
The problem seems to be that the ! is also a special sign within patterns. Even if I try to escape it, it does not work:
$newformula =
preg_replace('/(?<\!)[A-Z]{1,2}[0-9]/',
'lala',
$oldformula);
(lala is my temporary marker to see if it is selecting the right cell-adresses)
(and yes, the lala is only places over the first number, but that's no issue right now)
(and yes, all Excel $..$.. (permanent) markers have already been replaced. No need to build that in the formula)
Your negative lookbehind is corrupt, you need to define it as (?<!!). However, you also need to use either a word boundary before it, or a (?<![A-Z]) lookbehind to make sure you have no other letters before the [A-Z]{1,2}.
So, you may use
'~\b(?<!!)[A-Z]{1,2}[0-9]~'
See the regex demo. Replace with sheet_turnover!$0 where $0 is the whole match value.
Details
\b - a word boundary (it is necessary, or name!AA11 would still get matched)
(?<!!) - no ! immediately to the left of the current location
[A-Z]{1,2} - 1 or 2 letters
[0-9] - a digit.
Another approach is match and skip "wrong" contexts and then match and keep the "right" ones:
'~\w+![A-Z]{1,2}[0-9](*SKIP)(*F)|\b[A-Z]{1,2}[0-9]~'
See this regex demo.
Here, \w+![A-Z]{1,2}[0-9](*SKIP)(*F)| part matches 1 or more word chars, then 1 or 2 uppercase ASCII letters and then a digit, and (*SKIP)(*F) will omit the match and will make the engine proceed looking for matches after the end of the previous match.

Php preg_replace numbers characters

$my_string = '88888805';
echo preg_replace("/(^.|.$)(*SKIP)(*F)|(.)/","*",$,my_string);
This shows the first and last number like thus 8******5
But how can i show this number like this 888888**. (The last 2 number is hidden)
Thank you!
From this: 8******5
To: 888888**
I'm not sure if you have worked on this Regex pattern to do something unique. However, I will provide you with a general one that should fit your question without using your current pattern.
$my_string = '88888805';
echo preg_replace("/([0-9]+)[0-9]{2}$/","$1**",$,my_string);
Explanation:
The ([0-9]+) will match all digits, this could be replaced with \d+, it's between brackets to be captured as we are going to use it in the results.
[0-9]{2} is going to match the last 2 digits, again, it can be replaced with \d{2}, it's outside the brackets because we don't want to include them in the result. the $ after that is to indicate the end of the test, it's optional anyways.
Results:
Input: 88888805
Output: 888888**
echo preg_replace("/(.{2}$)(*SKIP)(*F)|(.)/","*",$my_string);
If it for a uni assignment, you'd probably want to do this. Basically says, don't match if its the last two characters, otherwise match.

Unique regex for name validation

I want to check is the name valid with regex PHP, but i need a unique regex that allows:
Letters (upper and lowercase)
Spaces (max 2)
But there can't be a space after space..
For example:
Name -> Dennis Unge Shishic (valid)
Name -> Denis(space)(space) (not valid)
Hope you guys understand me, thank you :)
First, it's worth mentioning that having such restrictive rules for the names of persons is a very bad idea. However, if you must, a simple character class like this will limit you to just uppercase and lowercase English letters:
[A-Za-z]
To match one or more, you need to add a + after it. So, this will match the first part of the name:
[A-Za-z]+
To capture a second name, you just need to do the same thing preceded by a space, so something like this will capture two names:
[A-Za-z]+ [A-Za-z]+
To make the second name optional, you need to surround it by parentheses and add a ? after it, like this:
[A-Za-z]+( [A-Za-z]+)?
And to add a third name, you just need to do it again:
[A-Za-z]+( [A-Za-z]+)? [A-Za-z]+
Or, you could specify that the latter names can repeat between 1 and 2 times, like this:
[A-Za-z]+( [A-Za-z]+){1,2}
To make the resulting code easy to understand and maintain, you could use two Regex. One checking (by requiring it to be true) that only the allowed characters are used ^[a-zA-Z ]+$ and then another one, checking (by requiring it to be false) that there are no two (or more) adjacent spaces ( ){2,}
Try following working code:
Change input to whatever you want to test and see correct validation result printed
<?php
$input_line = "Abhishek Gupta";
preg_match("/[a-zA-Z ]+/", $input_line, $nameMatch);
preg_match("/\s{2,}/", $input_line, $multiSpace);
var_dump($nameMatch);
var_dump($multiSpace);
if(count($nameMatch)>0){
if(count($multiSpace)>0){
echo "Invalid Name Multispace";
}
else{
echo "Valid Name";
}
}
else{
echo "Invalid Name";
}
?>
A regex for one to three words consisting of only Unicode letters in PHP looks like
/^\p{L}+(?:\h\p{L}+){1,2}\z/u
Description:
^ - string start
\p{L}+ - one or more Unicode letters
(?:\h\p{L}+){1,2} - one or two sequences of a horizontal whitespace followed with one or more Unicode letters
\z - end of string, even disallowing trailing newline that a dollar anchor allows.

Regex not validating consistently

I'm experiencing an usual problem and I can't seem to nail down the cause. I'm working on server side validation for a number of fields on a signup form on a site. This is the block of PHP:
if ('phone-name' == $tag->name) {
$value = $_POST[$tag->name];
if (!preg_match('/\+([0-9])([ .-]*\d){7,12}/', $value)) {
$result->invalidate($tag, "You must enter a valid number: $value is not valid");
}
}
The regex I am using should only allow the user to input:
the "+" sign
spaces, dots and hyphens
and any number from 0 to 9
However when I run the following tests I get these results:
TEST 1: INVALID INPUT
+3588<script>alert(1)</script>
TEST 2: VALID INPUT
+35388888888<script>alert(1)</script>
Is there something I am missing here? How come the regex works for TEST 2 and not for TEST 1? Any help is much appreciated.
Thanks
Because your regex is matching the number at the beginning of the string. You could use ^$ to match string length (beginning and end), truncate the string first, use strip_tags, etc.
Example:
http://www.regexr.com/3bmfm
^\+([0-9])([ .-]*\d){7,12}$
Remove the ^ and $ from beginning and end of pattern to see it match the second line.
Because you specified how many times a given pattern should be repeated by {7,12}.

Regex Capital letter combo

REGEX is something of a mystery to me. After searching on SO, I did download Espresso and went through the tutorial, but things still are not clicking for me. It may just be my specific need, but I haven't found any examples. What I want to do is find matches that are exactly two specific capital (or lowercase, mix) and then a string of numbers. Here are the cases I want to test against:
TL123
TL 123
tl123
tl 123
TLABC123
tlabc123
What I'm then trying to do is preg_replace the results for that match (and ultimately always return TL-123 - for example).
So, any letter or number combo after TL would return TL- and vice-versa. Any nudges in the right direction would be extremely helpful. Thanks!
Edit
It might actually be preg_match_all that I need for this.
To match the specified pattern, you can use:
TL(?:[^0-9]*)(\d+)
This will match a TL followed by anything that isn't a number (or nothing) and then a list of numbers.
You could use this with PHP's preg_replace() like:
$str = preg_replace('/TL(?:[^0-9]*)(\d+)/i', 'TL-$1', $str);
This example, of course, assumes that TL is the exact characters you want to match. If TL is just a placeholder and you could match anything, you could use the following:
preg_replace('/([a-z]{2})(?:[^0-9]*)(\d+)/i', '$1-$2', $str);
With this, I have it hardcoded to only allow 2 characters to match ({2}). You can modify this to any number if you need it to change.
Also, as you want the matched characters to always be uppercase, but can match lowercase, I would suggest to just use strtoupper() around the result (instead of a callback).

Categories