I need to split a UK postcode into two. I have some code that gets the first half but it doesn't cover everything (such as gir0aa). Does anyone have anything better that validates all UK postcodes then breaks it into the first and second half? Thanks.
function firstHalf($postcode) {
if(preg_match('/^(([A-PR-UW-Z]{1}[A-IK-Y]?)([0-9]?[A-HJKS-UW]?[ABEHMNPRVWXY]?|[0-9]?[0-9]?))\s?([0-9]{1}[ABD-HJLNP-UW-Z]{2})$/i',$postcode))
return preg_replace('/^([A-Z]([A-Z]?\d(\d|[A-Z])?|\d[A-Z]?))\s*?(\d[A-Z][A-Z])$/i', '$1', $postcode);
}
will split ig62ts into ig6 or cm201ln into cm20.
The incode is always a single digit followed by two alpha characters, so the easiest way to split is to chop off the last three characters, allowing it to be validated easily.
Trim any spaces: they're used purely for ease of human readability.
The first part that remains is then the outcode. This can be a single alpha character followed by 1 or 2 digits; two alpha characters followed by 1 or 2 digits; or one or two characters followed by a single digit, followed by an additional alpha character.
There are a couple of notable exceptions: SAN TA1 is a recognised postcode, as is GIR 0AA; but these are the only two that don't follow the standard pattern.
To test if a postcode is valid, a regexp isn't really adequate... you need to do a lookup to retrieve that information.
If you do not care about validation, based on information here (at the bottom of the page there are different regexps, including yours) http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom your can use for everything except of Anguilla
$str = "BX3 2BB";
preg_match('#^(.*)(\s+)?(\d\w{2})$#', $str, $matches);
echo "Part #1 = " . $matches[1];
echo "<br>Part #2 = " . $matches[3];
Related
I want to split a string as per the parameters laid out in the title. I've tried a few different things including using preg_match with not much success so far and I feel like there may be a simpler solution that I haven't clocked on to.
I have a regex that matches the "price" mentioned in the title (see below).
/(?=.)\£(([1-9][0-9]{0,2}(,[0-9]{3})*)|[0-9]+)?(\.[0-9]{1,2})?/
And here are a few example scenarios and what my desired outcome would be:
Example 1:
input: "This string should not split as the only periods that appear are here £19.99 and also at the end."
output: n/a
Example 2:
input: "This string should split right here. As the period is not part of a price or at the end of the string."
output: "This string should split right here"
Example 3:
input: "There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price"
output: "There is a price in this string £19.99, but it should only split at this point"
I suggest using
preg_split('~\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?(*SKIP)(*F)|\.(?!\s*$)~u', $string)
See the regex demo.
The pattern matches your pattern, \£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})? and skips it with (*SKIP)(*F), else, it matches a non-final . with \.(?!\s*$) (even if there is trailing whitespace chars).
If you really only need to split on the first occurrence of the qualifying dot you can use a matching approach:
preg_match('~^((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?|[^.])+)\.(.*)~su', $string, $match)
See the regex demo. Here,
^ - matches a string start position
((?:\£(?:[1-9]\d{0,2}(?:,\d{3})*|[0-9]+)?(?:\.\d{1,2})?|[^.])+) - one or more occurrences of your currency pattern or any one char other than a . char
\. - a . char
(.*) - Group 2: the rest of the string.
To split a text into sentences avoiding the different pitfalls like dots or thousand separators in numbers and some abbreviations (like etc.), the best tool is intlBreakIterator designed to deal with natural language:
$str = 'There is a price in this string £19.99, but it should only split at this point. As I want it to ignore periods in a price';
$si = IntlBreakIterator::createSentenceInstance('en-US');
$si->setText($str);
$si->next();
echo substr($str, 0, $si->current());
IntlBreakIterator::createSentenceInstance returns an iterator that gives the indexes of the different sentences in the string.
It takes in account ?, ! and ... too. In addition to numbers or prices pitfalls, it works also well with this kind of string:
$str = 'John Smith, Jr. was running naked through the garden crying "catch me! catch me!", but no one was chasing him. His psychatre looked at him from the window with a circumspect eye.';
More about rules used by IntlBreakIterator here.
You could simply use this regex:
\.
Since you only have a space after the first sentence (and not a price), this should work just as well, right?
I am trying to use a regular expression to pick a phone number from a string, where the format of the phone number could be just about anything, or there may not be a phone number at all. For example:
$string = 'My phone number is +34 961 123456.';
$string = 'My phone number is +34 (961) 123456.';
$string = 'My phone number is 961-123456.';
$string = 'My phone number is +34.961.12.34.56.';
$string = 'Product A costs €100.00 and Product B costs €134.15.';
So far, I have got to
$number = preg_replace("/[^0-9\/\+\.\-\s]+/", "", $string);
$number = preg_replace("/[^0-9]+/", "", $number);
if (strlen($number)>8) {
/* It's a phone number, so do something with it */
}
This works for picking out all the different phone number formats that I have tried, but it also puts the prices together and assumes that they are a phone number too.
It seems that my problem is that a human can readily distinguish between a space between words and a space in the middle of a phone number, but how do I make the computer do that? Is there a way that I can replace spaces that are both preceded and followed by a number but leave other spaces intact? Is there some other way of sorting this out?
I'm afraid you aren't gonna like it. The regex I get is this:
(\+?[0-9]?[0-9]?[[:blank:],\.]?[0-9][0-9][0-9][[:blank:],\.]?[0-9][0-9][[:blank:],\.]?[0-9][0-9][[:blank:],\.]?[0-9][0-9])
Explanation:
( <-- is for "grouping" and get the regular expression, probably not needed here
\+? <-- optional plus sign
[0-9]?[0-9]? <-- optional prefix code
[[:blank:],\.]? <-- optional space (or comma or dot) between the prefix code and the rest of the number
[0-9][0-9][0-9][[:blank:],\.]? <-- optional province code
[0-9][0-9][[:blank:],\.]?[0-9][0-9][[:blank:],\.]?[0-9][0-9] <-- number, composed by six numbers
Because these examples are for spanish telephone numbers, aren't they???
In that case, you've forgotten to give us examples of other formats, like "91 123 45 67", that might complicate the solution even more.
For these cases, I humbly think that is a best solution to make a little function. The regular expression is too complex to be a maintenable solution.
Looks like you want sequences of nine to twelve digits, with nothing between them except spaces, parentheses, periods or dashes; and possibly preceded by +. Try this:
preg_match_all("/\+?(?:\d[-. ()]*){9,12}/", $string, $results);
This isn't quite perfect, since trailing punctuation (like the period that follows all your examples) will be included in the matched string. Post-process the list of results to trim it:
preg_replace("/[-. ]+$/", "", $results);
Or you could standardize the collected phone numbers by removing all non-digits from the results, keeping just the digits and possibly an initial "+":
preg_replace("/[-. ()]/", "", $results);
I have a pretty large database with some data listed in this format, mixed up with another bunch of words in the keywords column.
BA 093, RJ 342, ES 324, etc.
The characters themselves always vary but the structure remains the same. I would like to change all of the strings that obey this character structure : 2 characters A-Z, space, 3 characters 0-9 to the following:
BA-093, RJ-342, ES-324, etc.
Be mindful that these strings are mixed up with a bunch of other strings, so I need to isolate them before replacing the empty space. Here is a sample string:
Km 111 aracoiaba Araçoiaba sp 270 spvias vias sao paulo Araçoiaba Bidirecional
sp 270 is the bit we want to change.
EDIT: There was also an exception which should ignore the condition in case KM are the first two characters, it was handled by one of the answers
I have written the beginning of the script that picks up all the data and shows it on the browser to find a solution, but I'm unsure on what to do with my if statement to isolate the strings and replace them. And since I'm using explode it is probably turning the data above into two separate arrays each, which further complicates things.
<?php
require 'includes/connect.php';
$pullkeywords = $db->query("SELECT keywords FROM main");
while ($result = $pullkeywords->fetch_object()) {
$separatekeywords = explode(" ", $result->keywords);
print_r ($separatekeywords);
echo "<br />";
}
Any help is appreciated. Thank you in advance.
This regex should do it.
([A-Z]{2})\h(\d{3})
That says any character between A-Z two times ({2}). A horizontal white space \h. Then three {3} numbers \d. The ( and ) capture the values you want to capture. So $1 and $2 have the found values.
Regex101 Demo: https://regex101.com/r/nU2yN0/1
PHP Usage:
$string = 'BA 093, RJ 342, ES 324';
echo preg_replace('~([A-Z]{2})\h(\d{3})~', '$1-$2', $string);
Output:
BA-093, RJ-342, ES-324
You may want (?:^|\h)([A-Z]{2})\h(\d{3}) which would require the capital letters don't have text running into them. For example AB 345, cattleBE 123, BE 678. With this regex cattleBE 123 wouldn't be found. Not sure what your intent with this example is though so I'll leave that to you..
The ?: makes the () non capturing there. The ^ is so the capital letters can be the start of the string. The | is or and the \h is another horizontal space. You could do \s in place of \h if you wanted to allow new lines as well.
Update:
(?!KM)([A-Z]{2})\h(\d{3})
This will ignore strings starting with KM. https://regex101.com/r/nU2yN0/2
Is there a function or a easy way to strip down phone numbers to a specific format?
Input can be a number (mobile, different country codes)
maybe
+4917112345678
+49171/12345678
0049171 12345678
or maybe from another country
004312345678
+44...
Im doing a
$mobile_new = preg_replace("/[^0-9]/","",$mobile);
to kill everything else than a number, because i need it in the format 49171 (without + or 00 at the beginning), but i need to handle if a 00 is inserted first or maybe someone uses +49(0)171 or or inputs a 0171 (needs to be 49171.
so the first numbers ALWAYS need to be countryside without +/00 and without any (0) between.
can someone give me an advice on how to solve this?
You can use
(?:^(?:00|\+|\+\d{2}))|\/|\s|\(\d\)
to match most of your cases and simply replace them with nothing. For example:
$mobile = "+4917112345678";
$mobile_new = preg_replace("/(?:^(?:00|\+|\+\d{2}))|\/|\s|\(\d\)/","",$mobile);
echo $mobile_new;
//output: 4917112345678
regex101 Demo
Explanation:
I'm making use of OR here, matching each of your cases one by one:
(?:^(?:00|\+|\+\d{2})) matches 00, + or + followed by two numbers at the beginning of your string
\/ matches a / anywhere in the string
\s matches a whitspace anywhere in the string (it matches the newline in the regex101 demo, but I suppose you match each number on its own)
\(\d\) matches a number enclosed in brackets anywhere in the string
The only case not covered by this regex is the input format 01712345678, as you can only take a guess what the country specific prefix can be. If you want it to be 49 by default, then simply replace each input starting with a single 0 with the 49:
$mobile = "01712345678";
$mobile_new = preg_replace("/^0/","49",$mobile);
echo $mobile_new;
//output: 491712345678
This pattern (49)\(?([0-9]{3})[\)\s\/]?([0-9]{8}) will split number in three groups:
49 - country code
3 digits - area code
8 digits - number
After match you can construct clean number just concatnating them by \1\2\3.
Demo: https://regex101.com/r/tE5iY3/1
If this not suits you then please explain more precisely what you want with test input and expected output.
I recommend taking a look at LibPhoneNumber by Google and its port for PHP.
It has support for many formats and countries and is well-maintained. Better not to figure this out yourself.
https://github.com/giggsey/libphonenumber-for-php
$phoneUtil = \libphonenumber\PhoneNumberUtil::getInstance();
$usNumberProto = $phoneUtil->parse("+1 650 253 0000", "US");
I have a few thousand strings that have one of these two forms:
SomeT1tle-ThatL00ks L1k3.this - $3.57 KnownWord
SomeT1tle-ThatL00ks L1k3.that - 4.5% KnownWord
The SomeT1tle-ThatL00ks L1ke.this part may contain uppercase and lowercase characters, digits, periods, dashes, and spaces. It is always followed by a space-dash-space pattern.
I want to pull out the Title (the part before the space-dash-space separator) and the Amount, which is right before KnownWord.
So for these two strings I'd like:
SomeT1tle-ThatL00ks L1k3.this, $3.57 and
SomeT1tle-ThatL00ks L1k3.that, 4.5%.
This code works (using Perl equivalent Regular Expressions)
$my_string = "SomeT1tle-ThatL00ks L1k3.this - $3.57 KnownWord";
$pattern_title = "/^(.*?)\x20\x2d\x20/";
$pattern_amount = "/([0-9.$%]+) KnownWord$/";
preg_match_all($pattern_title, $my_string, $matches_title);
preg_match_all($pattern_amount, $my_string, $matches_amount);
echo $matches_title[1][0] . " " . $matches_amount[1][0] . "<br>";
I tried putting both patterns together:
$pattern_together_doesnt_work = "/^(.*?)\x20\x2d\x20([0-9.$%]+) KnownWord$/";
but the first part of the pattern always matches the whole thing, even with the "lazy" part (.*? rather than .*). I can't negative-match spaces and dashes, because the title itself can contain either.
Any hints?
Use this pattern
/^(.*?)\x20\x2d\x20([0-9.$%]+) KnownWord$/