I want to extract the area code from a UK postcode using a regular expression. For example this would be removing "SW" from "SW11 1AW". The area is always the first characters of the string and is always followed by a number. I can't just extract the first two characters as sometimes there is only one letter, eg "E1 4PN". So it needs to match A-Z only from the start of the string until it hits a number and return just the letters. For the sake of argument the string will always be upper case.
Thanks.
(assuming PHP)
$letters = preg_replace('#^([a-z]+).*#i','$1',$postcode);
In ruby this would look like:
postcode = 'SW11 1AW'
postcode[/^[a-z]+/i] # get the area code
#=> "SW"
postcode[/^[a-z]+(.*)/i,1] # get the rest
#=> "11 1AW"
Note: The i flag (ignore case) is set. So both, uppercase and downcase letters work.
^(?i)[a-z]+(?=\d)
Will find first two letters if there is two or first one if only one letter that is before a number, regardless of case.
Related
:)
We would like to set a special condition (based on PHP Preg_match regular expression) to validates a number on our form.
That “number field” need, at first, only contain a max of 13 numbers (and only numbers. No letters or anything else).
The very first number need to be (only) “1” or “2” (not anything else)
The 4rd and 5rd number represent (the 2 numbers combinated) the “Month of birth” of someone, so the 4rd number need to be "0" or "1", and the 5rd need to be between "1" and "9".
Really appreciates if you can help us for that, to have the good “syntax” for the regular expression in PHP Preg_match to validates that field on our form! :)
Thanks to the community for your support and help!
Regards
Here is the literal regex pattern you have described to us:
^[12]\d{2}(?:0[1-9]|1[0-2])\d{8}$
Sample script:
$input = "1231212345678";
if (preg_match("/^[12]\d{2}(?:0[1-9]|1[0-2])\d{8}$/", $input)) {
echo "MATCH";
}
This regex pattern says to:
^ from the start of the string
[12] match 1 or 2 as the first digit
\d{2} then match any digits in the 2nd and 3rd position
(?:0[1-9]|1[0-2]) match 01, 02, ..., 12 as the two digit month
\d{8} then match any other 8 digits
$ end of string
I have a value like this 73b6424b. I want to split value into two parts. Like 73b6 and 424b. Then the two split value want to reverse. Like 424b and 73b6. And concatenate this two value like this 424b73b6. I have already done this like way
$substr_device_value = 73b6424b;
$first_value = substr($substr_device_value,0,4);
$second_value = substr($substr_device_value,4,8);
$final_value = $second_value.$first_value;
I am searching more than easy way what I have done. Is it possible?? If yes then approach please
You may use
preg_replace('~^(.{4})(.{4})$~', '$2$1', $s)
See the regex demo
Details
^ - matches the string start position
(.{4}) - captures any 4 chars into Group 1 ($1)
(.{4}) - captures any 4 chars into Group 2 ($2)
$ - end of string.
The '$2$1' replacement pattern swaps the values.
NOTE: If you want to pre-validate the data before swapping, you may replace . pattern with a more specific one, say, \w to only match word chars, or [[:alnum:]] to only match alphanumeric chars, or [0-9a-z] if you plan to only match strings containing digits and lowercase ASCII letters.
Is there a function or a easy way to strip down phone numbers to a specific format?
Input can be a number (mobile, different country codes)
maybe
+4917112345678
+49171/12345678
0049171 12345678
or maybe from another country
004312345678
+44...
Im doing a
$mobile_new = preg_replace("/[^0-9]/","",$mobile);
to kill everything else than a number, because i need it in the format 49171 (without + or 00 at the beginning), but i need to handle if a 00 is inserted first or maybe someone uses +49(0)171 or or inputs a 0171 (needs to be 49171.
so the first numbers ALWAYS need to be countryside without +/00 and without any (0) between.
can someone give me an advice on how to solve this?
You can use
(?:^(?:00|\+|\+\d{2}))|\/|\s|\(\d\)
to match most of your cases and simply replace them with nothing. For example:
$mobile = "+4917112345678";
$mobile_new = preg_replace("/(?:^(?:00|\+|\+\d{2}))|\/|\s|\(\d\)/","",$mobile);
echo $mobile_new;
//output: 4917112345678
regex101 Demo
Explanation:
I'm making use of OR here, matching each of your cases one by one:
(?:^(?:00|\+|\+\d{2})) matches 00, + or + followed by two numbers at the beginning of your string
\/ matches a / anywhere in the string
\s matches a whitspace anywhere in the string (it matches the newline in the regex101 demo, but I suppose you match each number on its own)
\(\d\) matches a number enclosed in brackets anywhere in the string
The only case not covered by this regex is the input format 01712345678, as you can only take a guess what the country specific prefix can be. If you want it to be 49 by default, then simply replace each input starting with a single 0 with the 49:
$mobile = "01712345678";
$mobile_new = preg_replace("/^0/","49",$mobile);
echo $mobile_new;
//output: 491712345678
This pattern (49)\(?([0-9]{3})[\)\s\/]?([0-9]{8}) will split number in three groups:
49 - country code
3 digits - area code
8 digits - number
After match you can construct clean number just concatnating them by \1\2\3.
Demo: https://regex101.com/r/tE5iY3/1
If this not suits you then please explain more precisely what you want with test input and expected output.
I recommend taking a look at LibPhoneNumber by Google and its port for PHP.
It has support for many formats and countries and is well-maintained. Better not to figure this out yourself.
https://github.com/giggsey/libphonenumber-for-php
$phoneUtil = \libphonenumber\PhoneNumberUtil::getInstance();
$usNumberProto = $phoneUtil->parse("+1 650 253 0000", "US");
As this question, I can split strings that includes upper cases like this:
function splitAtUpperCase($string){
return preg_replace('/([a-z0-9])?([A-Z])/','$1 $2',$string);
}
$string = 'setIfUnmodifiedSince';
echo splitAtUpperCase($string);
Output is "set If Unmodified Since"
But I need some modification:
That code snippet doesn't handle the cases, when these characters exist in string: ÇÖĞŞÜİ. I don't want to transliterate the characters. Then I lose meaning of word. I need to use some UTF characters. That code makes "HereÇonThen" to "HereÇon Then"
I also don't want to split uppercase abbreviations. If word is "IKnowYouWillComeASAPHere" I need it to be converted to "I Know You Will Come ASAP Here"
Don't explode if all letters are uppercase. Like "DONTCOMEHERE"
Explode also numeric values. "Before2013ends" to "Before 2013 ends"
Explode if first character is hash key (#).
cases and expected results
"comeHEREtomorrow" => "come HERE tomorrow"
"KissYouTODAY" => "kiss you TODAY"
"comeÜndeHere" => "come Ünde Here"
"NEVERSAYIT" => "NEVERSAYIT"
"2013willCome" => "2013 will Come"
"Before2013ends" => "Before 2013 ends"
"IKnowThat" => "I Know That"
"#whatiknow" => "# whatiknow"
For these cases I use subsequent str_replace operations. I look for a short solution that doesn't make too much for loops to check the words. It would be better to have it as preg_replace or etc. if possible.
Edit: Anyone can try his solution by changing convert function inside this PHP fiddle: http://ideone.com/9gajZ8
/([[:lower:][:digit:]])?([[:upper:]]+)/u should do it.
Here /u is used for Unicode characters. and ([[:upper:]]+) is used for Sequence of upper cased letters.
Note. Case of a letter depends on the character set you are using.
Some notes:
Use Unicode properties to search for upper-case & lower-case letters (and even title-case ones, f.ex. Dž Lj Nj Dz)
comeHEREtomorrow & IKnowThat won't work with one method, until you use some dictionaries to find exact words.
Because if you want to translate comeHEREtomorrow as come HERE tomorrow, IKnowThat will be IK now That (or even IK now T hat);
And if you want to translate IKnowThat as I Know That, comeHEREtomorrow will be come H E R E tomorrow
My solution: http://ideone.com/oALyTo (excludes non-letter & non-number charaters)
Well, I matched all of your test cases, but I still don't think it's a good solution. (One of the few flaws in test driven design).
I took a slightly different approach. Instead of trying to write a regular expression for what the place between a word should look like, I wrote a regular expression that looks for everything that apparently is a word, and then imploded.
function convert($keyword) {
$wResult = preg_match_all('/(^I|[[:upper:]]{2,}|[[:upper:]][[:lower:]]*|[[:lower:]]+|\d+|#)/u', $keyword, $matches);
return implode(' ',$matches[0]);
}
As you can see, this is what I decided qualified as a word:
^I A capital I at the beginning of the string. Break point: Icons.
[[:upper:]]{2,} Consecutive capitals. Break Point: WellIKnowThat
[[:upper:]][[:lower:]]* A single Capital followed by some lower case letters
[[:lower:]]+ A string of lower case letters
\d+ A string of digits
# A literal #
It's not perfect - there're still many breakpoints. You can continue to refine these word definitions, but frankly, there's always going to be an edge case you can't catch. Then you wind up slowly expanding this regular expression until it's totally unmanageable. You could try using a dictionary, but that breaks down eventually, too. What do you do with "whirlwind"? Or "ITan"? Is that "IT an", or "I Tan"? Case in point? Here it is after I tried to catch some of My errors. It's getting so huge, and it's still trivial to come up with strings it breaks on. This function is all about degrees - how much time is it worth spending to teach your algorithm all the funny points of all the world languages?
EDIT: After some work, And deciding that I could be separated out as its own word if and only if it was followed immediately by One Capital letter and one lower case letter, I've updated my attempt at an answer.
function convert($keyword, $debug = false) {
$wResult = preg_match_all('/I(?=[[:upper:]][[:lower:]])|[[:upper:]]{2,}|[[:upper:]][[:lower:]]*|[[:lower:]]+|\d+|#/u', $keyword, $matches);
if($debug){
var_dump($matches);
var_dump($matches[0]);
var_dump(implode(' ',$matches[0]));
}
return implode(' ',$matches[0]);
}
I also added some new test cases:
convert("Icons") = "Icons"
convert("WellIKnowThat") == "Well I Know That"
convert("ITan") == "I Tan"
convert("whirlwind") == "whirlwind"
I think this is about as good as it's going to get today. The final set of "Word Definitions" in order of preference, is:
Upper case I, provided it's followed by an upper case letter and a lower case letter:I(?=[[:upper:]][[:lower:]])
Two or more consecutive upper case letters: [[:upper:]]{2,}
A single uppercase Letter, followed by as many Lower case letters as possible: [[:upper:]][[:lower:]]*
one or more consecutive lower case letters: [[:lower:]]+
One or more consecutive digits: \d+
A literal pound symbol: #
I've added another word definition, a test case, and refined the testing fiddle. The new word definition matches the rule for I, but with A - the only other one letter word in the English Language.
you need Unicode Regex:
\p{Lu} for upercase and \p{Li} for lowercase
Hence, your usage will look like this:
/([\p{Ll}0-9])?([\p{Lu}])/
I have a string that contains 5 words. In the string one of the words is a Ham Radio Call Sign and can be anyone of the thousands of call signs in the US. In order to extract the Call Sign from the string I need to utilize the below pattern. The Call Sign I need to extract can be in any of the 5 positions in the string. The number is never the first character and the number is never the last character. The string is actually put together from an Array since it is originally read from a text file.
$string = $word[1] $word[2] $word[3] etc....
So the search can be either done on the whole string or each piece of the array.
Patterns:
1 Number and 3 Letters Example: AB4C A4BC
1 Number and 4 Letters Example: A4BCD
1 Number and 5 Letters Example: AB4CDE
I have tried everything I can think of and search till I cant search no more. I am sure I am over thinking this.
A two-step regular expression like this would do it:
$str = "hello A4AB there BC5AD";
$signs = array();
preg_match_all('/[A-Z][A-Z\d]{1,3}[A-Z]/', $str, $possible_signs);
foreach($possible_signs[0] as $possible_sign)
if (preg_match('/^\D+\d\D+$/', $possible_sign))
array_push($signs, $possible_sign);
print_r($signs); //Array ([0] => A4AB [1] => BC5AD)
Explanation
This is a regular expression approach, using two patterns. I don't think it could be done with one and still satisfy the exact requirements of the matching rules.
The first pattern enforces the following requirements:
substring starts and ends with a capital letter
substring contains only other capital letters or numbers between the first and last letter
substring is, overall, not more than 6 characters long
What I can't do in that same pattern, for complex REGEX reasons I won't go into (unless someone knows a way and can correct me), is enforce that only one number is contained.
#jeroen's answer does enforce this in a single pattern, but in turn does not enforce the correct length of the substring. Either way, we need a second pattern.
So after grabbing the initial matches, we loop over the results. We then apply each to a second pattern that enforces simply that there is only one number in the substring.
If so, we green-light the substring and it's added to the $signs array.
Hope this helps.
It depends on what the other words can contain, but you could use a regular expression like:
#\b[a-z]+\d[a-z]+\b#i
^ case insensitive
^^ a word boundary
^^^^^^ One or more letters
^^ One number
You can make it more restrictive by using {1,3} instead of + for the letters so that you have a sequence of 1 to 3 letters.
The complete expression would be something like:
$success = preg_match('#\b[a-z]+\d[a-z]+\b#i', $input_string, $matches);
where $matches[0] will contain the matched value, see the manual.