I need to remove everything but numbers and, if exists one character from a string. It's a street name I need to extract the house number of. It is possible that there is some more content after the string, but not neccessarely.
The original string is something like
Wagnerstrasse 3a platz53,eingang 3,Zi.3005
I extract the street with number like this:
preg_match('/^([^\d]*[^\d\s]) *(\d.*)$/', $address, $match);
Then, I do an if statement on "Wagnerstrasse 3a"
if (preg_replace("/[^0-9]/","",$match[2]) == $match[2])
I need to change the regex in order to get one following letter too, even if there is a space in between, but only if it is a single letter so that my if is true for this condition / Better a regex that just removes everything but below:
Wagnerstrasse 3a <-- expected result: 3a
Wagnerstrasse 3 a <--- expected result 3 a
Wagnerstrasse 3 <--- expected result 3
Wagnerstrasse 3 a bac <--- expected result 3 a
You can try something like this that uses word boundaries:
preg_match('~\b\d+(?: ?[a-z])?\b~', $txt, $m)
The letter is in an optional group with an optional space before. Even if there is no letter the last word boundary will match with the digit and what follows (space, comma, end of the string...).
Note: to avoid a number in the street name, you can try to anchor your pattern at the first comma in a lookahead, for example:
preg_match('~\b\d+(?: ?[a-z])?\b(?= [^\s]*,)~', $txt, $m)
I let you to improve this subpattern with your cases.
<?php
$s1 = 'Wagnerstrasse 3 platz53,eingang 3,Zi.3005';
$s2 = 'Wagnerstrasse 3a platz53,eingang 3,Zi.3005';
$s3 = 'Wagnerstrasse 3A platz53,eingang 3,Zi.3005';
$s4 = 'Wagnerstrasse 3 a platz53,eingang 3,Zi.3005';
$s5 = 'Wagnerstrasse 3 A platz53,eingang 3,Zi.3005';
//test all $s
preg_match('#^(.+? [0-9]* *[A-z]?)[^A-z]#', $s1, $m);
//if you want only the street number
//preg_match('#^.+? ([0-9]* *[A-z]?)[^A-z]#', $s1, $m);
echo $m[1];
?>
After doing some more research and hours of checking addresses (so many addresses) on the topic I found a solution which, until now, didn't fail. Might be that I didn't realize it, but it seems to be quite good. And it's a regex one has not seen before... The regex fails if there are no numbers in the line. So I did some hacking (mention the millions of nines...)
Basically the regex is excellent for finding numbers at the end and preserves numbers in the middle of the text but fails for above mentionend fact and if the street starts with a number. So I did just another little hack and explode the first number to the back and catch it as number.
if ($this->startsWithNumber($data))
{
$tmp = explode(' ', $data);
$data = trim(str_replace($tmp[0], '', $data)) . ' ' . $tmp[0];
}
if (!preg_match('/[0-9]/',$data))
{
$data .= ' 99999999999999999999999999999999999999999999999999999999999999999999999';
}
$data = preg_replace("/[^ \w]+/",'',$data);
$pcre = '/\A\s*
(.*?) # street
\s*
\x2f? # slash
(
\pN+\s*[a-zA-Z]? # number + letter
(?:\s*[-\x2f\pP]\s*\pN+\s*[a-zA-Z]?)* # cut
) # number
\s*\z/ux';
preg_match($regex, $data, $h);
$compare = strpos($h[2],'999999999999999999999999999999999999999999999999999999999999999999999999');
if ($compare !== false) {
$h[2] = null;
}
$this->receiverStreet[] = (isset($h[1])) ? $h[1] : null;
$this->receiverHouseNo[] = (isset($h[2])) ? $h[2] : null;
public function startsWithNumber($str)
{
return preg_match('/^\d/', $str) === 1;
}
Related
How would I go about splitting the word:
oneTwoThreeFour
into an array so that I can get:
one Two Three Four
with preg_match ?
I tired this but it just gives the whole word
$words = preg_match("/[a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/", $string, $matches)`;
You can use preg_split as:
$arr = preg_split('/(?=[A-Z])/',$str);
See it
I'm basically splitting the input string just before the uppercase letter. The regex used (?=[A-Z]) matches the point just before a uppercase letter.
You can also use preg_match_all as:
preg_match_all('/((?:^|[A-Z])[a-z]+)/',$str,$matches);
Explanation:
( - Start of capturing parenthesis.
(?: - Start of non-capturing parenthesis.
^ - Start anchor.
| - Alternation.
[A-Z] - Any one capital letter.
) - End of non-capturing parenthesis.
[a-z]+ - one ore more lowercase letter.
) - End of capturing parenthesis.
I know that this is an old question with an accepted answer, but IMHO there is a better solution:
<?php // test.php Rev:20140412_0800
$ccWord = 'NewNASAModule';
$re = '/(?#! splitCamelCase Rev:20140412)
# Split camelCase "words". Two global alternatives. Either g1of2:
(?<=[a-z]) # Position is after a lowercase,
(?=[A-Z]) # and before an uppercase letter.
| (?<=[A-Z]) # Or g2of2; Position is after uppercase,
(?=[A-Z][a-z]) # and before upper-then-lower case.
/x';
$a = preg_split($re, $ccWord);
$count = count($a);
for ($i = 0; $i < $count; ++$i) {
printf("Word %d of %d = \"%s\"\n",
$i + 1, $count, $a[$i]);
}
?>
Note that this regex, (like codaddict's '/(?=[A-Z])/' solution - which works like a charm for well formed camelCase words), matches only a position within the string and consumes no text at all. This solution has the additional benefit that it also works correctly for not-so-well-formed pseudo-camelcase words such as: StartsWithCap and: hasConsecutiveCAPS.
Input:
oneTwoThreeFour
StartsWithCap
hasConsecutiveCAPS
NewNASAModule
Output:
Word 1 of 4 = "one"
Word 2 of 4 = "Two"
Word 3 of 4 = "Three"
Word 4 of 4 = "Four"
Word 1 of 3 = "Starts"
Word 2 of 3 = "With"
Word 3 of 3 = "Cap"
Word 1 of 3 = "has"
Word 2 of 3 = "Consecutive"
Word 3 of 3 = "CAPS"
Word 1 of 3 = "New"
Word 2 of 3 = "NASA"
Word 3 of 3 = "Module"
Edited: 2014-04-12: Modified regex, script and test data to correctly split: "NewNASAModule" case (in response to rr's comment).
While ridgerunner's answer works great, it seems not to work with all-caps substrings that appear in the middle of sentence. I use following and it seems to deal with these just alright:
function splitCamelCase($input)
{
return preg_split(
'/(^[^A-Z]+|[A-Z][^A-Z]+)/',
$input,
-1, /* no limit for replacement count */
PREG_SPLIT_NO_EMPTY /*don't return empty elements*/
| PREG_SPLIT_DELIM_CAPTURE /*don't strip anything from output array*/
);
}
Some test cases:
assert(splitCamelCase('lowHigh') == ['low', 'High']);
assert(splitCamelCase('WarriorPrincess') == ['Warrior', 'Princess']);
assert(splitCamelCase('SupportSEELE') == ['Support', 'SEELE']);
assert(splitCamelCase('LaunchFLEIAModule') == ['Launch', 'FLEIA', 'Module']);
assert(splitCamelCase('anotherNASATrip') == ['another', 'NASA', 'Trip']);
A functionized version of #ridgerunner's answer.
/**
* Converts camelCase string to have spaces between each.
* #param $camelCaseString
* #return string
*/
function fromCamelCase($camelCaseString) {
$re = '/(?<=[a-z])(?=[A-Z])/x';
$a = preg_split($re, $camelCaseString);
return join($a, " " );
}
$string = preg_replace( '/([a-z0-9])([A-Z])/', "$1 $2", $string );
The trick is a repeatable pattern $1 $2$1 $2 or lower UPPERlower UPPERlower etc....
for example
helloWorld = $1 matches "hello", $2 matches "W" and $1 matches "orld" again so in short you get $1 $2$1 or "hello World", matches HelloWorld as $2$1 $2$1 or again "Hello World". Then you can lower case them uppercase the first word or explode them on the space, or use a _ or some other character to keep them separate.
Short and simple.
When determining the best pattern for your project, you will need to consider the following pattern factors:
Accuracy (Robustness) -- whether the pattern is correct in all cases and is reasonably future-proof
Efficiency -- the pattern should be direct, deliberate, and avoid unnecessary labor
Brevity -- the pattern should use appropriate techniques to avoid unnecessary character length
Readability -- the pattern should be keep as simple as possible
The above factors also happen to be in the hierarchical order that strive to obey. In other words, it doesn't make much sense to me to prioritize 2, 3, or 4 when 1 doesn't quite satisfy the requirements. Readability is at the bottom of the list for me because in most cases I can follow the syntax.
Capture Groups and Lookarounds often impact pattern efficiency. The truth is, unless you are executing this regex on thousands of input strings, there is no need to toil over efficiency. It is perhaps more important to focus on pattern readability which can be associated with pattern brevity.
Some patterns below will require some additional handling/flagging by their preg_ function, but here are some pattern comparisons based on the OP's sample input:
preg_split() patterns:
/^[^A-Z]+\K|[A-Z][^A-Z]+\K/ (21 steps)
/(^[^A-Z]+|[A-Z][^A-Z]+)/ (26 steps)
/[^A-Z]+\K(?=[A-Z])/ (43 steps)
/(?=[A-Z])/ (50 steps)
/(?=[A-Z]+)/ (50 steps)
/([a-z]{1})[A-Z]{1}/ (53 steps)
/([a-z0-9])([A-Z])/ (68 steps)
/(?<=[a-z])(?=[A-Z])/x (94 steps) ...for the record, the x is useless.
/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/ (134 steps)
preg_match_all() patterns:
/[A-Z]?[a-z]+/ (14 steps)
/((?:^|[A-Z])[a-z]+)/ (35 steps)
I'll point out that there is a subtle difference between the output of preg_match_all() and preg_split(). preg_match_all() will output a 2-dimensional array, in other words, all of the fullstring matches will be in the [0] subarray; if there is a capture group used, those substrings will be in the [1] subarray. On the other hand, preg_split() only outputs a 1-dimensional array and therefore provides a less bloated and more direct path to the desired output.
Some of the patterns are insufficient when dealing with camelCase strings that contain an ALLCAPS/acronym substring in them. If this is a fringe case that is possible within your project, it is logical to only consider patterns that handle these cases correctly. I will not be testing TitleCase input strings because that is creeping too far from the question.
New Extended Battery of Test Strings:
oneTwoThreeFour
hasConsecutiveCAPS
newNASAModule
USAIsGreatAgain
Suitable preg_split() patterns:
/[a-z]+\K|(?=[A-Z][a-z]+)/ (149 steps) *I had to use [a-z] for the demo to count properly
/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/ (547 steps)
Suitable preg_match_all() pattern:
/[A-Z]?[a-z]+|[A-Z]+(?=[A-Z][a-z]|$)/ (75 steps)
Finally, my recommendations based on my pattern principles / factor hierarchy. Also, I recommend preg_split() over preg_match_all() (despite the patterns having less steps) as a matter of directness to the desired output structure. (of course, choose whatever you like)
Code: (Demo)
$noAcronyms = 'oneTwoThreeFour';
var_export(preg_split('~^[^A-Z]+\K|[A-Z][^A-Z]+\K~', $noAcronyms, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_match_all('~[A-Z]?[^A-Z]+~', $noAcronyms, $out) ? $out[0] : []);
Code: (Demo)
$withAcronyms = 'newNASAModule';
var_export(preg_split('~[^A-Z]+\K|(?=[A-Z][^A-Z]+)~', $withAcronyms, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_match_all('~[A-Z]?[^A-Z]+|[A-Z]+(?=[A-Z][^A-Z]|$)~', $withAcronyms, $out) ? $out[0] : []);
I took cool guy Ridgerunner's code (above) and made it into a function:
echo deliciousCamelcase('NewNASAModule');
function deliciousCamelcase($str)
{
$formattedStr = '';
$re = '/
(?<=[a-z])
(?=[A-Z])
| (?<=[A-Z])
(?=[A-Z][a-z])
/x';
$a = preg_split($re, $str);
$formattedStr = implode(' ', $a);
return $formattedStr;
}
This will return: New NASA Module
Another option is matching /[A-Z]?[a-z]+/ - if you know your input is on the right format, it should work nicely.
[A-Z]? would match an uppercase letter (or nothing). [a-z]+ would then match all following lowercase letters, until the next match.
Working example: https://regex101.com/r/kNZfEI/1
You can split on a "glide" from lowercase to uppercase thus:
$parts = preg_split('/([a-z]{1})[A-Z]{1}/', $string, -1, PREG_SPLIT_DELIM_CAPTURE);
//PREG_SPLIT_DELIM_CAPTURE to also return bracketed things
var_dump($parts);
Annoyingly you will then have to rebuild the words from each corresponding pair of items in $parts
Hope this helps
First of all codaddict thank you for your pattern, it helped a lot!
I needed a solution that works in case a preposition 'a' exists:
e.g. thisIsACamelcaseSentence.
I found the solution in doing a two step preg_match and made a function with some options:
/*
* input: 'thisIsACamelCaseSentence' output: 'This Is A Camel Case Sentence'
* options $case: 'allUppercase'[default] >> 'This Is A Camel Case Sentence'
* 'allLowerCase' >> 'this is a camel case sentence'
* 'firstUpperCase' >> 'This is a camel case sentence'
* #return: string
*/
function camelCaseToWords($string, $case = null){
isset($case) ? $case = $case : $case = 'allUpperCase';
// Find first occurances of two capitals
preg_match_all('/((?:^|[A-Z])[A-Z]{1})/',$string, $twoCapitals);
// Split them with the 'zzzzzz' string. e.g. 'AZ' turns into 'AzzzzzzZ'
foreach($twoCapitals[0] as $match){
$firstCapital = $match[0];
$lastCapital = $match[1];
$temp = $firstCapital.'zzzzzz'.$lastCapital;
$string = str_replace($match, $temp, $string);
}
// Now split words
preg_match_all('/((?:^|[A-Z])[a-z]+)/', $string, $words);
$output = "";
$i = 0;
foreach($words[0] as $word){
switch($case){
case 'allUpperCase':
$word = ucfirst($word);
break;
case 'allLowerCase':
$word = strtolower($word);
break;
case 'firstUpperCase':
($i == 0) ? $word = ucfirst($word) : $word = strtolower($word);
break;
}
// remove te 'zzzzzz' from a word if it has
$word = str_replace('zzzzzz','', $word);
$output .= $word." ";
$i++;
}
return $output;
}
Feel free to use it, and in case there is an 'easier' way to do this in one step please comment!
Full function based on #codaddict answer:
function splitCamelCase($str) {
$splitCamelArray = preg_split('/(?=[A-Z])/', $str);
return ucwords(implode($splitCamelArray, ' '));
}
I have a string:
3 pk. Ready-Dough White Loaves Included $3.99 - 47500 - 00892, 48101
I want to keep only groups of digits longer than 5 characters, and if possible, any dashes or commas between them.
e.g.
47500-00892,48101
My first step was to strip out groups of digits < 4:
preg_replace('/\d{1,4}/', '', $string);
My thinking was "replace any block of digits from 1 to 4 with nothing", but that doesn't do exactly what I thought. Maybe I'm just missing an operator?
Then I was going to strip out all letters and all punctuation except , and -. In my example I would've been left with a starting - because of it being in a string, but a trim() would've been fine to clean that up.
Any help is appreciated!
Had I been patient for 5 more minutes, I would've found the answer: \b
For some reason, working with digits didn't trigger that I needed to use 'word boundaries'.
$string = preg_replace('/\b\d{1,4}\b/', '', $string);
$string = preg_replace('/[^0-9-,]/', '', $string);
$string = trim($string, ',-');
Since there's no reason to perform a replacement, you can use preg_match_all to take what you want and reduce the result array:
$re = '/\d{5,}(?:(?=\s*([-,])\s*\d{5}))?/';
$str = '3 pk. Ready-Dough White Loaves Included $3.99 - 47500 - 00892, 48101';
if ( preg_match_all($re, $str, $matches, PREG_SET_ORDER) ) {
$result = array_reduce($matches, function ($c,$i) { return $c . implode('', $i); });
echo $result;
}
My goal is getting something like that: 150.000,54 or 48.876,05 which means my commas are decimal starters.
Here's my code so far :
<?php
//cut numbers after comma if there are any, after 2 digits
$matchPattern = '/[0-9]+(?:\,[0-9]{2}){0,2}/';
//remove everything except numbers, commas and dots
$repl1 = preg_replace("/[^a-zA-Z0-9,.]/", "", $input);
//let there be a 0 before comma to have values like 0,75
$repl2 = preg_replace("/^[0]{1}$/", "",$repl1);
//now i need you here to help me for the expression putting dots after each 3 numbers, until the comma:
$repl3 = preg_replace("/regexphere$/", ".", $repl2);
preg_match($matchPattern, $repl3, $matches);
echo($matches[0]);
?>
I know preg_replacing 3 times is stupid but I am not good at writing regular expressions. If you have a better idea, don't just share it but also explain. I know a little of the types : http://regexone.com/lesson/0
Thank you in advance.
--------UPDATE--------
So I need to handle 0000,45 like inputs to 0,45 and like 010101,84 inputs to 1,84
When this is done, I'm done.
$input = Input::get('userinput');
$repl1 = preg_replace("/[^0-9,.]/", "", $input);
$repl2 = preg_replace("/^0/", "",$repl1);
$repl3 = str_replace(".","",$repl2);
preg_match('/[0-9]+(?:\,[0-9]{2}){0,2}/', $repl3, $matches);
$repl4 = preg_replace('/(\d)(?=(\d{3})+(?!\d))/', '$1.', $matches[0]);
return repl4;
----UPDATE----
Here's what i get so far : https://ideone.com/5qmslB
I just need to remove the zeroes before the comma, before the numbers.
I am not sure this is the best way, but I hope it is helpful.
Here is the updated code that I used with a fake $input:
<?php
$input = "textmdwrhfejhg../,2222233333,34erw.re.ty";
//cut numbers after comma if there are any, after 2 digits
$matchPattern = '/[0-9]+(?:\,[0-9]{2}){0,2}/';
//remove everything except numbers, commas and dots
$repl1 = trim(preg_replace("/[^0-9,.]/", "", $input), ".,");
echo "$repl1" . "\n";
//let there be a 0 before comma to have values like 0,75, remove the 0
$repl2 = preg_replace("/^0/", "",$repl1);
echo "$repl2" . "\n";
//The expression putting dots after each 3 numbers, until the comma:
$repl3 = preg_replace('/(\d)(?=(?:\d{3})+(?!\d))/', '$1.', $repl2);
echo "$repl3" . "\n";
The expression putting dots after each 3 numbers is
(\d)(?=(?:\d{3})+(?!\d))
Here, you can see how it works. In plain human,
(\d) - A capturing group that we'll use in the replacement pattern, matching a single digit that....
(?=(?:\d{3})+(?!\d)) - is followed by groups of 3 digits. External (?=...) is a look-ahead construction that checks but does not consume characters, (?:\d{3})+ is a non-capturing group (no need to keep the matched text in memory) that matches 3 digits exactly (due to the limiting quantifier {3}) 1 or more times (due to the + quantifier), and (?!\d) is a negative look-ahead checking that the next character after the last matched 3-digit group is not a digit.
This does not work in case we have more than 3 digits after a decimal separator. With regex, I can only think of a way to support 4 digits after decimal with (?<!,)(\d)(?=(?:\d{3})+(?!\d)). Not sure if there is a generic way without variable-width look-behind in PHP (as here, we also need a variable-width look-ahead, too). Thus, you might consider splitting the $repl2 value by comma, and only pass the first part to the regex. Then, combine. Something like this:
$spl = split(',', $repl2); // $repl2 is 1234,123456
$repl3 = preg_replace('/(\d)(?=(?:\d{3})+(?!\d))/', '$1.', $spl[0]);
$repl3 .= "," . $spl[1]; // "1.234" + "," + "123456"
echo "$repl3" . "\n"; // 1.234,123456
Update:
The final code I have come up with:
$input = "textmdwrhfejhg../0005456,2222233333,34erw.re.ty";
//Here's mine :
$repl1 = trim(preg_replace("/[^0-9,.]/", "", $input), '.,');
//following line just removes one zero, i want it to remove all chars like
//Input : 000549,569 Output : 549,569
echo "$repl1\n";
$repl2 = preg_replace("/^0+(?!,)/", "",$repl1);
$repl3 = str_replace(".","",$repl2);
preg_match('/[0-9]+(?:\,[0-9]{2}){0,2}/', $repl3, $matches);
$repl4 = preg_replace('/(\d)(?=(\d{3})+(?!\d))/', '$1.', $matches[0]);
echo $repl4;
I would like to delete words with numbers (reference) or small words (2 characters or less) into my product name but I can't find the good regex.
Some examples:
"Chaine anti-rebond ECS-2035" should become "Chaine anti-rebond"
"Guide 35 cm Oregon Intenz" should become "Guide Oregon Intenz"
"Tronçonneuse sans fil AKE 30 LI - Guide 30 cm 36 V" should become "Tronçonneuse sans fil AKE - Guide"
I'm doing this in PHP:
preg_replace('#([^A-Za-z-]+)#', ' ',' '.wd_remove_accents($modele).' ');
You don't need to do everything in RegExp you know:
<?php
$str = "Chaine anti-rebond ECS-2035 cm 30 v";
$result = array();
$split = explode(" ", $str); //Split to an array
foreach ($split as $word) {
if ((strlen($word) <= 2) || (preg_match("|\d|", $word))) { //If word is <= 2 char long, or contains a digit
continue; //Continue to next iteration immediately
}
$result[] = $word; //Add word to result array (would only happen if the above condition was false)
}
$result = implode(" ", $result); //Implode result back to string
echo $result;
For word based string manipulation, parsing the string itself, conditioning exactly what you want on a word basis, is often much better than a string-level RegExp.
To deal with unicode characters like in tronçonneuse you could use:
/\b(?:[\pL-]+\pN+|\pN+[\pL-]+|\pN+|\pL{1,2})\b/
where \pL stands for any letter and \pN stands for any digit.
Your requirements aren't specific enough for a final answer, but this would do it for your example:
$subject = 'Tronçonneuse sans fil AKE 30 LI - Guide 30 cm 36 V';
$regex = '/(\\s+\\w{1,2}(?=\\W+))|(\\s+[a-zA-Z0-9_-]+\\d+)/';
$result = preg_replace($regex, '', $subject);
Well, for the combinations in your example the following regex would do:
/\b(?:[-A-Za-z]+[0-9]+|[0-9]+[-A-Za-z]+|\d{1,2}|[A-Za-z]{1,2})\b/
Then just replace the match with an empty string.
However, it doesn't allow for strings like aaa897bbb - just aaa786 or 876aaa (and an optional dash).
I don't know what it is that you require - you would have to specify the rules in more detail before the regex can be refined.
Use preg_replace_callback and filter in the callback function http://www.php.net/manual/en/function.preg-replace-callback.php
This will work for all 3 test strings:
<?php
$str = "Tronçonneuse sans fil AKE 30 LI - Guide 30 cm 36 V";
function filter_cb($matches)
{
$word = trim($matches[0]);
if ($word !== '-' && (strlen($word) <= 2 || (preg_match("/\d/", $word)))) {
return '';
}
return $matches[0];
}
$result = preg_replace_callback('/([\p{L}\p{N}-]+\s*)/u', "filter_cb", $str);
echo trim($result);
I have data in this format coming from a database...
BUS 101S Business and Society
or
BUS 101 Business and Society
Notice the optional "S" character (which can be any uppercase character)
I need to replace the "BUS 101S" part with null and here is what I have come up with...
$value = "BUS 101S Business and Society";
$sub = substr($value, 0, 3); // Gives me "BUS"
$num = substr($value, 4, 3); // Gives me "101"
$new_value = preg_replace("/$sub $num"."[A-Z]?/", null, $value);
The value of $new_value now contains S Business and Society. So I'm close, Just need it to replace the optional single uppercase character as well. Any ideas?
Assuming the pattern is 3 uppercase letters, 3 numbers and then an optional uppercase letter, just use a single preg_match:
$new = preg_replace('/^[A-Z]{3} \d{3}[A-Z]?/', '', $old);
The ^ will only match at the beginning of a line/string. The {3} means "match the preceding token 3 times exactly". The ? means "match the preceding token zero or one times"
You can also do something like this, so you don't bother with substr:
preg_replace('#^[A-Z]{3} [0-9]{3}[A-Z]? (.*)$#', '$1', $value);
Or using preg_match, to get all the components of the string
if (preg_match('#^([A-Z]{3}) ([0-9]{3})([A-Z]?) (.*)$#', $value, $matches)) {
$firstMatch=$matches[1];//BUS ($matches[0] is the whole string)
$secondMatch=$matches[2];//101
$secondMatch=$matches[3];//S or ''
$secondMatch=$matches[4];//the rest of the text
}
Wouldn't it just be easier to do something like:
$str = 'BUS 101S Business and Society';
$words = explode(' ', $str);
array_unshift($words); // bus
array_unshift($words); // 101s
$str = implode(' ', $words);