How would I go about splitting the word:
oneTwoThreeFour
into an array so that I can get:
one Two Three Four
with preg_match ?
I tired this but it just gives the whole word
$words = preg_match("/[a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/", $string, $matches)`;
You can use preg_split as:
$arr = preg_split('/(?=[A-Z])/',$str);
See it
I'm basically splitting the input string just before the uppercase letter. The regex used (?=[A-Z]) matches the point just before a uppercase letter.
You can also use preg_match_all as:
preg_match_all('/((?:^|[A-Z])[a-z]+)/',$str,$matches);
Explanation:
( - Start of capturing parenthesis.
(?: - Start of non-capturing parenthesis.
^ - Start anchor.
| - Alternation.
[A-Z] - Any one capital letter.
) - End of non-capturing parenthesis.
[a-z]+ - one ore more lowercase letter.
) - End of capturing parenthesis.
I know that this is an old question with an accepted answer, but IMHO there is a better solution:
<?php // test.php Rev:20140412_0800
$ccWord = 'NewNASAModule';
$re = '/(?#! splitCamelCase Rev:20140412)
# Split camelCase "words". Two global alternatives. Either g1of2:
(?<=[a-z]) # Position is after a lowercase,
(?=[A-Z]) # and before an uppercase letter.
| (?<=[A-Z]) # Or g2of2; Position is after uppercase,
(?=[A-Z][a-z]) # and before upper-then-lower case.
/x';
$a = preg_split($re, $ccWord);
$count = count($a);
for ($i = 0; $i < $count; ++$i) {
printf("Word %d of %d = \"%s\"\n",
$i + 1, $count, $a[$i]);
}
?>
Note that this regex, (like codaddict's '/(?=[A-Z])/' solution - which works like a charm for well formed camelCase words), matches only a position within the string and consumes no text at all. This solution has the additional benefit that it also works correctly for not-so-well-formed pseudo-camelcase words such as: StartsWithCap and: hasConsecutiveCAPS.
Input:
oneTwoThreeFour
StartsWithCap
hasConsecutiveCAPS
NewNASAModule
Output:
Word 1 of 4 = "one"
Word 2 of 4 = "Two"
Word 3 of 4 = "Three"
Word 4 of 4 = "Four"
Word 1 of 3 = "Starts"
Word 2 of 3 = "With"
Word 3 of 3 = "Cap"
Word 1 of 3 = "has"
Word 2 of 3 = "Consecutive"
Word 3 of 3 = "CAPS"
Word 1 of 3 = "New"
Word 2 of 3 = "NASA"
Word 3 of 3 = "Module"
Edited: 2014-04-12: Modified regex, script and test data to correctly split: "NewNASAModule" case (in response to rr's comment).
While ridgerunner's answer works great, it seems not to work with all-caps substrings that appear in the middle of sentence. I use following and it seems to deal with these just alright:
function splitCamelCase($input)
{
return preg_split(
'/(^[^A-Z]+|[A-Z][^A-Z]+)/',
$input,
-1, /* no limit for replacement count */
PREG_SPLIT_NO_EMPTY /*don't return empty elements*/
| PREG_SPLIT_DELIM_CAPTURE /*don't strip anything from output array*/
);
}
Some test cases:
assert(splitCamelCase('lowHigh') == ['low', 'High']);
assert(splitCamelCase('WarriorPrincess') == ['Warrior', 'Princess']);
assert(splitCamelCase('SupportSEELE') == ['Support', 'SEELE']);
assert(splitCamelCase('LaunchFLEIAModule') == ['Launch', 'FLEIA', 'Module']);
assert(splitCamelCase('anotherNASATrip') == ['another', 'NASA', 'Trip']);
A functionized version of #ridgerunner's answer.
/**
* Converts camelCase string to have spaces between each.
* #param $camelCaseString
* #return string
*/
function fromCamelCase($camelCaseString) {
$re = '/(?<=[a-z])(?=[A-Z])/x';
$a = preg_split($re, $camelCaseString);
return join($a, " " );
}
$string = preg_replace( '/([a-z0-9])([A-Z])/', "$1 $2", $string );
The trick is a repeatable pattern $1 $2$1 $2 or lower UPPERlower UPPERlower etc....
for example
helloWorld = $1 matches "hello", $2 matches "W" and $1 matches "orld" again so in short you get $1 $2$1 or "hello World", matches HelloWorld as $2$1 $2$1 or again "Hello World". Then you can lower case them uppercase the first word or explode them on the space, or use a _ or some other character to keep them separate.
Short and simple.
When determining the best pattern for your project, you will need to consider the following pattern factors:
Accuracy (Robustness) -- whether the pattern is correct in all cases and is reasonably future-proof
Efficiency -- the pattern should be direct, deliberate, and avoid unnecessary labor
Brevity -- the pattern should use appropriate techniques to avoid unnecessary character length
Readability -- the pattern should be keep as simple as possible
The above factors also happen to be in the hierarchical order that strive to obey. In other words, it doesn't make much sense to me to prioritize 2, 3, or 4 when 1 doesn't quite satisfy the requirements. Readability is at the bottom of the list for me because in most cases I can follow the syntax.
Capture Groups and Lookarounds often impact pattern efficiency. The truth is, unless you are executing this regex on thousands of input strings, there is no need to toil over efficiency. It is perhaps more important to focus on pattern readability which can be associated with pattern brevity.
Some patterns below will require some additional handling/flagging by their preg_ function, but here are some pattern comparisons based on the OP's sample input:
preg_split() patterns:
/^[^A-Z]+\K|[A-Z][^A-Z]+\K/ (21 steps)
/(^[^A-Z]+|[A-Z][^A-Z]+)/ (26 steps)
/[^A-Z]+\K(?=[A-Z])/ (43 steps)
/(?=[A-Z])/ (50 steps)
/(?=[A-Z]+)/ (50 steps)
/([a-z]{1})[A-Z]{1}/ (53 steps)
/([a-z0-9])([A-Z])/ (68 steps)
/(?<=[a-z])(?=[A-Z])/x (94 steps) ...for the record, the x is useless.
/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/ (134 steps)
preg_match_all() patterns:
/[A-Z]?[a-z]+/ (14 steps)
/((?:^|[A-Z])[a-z]+)/ (35 steps)
I'll point out that there is a subtle difference between the output of preg_match_all() and preg_split(). preg_match_all() will output a 2-dimensional array, in other words, all of the fullstring matches will be in the [0] subarray; if there is a capture group used, those substrings will be in the [1] subarray. On the other hand, preg_split() only outputs a 1-dimensional array and therefore provides a less bloated and more direct path to the desired output.
Some of the patterns are insufficient when dealing with camelCase strings that contain an ALLCAPS/acronym substring in them. If this is a fringe case that is possible within your project, it is logical to only consider patterns that handle these cases correctly. I will not be testing TitleCase input strings because that is creeping too far from the question.
New Extended Battery of Test Strings:
oneTwoThreeFour
hasConsecutiveCAPS
newNASAModule
USAIsGreatAgain
Suitable preg_split() patterns:
/[a-z]+\K|(?=[A-Z][a-z]+)/ (149 steps) *I had to use [a-z] for the demo to count properly
/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/ (547 steps)
Suitable preg_match_all() pattern:
/[A-Z]?[a-z]+|[A-Z]+(?=[A-Z][a-z]|$)/ (75 steps)
Finally, my recommendations based on my pattern principles / factor hierarchy. Also, I recommend preg_split() over preg_match_all() (despite the patterns having less steps) as a matter of directness to the desired output structure. (of course, choose whatever you like)
Code: (Demo)
$noAcronyms = 'oneTwoThreeFour';
var_export(preg_split('~^[^A-Z]+\K|[A-Z][^A-Z]+\K~', $noAcronyms, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_match_all('~[A-Z]?[^A-Z]+~', $noAcronyms, $out) ? $out[0] : []);
Code: (Demo)
$withAcronyms = 'newNASAModule';
var_export(preg_split('~[^A-Z]+\K|(?=[A-Z][^A-Z]+)~', $withAcronyms, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_match_all('~[A-Z]?[^A-Z]+|[A-Z]+(?=[A-Z][^A-Z]|$)~', $withAcronyms, $out) ? $out[0] : []);
I took cool guy Ridgerunner's code (above) and made it into a function:
echo deliciousCamelcase('NewNASAModule');
function deliciousCamelcase($str)
{
$formattedStr = '';
$re = '/
(?<=[a-z])
(?=[A-Z])
| (?<=[A-Z])
(?=[A-Z][a-z])
/x';
$a = preg_split($re, $str);
$formattedStr = implode(' ', $a);
return $formattedStr;
}
This will return: New NASA Module
Another option is matching /[A-Z]?[a-z]+/ - if you know your input is on the right format, it should work nicely.
[A-Z]? would match an uppercase letter (or nothing). [a-z]+ would then match all following lowercase letters, until the next match.
Working example: https://regex101.com/r/kNZfEI/1
You can split on a "glide" from lowercase to uppercase thus:
$parts = preg_split('/([a-z]{1})[A-Z]{1}/', $string, -1, PREG_SPLIT_DELIM_CAPTURE);
//PREG_SPLIT_DELIM_CAPTURE to also return bracketed things
var_dump($parts);
Annoyingly you will then have to rebuild the words from each corresponding pair of items in $parts
Hope this helps
First of all codaddict thank you for your pattern, it helped a lot!
I needed a solution that works in case a preposition 'a' exists:
e.g. thisIsACamelcaseSentence.
I found the solution in doing a two step preg_match and made a function with some options:
/*
* input: 'thisIsACamelCaseSentence' output: 'This Is A Camel Case Sentence'
* options $case: 'allUppercase'[default] >> 'This Is A Camel Case Sentence'
* 'allLowerCase' >> 'this is a camel case sentence'
* 'firstUpperCase' >> 'This is a camel case sentence'
* #return: string
*/
function camelCaseToWords($string, $case = null){
isset($case) ? $case = $case : $case = 'allUpperCase';
// Find first occurances of two capitals
preg_match_all('/((?:^|[A-Z])[A-Z]{1})/',$string, $twoCapitals);
// Split them with the 'zzzzzz' string. e.g. 'AZ' turns into 'AzzzzzzZ'
foreach($twoCapitals[0] as $match){
$firstCapital = $match[0];
$lastCapital = $match[1];
$temp = $firstCapital.'zzzzzz'.$lastCapital;
$string = str_replace($match, $temp, $string);
}
// Now split words
preg_match_all('/((?:^|[A-Z])[a-z]+)/', $string, $words);
$output = "";
$i = 0;
foreach($words[0] as $word){
switch($case){
case 'allUpperCase':
$word = ucfirst($word);
break;
case 'allLowerCase':
$word = strtolower($word);
break;
case 'firstUpperCase':
($i == 0) ? $word = ucfirst($word) : $word = strtolower($word);
break;
}
// remove te 'zzzzzz' from a word if it has
$word = str_replace('zzzzzz','', $word);
$output .= $word." ";
$i++;
}
return $output;
}
Feel free to use it, and in case there is an 'easier' way to do this in one step please comment!
Full function based on #codaddict answer:
function splitCamelCase($str) {
$splitCamelArray = preg_split('/(?=[A-Z])/', $str);
return ucwords(implode($splitCamelArray, ' '));
}
I have a string like "some words 12345cm some more words"
and I want to extract the 12345cm bit from that string. So I get the position of the first number:
$position_of_first_number = strcspn( "some words 12345cm some more words" , '0123456789' );
Then the position of the first space after $position_of_first_number
$position_of_space_after_numbers = strpos("some words 12345cm some more words", " ", $position_of_first_number);
Then I want to have a function which return the portion of the string between $position_of_first_number and $position_of_space_after_numbers.
How do I do it?
You can use the substr function. Note that it takes a starting position and a length, which you can calculate as the difference between the start and end positions.
Since you are looking for a pattern like blank-digits-letters-blank, I would recommend a regular expression using preg_match:
$s = "some words 12345cm some more words";
preg_match("/\s(?P<result>\d+[^\W\d_]+)\s/", $s, $matches);
echo $matches["result"];
12345cm
Explaining the pattern:
"/.../" limits the pattern in PHP
\s matches any whitespace character
(?P<name>...) names the following pattern
\d+ matches 1 or more digits
[^\W\d_]+ matches 1 or more Unicode-letters (i.e. any character that is not a non-alphanumeric character; see this answer)
i have the bellow string
$LINE = TCNU1573105 HDPE HTA108 155 155 000893520918 PAL990 25.2750 MT 28.9750 MT
and i want extract the PAL990 from the above string. actually extract PAL990 string or any string that has PAL followed by some digits Like PAL222 or PAL123
i tried many ways and could not get the result. i used,
substr ( $LINE, 77, 3)
but when the value in different position i get the wrong value.
You may use
$LINE = "TCNU1573105 HDPE HTA108 155 155 000893520918 PAL990 25.2750 MT 28.9750 MT";
if (preg_match('~\bPAL\d+\b~', $LINE, $res)) {
echo $res[0]; // => PAL990
}
See the PHP demo and this regex demo.
Details
\b - a word boundary
PAL - a PAL substring
\d+ - 1+ digits
\b - a word boundary.
The preg_match function will return the first match.
Note that in case your string contains similar strings in between hyphens/whitespace you will no longer be able to rely on word boundaries, use custom whitespace boundaries then, i.e.:
'~(?<!\S)PAL\d+(?!\S)~'
See this regex demo
EDIT
If you may have an optional whitespace between PAL and digits, you may use
preg_replace('~.*\b(PAL)\s?(\d+)\b.*~s', '$1$2', $LINE)
See this PHP demo and this regex demo.
Or, match the string you need with spaces, and then remove them:
if (preg_match('~\bPAL ?\d+\b~', $LINE, $res)) {
echo str_replace(" ", "", $res[0]);
}
See yet another PHP demo
Note that ? makes the preceding pattern optional (1 or 0 occurrences are matched).
$string = "123ABC1234 *$%^&abc.";
$newstr = preg_replace('/[^a-zA-Z\']/','',$string);
echo $newstr;
Output:ABCabc
I have a problem with a string to convert in number. I am not good with this elements !\d+!
I used that but the apporach is not correct.
Thank you.
preg_match_all('!\d+!', $product_price[$i], $matches);
$price_extracted = (float)implode('.', $matches[0]);
$item['normal_price'] = $price_extracted;
if ($item['normal_price'] > 800) ......
I have this result
1 299,99 $ (orginal) is converted in 1.2999 and must be 1299.99
549,99 $ (orginal) is converted in 549.99 and must be 549.99
44,99 $ (orginal) is converted in 44.99 and must be 44.99
The problem with your approach is, that you put the digits that are not separated by anything into an array.
This means that with the first string that you provided, where the thousand dollars is seperated by a whitespace is being registered as one of these matches.
preg_match_all('!\d+!', '1 299,99 $', $matches) -> returns an array as follows:
$matches[0] = 1
$matches[1] = 299
$matches[2] = 99
If you take my approach though and first replace all whitespaces by nothing and then split the numbers into the array...:
preg_match_all('!\d+!', preg_replace('/\s/', '', '1 299,99 $'), $matches) -> returns following array:
$matches[0] = 1299
$matches[1] = 99
after that you can still implode them:
$price_exctracted = (float)implode(".", $matches);
EDIT
A little explanation about preg_replace, preg_match_all and regex:
The regex '!\d+!' (I don't actually know why there would be '!' instead of '/' but if it works...) searches for digits (\d). The "+" refers to "one or more". So the line
preg_match_all('!\d+!', 'someString', $myArray)
could be translated into english as follows:
Find all occurances of digits, be it one or more,
and put these occurances separated into one index of $myArray.
The second regex used in my solution, '/\s/' , is used to search for whitespaces. The "preg_replace"-function is an easy "find and replace" function concluding in:
preg_replace('/\s/', '', 'someString')
translated to english:
Find all occurances of whitespaces and replace them with nothing in 'someString'
For reference:
preg_match_all
preg_replace
regex cheat sheet
Conditions can be checked on:
PHP Live Regex
This question already has answers here:
PHP substring extraction. Get the string before the first '/' or the whole string
(14 answers)
Closed 12 months ago.
I need to find a way in PHP to remove the last portions of 2 strings using regex's. This way once they are stripped of the extra characters I can find a match between them. Here is an example of the type of string data I am dealing with:
categories_widget-__i__
categories_widget-10
So I would like to remove:
-__i__ from the first string
-10 from the second string
Thanks in advance.
(.*)-
This simple regex can do your job if - is the splitting criteria
See demo.
http://regex101.com/r/rX0dM7/7
$str1 = "categories_widget-__i__";
$str2 = "categories_widget-10";
$arr1 = explode("-", $str1);
$arr2 = explode("-", $str2);
echo $arr1[0];
echo $arr2[0];
Is the last occurrence of a hyphen the only thing that's important? If so you don't need regex:
$firstPart = substr($str, 0, strrpos($str, '-'));
ยป example
You could try the below code to remove all the characters from - upto the last.
<?php
$text = <<<EOD
categories_widget-__i__
categories_widget-10
EOD;
echo preg_replace("~-.*$~m","",$text);
?>
Output:
categories_widget
categories_widget
- matches the literal - symbol. And .* matches any character following the - symbol upto the end of the line. $ denotes the end of a line. By replacing all the matched characters with an empty string would give you the desired output.