Why does this regular expression only capture one word? - php

I'm trying to learn Regular Expressions. I know the basics, and I'm not terrible at regex, I'm just no pro - hence I've got a question for you guys. If you know regex, I bet it'll be simple.
What I've got currently is this:
/(\w+)\s-{1}\s(\w+)\.{1}(\w{3,4})/
What I'm trying to do is create a little script for myself that tidies up my music collection by formatting all of the filenames. I know there's other stuff out there already but this is a learning experience for me. I already screwed up all the titles once by replacing things like "Hell Aint A Bad Place To Be" with "Hell Aint a Bad Place To Be". In my wisdom I somehow ended up with "Hell Aint a ad Place to be" (I was looking a A followed by a space and an uppercase character). Obviously that was a nightmare to fix and it had to be done manually. Needless to say I'm testing samples first now.
Anyway, the above regex is sort of a stage 1 of many. Eventually I want to build it up, but for now I just need to get the simple bits working.
In the end I'd like to turn:
"arctic Monkeys- a fake tales of a san francisco"
into
"Arctic Monkeys - A Fake Tales of a San Francisco"
I know I'll need lookbehind assertions to grab when you're after a '-', because if the first word is 'a', 'of' etc. which I'd normally lowercase, I need to uppercase them (the above is a bad example for this use case I know).
Any way of fixing the existing regular expression would be great, and and tips on where to look on my cheatsheet to finish the rest off would be great (I'm not looking for a fully-fledged answer, since I need to learn to do it myself, I just can't figure why w+ is only getting one word).

I believe there is a much simpler way of approaching this problem: split the string into words, based on a much simpler regex, and then apply whatever processing you want to those words. This will allow you to perform more complicated transformations on the text in a much cleaner way. Here's an example:
<?php
$song = "arctic Monkeys- a fake tales of a san francisco";
// Split on spaces or - (the - is still present
// because it's only a lookahead match)
$words = preg_split("/([\s]+|(?=-))/", $song);
/*
Output for print_r:
Array
(
[0] => arctic
[1] => Monkeys
[2] => -
[3] => a
[4] => fake
[5] => tales
[6] => of
[7] => a
[8] => san
[9] => francisco
)
*/
print_r($words);
$new_words = array();
foreach ($words as $k => $word) {
$new_words[] = processWord($word, $k, $words);
}
// This will output:
// Arctic Monkeys - A Fake Tales of a San Francisco
echo implode(' ', $new_words);
// You can add as many processing rules you want in here - in a very clean way
function processWord($word, $idx, $words) {
if ($words[$idx - 1] == '-') return ucfirst($word);
return strlen($word) > 2 ? ucfirst($word) : $word;
}
Here's an example of this code running: http://codepad.org/t6pc8WpR

I'm a little confused about what you're doing, but maybe this will help. Remember that + is 1 or more characters, * is 0 or more. So you probably want to do something like ([\s]*) to match spaces. You don't need to specify the {1} next to a single character.
So maybe something like this:
([\w\s]+)([\s]*)-([\s]*)([\w\s]+)\.([\w]{3,4})
I haven't tested this code, but I think you get the idea.

\w does not contain the blank. A working regex might be:
/^(.+?)\s*-\s*(.+)$/
Explanation:
^ - must start at the beginning of the string
(.+?) - match any character, be ungreedy
\s* - match any number whitespace that might exists (including none)
- - match character
\s* - any whitespace again
(.+) - remaining characters
$ - end of string
The transcoding would then happen in another replacing regex.

For the first part, \w doesn't match words, it matches word characters. It's equivalent to [A-Za-z0-9_].
Instead, try ([A-Za-z0-9_ ]+) as your first bit (has an extra space inside the match square brackets and removed the \s.

Here's what I have:
<?php
/**
* Formats a string into a title:
* * Pads all dashes with spaces.
* * Uppercase all words with 3 letters or more.
* * Uppercase first word and first words after dashes.
*
* #param $str
*
* #return string
*/
function format_title($str) {
//Remove all spaces before and after dashes.
//(These will return in the final product)
$str = preg_replace("/\s?-\s?/", "-", $str);
//Explode by dash.
$string_split_by_dash = explode("-", $str);
//For each sentence (separated by dashes)
foreach ($string_split_by_dash as &$sentence) {
//Uppercase all words.
$sentence = ucwords($sentence);
//Explode into words (by space)
$words = explode(" ", $sentence);
//For each word
foreach ($words as &$word) {
//If its length is smaller than 3
if (strlen($word) < 3) {
//Lowercase it.
$word = strtolower($word);
}
}
//Implode back into a sentence.
$sentence = implode(" ", $words);
//Uppercase the first word, regardless of length.
$sentence = ucfirst($sentence);
}
//Implode all sentances back by space-padded dash.
$str = implode(" - ", $string_split_by_dash);
return $str;
}
$str = "arctic Monkeys- a fake tales of a san francisco";
var_dump(format_title($str));
I'd argue it's more readable (and more documentable) than a regex. Probably more efficient too, (didn't check).

Related

php regex replace single capital with space capital [duplicate]

How would I go about splitting the word:
oneTwoThreeFour
into an array so that I can get:
one Two Three Four
with preg_match ?
I tired this but it just gives the whole word
$words = preg_match("/[a-zA-Z]*(?:[a-z][a-zA-Z]*[A-Z]|[A-Z][a-zA-Z]*[a-z])[a-zA-Z]*\b/", $string, $matches)`;
You can use preg_split as:
$arr = preg_split('/(?=[A-Z])/',$str);
See it
I'm basically splitting the input string just before the uppercase letter. The regex used (?=[A-Z]) matches the point just before a uppercase letter.
You can also use preg_match_all as:
preg_match_all('/((?:^|[A-Z])[a-z]+)/',$str,$matches);
Explanation:
( - Start of capturing parenthesis.
(?: - Start of non-capturing parenthesis.
^ - Start anchor.
| - Alternation.
[A-Z] - Any one capital letter.
) - End of non-capturing parenthesis.
[a-z]+ - one ore more lowercase letter.
) - End of capturing parenthesis.
I know that this is an old question with an accepted answer, but IMHO there is a better solution:
<?php // test.php Rev:20140412_0800
$ccWord = 'NewNASAModule';
$re = '/(?#! splitCamelCase Rev:20140412)
# Split camelCase "words". Two global alternatives. Either g1of2:
(?<=[a-z]) # Position is after a lowercase,
(?=[A-Z]) # and before an uppercase letter.
| (?<=[A-Z]) # Or g2of2; Position is after uppercase,
(?=[A-Z][a-z]) # and before upper-then-lower case.
/x';
$a = preg_split($re, $ccWord);
$count = count($a);
for ($i = 0; $i < $count; ++$i) {
printf("Word %d of %d = \"%s\"\n",
$i + 1, $count, $a[$i]);
}
?>
Note that this regex, (like codaddict's '/(?=[A-Z])/' solution - which works like a charm for well formed camelCase words), matches only a position within the string and consumes no text at all. This solution has the additional benefit that it also works correctly for not-so-well-formed pseudo-camelcase words such as: StartsWithCap and: hasConsecutiveCAPS.
Input:
oneTwoThreeFour
StartsWithCap
hasConsecutiveCAPS
NewNASAModule
Output:
Word 1 of 4 = "one"
Word 2 of 4 = "Two"
Word 3 of 4 = "Three"
Word 4 of 4 = "Four"
Word 1 of 3 = "Starts"
Word 2 of 3 = "With"
Word 3 of 3 = "Cap"
Word 1 of 3 = "has"
Word 2 of 3 = "Consecutive"
Word 3 of 3 = "CAPS"
Word 1 of 3 = "New"
Word 2 of 3 = "NASA"
Word 3 of 3 = "Module"
Edited: 2014-04-12: Modified regex, script and test data to correctly split: "NewNASAModule" case (in response to rr's comment).
While ridgerunner's answer works great, it seems not to work with all-caps substrings that appear in the middle of sentence. I use following and it seems to deal with these just alright:
function splitCamelCase($input)
{
return preg_split(
'/(^[^A-Z]+|[A-Z][^A-Z]+)/',
$input,
-1, /* no limit for replacement count */
PREG_SPLIT_NO_EMPTY /*don't return empty elements*/
| PREG_SPLIT_DELIM_CAPTURE /*don't strip anything from output array*/
);
}
Some test cases:
assert(splitCamelCase('lowHigh') == ['low', 'High']);
assert(splitCamelCase('WarriorPrincess') == ['Warrior', 'Princess']);
assert(splitCamelCase('SupportSEELE') == ['Support', 'SEELE']);
assert(splitCamelCase('LaunchFLEIAModule') == ['Launch', 'FLEIA', 'Module']);
assert(splitCamelCase('anotherNASATrip') == ['another', 'NASA', 'Trip']);
A functionized version of #ridgerunner's answer.
/**
* Converts camelCase string to have spaces between each.
* #param $camelCaseString
* #return string
*/
function fromCamelCase($camelCaseString) {
$re = '/(?<=[a-z])(?=[A-Z])/x';
$a = preg_split($re, $camelCaseString);
return join($a, " " );
}
$string = preg_replace( '/([a-z0-9])([A-Z])/', "$1 $2", $string );
The trick is a repeatable pattern $1 $2$1 $2 or lower UPPERlower UPPERlower etc....
for example
helloWorld = $1 matches "hello", $2 matches "W" and $1 matches "orld" again so in short you get $1 $2$1 or "hello World", matches HelloWorld as $2$1 $2$1 or again "Hello World". Then you can lower case them uppercase the first word or explode them on the space, or use a _ or some other character to keep them separate.
Short and simple.
When determining the best pattern for your project, you will need to consider the following pattern factors:
Accuracy (Robustness) -- whether the pattern is correct in all cases and is reasonably future-proof
Efficiency -- the pattern should be direct, deliberate, and avoid unnecessary labor
Brevity -- the pattern should use appropriate techniques to avoid unnecessary character length
Readability -- the pattern should be keep as simple as possible
The above factors also happen to be in the hierarchical order that strive to obey. In other words, it doesn't make much sense to me to prioritize 2, 3, or 4 when 1 doesn't quite satisfy the requirements. Readability is at the bottom of the list for me because in most cases I can follow the syntax.
Capture Groups and Lookarounds often impact pattern efficiency. The truth is, unless you are executing this regex on thousands of input strings, there is no need to toil over efficiency. It is perhaps more important to focus on pattern readability which can be associated with pattern brevity.
Some patterns below will require some additional handling/flagging by their preg_ function, but here are some pattern comparisons based on the OP's sample input:
preg_split() patterns:
/^[^A-Z]+\K|[A-Z][^A-Z]+\K/ (21 steps)
/(^[^A-Z]+|[A-Z][^A-Z]+)/ (26 steps)
/[^A-Z]+\K(?=[A-Z])/ (43 steps)
/(?=[A-Z])/ (50 steps)
/(?=[A-Z]+)/ (50 steps)
/([a-z]{1})[A-Z]{1}/ (53 steps)
/([a-z0-9])([A-Z])/ (68 steps)
/(?<=[a-z])(?=[A-Z])/x (94 steps) ...for the record, the x is useless.
/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/ (134 steps)
preg_match_all() patterns:
/[A-Z]?[a-z]+/ (14 steps)
/((?:^|[A-Z])[a-z]+)/ (35 steps)
I'll point out that there is a subtle difference between the output of preg_match_all() and preg_split(). preg_match_all() will output a 2-dimensional array, in other words, all of the fullstring matches will be in the [0] subarray; if there is a capture group used, those substrings will be in the [1] subarray. On the other hand, preg_split() only outputs a 1-dimensional array and therefore provides a less bloated and more direct path to the desired output.
Some of the patterns are insufficient when dealing with camelCase strings that contain an ALLCAPS/acronym substring in them. If this is a fringe case that is possible within your project, it is logical to only consider patterns that handle these cases correctly. I will not be testing TitleCase input strings because that is creeping too far from the question.
New Extended Battery of Test Strings:
oneTwoThreeFour
hasConsecutiveCAPS
newNASAModule
USAIsGreatAgain
Suitable preg_split() patterns:
/[a-z]+\K|(?=[A-Z][a-z]+)/ (149 steps) *I had to use [a-z] for the demo to count properly
/(?<=[a-z])(?=[A-Z])|(?<=[A-Z])(?=[A-Z][a-z])/ (547 steps)
Suitable preg_match_all() pattern:
/[A-Z]?[a-z]+|[A-Z]+(?=[A-Z][a-z]|$)/ (75 steps)
Finally, my recommendations based on my pattern principles / factor hierarchy. Also, I recommend preg_split() over preg_match_all() (despite the patterns having less steps) as a matter of directness to the desired output structure. (of course, choose whatever you like)
Code: (Demo)
$noAcronyms = 'oneTwoThreeFour';
var_export(preg_split('~^[^A-Z]+\K|[A-Z][^A-Z]+\K~', $noAcronyms, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_match_all('~[A-Z]?[^A-Z]+~', $noAcronyms, $out) ? $out[0] : []);
Code: (Demo)
$withAcronyms = 'newNASAModule';
var_export(preg_split('~[^A-Z]+\K|(?=[A-Z][^A-Z]+)~', $withAcronyms, 0, PREG_SPLIT_NO_EMPTY));
echo "\n---\n";
var_export(preg_match_all('~[A-Z]?[^A-Z]+|[A-Z]+(?=[A-Z][^A-Z]|$)~', $withAcronyms, $out) ? $out[0] : []);
I took cool guy Ridgerunner's code (above) and made it into a function:
echo deliciousCamelcase('NewNASAModule');
function deliciousCamelcase($str)
{
$formattedStr = '';
$re = '/
(?<=[a-z])
(?=[A-Z])
| (?<=[A-Z])
(?=[A-Z][a-z])
/x';
$a = preg_split($re, $str);
$formattedStr = implode(' ', $a);
return $formattedStr;
}
This will return: New NASA Module
Another option is matching /[A-Z]?[a-z]+/ - if you know your input is on the right format, it should work nicely.
[A-Z]? would match an uppercase letter (or nothing). [a-z]+ would then match all following lowercase letters, until the next match.
Working example: https://regex101.com/r/kNZfEI/1
You can split on a "glide" from lowercase to uppercase thus:
$parts = preg_split('/([a-z]{1})[A-Z]{1}/', $string, -1, PREG_SPLIT_DELIM_CAPTURE);
//PREG_SPLIT_DELIM_CAPTURE to also return bracketed things
var_dump($parts);
Annoyingly you will then have to rebuild the words from each corresponding pair of items in $parts
Hope this helps
First of all codaddict thank you for your pattern, it helped a lot!
I needed a solution that works in case a preposition 'a' exists:
e.g. thisIsACamelcaseSentence.
I found the solution in doing a two step preg_match and made a function with some options:
/*
* input: 'thisIsACamelCaseSentence' output: 'This Is A Camel Case Sentence'
* options $case: 'allUppercase'[default] >> 'This Is A Camel Case Sentence'
* 'allLowerCase' >> 'this is a camel case sentence'
* 'firstUpperCase' >> 'This is a camel case sentence'
* #return: string
*/
function camelCaseToWords($string, $case = null){
isset($case) ? $case = $case : $case = 'allUpperCase';
// Find first occurances of two capitals
preg_match_all('/((?:^|[A-Z])[A-Z]{1})/',$string, $twoCapitals);
// Split them with the 'zzzzzz' string. e.g. 'AZ' turns into 'AzzzzzzZ'
foreach($twoCapitals[0] as $match){
$firstCapital = $match[0];
$lastCapital = $match[1];
$temp = $firstCapital.'zzzzzz'.$lastCapital;
$string = str_replace($match, $temp, $string);
}
// Now split words
preg_match_all('/((?:^|[A-Z])[a-z]+)/', $string, $words);
$output = "";
$i = 0;
foreach($words[0] as $word){
switch($case){
case 'allUpperCase':
$word = ucfirst($word);
break;
case 'allLowerCase':
$word = strtolower($word);
break;
case 'firstUpperCase':
($i == 0) ? $word = ucfirst($word) : $word = strtolower($word);
break;
}
// remove te 'zzzzzz' from a word if it has
$word = str_replace('zzzzzz','', $word);
$output .= $word." ";
$i++;
}
return $output;
}
Feel free to use it, and in case there is an 'easier' way to do this in one step please comment!
Full function based on #codaddict answer:
function splitCamelCase($str) {
$splitCamelArray = preg_split('/(?=[A-Z])/', $str);
return ucwords(implode($splitCamelArray, ' '));
}

Replace whole words from blacklist array instead of partial matches

I have an array of words
$banned_names = array('about','access','account');
The actual array is very long a contains bad words so at risk of breaking any rule I just added an example, the issue I'm having is the following:
$title = str_ireplace($filterWords, '****', $dn1['title']);
This works however, one of my filtered words is 'rum' and if I was to post the word 'forum' it will display as 'fo****'
So I need to only replace the word with **** if it matches the exact word from the array, if I was to give an example the phrase "Lets check the forum and see if anyone has rum", would be "Lets check the forum and see if anyone has ****".
Similar to the other answers but this uses \b in regex to match word boundaries (whole words). It also creates the regex-compatible banned list on the fly before passing to preg_replace_callback().
$dn1['title'] = 'access forum';
$banned_names = array('about','access','account','rum');
$banned_list = array_map(function($r) { return '/\b' . preg_quote($r, '/') . '\b/'; }, $banned_names);
$title = preg_replace_callback($banned_list, function($m) {
return $m[0][0].str_repeat('*', strlen($m[0])-1);
}, $dn1['title']);
echo $title; //a***** forum
You can use regex with \W to match a "non-word" character:
var_dump(preg_match('/\Wrum\W/i', 'the forum thing')); // returns 0 i.e. doesn't match
var_dump(preg_match('/\Wrum\W/i', 'the rum thing')); // returns 1 i.e. matches
The preg_replace() method takes an array of filters like str_replace() does, but you'll have to adjust the list to include the pattern delimiters and the \W on both sides. You could store the full patterns statically in your list:
$banlist = ['/\Wabout\W/i','/\Waccess\W/i', ... ];
preg_replace($banlist, '****', $text);
Or adjust the array on the fly to add those bits.
You can use preg_replace() to look for your needles with a beginning/end of string tag after converting each string in your haystack to an array of strings, so you'll be matching on full words. Alternatively you can add spaces and continue to use str_ireplace() but that option would fail if your word is the first or last word in the string being checked.
Adding spaces (will miss first/last word, not reccomended):
You'll have to modify your filtering array first of course. And yes the foreach could be simpler, but I hope this makes clear what I'm doing/why.
foreach($filterWords as $key => $value){
$filterWords[$key] = " ".$value." ";
}
str_ireplace ( $filterWords, "****", $dn1['title'] );
OR
Breaking up long string (recommended):
foreach($filterWords as $key => $value){
$filterWords[$key] = "/^".$value."$/i"; //add regex for beginning/end of string value
}
preg_replace ( $filterWords, "****", explode(" ", $dn1['title']) );

PHP regex for math operations

So i'm trying to create a regex without success.
This is what i get as in input string:
String A: "##(ABC 50a- {+} UDF 69,22g,-) {*} 3##"
String B: "##ABC 0,10,- DEF {/} 9 ABC {*} UHG 3-##"
And this is what i need processed out of the regex:
Result A: "(50+69,22)*3"
String B: "0,10/9*3"
I just can't get the number replacement combined with the operation symbols.
This is what i got:
'/[^0-9\+\-\*\/\(\)\.]/'
Thankful for every help.
One simple solution consists of getting rid of everything you don't want.
So replace this:
\{(.+?)\}|[^0-9,{}()]+|(?<!\d),|,(?!\d)
With $1.
Simple enough:
$input = "(ABC 50a- {+} UDF 69,22g,-) {*} 3";
$output = preg_replace('#\{(.+?)\}|[^0-9,{}()]+|(?<!\d),|,(?!\d)#', '$1', $input);
\{(.+?)\} part matches everything inside {...} and outputs it (it gets replaced by $1)
[^0-9,{}()]+ gets rid of every character not belonging to the ones we're trying to keep (it's replaced with an empty string)
(?<!\d),|,(?!\d) throws out commas which are not part of a number
Unfortunately, I can't say much else without a better spec.
A good start would be to write down in words the patterns that you want to match. For instance, you've said that you know the operations are inside {}, but that doesn't appear anywhere in your first attempt at a regex.
You can also break it down into separate sections, and then build it up later. So for instance you might say:
if you see parentheses, keep them in the final answer
a number is made up either of digits...
...or digits followed by a comma and more digits
an operation is always in curly braces, and is either +, -, *, or /
everything else should be thrown away
Given the above list:
matching parentheses is easy: [()]
matching a digit can be done with [0-9] or \d; at least one is +; so "digits" is \d+
comma digits is easy: ,\d+; make it optional with ?and you get \d+(,\d+)?
any of four operations is just [+*/-]; escape the / and - to get [+*\/\-] don't forget that { and } have special meanings in regexes, so need to be escaped as \{ and \}; our list of operations in braces becomes: \{[+*\/\-]\}
Now we have to put it together; one way would be to use preg_match_all to find all occurences of any of those patterns, in order, and then we can stick them back together. So our regex is just "this or this or this or this":
/[()]|\d+(,\d+)?|\{[+*\/\-]\}/
I haven't tested this, but given the explanation of how I arrived at it, hopefully you can figure out how to test parts of it and tweak it if necessary.
I`m not good at regex but I found another approach:
Do EXTRA check of input before running eval!!!
$string = "(ABC 50a- {+} UDF 69,22g) {*} 3";
$new ='';
$string = str_split($string);
foreach($string as $char) {
if(!ctype_alnum($char) || ctype_digit($char) ){
//you don't want letters, except symbols like {, ( etc
$new .=$char;
}
}
//echo $new; will output -> ( 50- {+} 69,22) {*} 3
//remove the brackets although you could put it in the if statement ...
$new = str_replace(array('{','}'),array('',''), $new);
//floating point numbers use dot not comma
$new = str_replace(',','.', $new);
$p = eval('return '.$new.';');
print $p; // -57.66
Used: ctype_digit, ctype_alnum, eval, str_split, str_replace
P.S: I assumed that the minus before the base operation is taken into account.
Just a quick try before leaving the office ;-)
$data = array(
"(ABC 50a- {+} UDF 69,22g) {*} 3",
"ABC 0,10- DEF {/} 9 ABC {*} UHG 3-"
);
foreach($data as $d) {
echo $d . " = " . extractFormula($d) . "\n";
}
function extractFormula($string) {
$regex = '/([()])|([0-9]+(,[0-9]+)?)|\{([+\*\/-])\}/';
preg_match_all($regex, $string, $matches);
$formula = implode(' ', $matches[0]);
$formula = str_replace(array('{', '}'),NULL,$formula);
return $formula;
}
Output:
(ABC 50a- {+} UDF 69,22g) {*} 3 = ( 50 + 69,22 ) * 3
ABC 0,10- DEF {/} 9 ABC {*} UHG 3- = 0,10 / 9 * 3
If some one likes to fiddle around with the code, here is a live example: http://sandbox.onlinephpfunctions.com/code/373d76a9c0948314c1d164a555bed847f1a1ed0d

Regex pattern - match word that starts with #

My mobile application is just like a forum on mobile platform (WP7, Silverlight, .NET). One feature consists in tagging other users by writing "#" char followed by the username.
On server side, PHP, I'm parsing the text so that it matches tags and replace them with more readable string such as [tag](display name)|(user id)[/tag], but that's not important for our purpose.
In order to match tags, I'm replacing all special chars with a space so I can prevent this like .... #name, ..... Then I'm removing all multiple spaces that the previous command could have been created. And finally I'm splitting each whitespace and then I check if that word begins with "#" char.
This is not of course the best method, but It's what I managed to do so far. There's a weak point, new line chars make my code fail. For example:
Hello, this is my first line
since I go to second and then I tag
#Jonh
who is a good boy
In case like this, the code I'm going to write below fails.
Where $resp is the text to parse.
if (strpos($resp,'#') !== false) {
$new_str = preg_replace('/[^a-zA-Z0-9_ \#]/', ' ', $resp);
$new_str = preg_replace('/\s+/', ' ', $new_str);
foreach(explode(' ', $new_str) as $word)
{
if (strpos($word, '#') === 0) {
//found my tag!
}
}
}
}
What would you advise to do?
Rather than using regex to replace everything you don't want to match, you should be able to immediately match any word with an # before it.
$subject = "..blah.. #name, ..blah..#hello,blah";
$pattern = '/#\w+/';
preg_match_all($pattern, $subject, $matches);
print_r($matches);
Output:
Array ( [0] => Array ( [0] => #name [1] => #hello ) )
/#\w+/ assumes that only numbers, letters and underscores (thats what the \w matches) are valid matches (i.e. #user123_xd), if you want to include for example the - (dash) in valid matches (e.g. #user1a-12) then the $pattern would be /#[\w-]+/

how to split by letter-followed-by-period?

I want to split text by the letter-followed-by-period rule. So I do this:
$text = 'One two. Three test. And yet another one';
$splitted_text = preg_split("/\w\./", $text);
print_r($splitted_text);
Then I get this:
Array ( [0] => One tw [1] => Three tes [2] => And yet another one )
But I do need it to be like this:
Array ( [0] => One two [1] => Three test [2] => And yet another one )
How to settle the matter?
Its splitting on the letter and the period. If you want to test to make sure that there is a letter preceding the period, you need to use a positive look behind assertion.
$text = 'One two. Three test. And yet another one';
$splitted_text = preg_split("/(?<=\w)\./", $text);
print_r($splitted_text);
use explode statement
$text = 'One two. Three test. And yet another one';
$splitted_text = explode(".", $text);
print_r($splitted_text);
Update
$splitted_text = explode(". ", $text);
using ". " the explode statement check also the space.
You can use any kind of delimiters also a phrase non only a single char
Using regex is an overkill here, you can use explode easily. Since a explode based answer is already give, I'll give a regex based answer:
$splitted_text = preg_split("/\.\s*/", $text);
Regex used: \.\s*
\. - A dot is a meta char. To match a literal match we escape it.
\s* - zero or more white space.
If you use the regex: \.
You'll have some leading spaces in some of the pieces created.

Categories