PHP regex preg_split - split by largest group only

PHP regex preg_split - split by largest group only - php

I have the following regex
((\$|(\\\[)).*?(\$|(\\\])))
which should capture everything between $$ and \[\] and I tested it on http://gskinner.com/RegExr/ and it's working.
PHP variant is (doubled backslashes)
((\$|(\\\\\[)).*?(\$|(\\\\\])))
and I would like to split my text based on that regex. How can I tell that it uses just the first (and largest group) and not these small ones?
preg_split('/((\$|(\\\\\[)).*?(\$|(\\\\\])))/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
So for text This is my $test$ for something. I should get an array
[0] => This is my
[1] => $test$
[2] => for something.
But I get
[0] => This is my
[1] => $test$
[2] => $
[3] =>
[4] => $
[5] => for something.

You would need something like this:
$text = 'This is my $test$ for \[something\] new!';
print_r(preg_split('/(\$.*?\$|\\\\\[.*?\\\\\])/', $text, -1, PREG_SPLIT_DELIM_CAPTURE));
Output:
Array
(
[0] => This is my
[1] => $test$
[2] => for
[3] => \[something\]
[4] => new!
)
IMHO, your regex is (probably) wrong. It would fail for texts like Hello $there\]. If you need to capture texts between two $s and a pair of \[ and \], then you need the regexp like:
<-------------> Match text between \[ and \]
/(\$.*?\$|\\\\\[.*?\\\\\])/
<-----> Match text between dollars

Related

Retrieving text outside square brackets in PHP

I need some way of capturing the text outside square brackets. So for example, the following string:
My [ground]name[test]Jhon[random]petor [shorts].
I m using the below preg match expression but the result could not be expected
preg_match_all("/\[[^\]]*\]/", $text, $matches);
it giving me the result which is within the square bracket.
Result :
Array (
[0] => [ground]
[1] => [test]
[2] => [random]
[3] => [shorts]
)
Expect Output:
Array (
[0] => [My]
[1] => [name]
[2] => [Jhon]
[3] => [petor]
)
Any help that would be great

You can extend the pattern adding \K to clean what is matched so far and then using an alternation to match 1 or more word characters.
\[[^][]+]\K|\w+
See a regex demo
$re = '/\[[^][]+]\K|\w+/';
$str = 'My [ground]name[test]Jhon[random]petor [shorts].';
preg_match_all($re, $str, $matches);
print_r(array_values(array_filter($matches[0])));
Output
Array
(
[0] => My
[1] => name
[2] => Jhon
[3] => petor
)

Split a string by forward slash but ignore the <\ in the string

I have string similar to this
word1/word2/word3/<b>word3</b>
I want to explode this string by forward slash. So that I can get the following result.
Array = (
[0] => 'word1',
[1] => 'word2',
[2] => 'word3',
[3] => '<b>word3</b>'
);
But I'm unable to get the above result. Instead I'm getting the following result
Array = (
[0] => 'word1',
[1] => 'word2',
[2] => 'word3',
[3] => '<b>word3<',
[4] => 'b>'
);
What regular expression should I use for this to use the preg_split function to achieve the expected results?

With preg_split function and specific regex pattern:
$s = 'word1/word2/word3/<b>word3</b>';
$result = preg_split('~(?<!<)/~', $s);
print_r($result);
~ - treated as regex expression separator
(?<!<)/ - negative lookbehind assertion, assures that forward slash / is not preceded by <
The output:
Array
(
[0] => word1
[1] => word2
[2] => word3
[3] => <b>word3</b>
)

Break a string with optional space and number and a dot

trying to break a string from (optional space) number and a dot.
$string = "1.1Kumar/Sandeep MR*T0148.4801 12.23Pal/Sandeep MR*T643.948";
$regex1 = "/(\s*[0-9]+\.)/";
$regex2 = "/(?<=\s)[0-9]+\./";
I need to break from 1. and 12. .
The first regex gives:
Array
(
[0] =>
[1] => 1Kumar/Sandeep MR*T
[2] => 4801
[3] => 23Pal/Sandeep MR*T
[4] => 948
)
The second regex gives:
Array
(
[0] => 1.1Kumar/Sandeep MR*T0148.4801
[1] => 23Pal/Sandeep MR*T643.948
)
I am trying to get:
Array
(
[0] => 1Kumar/Sandeep MR*T0148.4801
[1] => 23Pal/Sandeep MR*T643.948
)

For you example string this will work:
\b\d+\.
Debuggex Demo
It makes sure there's a word break before the numeric part. (start of line or a space does it)

PHP preg_match_all capture all patterns at front of string not mid string

given the subject
AB: CD:DEF: HIJ99:message packet - no capture
I have crafted the following regex to capture correctly the 2-5 character targets which are all followed by a colon.
/\s{0,1}([0-9a-zA-Z]{2,5}):\s{0,1}/
which returns my matches even if erronious spaces are added before or after the targets
[0] => AB
[1] => CD
[2] => DEF
[3] => HIJ99
However, if the message packet contains a colon in it anywhere, for example
AB: CD:DEF: HIJ99:message packet no capture **or: this either**
it of course includes [4] => or in the resulting set, which is not desired. I want to limit the matches to a consecutive set from the beginning, then once we lose concurrency, stop looking for more matches in the remainder
Edit 1:
Also tried ^(\s{0,1}([0-9a-zA-Z]{2,5}):\s{0,1}){1,5} to force checking from the beginning of the string for multiple matches, but then I lose the individual matches
[0] => Array
(
[0] => AB: CD:DEF: HIJ99:
)
[1] => Array
(
[0] => HIJ99:
)
[2] => Array
(
[0] => HIJ99
)
Edit 2:
keep in mind the subject is not fixed.
AB: CD:DEF: HIJ99:message packet - no capture
could just as easily be
ZY:xw:VU:message packet no capture or: this either
for the matches we are trying to pull, with the subject being variable as well. Just trying to filter out the chance of matching a ":" in the message packet

You could use \G to do a consecutive string match.
$str = 'AB: CD:DEF: HIJ99:message packet no capture or: this either';
preg_match_all('/\G\s*([0-9a-zA-Z]{2,5}):\s*/', $str, $m);
print_r($m[1]);
Output:
Array
(
[0] => AB
[1] => CD
[2] => DEF
[3] => HIJ99
)
DEMO

How about:
$str = 'AB: CD:DEF: HIJ99:message packet no capture or: this either';
preg_match_all('/(?<![^:]{7})([0-9a-zA-Z]{2,5}):/', $str, $m);
print_r($m);
Output:
Array
(
[0] => Array
(
[0] => AB:
[1] => CD:
[2] => DEF:
[3] => HIJ99:
)
[1] => Array
(
[0] => AB
[1] => CD
[2] => DEF
[3] => HIJ99
)
)

Get name from hashtag using regex

I have this string/content:
#Salome, #Jessi H and #O'Ren were playing at the #Lean's yard with "#Ziggy" the mouse.
Well, I am trying to get all names focuses above. I have used # symbol to create like a hash to be used in my web. If you note, there are names with spaces between like #Jessi H and characters before and after like #Ziggy. So, I don't my if you suggest me another way to manage the hash in another way to get it works correctly. I was thinking that for user that have white spaces, could write the hash with quotes like #"Jessi H". What do you think? Other examples:
#Lean's => #"Lean"'s
#Jessi H => #"Jessi H"
"#Jessi H" => (sorry, I don't know how to parse it)
#O'Ren => #"O'Ren"
What I have do?
I'm starting using regex in php, but some SO questions have been usefull for me to get started, so, these are my tries using preg_match_all function firstly:
Result of /#(.*?)[,\" ]/:
Array ( [0] => Salome [1] => Jessi [2] => Charlie [3] => Lean's [4] => Ziggy" ) )
Result of /#"(.*?)"/ for names like #"name":
Empty array
Guys, I don't expect that you do it all for me. I think that a pseudo-code or something like this will be helpful to guide me to the right direction.

Try the following regex: '/#(?:"([^"]+)|([^\b]+?))\b/'
This will return two match groups, the first containing any quoted names (eg #"Jessi H" and #"O'Ren"), and the second containing any unquoted names (eg #Salome, #Leon)
$matches = array();
preg_match_all('/#(?:"([^"]+)|([^\b]+?))\b/', '#Salome, #"Jessi H" and #"O\'Ren" were playing at the #Lean\'s yard with "#Ziggy" the mouse.', $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => #Salome
[1] => #"Jessi H
[2] => #"O'Ren
[3] => #Lean
[4] => #Ziggy
)
[1] => Array
(
[0] =>
[1] => Jessi H
[2] => O'Ren
[3] =>
[4] =>
)
[2] => Array
(
[0] => Salome
[1] =>
[2] =>
[3] => Lean
[4] => Ziggy
)
)

Are you setting these requirements or can you choose them? If you can set the requirements, I would suggest using _ instead of spaces, which would allow you to use the regex:
/#(.+) /
If spaces must be allowed and you're going with quotes, then the quotes should probably span the entire name, allowing for this regex:
/#\"(.+)\" /

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP regex preg_split - split by largest group only - php

Related

Retrieving text outside square brackets in PHP

Split a string by forward slash but ignore the <\ in the string

Break a string with optional space and number and a dot

PHP preg_match_all capture all patterns at front of string not mid string

Get name from hashtag using regex

Categories

Resources