Regular expression to match | but not ||

Regular expression to match | but not || - php

My goal is to split a string such as, a|b||c|d in a, b||c and d.
I tried using several methods, but end up splititng my string in any way:
Lookbehind:
var_dump(preg_split("/\\|(?<!\\|\\|)/", 'a|b||c|d'));
array (size=4)
0 => string 'a' (length=1)
1 => string 'b' (length=1)
2 => string '|c' (length=2)
3 => string 'd' (length=1)
Lookahead:
var_dump(preg_split("/(?!\\|\\|)\\|/", 'a|b||c|d'));
array (size=4)
0 => string 'a' (length=1)
1 => string 'b|' (length=2)
2 => string 'c' (length=1)
3 => string 'd' (length=1)
How can I just ignore doublepipes?

Just split your input according to the below regex which uses negative lookarounds.
(?<!\|)\|(?!\|)
DEMO
| is a special meta character in regex which acts like a logical OR or alternation operator. To match a literal | symbol, you need to escape the | in your regex like \|

You can use this regex for splitting:
(?<!\|)\|(?!\|)

Related

PHP preg_split() pattern

I need help finding a PCRE pattern using preg_split().
I'm using the regex pattern below to split a string based on its starting 3 character code and semi-colons. The pattern works fine in Javascript, but now I need to use the pattern in PHP. I tried preg_split() but just getting back junk.
// Each group will begin with a three letter code, have three segments separated by a semi-colon. The string will not be terminated with a semi-colon.
// Pseudocode
string_to_split = "AAA;RED;111;BBB;BLUE;22;CCC;GREEN;33;DDD;WHITE;44"
// This works in JS
// https://regex101.com
$pattern = "/[AAA|BBB|CCC|DDD][^;]*;[^;]*[;][^;]*/gi";
Match 1
Full match 0-11 `AAA;RED;111`
Match 2
Full match 12-23 `BBB;BLUE;22`
Match 3
Full match 24-36 `CCC;GREEN;33`
Match 4
Full match 37-49 `DDD;WHITE;44`
$pattern = "/[AAA|BBB|CCC|DDD][^;]*;[^;]*[;][^;]*/";
$split = preg_split($pattern, $string_to_split);
returns
array(5)
0:""
1:";"
2:";"
3:";"
4:""

According to your additional information in some comments to the answers, I update my answer to be very specific to your source format.
You might want something like this:
$subject = "AAA;RED;111;AAA;Oh my dog;12.34;AAA;Oh Long John;.4556;BBB;Oh Long Johnson;1.2323;BBB;Oh Don Piano;.33;CCC;Why I eyes ya;1.445;CCC;All the live long day;2.3343;DDD;Faith Hilling;.89";
$pattern = '/(?<=;|^)(AAA|BBB|CCC|DDD);([^;]*);((?:\d*\.)?\d+)(?=;|$)/';
preg_match_all($pattern, $subject,$matches);
var_dump($matches);
giving you
array (size=4)
0 =>
array (size=8)
0 => string 'AAA;RED;111' (length=11)
1 => string 'AAA;Oh my dog;12.34' (length=19)
2 => string 'AAA;Oh Long John;.4556' (length=22)
3 => string 'BBB;Oh Long Johnson;1.2323' (length=26)
4 => string 'BBB;Oh Don Piano;.33' (length=20)
5 => string 'CCC;Why I eyes ya;1.445' (length=23)
6 => string 'CCC;All the live long day;2.3343' (length=32)
7 => string 'DDD;Faith Hilling;.89' (length=21)
1 =>
array (size=8)
0 => string 'AAA' (length=3)
1 => string 'AAA' (length=3)
2 => string 'AAA' (length=3)
3 => string 'BBB' (length=3)
4 => string 'BBB' (length=3)
5 => string 'CCC' (length=3)
6 => string 'CCC' (length=3)
7 => string 'DDD' (length=3)
2 =>
array (size=8)
0 => string 'RED' (length=3)
1 => string 'Oh my dog' (length=9)
2 => string 'Oh Long John' (length=12)
3 => string 'Oh Long Johnson' (length=15)
4 => string 'Oh Don Piano' (length=12)
5 => string 'Why I eyes ya' (length=13)
6 => string 'All the live long day' (length=21)
7 => string 'Faith Hilling' (length=13)
3 =>
array (size=8)
0 => string '111' (length=3)
1 => string '12.34' (length=5)
2 => string '.4556' (length=5)
3 => string '1.2323' (length=6)
4 => string '.33' (length=3)
5 => string '1.445' (length=5)
6 => string '2.3343' (length=6)
7 => string '.89' (length=3)
The start marker should occur at the start of string or immidiately after a semicolon, so we do a lookbehind, looking for start or semicolon:
(?<=;|^)
We look for an alternative of AAA,BBB,CCC or DDD and capture it:
(AAA|BBB|CCC|DDD)
After a semicolon we look for any character except a semicolon. The quantifier * means 0 or more time. Use + if you want at least 1.
;([^;]*)
After the next semicolon wie look for a number. This task has to be splitted to fit a valid format: We first look for 0 or more digits followed by a dot:
(?:\d*\.)?
where (?:) means a non-capturing group.
Behind we look for at least one digit: \d+
We want to capture both parts of of the number using parentheses after the searched semicolon:
;((?:\d*\.)?\d+)
This matches "1234", ".1234", "1.234", "12.34" , "123.4" but "1234.", "1.2.3"
Finally we want this to immediately occur before a semicolon or the end of string. Thus we do a lookahead:
(?=;|$)
Lookaheads and lookbehinds are not part of the captured result behind or respectively before.

I've modified your pattern a little, and added a couple of flags to preg_split.
The PREG_SPLIT_NO_EMPTY flag will exclude empty matches from the result, and PREG_SPLIT_DELIM_CAPTURE will include the captured value in the result.
$split = preg_split('/([abcd]{3};[^;]+;\d+);?/i', $string, -1, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
Result:
Array
(
[0] => AAA;RED;111
[1] => BBB;BLUE;22
[2] => CCC;GREEN;33
[3] => DDD;WHITE;44
)
Alternatively, and more suitably, you can use preg_match_all with the following pattern.
preg_match_all('/([abcd]{3};[^;]+;\d+);?/i', $string, $matches);
print_r($matches[0]);
Result:
Array
(
[0] => AAA;RED;111
[1] => BBB;BLUE;22
[2] => CCC;GREEN;33
[3] => DDD;WHITE;44
)

You don't want to split your string but match elements, use preg_match_all:
$str = "AAA;RED;111;AAA;Oh my dog;2.34;AAA;Oh Long John;.4556;BBB;Oh Long Johnson;1.2323;BBB;Oh Don Piano;.33;CCC;Why I eyes ya;1.445;CCC;All the live long day;2.3343;DDD;Faith Hilling;.89";
$res = preg_match_all('/(?:AAA|BBB|CCC|DDD);[^;]*;[^;]*;?/', $str, $m);
print_r($m[0]);
Output:
Array
(
[0] => AAA;RED;111;
[1] => AAA;Oh my dog;2.34;
[2] => AAA;Oh Long John;.4556;
[3] => BBB;Oh Long Johnson;1.2323;
[4] => BBB;Oh Don Piano;.33;
[5] => CCC;Why I eyes ya;1.445;
[6] => CCC;All the live long day;2.3343;
[7] => DDD;Faith Hilling;.89
)
Explanation:
/ : regex delimiter
(?:AAA|BBB|CCC|DDD) : non capture group AAA or BBB or CCC or DDD
; : a semicolon
[^;]* : 0 or more any character that is not a semicolon
; : a semicolon
[^;]* : 0 or more any character that is not a semicolon
;? : optional semicolon
/ : regex delimiter

Can not match the last group of numbers using php preg_match()

preg_match_all("/(\d{12})
(?:,|$)/","111762396541,561572500056,561729950637,561135281443",$matches);
var_dump($mathes):
array (size=2)
0 =>
array (size=4)
0 => string '561762396543,' (length=13)
1 => string '561572500056,' (length=13)
2 => string '561729950637,' (length=13)
3 => string '561135281443' (length=12)
1 =>
array (size=4)
0 => string '561762396543' (length=12)
1 => string '561572500056' (length=12)
2 => string '561729950637' (length=12)
3 => string '561135281443' (length=12)
But I want the $matches like this:
array (size=4)
0 => string '561762396543,' (length=13)
1 => string '561572500056,' (length=13)
2 => string '561729950637,' (length=13)
3 => string '561135281443' (length=12)
I wanna match groups of numbers(each has 12 digits) and a suffix comma if there is one.The exeption is the last group of numbers,it doesnt have to match a comma,cause it reaches the end of the line.

Try this instead:
preg_match_all("/(\d{12}(?:,|$))/","111762396541,561572500056,561729950637,561135281443",$matches);
When the $ is inside your character range brackets [ ] it is looking for the $ characters not the end-of-line.
EDIT: If you want to include the comma in your matches, then just use the above code sample and look at $matches[0].
If you wanted an easier syntax that matches any sort of word boundary, the \b will match commas and end-of-line, too:
preg_match_all("/(\d{12}\b)/","111762396541,561572500056,561729950637,561135281443",$matches);

preg match all get group multiple times

I am trying to get a regular expression to get a subgroup everytime it is found. This is my code:
$string2 = 'cabbba';
preg_match_all('#c(a(b)*a)#',$string2,$result3,PREG_SET_ORDER);
var_dump($result3);
My goal is to get 'b' as a captured group each time (so 3 times). This codes outputs the following:
array (size=1)
0 =>
array (size=3)
0 => string 'cabbba' (length=6)
1 => string 'abbba' (length=5)
2 => string 'b' (length=1)
I want it to show 'b' each times it appears, so something like this
array (size=1)
0 =>
array (size=3)
0 => string 'cabbba' (length=6)
1 => string 'abbba' (length=5)
2 => array (size=3)
0 => string 'b' (length 1)
1 => string 'b' (length 1)
2 => string 'b' (length 1)
This is a simplified example, in the real code the subpattern 'b' will be different each time, but it follows the same pattern.

This would be possible only through \G anchor.
(?:ca|\G)(b)(?=b|(a))
DEMO

Did you try using a non-greedy modifier for your b*?
$string2 = 'cabbba';
preg_match_all('#c(a(b)*?a)#', $string2, $result3, PREG_SET_ORDER);
var_dump($result3);
Excuse me if it's not what you asked, I'm not sure I really understood your needs...
UPDATE:
Sorry, previous answer is wrong, please ignore it...
I'm trying to elaborate a right one...
Just trying something like
preg_match_all('#c(a(?:(b{1}))*a)#', $string2, $result3, PREG_SET_ORDER);
but it doesn't work, either... :-(
UPDATE 2:
See Avinash Raj answer, I think it's quite good...

preg_match Regex Matching Full String

I have a simple regex, but it's matching more than I want...
Basically, I'm trying to match certain operators (eg. > < != =) followed by a string.
Regex:
/^(<=|>=|<>|!=|=|<|>)(.*)/
Example subject:
>42
What I'm getting:
array (size=3)
0 => string '>42' (length=3)
1 => string '>' (length=1)
2 => string '42' (length=2)
What I'm trying to get:
array (size=2)
0 => string '>' (length=1)
1 => string '42' (length=2)
What I don't understand is that my regex works perfectly on Regex101
Edit: To clarify, how can I get rid of the full string match?

Your answer is correct.Group(0) is the whole match.Group(1) if first group and group(2) is the second group.

You are getting all 3 groups \0, \1, and '\2'. see the group matching at the bottom of the page
assuming your matches are in $matches you can run array_shift($matches) to remove the '\0' match if you wish.

strange behavior of preg_match_all()

Following code:
$string ='۱۲۳۴۵۶۷۸۹۰';
$regex ='#۱#';
preg_match_all($regex,$string,$match);
var_dump($match);
will output:
array(1) {
[0] =>
array(1) {
[0] =>
string(2) "۱"
}
}
but
$regex2 ='#[۱]#';
preg_match_all($regex2,$string,$match);
var_dump($match);
will output
array (size=1)
0 =>
array (size=11)
0 => string '�' (length=1)
1 => string '�' (length=1)
2 => string '�' (length=1)
3 => string '�' (length=1)
4 => string '�' (length=1)
5 => string '�' (length=1)
6 => string '�' (length=1)
7 => string '�' (length=1)
8 => string '�' (length=1)
9 => string '�' (length=1)
10 => string '�' (length=1)
Indeed I want use RegEx like [۱۲۳۴۵۶۷۸۹۰]‍‍‍‍‍‍, but the function output strange result with such RegEx's. I am using PHP 5.4

Try adding the Unicode flag:
$regex = '#[۱]#u';
The reason for this is because ۱ is actually several bytes long. On it's own, it's harmless because those exact bytes are either the symbol, or the individual bytes being there coincidentally. However, in a character class any of the individual bytes may match any of the individual bytes in the other characters, which is does because they are close together in the map.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regular expression to match | but not || - php

Just split your input according to the below regex which uses negative lookarounds. (?<!\|)\|(?!\|) DEMO | is a special meta character in regex which acts like a logical OR or alternation operator. To match a literal | symbol, you need to escape the | in your regex like \|

You can use this regex for splitting: (?<!\|)\|(?!\|)

Related

PHP preg_split() pattern

Can not match the last group of numbers using php preg_match()

preg match all get group multiple times

preg_match Regex Matching Full String

strange behavior of preg_match_all()

Categories

Resources