php preg_match character special characters ([ ] ( ) { } etc) - php

im new to preg_match, i know this character [ ] has meaning in preg_match, but how do I actually treat it as character that i really want to match?
For example:
$word = '[Hello], Im steve';
preg_match_all('/[Hello]/', $word, $match);
print_r($match)
Output:
Array ( [0] => Array ( [0] => H [1] => e [2] => l [3] => l [4] => o [5] => e [6] => e ) )
The above statement didnt match and return the real '[' and ']'
How to overcome this?

Just escape it with a backslash \
preg_match_all('/\[Hello\]/', $word, $match);
print_r($match);
UPD:
Case-insensitive match: (i modifier after delimiter)
preg_match_all('/\[Hello\]/i', $word, $match);

Related

regex split a string between [ and ]

My string is something like that '[15][18][22]' and now I like so split it into an array of [15] and [18] and [22]. I'm trying with this regex
\[\d+\]
But it only split the first one.
thanks for help
You are better off using preg_match_all with what you want to capture:
if (preg_match_all('/\[\d+]/', $str, $m)) {
print_r($m[0]);
}
Output:
Array
(
[0] => [15]
[1] => [18]
[2] => [22]
)
Or else you may use this preg_split with a capture group:
$str = '[15][18][22]';
$arr = preg_split('/(\[\d+])/', $str, -1,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($arr);
Output:
Array
(
[0] => [15]
[1] => [18]
[2] => [22]
)
It just doesn't get any simpler than this. Three characters in the pattern. You only need to explode on the zero-width position after each ]. \K tells the regex engine to forget/release the previously matched character.
~]\K~ Pattern Demo
Code: (Demo)
$string = '[15][18][22]';
var_export(preg_split('~]\K~', $string, -1, PREG_SPLIT_NO_EMPTY));
Output:
array (
0 => '[15]',
1 => '[18]',
2 => '[22]',
)
This will perform with maximum efficiency because it doesn't have any capture groups, lookarounds, or alternatives to slow it down.

preg_split every character, but don't split if is quote [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 5 years ago.
I'm using the following code to split my UTF-8 strings to characters:
$characters = preg_split('//u', $word, -1, PREG_SPLIT_NO_EMPTY);
In some cases, a character might have a single quote after it. for example: hel'lo. I want to keep that quote with the character before it.
Using the regex above, my array is this:
Array
(
[0] => h
[1] => e
[2] => l
[3] => '
[4] => l
[5] => o
)
And I want the array to be:
Array
(
[0] => h
[1] => e
[2] => l'
[3] => l
[4] => o
)
How can I do it?
Thanks!
(the single quote can be at the beginning of the string, at the end of it and in the middle of it).
Rather than split, you can do preg_match_all using
'?\p{L}'?
i.e. an optional ' before and after the Unicode letter:
preg_match_all("/'?\\p{L}'?/u", $str, $matches);
RegEx Demo
Use ! to prevent from split
$characters = preg_split("/(?!')/u", $word, -1, PREG_SPLIT_NO_EMPTY);

Whitespace delimiter not being captured in preg split

<?php
$text = "Testing text splitting\nWith a newline!";
$textArray = preg_split('/\s+/', $text, 0, PREG_SPLIT_DELIM_CAPTURE);
print_r($textArray);
The above code will output the following:
Array
(
[0] => Testing
[1] => text
[2] => splitting
[3] => With
[4] => a
[5] => newline!
)
However to my knowledge the PREG_SPLIT_DELIM_CAPTURE flag should be capturing the whitespace delimiters in the array. Am I missing something?
edit: Ok, after rereading the documentation I now understand PREG_SPLIT_DELIM_CAPTURE is not meant for this case. My desired output would be something like:
Array
(
[0] => Testing
[1] => ' '
[2] => text
[3] => ' '
[4] => splitting
[5] => '\n'
[6] => With
[7] => ' '
[8] => a
[9] => ' '
[10] => newline!
)
So if you read manual for PREG_SPLIT_DELIM_CAPTURE once again which says:
If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.
you will suddenly understand that expression in the delimiter pattern (in your case it is \s) will be captured (i.e added to result) only when it is in parentheses. Now, you can:
$text = "Testing text splitting\nWith a newline!";
$textArray = preg_split('/(\s+)/', $text, 0, PREG_SPLIT_DELIM_CAPTURE);
// parentheses!
print_r($textArray);
You can also use T-Regx library:
$textArray = pattern('(\s+)')->split("Testing text splitting\nWith a newline!")->inc();

how to split a string containing numbers in our outside parenthesis using preg_match_all

I have a string that looks something like this:
535354 345356 3543674 34667 2345347 -3536 4532452 (234536 2345634 -4513453) (2345 -13254 13545)
The text between () is always at the end of the string (at least for now).
i need to split it into an array similar to this:
[0] => [0] 535354,345356,3543674,34667,2345347,-3536,4532452
[1] => [0] 234536,2345634,-4513453
=> [1] 2345,-13254,13545
What expression should i use for preg_match_all?
Best i could get with my limited knowledge is /([0-9]{1,}){1,}.*(?=(\(.*\)))/U but i still get some unwanted elements.
You may use a regex that will match chunks of numbers outside of parentheses and those inside with "~(?<=\()\s*$numrx\s*(?=\))|\s*$numrx~" where a $numrx stands for the number regex (that can be enhanced further).
The -?\d+(?:\s+-?\d+)* matches an optional -, 1 or more digits, and then 0+ sequences of 1+ whitespaces followed with optional - and 1+ digits. (?<=\()\s*$numrx\s*(?=\)) matches the same only if preceded with ( and followed with ).
See this PHP snippet:
$s = "535354 345356 3543674 34667 2345347 -3536 4532452 (234536 2345634 -4513453) (2345 -13254 13545)";
$numrx = "-?\d+(?:\s+-?\d+)*";
preg_match_all("~(?<=\()\s*$numrx\s*(?=\))|\s*$numrx~", $s, $m);
$res = array();
foreach ($m[0] as $k) {
array_push($res,explode(" ",trim($k)));
}
print_r($res);
Output:
[0] => Array
(
[0] => 535354
[1] => 345356
[2] => 3543674
[3] => 34667
[4] => 2345347
[5] => -3536
[6] => 4532452
)
[1] => Array
(
[0] => 234536
[1] => 2345634
[2] => -4513453
)
[2] => Array
(
[0] => 2345
[1] => -13254
[2] => 13545
)
You can use this regex in preg_match_all:
$re = '/\d+(?=[^()]*[()])/';
RegEx Demo
RegEx Breakup:
\d+ # match 1 or more digits
(?= # lookahead start
[^()]* # match anything but ( or )
[()] # match ( or )
) # lookahead end

Sscanf with regex to match even an empty string

I am using this regex in sscanf
sscanf($seat, "%d-%[^(](%[^#]#%[^)])");
And it works well when i'm getting this kind of strings:
173-9B(AA#3.45 EUR#32H)
but when i'm getting this kind of string:
173-9B(#3.14 EUR#32H)
it's all messed up, how can I also accept empty strings between the first ( and the first # ?
You would be better off using a regex in preg_match to handle optional data presence in input:
$re = '/(\d*)-([^(]*)\(([^#]*)#([^)]*)\)/';
preg_match($re, '173-9B(#3.45 EUR#32H)', $m);
unset($m[0]);
print_r($m);
Output:
Array
(
[1] => 173
[2] => 9B
[3] =>
[4] => 3.45 EUR#32H
)
And 2nd example:
preg_match($re, '173-9B(AA#3.45 EUR#32H)', $m);
unset($m[0]);
print_r($m);
Array
(
[1] => 173
[2] => 9B
[3] => AA
[4] => 3.45 EUR#32H
)
Use of ([^#]*) will make it match 0 more characters that are not #.

Categories