I need help with this regex pattern

I need help with this regex pattern - php

Hi I have a problem with my regex pattern:
preg_match_all('/!!\d{3}/', '!!333!!333 !!333 test', $result);
I want this to match !!333 but not !!333!333. How can I modify this regex to match only a max length of 5 characters - two ! and three numbers.

/^!!\d{3}$/
You need the anchors ^, that match the beginning of a string and $ for the end. Its like saying: "It must begin at the start of the string and it must end at the end of it." If you omit one (or both) the pattern allows arbitrary symbols at the beginning and/or the end.
Update
As I found out in the comments the question was very misleading. Now I suggest to split the string before applying the pattern
$string = '!!333!!333 !!333 test';
$result = array();
foreach (explode(' ', $string) as $index => $item) {
if (preg_match('/^!!\d{3}$/', $item)) {
$result[$index] = $item;
}
}
This also respects the index of the item. If you dont need it, remove the $index stuff or just ignore it ;)
Its much easier then trying to find a pattern, that fulfill your request all at once.

^!!\d{3}$
You need to anchor your pattern.
If you want to match a string with !!333 in it, you may want something like:
(^|\s)!!\d{3}($|\s)
With further explanation we can have a further refinement:
(^|\s)!!\d{3}(?=$|\s)
Which will not capture the trailing space allowing multiple matches in the same line to match one after another.

I find the easiest and most descriptive way to do this is with negative lookaheads and lookbehinds.
See:
preg_match_all('/(?<![^\s])!!\d{3}(?![^\s])/', '!!333 !!333!!333 !!333 test !!333', result);
This says: match anything of the form !![0-9][0-9][0-9] which doesn't have anything other than a space in front or behind it. Note that these lookaheads/lookbehinds aren't matched themselves, they are "zero-width assertions", they are thrown away and so you only get "!!333" etc in your match, not " !!333" etc.
It returns
[0] => Array
(
[0] => !!333
[1] => !!333
[2] => !!333
)
)
Also
preg_match_all(
'/(?<![^\s])!!\d{3}(?![^\s])/',
'!!333 !!555 !!333 !!123 !!555 !!456 !!333 !!333 !!444 !!444 !!123 !!123 !!123!!123',
$result));
returns
[0] => Array
(
[0] => !!333
[1] => !!555
[2] => !!333
[3] => !!123
[4] => !!555
[5] => !!456
[6] => !!333
[7] => !!333
[8] => !!444
[9] => !!444
[10] => !!123
[11] => !!123
)
That is, all but the last two which are too long.
See Lookahead tutorial.

Related

Match all regex start with but not end with characters

i have an array of words
i want to match all with starting '___'
but some words also having '___' at the end .
but i do not want to match these words
here is my word list
___apis
___db_tables
___groups
___inbox_messages
___sent_messages
___todo
___users
___users_groups
____4underscorestarting
sinan
sssssssssss
test_______dfg
testttttt
tet____
tttttttttt
uuuuuuuu
vvvvvvvvvvvv
wwwwwwww
zzzzzzzzzz
i want to match only these words
___apis
___db_tables
___groups
___inbox_messages
___sent_messages
___todo
___users
___users_groups
i do not want to match these words
tet____
test_______dfg
____4underscorestarting
this is how it looks like when i try

The solution using preg_grep function:
// $arr is your initial array of words
$matched = preg_grep("/^_{3}[^_].*/", $arr);
print_r($matched);
The output:
Array
(
[0] => ___apis
[1] => ___db_tables
[2] => ___groups
[3] => ___inbox_messages
[4] => ___sent_messages
[5] => ___todo
[6] => ___users
[7] => ___users_groups
)
Update: To get the opposite matches use one of the following:
regex pattern:
/^(?!_{3})\w*/
set the third argument of preg_grep function as PREG_GREP_INVERT(... preg_grep("/^_{3}[^_].*/", $arr, PREG_GREP_INVERT))
http://php.net/manual/en/function.preg-grep.php

^___[a-z].*
this should do it for you.See demo.
https://regex101.com/r/hHRg8d/1

^_{3}.*[^(_{3})]$
Starts(^) with 3 '_' _{3}
Can contain anything in the middle .*
Does not end($) in 3 '' [^({3}]

Using regex to not match periods between numbers

I have a regex code that splits strings between [.!?], and it works, but I'm trying to add something else to the regex code. I'm trying to make it so that it doesn't match [.] that's between numbers. Is that possible? So, like the example below:
$input = "one.two!three?4.000.";
$inputX = preg_split("~(?>[.!?]+)\K(?!$)~", $input);
print_r($inputX);
Result:
Array ( [0] => one. [1] => two! [2] => three? [3] => 4. [4] => 000. )
Need Result:
Array ( [0] => one. [1] => two! [2] => three? [3] => 4.000. )

You should be able to split on this:
(?<=(?<!\d(?=[.!?]+\d))[.!?])(?![.!?]|$)
https://regex101.com/r/kQ6zO4/1
It uses lookarounds to determine where to split. It looks behind to try to match anything in the set [.!?] one or more times as long as it isn't preceded by and succeeded by a digit.
It also won't return the last empty match by ensuring the last set isn't the end of the string.
UPDATE:
This should be much more efficient actually:
(?!\d+\.\d+).+?[.!?]+\K(?!$)
https://regex101.com/r/eN7rS8/1
Here is another possibility using regex flags:
$input = "one.two!three???4.000.";
$inputX = preg_split("~(\d+\.\d+[.!?]+|.*?[.!?]+)~", $input, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($inputX);
It includes the delimiter in the split and ignores empty matches. The regex can be simplified to ((?:\d+\.\d+|.*?)[.!?]+), but I think what is in the code sample above is more efficient.

preg_split and multiple delimiters

let me start by saying the first number before the first - will be the ID I need to extract. from the first - to the first / will be the 'name' I need to extract. Everything after that I do not care for.
Test String:
1-gc-communications/edit/profile_picture
Expected Output:
Array ( [0] => 1 [1] => gc-communications [2] => /edit/profile_picture )
The best I could come up with was the following patterns (along with their results - with a limit of 3)
Pattern: /-|edit\/profile_picture/
Result: Array ( [0] => 1 [1] => gc [2] => communications/edit/profile_picture )
^ This one is flawed because it does both dashes.
Pattern: /~-~|edit\/profile_picture/
Result: Array ( [0] => 1-gc-communications/ [1] => )
^ major fail.
I know I can do a 2-element limit and just break on the first / and then do a preg_split on the result array, but I would love a way to make this work with one line.
If this is a no-go I am open to other "one liner" solutions.

Try this one
$str = '1-gc-communications/edit/profile_picture';
$match = preg_split('#([^-]+)-([^/]+)/(.*)#', $str, 0, PREG_SPLIT_DELIM_CAPTURE);
print_r($match);
return like as
array (
0 => '',
1 => '1',
2 => 'gc-communications',
3 => 'edit/profile_picture',
4 => '',
)

the first number before the first - will be the ID I need to extract. from the first - to the first / will be the 'name' I need to extract. Everything after that I do not care for.
This task seems a great candidate for sscanf() -- it is specifically designed for parsing (scanning) a formatted string. Not only is the syntax brief, you know that you do not need to make repeated matches with the pattern. The output, in case it matters, can be pre-cast as an integer or string for convenience. The remaining string from the first occurring slash are simply ignored.
Code: (Demo)
$str = '1-gc-communications/edit/profile_picture';
var_export(
sscanf($str, '%d-%[^/]')
# ^^ ^^^^^- greedily match one or more non-slash characters
# ^^------- greedily match one or more numeric characters
);
Output:
array (
0 => 1, #<-- integer-typed
1 => 'gc-communications', #<-- string-typed
)

Unexpected result with very simple regexp

I am fairly new to regexp and have encountered a regexp that delivers an unexpected result, when trying to match name parts in name of the form firstname-fristname firstname:
preg_match_all('/([^- ])*/i', 'aNNA-äöå Åsa', $result);
gives a print_r($result) that looks like this:
Array
(
[0] => Array
(
[0] => aNNA
[1] =>
[2] => äöå
[3] =>
[4] => Åsa
[5] =>
)
[1] => Array
(
[0] => A
[1] =>
[2] => å
[3] =>
[4] => a
[5] =>
)
)
Now the $result[0] has the items I would want and expect as result, but where the heck do the $results[1] come from - I see it's the word endings, but how come they are matched?
And as a little side question, how do I prevent the empty matches ($results[0][1], $results[0][3], ...), or better even: Why do they show up - they are not not- or not-space either?

Have a try with:
preg_match_all('/([^- ]+)/', 'aNNA-äöå Åsa', $result);
Your regex:
/([^- ])*/i
means: find one char that is not ^ or space and keep it in a group 0 or more times
This one:
/([^- ]+)/
means: find one or more char that is not ^ or space and keep it in a group
Moreover, there's no need for case insensitive.

The * means "0 or more of the preceding." Since a "-" is exactly 0 of the the character class, it is matched. However, since it is omitted from the character class, the capture fails to grab anything, leaving you an empty entry. The expression giving you the expected behavior would be:
preg_match_all('/([^- ])+/i', 'aNNA-äöå Åsa', $result);
("+" means "1 or more of the preceding.")

http://php.net/manual/en/function.preg-match-all.php says:
Orders results so that $matches[0] is an array of full pattern
matches, $matches[1] is an array of strings matched by the first
parenthesized subpattern, and so on.
Check the URL for more details

Regular Expressions: get what is outside of the brackets

I'm using PHP and I have text like:
first [abc] middle [xyz] last
I need to get what's inside and outside of the brackets. Searching in StackOverflow I found a pattern to get what's inside:
preg_match_all('/\[.*?\]/', $m, $s)
Now I'd like to know the pattern to get what's outside.
Regards!

You can use preg_split for this as:
$input ='first [abc] middle [xyz] last';
$arr = preg_split('/\[.*?\]/',$input);
print_r($arr);
Output:
Array
(
[0] => first
[1] => middle
[2] => last
)
This allows some surrounding spaces in the output. If you don't want them you can use:
$arr = preg_split('/\s*\[.*?\]\s*/',$input);
preg_split splits the string based on a pattern. The pattern here is [ followed by anything followed by ]. The regex to match anything is .*. Also [ and ] are regex meta char used for char class. Since we want to match them literally we need to escape them to get \[.*\]. .* is by default greedy and will try to match as much as possible. In this case it will match abc] middle [xyz. To avoid this we make it non greedy by appending it with a ? to give \[.*?\]. Since our def of anything here actually means anything other than ] we can also use \[[^]]*?\]
EDIT:
If you want to extract words that are both inside and outside the [], you can use:
$arr = preg_split('/\[|\]/',$input);
which split the string on a [ or a ]

$inside = '\[.+?\]';
$outside = '[^\[\]]+';
$or = '|';
preg_match_all(
"~ $inside $or $outside~x",
"first [abc] middle [xyz] last",
$m);
print_r($m);
or less verbose
preg_match_all("~\[.+?\]|[^\[\]]+~", $str, $matches)

Use preg_split instead of preg_match.
preg_split('/\[.*?\]/', 'first [abc] middle [xyz] last');
Result:
array(3) {
[0]=>
string(6) "first "
[1]=>
string(8) " middle "
[2]=>
string(5) " last"
}
ideone

As every one says that you should use preg_split, but only one person replied with an expression that meets your needs, and i think that is a little complex - not complex, a little to verbose but he has updated his answer to counter that.
This expression is what most of the replies have stated.
/\[.*?\]/
But that only prints out
Array
(
[0] => first
[1] => middle
[2] => last
)
and you stated you wanted whats inside and outside the braces, sio an update would be:
/[\[.*?\]]/
This gives you:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
but as you can see that its capturing white spaces as well, so lets go a step further and get rid of those:
/[\s]*[\[.*?\]][\s]*/
This will give you a desired result:
Array
(
[0] => first
[1] => abc
[2] => middle
[3] => xyz
[4] => last
)
This i think is the expression your looking for.
Here is a LIVE Demonstration of the above Regex

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

I need help with this regex pattern - php

Hi I have a problem with my regex pattern: preg_match_all('/!!\d{3}/', '!!333!!333 !!333 test', $result); I want this to match !!333 but not !!333!333. How can I modify this regex to match only a max length of 5 characters - two ! and three numbers.

Related

Match all regex start with but not end with characters

Using regex to not match periods between numbers

preg_split and multiple delimiters

Unexpected result with very simple regexp

Regular Expressions: get what is outside of the brackets

Categories

Resources