Regex - avoid any unnecessary searches preg_match() PHP

Regex - avoid any unnecessary searches preg_match() PHP - php

Hello i have small problem with my regular expression.
For simple:
$pattern='/^(a([0-9]|[a-z])?|b(\=|\?)?)$/';
$subject='b=';
returns array:
Array
(
[0] => b=
[1] => b=
[2] =>
[3] => =
)
Index number 2 in this array is from a(...)? - my question: can i avoid this field in my result? I have very long pattern and my array is in 90% empty. Can i remove this empty fields by some magic characters?
Edit:
In my pattern i have something like that:
n(o|h)?(\+|\-|\(([+]?[0-9]+);([+]?[0-9]+)\))?
It will search strings like no+ or n(12;15). Can i do it simpler? And i have more text like this, it means i have something like that:
/^(n(o|h)?(\+|\-|\(([+]?[0-9]+);([+]?[0-9]+)\))?|i(o|h)?(\+|\-|\(([+]?[0-9]+);([+]?[0-9]+)\))?)$/
Regards

After reading your pattern, I assume that you can make it simpler with this version:
\A([in][oh]?)([+-]|\(\+?[0-9]+;\+?[0-9]+\))\z
demo
Note that i don't know exactly the captures you need, but you can add them as you want.
details:
\A # anchor for the start of the string
( # capture group 1:
[in] # a 'i' or a 'n'
[oh]? # a 'o' or a 'h' (optional)
)
( # capture group 2:
[+-] # a '+' or a '-'
| # OR
\(\+?[0-9]+;\+?[0-9]+\)
)
\z # anchor for the end of the string

Related

Regex: Capturing multiple instances in one word group

I'm not good at Regex and I've been trying for hours now so I hope you can help me. I have this text:
✝his is *✝he* *in✝erne✝*
I need to capture (using PREG_OFFSET_CAPTURE) only the ✝ in a word surrounded with *, so I only need to capture the last three ✝ in this example. The output array should look something like this:
[0] => Array
(
[0] => ✝
[1] => 17
)
[1] => Array
(
[0] => ✝
[1] => 32
)
[2] => Array
(
[0] => ✝
[1] => 44
)
I've tried using (✝) but ofcourse this will select all instances including the words without asterisks. Then I've tried \*[^ ]*(✝)[^ ]*\* but this only gives me the last instance in one word. I've tried many other variations but all were wrong.
To clarify: The asterisk can be at all places in the string, but always at the beginning and end of a word. The opening asterisk always precedes a space except at the beginning of the string and the closing asterisk always ends with a space except at the end of the string. I must add that punctuation marks can be inside these asterisks. ✝ is exactly (and only) what I need to capture and can be at any position in a word.

You could make use of the \G anchor to get iterative matches between the *. The anchor matches either at the start of the string, or at the end of the previous match.
(?:\*|\G(?!^))[^&*]*(?>&(?!#)[^&*]*)*\K✝(?=[^*]*\*)
Explanation
(?: Non capture group
\* Match *
| Or
\G(?!^) Assert the end of the previous match, not at the start
) Close non capture group
[^&*]* Match 0+ times any char except & and *
(?> Atomic group
&(?!#) Match & only when not directly followed by #
[^&*]* Match 0+ times any char except & and *
)* Close atomic group and repeat 0+ times
\K Clear the match buffer (forget what is matched until now)
✝ Match literally
(?=[^*]*\*) Positive lookahead, assert a * at the right
Regex demo | Php demo
For example
$re = '/(?:\*|\G(?!^))[^&*]*(?>&(?!#)[^&*]*)*\K✝(?=[^*]*\*)/m';
$str = '✝his is *✝he* *in✝erne✝*';
preg_match_all($re, $str, $matches, PREG_OFFSET_CAPTURE);
print_r($matches[0]);
Output
Array
(
[0] => Array
(
[0] => ✝
[1] => 16
)
[1] => Array
(
[0] => ✝
[1] => 31
)
[2] => Array
(
[0] => ✝
[1] => 43
)
)
Note The the offset is 1 less than the expected as the string starts counting at 0. See PREG_OFFSET_CAPTURE
If you want to match more variations, you could use a non capturing group and list the ones that you would accept to match. If you don't want to cross newline boundaries you can exclude matching those in the negated character class.
(?:\*|\G(?!^))[^&*\r\n]*(?>&(?!#)[^&*\\rn]*)*\K&#(?:x271D|169);(?=[^*\r\n]*\*)
Regex demo

Php preg_match optional slash not working properly

I have this regex:
`^(?:/(?P<cat1>[^/\.]+/?)?)(?:(?P<cat2>[^/\.]+/?)?)(?:(?P<cat3>[^/\.]+/?)?)(?:(?P<cat4>[^/\.]+/?)?)(?:(?P<slug>[^/\.]+))-(?:(?P<id>[0-9]++))$`u
Which should work with
/cat-one/product-14
/cat-one/cat-two/product-14
/cat-one/cat-two/cat-three/product-14
/cat-one/cat-two/cat-three/cat-four/product-14
Problem is that only with the fourth one works good.
Array
(
[cat1] => cat-one
[cat2] => cat-two
[cat3] => cat-three
[cat4] => cat-four
[slug] => product
[id] => 14
)
The first three the 'slug' parameter has only one letter and the cat before gets the first letters:
Array
(
[cat1] => cat-one
[cat2] => cat-two
[cat3] => produc
[cat4] =>
[slug] => t
[id] => 14
)
I know the optional / is causing some problems, but i need it to match something else in the code and this regex is generated dinamically and I can not set a specific if for this case only.
(?P<cat1>[^/\.]+/?)?)
How can I make the / optional but still get the result I need?
Thanks!
LE: The problem here preg match possible duplicate was that i had different parameters optional and the preg_match was not matching them accordingly. The question above is different, since the problem is that because of a /? i get my slug broken in two.

Don't make the / optional since the groups are optional.
This leaves the slug-id intact each time.
^/(?:(?<cat1>[^/.\r\n]+/)?)(?:(?<cat2>[^/.\r\n]+/)?)(?:(?<cat3>[^/.\r\n]+/)?)(?:(?<cat4>[^/.\r\n]+/)?)(?:(?<slug>[^/.\r\n]+))-(?:(?<id>[0-9]++))$
https://regex101.com/r/Z64x8l/1
Readable regex
^ /
(?:
(?<cat1> [^/.\r\n]+ / )? # (1)
)
(?:
(?<cat2> [^/.\r\n]+ / )? # (2)
)
(?:
(?<cat3> [^/.\r\n]+ / )? # (3)
)
(?:
(?<cat4> [^/.\r\n]+ / )? # (4)
)
(?:
(?<slug> [^/.\r\n]+ ) # (5)
)
-
(?:
(?<id> [0-9]++ ) # (6)
)
$
Note, \r\n were added for multiline purposes. If you have a single line
string, just take that out.
Also, if you believe there may be more nesting before slug-id that you
don't account for, just add (?:[^/.\r\r]+/)* before the slug named group.
This will always keep the slug-id at the end.

Validate url parameters with preg_match

Valid example
12[red,green],13[xs,xl,xxl,some other text with chars like _&-##%]
number[anythingBut ()[]{},anythingBut ()[]{}](,number[anythingBut ()[]{},anythingBut ()[]{}]) or nothing
Full match 12[red,green]
Group 1 12
Group 2 red,green
Full match 13[xs,xl,xxl,some other text with chars like _&-##%]
Group 1 13
Group 2 xs,xl,xxl,some other text with chars like _&-##%
Not valid example
13[xs,xl,xxl 9974-?ds12[dfgd,dfgd]]
What I tried is this: (\d+(?=\[))\[([^\(\[\{\}\]\)]+)\], regex101 link with what I tried, but this also matches wrong input like given in the example.

If you just need to validate the input, you can add some anchors:
^(?:\d+\[[^\(\[\{\}\]\)]+\](?:,|$))+$
Regex101
If you also need to get all the matching parts, you can use another regex. Using only one will not work well.

$in = '12[red,green],13[xs,xl,xxl,some other text with chars like _&-##%],13[xs,xl,xxl 9974-?ds12[dfgd,dfgd]]';
preg_match_all('/(\d+)\[([^][{}()]+)(?=\](?:,|$))/', $in, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => 12[red,green
[1] => 13[xs,xl,xxl,some other text with chars like _&-##%
)
[1] => Array
(
[0] => 12
[1] => 13
)
[2] => Array
(
[0] => red,green
[1] => xs,xl,xxl,some other text with chars like _&-##%
)
)
Explanation:
/ : regex delimiter
(\d+) : group 1, 1 or more digits
\[ : open square bracket
( : start group 2
[^][{}()]+ : 1 or more any character that is not open or close parenthesis, brackets or square brackets
) : end group 2
(?= : positive lookahead, make sure we have after
\] : a close square bracket
(?:,|$) : non capture group, a comma or end of string
) : end group 2
/ : regex delimiter

Extract urls from string without spaces between

Let's say I have a string like this:
$urlsString = "http://foo.com/barhttps://bar.com//foo.com/foo/bar"
and I want to get an array like this:
array(
[0] => "http://foo.com/bar",
[1] => "https://bar.com",
[0] => "//foo.com/foo/bar"
);
I'm looking to something like:
preg_split("~((https?:)?//)~", $urlsString, PREG_SPLIT_NO_EMPTY|PREG_SPLIT_DELIM_CAPTURE);
Where PREG_SPLIT_DELIM_CAPTURE definition is:
If this flag is set, parenthesized expression in the delimiter pattern will be captured and returned as well.
That said, the above preg_split returns:
array (size=3)
0 => string '' (length=0)
1 => string 'foo.com/bar' (length=11)
2 => string 'bar.com//foo.com/foo/bar' (length=24)
Any idea of what I'm doing wrong or any other idea?
PS: I was using this regex until I've realized that it doesn't cover this case.
Edit:
As #sidyll pointed, I'm missing the $limit in the preg_split parameters. Anyway, there is something wrong with my regex, so I will use #WiktorStribiżew suggestion.

You may use a preg_match_all with the following regex:
'~(?:https?:)?//.*?(?=$|(?:https?:)?//)~'
See the regex demo.
Details:
(?:https?:)? - https: or http:, optional (1 or 0 times)
// - double /
.*? - any 0+ chars other than line break as few as possible up to the first
(?=$|(?:https?:)?//) - either of the two:
$ - end of string
(?:https?:)?// - https: or http:, optional (1 or 0 times), followed with a double /
Below is a PHP demo:
$urlsString = "http://foo.com/barhttps://bar.com//foo.com/foo/bar";
preg_match_all('~(?:https?:)?//.*?(?=$|(?:https?:)?//)~', $urlsString, $urls);
print_r($urls);
// => Array ( [0] => http://foo.com/bar [1] => https://bar.com [2] => //foo.com/foo/bar )

preg_match_all for words in and outside of brackets

I have been sitting for hours to figure out a regExp for a preg_match_all function in php.
My problem is that i whant two different things from the string.
Say you have the string "Code is fun [and good for the brain.] But the [brain is] tired."
What i need from this an array of all the word outside of the brackets and the text in the brackets together as one string.
Something like this
[0] => Code
[1] => is
[2] => fun
[3] => and good for the brain.
[4] => But
[5] => the
[6] => brain is
[7] => tired.
Help much appreciated.

You could try the below regex also,
(?<=\[)[^\]]*|[.\w]+
DEMO
Code:
<?php
$data = "Code is fun [and good for the brain.] But the [brain is] tired.";
$regex = '~(?<=\[)[^\]]*|[.\w]+~';
preg_match_all($regex, $data, $matches);
print_r($matches);
?>
Output:
Array
(
[0] => Array
(
[0] => Code
[1] => is
[2] => fun
[3] => and good for the brain.
[4] => But
[5] => the
[6] => brain is
[7] => tired.
)
)
The first lookbind (?<=\[)[^\]]* matches all the characters which are present inside the braces [] and the second [.\w]+ matches one or more word characters or dot from the remaining string.

You can use the following regex:
(?:\[([\w .!?]+)\]+|(\w+))
The regex contains two alternations: one to match everything inside the two square brackets, and one to capture every other word.
This assumes that the part inside the square brackets doesn't contain any characters other than alphabets, digits, _, !, ., and ?. In case you need to add more punctuation, it should be easy enough to add them to the character class.
If you don't want to be that specific about what should be captured, then you can use a negated character class instead — specify what not to match instead of specifying what to match. The expression then becomes: (?:\[([^\[\]]+)\]|(\w+))
Visualization:
Explanation:
(?: # Begin non-capturing group
\[ # Match a literal '['
( # Start capturing group 1
[\w .!?]+ # Match everything in between '[' and ']'
) # End capturing group 1
\] # Match literal ']'
| # OR
( # Begin capturing group 2
\w+ # Match rest of the words
) # End capturing group 2
) # End non-capturing group
Demo

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regex - avoid any unnecessary searches preg_match() PHP - php

Related

Regex: Capturing multiple instances in one word group

Php preg_match optional slash not working properly

Validate url parameters with preg_match

Extract urls from string without spaces between

preg_match_all for words in and outside of brackets

Categories

Resources