Regex is highlighting the wrong words like Hell«o» and ignoring the correct words «Hello» or Hello,
So, my problem is working fine for my javascript code, but when i try it for php it also highlighting the string, which shouldn't:
'«This is the point of sale» ';
here is my regex: https://regex101.com/r/SqCR1y/14
PHP Code:
$re = '/^(?:.*[[{(«][^\]})»\n]*|[^[{(«\n]*[\]})»].*|.*\w[[{(«].*|.*[\]})»]\w.*)$/m';
$str = '«This is the point of sale»';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);
//Output
array(1) {
[0]=>
array(1) {
[0]=>
string(29) "«This is the point of sale»"
}
}
expected: empty array
jsfiddle here, which is working fine
Thanks in advance
you're not using the right pattern. try this:
$re = '/^
(?:
\([^)\n] | [^(\n]*\). |
\[[^]\n] | [^[\n]*\]. |
{[^}\n] | [^{\n]}.* |
«[^»\n] | [^«\n]*». |
.?\w[[{(«]. | .?[\]})»]\w.
)
$/mxu';
What about a string like "(not) balanced)" ? Should that be legal?
This type of pattern isn't explicit in your test input, but since none of your "good" strings are imbalanced, you could consider covering these cases by using regex recursion to match balanced bracket expressions and targeting valid strings instead of invalid ones:
$re = '/
^
(?!.*\w[{}«»\(\)\[\]]\w) //disallow brackets inside words
(?:
[^\n{}«»\(\)\[\]]| //non bracket character, OR:
( //(capture group #1, the recursive subpattern) "one of the following balanced groups":
(\((?:(?>[^\n«»\(\){}\[\]]|(?1))*)\))| //balanced paren groups
(\[(?:(?>[^\n«»\(\){}\[\]]|(?1))*)\])| //balanced bracket groups
(«(?:(?>[^\n«»\(\){}\[\]]|(?1))*)»)| //balanced chevron groups
({(?:(?>[^\n«»\(\){}\[\]]|(?1))*)}) //balanced curly bracket groups
)
)+ //repeat "non bracket character or balanced group" until end of string
$
/mxu';
The recursion takes this form:
[openbracket]([nonbracket] | [open/close pattern again via recursion])*[closebracket]
To use part of the pattern recursively you identify it via the capture group that encloses it (?N), where N is the number of the group.
*The initial negative lookahead will fail any "word boundary" violations before going into the recursive stuff
*This regex looks to be about 35% faster than the original approach, as seen here: https://regex101.com/r/MBITHe/4
Related
I have a string, from which I want to keep text inside a pair of brackets and remove everything outside of the brackets:
Hello [123] {45} world (67)
Hello There (8) [9] {0}
Desired output:
[123] {45} (67) (8) [9] {0}
Code tried but fails:
$re = '/[^()]*+(\((?:[^()]++|(?1))*\))[^()]*+/';
$text = preg_replace($re, '$1', $text);
If the values in the string are always an opening bracket paired up with a closing bracket and no nested parts, you can match all the bracket pairs which you want to keep, and match all other character except the brackets that you want to remove.
(?:\[[^][]*]|\([^()]*\)|{[^{}]*})(*SKIP)(*F)|[^][(){}]+
Explanation
(?: Non capture gorup
\[[^][]*] Match from [...]
| Or
\([^()]*\) Match from (...)
| Or
{[^{}]*} Match from {...}
) Close non capture group
(*SKIP)(*F)| consume characters that you want to avoid, and that must not be a part of the match result
[^][(){}]+ Match 1+ times any char other than 1 of the listed
Regex demo | Php demo
Example code
$re = '/(?:\[[^][]*]|\([^()]*\)|{[^{}]*})(*SKIP)(*F)|[^][(){}]+/m';
$str = 'Hello [123] {45} world (67)
Hello There (8) [9] {0}';
$result = preg_replace($re, '', $str);
echo $result;
Output
[123]{45}(67)(8)[9]{0}
If you want to remove all other values:
(?:\[[^][]*]|\([^()]*\)|{[^{}]*})(*SKIP)(*F)|.
Regex demo
Looks like you wanted to target nested stuff as well. There are already questions about how to match balanced parenthesis. Adjust one of those patterns to fit your needs, e.g. something like
$pattern = '/\((?:[^)(]*(?R)?)*+\)|\{(?:[^}{]*+(?R)?)*\}|\[(?:[^][]*+(?R)?)*\]/';
You can try this on Regex101. Extract those with preg_match_all and implode the matches.
if(preg_match_all($pattern, $str, $out) > 0)
echo implode(' ', $out[0]);
If you need to match the stuff outside, even with this pattern you can use (*SKIP)(*F) that also used #Thefourthbird in his elaborately answer! For skipping the bracketed see this other demo.
If the brackets are not nested, the following should suffice:
[^[{(\]})]+(?=[[{(]|$)
Demo.
Breakdown:
[^[{(\]})]+ # Match one or more characters except for opening/closing bracket chars.
(?=[[{(]|$) # A positive Lookahead to ensure that the match is either followed by
# an opening bracket char or is at the end of the string.
I have an input that goes like this
[d/D/d1/d2/d3/d4/d5/d6/d7/D1/D2/D3/D4/D5/D6/D7]+[\.]+[r1/r2/r3/r4/r5/r6/R1/R2/R3/R4/R5/R6]+[\.]+[number 1 to 37]+[#]+[number 0 - 9 ]
An example would be "d2.r1.4#100.37#1.9#2.3#1(can have as many 1-37 # 0-9 as needed)"
How do I write a regex match that can allow the last part of the string to be dynamic (matches as many groups as needed as inputted)
I've tried this expression:
[dD1-7]+\.[rR1-5]+\.
and I'm not sure how to match the dynamic group that comes after the "d2.r1." part.
Assuming you merely need to validate the string (and not capture/extract specific substrings), the following pattern provides the same result as Emma's answer but with a tighter syntax.
The i pattern modifier means you only have to write the two letters in lowercase. I don't use any excess non-capturing groups. Two-character character classes don't need a hyphen. \d is the shorter way of expressing [0-9].
Wrapping the final/repeating characters in parentheses then writing * means the sequence in the parentheses may repeat zero or more times.
Code: (Demo)
$inputs = [
'd2.r1.4#100.37#1.9#2.3#1',
'd2.r1.4#100.37#1.9#2.38#1.8#22',
'd2.r1.4#100.37#1.9#2.3#1.12#2.30#2',
];
$pattern = '/^d[1-7]\.r[1-6](?:\.(?:3[0-7]|[12]\d|[1-9])#\d+)*$/i';
foreach ($inputs as $input) {
echo "\n{$input}: ";
var_export((bool)preg_match($pattern, $input));
}
Output:
d2.r1.4#100.37#1.9#2.3#1: true
d2.r1.4#100.37#1.9#2.38#1.8#22: false
d2.r1.4#100.37#1.9#2.3#1.12#2.30#2: true
I'm guessing that maybe some expression similar to,
^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$
or with some slight changes, would likely work here.
Test
$re = '/^[dD][1-7]\.[rR][1-6](?:(?:\.(?:3[0-7]|[1-2]\d|[1-9]))#[0-9]+)*$/m';
$str = 'd2.r1.4#100.37#1.9#2.3#1
d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1
d2.r1.4#100.38#1.9#2.3#1
d2.r1.4#100.0#1.9#2.3#1
';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
Output
array(2) {
[0]=>
array(1) {
[0]=>
string(24) "d2.r1.4#100.37#1.9#2.3#1"
}
[1]=>
array(1) {
[0]=>
string(63) "d2.r1.4#100.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1.37#1.9#2.3#1"
}
}
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
RegEx Circuit
jex.im visualizes regular expressions:
I'm trying to decode the content-disposition header (from curl) to get the filename using the following regular expression:
<?php
$str = 'attachment;filename="unnamed.jpg";filename*=UTF-8\'\'unnamed.jpg\'';
preg_match('/^.*?filename=(["\'])([^"\']+)\1/m', $str, $matches);
print_r($matches);
So while it matches if the filename is in single or double quotes, it fails if there are no quotes around the filename (which can happen)
$str = 'attachment;filename=unnamed.jpg;filename*=unnamed.jpg';
Right now I'm using two regular expressions (with if-else) but I just wanted to learn if it is possible to do in a single regex? Just for my own learning to master regex.
I will use the branch reset feature (?|...|...|...) that gives a more readable pattern and avoids to create a capture group for the quotes. In a branch-reset group, each capture groups have the same numbers for each alternative:
if ( preg_match('~filename=(?|"([^"]*)"|\'([^\']*)\'|([^;]*))~', $str, $match) )
echo $match[1], PHP_EOL;
Whatever the alternative that succeeds, the capture is always in group 1.
Just to put my two cents in - you could use a conditional regex:
filename=(['"])?(?(1)(.+?)\1|([^;]+))
Broken down, this says:
filename= # match filename=
(['"])? # capture " or ' into group 1, optional
(?(1) # if group 1 was set ...
(.+?)\1 # ... then match up to \1
| # else
([^;]+) # not a semicolon
)
Afterwards, you need to check if group 2 or 3 was present.
Alternatively, go for #Casimir's answer using the (often overlooked) branch reset.
See a demo on regex101.com.
One approach is to use an alternation in a single regex to match either a single/double quoted filename, or a filename which is completely unquoted. Note that one side effect of this approach is that we introduce more capture groups into the regex. So we need a bit of extra logic to handle this.
<?php
$str = 'attachment;filename=unnamed.jpg;filename*=UTF-8\'\'unnamed.jpg\'';
$result = preg_match('/^.*?filename=(?:(?:(["\'])([^"\']+)\1)|([^"\';]+))/m',
$str, $matches);
print_r($matches);
$index = count($matches) == 3 ? 2 : 3;
if ($result) {
echo $matches[$index];
}
else {
echo "filename not found";
}
?>
Demo
You could make your capturing group optional (["\'])? and \1? like:
and add a semicolon or end of the string to the end of the regex in a non capturing group which checks if there is a ; or the end of the line (?:;|$)
^.*?filename=(["\'])?([^"\']+)\1?(?:;|$)
$str = 'attachment;filename=unnamed.jpg;filename*=UTF-8\'\'unnamed.jpg\'';
preg_match('/^.*?filename=(["\'])?([^"\']+)\1?(?:;|$)/m', $str, $matches);
print_r($matches);
Output php
You can also use \K to reset the starting point of the reported match and then match until you encounter a double quote or a semicolon [^";]+. This will only return the filename.
^.*?filename="?\K[^";]+
foreach ($strings as $string) {
preg_match('/^.*?filename="?\K[^";]+/m', $string, $matches);
print_r($matches);
}
Output php
Hi I'm starting to learn php regex and have the following problem:
I need to extract the numbers inside $string.
The regex I use returns "NULL".
$string = 'Clasificación</a> (2194) </li>';
$regex = '/Clasificación</a>((.*?))</li>/';
preg_match($regex , $string, $match);
var_dump($match);
Thanks in advance.
There are three problems with your regex:
You aren't escaping the forward slash. You're using the forward slash as a delimiter, so if you want to use it as a literal character inside the expression, you need to escape it
((.*?)) doesn't do what you think it does. It creates two capturing groups -- one nested inside the other. I assume, you're trying to capture what's inside the parentheses. For that, you'll need to escape the ( and ) characters. The expression would become: \((.*?)\)
Your expression doesn't handle whitespace. In the string you've given, there is whitespace between the </a> and the beginning of the number -- </a> (2194). To ignore the whitespace and capture just the number, you need to use \s (which matches any whitespace character). For that, you need to write \s*\((.*?)\)\s*.
The final regular expression after fixing all the above errors, will look like:
$regex = '~Clasificación</a>\s*\((.*?)\)\s*</li>~';
Full code:
$string = 'Clasificación</a> (2194) </li>';
$regex = '~Clasificación</a>\s*\((.*?)\)\s*</li>~';
preg_match($regex , $string, $match);
var_dump($match);
Output:
array(2) {
[0]=>
string(32) "Clasificación (2194) "
[1]=>
string(4) "2194"
}
Demo.
You forget to espace / in your regex, since you're using the / as a delimiter:
$regex = '/Clasificación<\/a>((.*?))<\/li>/';
// ^ delimiter ^^ ^ delimiter
// ^^ / in a string which is escaped
Another way can be to change that delimiter, and then you will not have to escape it:
$regex = '#Clasificación<\/a>((.*?))<\/li>#';
See the PHP documentation for more information.
you will have to escape out the special characters that you want to match:
$regex = '/Clasificación<\/a> \((.*?)\) <\/li>/'
and may want to make your match a little more specific where it matters (depending on your use case)
$regex = '/Clasificación<\/a>\s*\(([0-9]+)\)\s*<\/li>/';
that will allow for 0 or more spaces before or after the (1234) and only match if there are only numbers in the ()
I just tried this in php:
php > preg_match($regex , $string, $match);
php > var_dump($match);
array(2) {
[0]=>
string(30) "Clasificacin</a> (2194) </li>"
[1]=>
string(4) "2194"
}
can regex able to find a patter to this?
{{foo.bar1.bar2.bar3}}
where in the groups would be
$1 = foo $2 = bar1 $3 = bar2 $4 = bar3 and so on..
it would be like re-doing the expression over and over again until it fails to get a match.
the current expression i am working on is
(?:\{{2})([\w]+).([\w]+)(?:\}{2})
Here's a link from regexr.
http://regexr.com?3203h
--
ok I guess i didn't explain well what I'm trying to achieve here.
let's say I am trying to replace all
.barX inside a {{foo . . . }}
my expected results should be
$foo->bar1->bar2->bar3
This should work, assuming no braces are allowed within the match:
preg_match_all(
'%(?<= # Assert that the previous character(s) are either
\{\{ # {{
| # or
\. # .
) # End of lookbehind
[^{}.]* # Match any number of characters besides braces/dots.
(?= # Assert that the following regex can be matched here:
(?: # Try to match
\. # a dot, followed by
[^{}]* # any number of characters except braces
)? # optionally
\}\} # Match }}
) # End of lookahead%x',
$subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
I'm not a PHP person, but I managed to construct this piece of code here:
preg_match_all("([a-z0-9]+)",
"{{foo.bar1.bar2.bar3}}",
$out, PREG_PATTERN_ORDER);
foreach($out[0] as $val)
{
echo($val);
echo("<br>");
}
The code above prints the following:
foo
bar1
bar2
bar3
It should allow you to exhaustively search a given string by using a simple regular expression. I think that you should also be able to get what you want by removing the braces and splitting the string.
I don't think so, but it's relatively painless to just split the string on periods like so:
$str = "{{foo.bar1.bar2.bar3}}";
$str = str_replace(array("{","}"), "", $str);
$values = explode(".", $str);
print_r($values); // Yields an array with values foo, bar1, bar2, and bar3
EDIT: In response to your question edit, you could replace all barX in a string by doing the following:
$str = "{{foo.bar1.bar2.bar3}}";
$newStr = preg_replace("#bar\d#, "hi", $str);
echo $newStr; // outputs "{{foo.hi.hi.hi}}"
I don't know the correct syntax in PHP, for pulling out the results, but you could do:
\{{2}(\w+)(?:\.(\w+))*\}{2}
That would capture the first hit in the first capturing group and the rest in second capturing group. regexr.com is lacking the ability to show that as far as I can see though. Try out Expresso, and you'll see what I mean.