Negate charactor group: match "abc'," but not "abc\'," - php

I need a pattern which can negate a charactor group and also negate a charactor inside the negate group
The following pattern works, but I want to do a bit more
(?:(?!'\,).)+
Here I don't want to match strings that contain ',
But what I really need is to integrate a negation inside the negation group - something like this
(?:(?![^\\]'\,).)+
I don't want to match any escaped quote signs
Match: abc',
Don't match: abc\',
argh.. it posts on enter..
$str = "'abc\',',asdf";
preg_match("/^('(?:(?!',).)+')/", $str, $matches);
echo '<pre>';
print_r($matches);
echo '</pre>';
this should output abc\', but it outputs abc\

Judging by your last comment, I think you're trying to match a single-quoted string literal, which might contain single-quotes escaped with backslashes. For example, in this string:
'abc\',','xyz'
...you want to match 'abc\',' and 'xyz'. That's easy enough:
$source = "'abc\',','xyz'";
print "$source\n\n";
preg_match_all("/'(?:[^'\\\\]++|\\\\.)*+'/", $source, $matches);
print_r($matches);
output:
'abc\',','xyz'
Array
(
[0] => Array
(
[0] => 'abc\','
[1] => 'xyz'
)
)
see it on ideone
But maybe you want to match all the items in a comma-separated list, which may or may not be quoted--in other words, CSV (or something very similar). If that's the case, you should use a dedicated CSV processing tool; there are many of them out there. In fact, PHP has one built in: http://php.net/manual/en/function.fgetcsv.php

/^(?:(?!\\\\',).)+$/ appears to do what you want. Note that you have to escape the single quote ''. See http://ideone.com/ypln2
If you don't necessarily want to match the full string, remove the ^ and $. See http://ideone.com/G67RV

Related

How do I match this pattern using preg_match in PHP?

I'm writing a simple quiz engine in PHP and supply the question text in this format
question|correct/feedback|wrong/feedback|wrong/feedback
There can be as many wrong/feedback options as necessary. I want to use preg_match to return the results so I can display them. For instance:
q|aaa/aaa|bbb/bbb|ccc/ccc
...should return...
array(
0 => q|aaa/aaa|bbb/bbb|ccc/ccc
1 => q
2 => aaa/aaa
3 => bbb/bbb
4 => ccc/ccc
)
So, far I've got this regular expression which matches the question and the correct/feedback combination...
([^\|]+)\|([^\/]+\/[^\|$]+)
...but I have no idea how to match the remaining wrong/feedback strings
You can also use the "glue" feature in your pattern with preg_match_all, this way it's possible to check if the syntax is correct and to extract each part at the same time.
The glue feature ensures that each match follows immediately the previous match without gap. To do that I use the A global modifier (Anchored to the start of the string or the next position after the previous match).
$s = 'q|aaa/aaa|bbb/bbb|ccc/ccc';
$pat = '~ (?!\A) \| \K [^|/]+ / [^|/]+ (?: \z (*:END) )? | \A [^|/]+ ~Ax';
if ( preg_match_all($pat, $s, $m) && isset($m['MARK']) ) {
$result = $m[0];
print_r($result);
}
I use also a marker (*:END) to be sure that the end of the string is well reached despite of the pattern constraints. If this marker exists in the matches array, it's a proof that the syntax is correct. Advantage: you have to parse the string only once (you don't even need to check the whole string syntax in a lookahead assertion anchored at the start of the string).
demo
If you want the whole question as first item in the result array, just write:
$result = array_merge([$s], $m[0]);
So, after the advice, I've decided to use preg_match to check the syntax and then explode to split the string.
This regex seems to match the string format up until any mismatch occurs.
^[^\|/]+(?:\|[^\|/]+/[^\|/]+)+
If I check that the length of the match is the same as the original string I think this will tell me the syntax is correct. Does this sound feasible?

Trouble Using the preg_replace Option

I have a regex:
preg_match_all('#^(((?:-?>?(?:[A-Z]{3})?\d{3})+)-([0-9]{2})([0-9]{2})([0-9]{2})-\n\/O.([A-Z]{3}).KCLE.([A-Z]{2}).([A-Z]).([0-9]{4}).[0-9]{6}T[0-9]{4}Z-([0-9]{2})([0-9]{4}T[0-9]{4}Z[\/]))#', '', $matches)
that runs against a string(s) on a webpage. An example of a possible string:
OHZ012>018-PAZ015-060815-
/O.EXP.KCLE.BH.S.0015.000000T0000Z-170806T0700Z/
This will correctly match the string. However, for $matches[2] it will output
OHZ012>018-PAZ015
I want this line to read: 012>018-015 (i.e. remove the letters from that group).
I have tried the following using preg_replace:
$matches = preg_replace('/([A-Z]{3})/','',$matches);
Now if I print out $matches[2] it just gives me the 3rd character as opposed to the group. So for example, it will print out "2" instead of "012>018-015". Any idea why it isn't printing out the entire group as I would expect?
preg_match_all populates your $matches variable with an array of arrays. The third parameter of preg_replace should be either a string or an array of strings, so that is probably where you were running into the issue.
$matches[2], however, is an array of strings, so you can call preg_replace passing it as the third parameter and get your results.
$matches[2] = preg_replace('/([A-Z]{3})/','',$matches[2]);
If you would like a more generic letter replacement regex, you can use /[A-Z]/i to remove all letters in the strings.

preg_replace - similar patterns

I have a string that contains something like "LAB_FF, LAB_FF12" and I'm trying to use preg_replace to look for both patterns and replace them with different strings using a pattern match of;
/LAB_[0-9A-F]{2}|LAB_[0-9A-F]{4}/
So input would be
LAB_FF, LAB_FF12
and the output would need to be
DAB_FF, HAD_FF12
Problem is, for the second string, it interprets it as "LAB_FF" instead of "LAB_FF12" and so the output is
DAB_FF, DAB_FF
I've tried splitting the input line out using 2 different preg_match statements, the first looking for the {2} pattern and the second looking for the {4} pattern. This sort of works in that I can get the correct output into 2 separate strings but then can't combine the two strings to give the single amended output.
\b is word boundary. Meaning it will look at where the word ends and not only pattern match.
https://regex101.com/r/upY0gn/1
$pattern = "/\bLAB_[0-9A-F]{2}\b|\bLAB_[0-9A-F]{4}\b/";
Seeing the comment on the other answer about how to replace the string.
This is one way.
The pattern will create empty entries in the output array for each pattern that fails.
In this case one (the first).
Then it's just a matter of substr.
$re = '/(\bLAB_[0-9A-F]{2}\b)|(\bLAB_[0-9A-F]{4}\b)/';
$str = 'LAB_FF12';
preg_match($re, $str, $matches);
var_dump($matches);
$substitutes = ["", "DAB", "HAD"];
For($i=1; $i<count($matches); $i++){
If($matches[$i] != ""){
$result = $substitutes[$i] . substr($matches[$i],3);
Break;
}
}
Echo $result;
https://3v4l.org/gRvHv
You can specify exact amounts in one set of curly braces, e.g. `{2,4}.
Just tested this and seems to work:
/LAB_[0-9A-F]{2,4}/
LAB_FF, LAB_FFF, LAB_FFFF
EDIT: My mistake, that actually matches between 2 and 4. If you change the order of your selections it matches the first it comes to, e.g.
/LAB_([0-9A-F]{4}|[0-9A-F]{2})/
LAB_FF, LAB_FFFF
EDIT2: The following will match LAB_even_amount_of_characters:
/LAB_([0-9A-F]{2})+/
LAB_FF, LAB_FFFF, LAB_FFFFFF...

Regex grab all text between brackets, and NOT in quotes

I'm attempting to match all text between {brackets}, however not if it is in quotation marks:
For example:
$str = 'value that I {want}, vs value "I do {NOT} want" '
my results should snatch "want", but omit "NOT". I've searched stackoverflow desperately for the regex that could perform this operation with no luck. I've seen answers that allow me to get the text between quotes but not outside quotes and in brackets. Is this even possible?
And if so how is it done?
So far this is what I have:
preg_match_all('/{([^}]*)}/', $str, $matches);
But unfortunately it only gets all text inside brackets, including {NOT}
It's quite tricky to get this done in one go. I even wanted to make it compatible with nested brackets so let's also use a recursive pattern :
("|').*?\1(*SKIP)(*FAIL)|\{(?:[^{}]|(?R))*\}
Ok, let's explain this mysterious regex :
("|') # match eiter a single quote or a double and put it in group 1
.*? # match anything ungreedy until ...
\1 # match what was matched in group 1
(*SKIP)(*FAIL) # make it skip this match since it's a quoted set of characters
| # or
\{(?:[^{}]|(?R))*\} # match a pair of brackets (even if they are nested)
Online demo
Some php code:
$input = <<<INP
value that I {want}, vs value "I do {NOT} want".
Let's make it {nested {this {time}}}
And yes, it's even "{bullet-{proof}}" :)
INP;
preg_match_all('~("|\').*?\1(*SKIP)(*FAIL)|\{(?:[^{}]|(?R))*\}~', $input, $m);
print_r($m[0]);
Sample output:
Array
(
[0] => {want}
[1] => {nested {this {time}}}
)
Personally I'd process this in two passes. The first to strip out everything in between double quotes, the second to pull out the text you want.
Something like this perhaps:
$str = 'value that I {want}, vs value "I do {NOT} want" ';
// Get rid of everything in between double quotes
$str = preg_replace("/\".*\"/U","",$str);
// Now I can safely grab any text between curly brackets
preg_match_all("/\{(.*)\}/U",$str,$matches);
Working example here: http://3v4l.org/SRnva

PHP- Regular expression - how to read from Right to left

I have below example
$game = "hello999hello888hello777last";
preg_match('/hello(.*?)last/', $game, $match);
The above code returns 999hello888hello777, what I need is to retrieve the value just before Last, i.e 777. So I need to read regular expression to read from right to left.
$game = strrev($game);
How about that? :D
Then just reverse the regular expression ^__^
Why not just reverse the string? Use PHP's strrev and then just reverse your regular expression.
$game = "hello999hello888hello777last";
preg_match('/tsal(.*?)elloh/', strrev($game), $match);
This will return the last set of digits before the string last
$game = "hello999hello888hello777last";
preg_match('/hello(\d+)last$/', $game, $match);
print_r($match);
Output Example:
Array
(
[0] => hello777last
[1] => 777
)
So you would need $match[1]; for the 777 value
Your problem is that although .* matches reluctantly, i. e. as few characters as possible, it still starts matching right after hello, and since it matches any characters, it will match right across "boundaries" (last and hello in your case).
Therefore you need to be more explicit about the fact that it's not legal to match across boundaries, and that's what lookahead assertions are for:
preg_match('/hello((?:(?!hello|last).)*)last(?!.*(?:hello|last)/', $game, $match);
Now the match between hello and last is prohibited from containing hello and/or last, and it's not allowed to have hello or last after the match.

Categories