PHP regular expression match text not within strings

PHP regular expression match text not within strings - php

I have content like foo == 'bar test baz' and test.asd = "buz foo". I need to match the "identifiers", the ones on the left that are not within double/single quotes. This is what I have now:
preg_replace_callback('#([a-zA-Z\\.]+)#', function($matches) {
var_dump($matches);
}, $subject);
It now matches even those within strings. How would I write one that does not match the string ones?
Another example: foo == 5 AND bar != 'buz' OR fuz == 'foo bar fuz luz'. So in essence, match a-zA-Z that are not inside strings.

/^[^'"=]*/
would work on your examples. It matches any number of characters (starting at the start of the string) that are neither quotes nor equals signs.
/^[^'"=\s]*/
additionally avoids matching whitespace which may or may not be what you need.
Edit:
You're asking how to match letters (and possibly dots?) outside of quoted sections anywhere in the text. This is more complicated. A regex that can correctly identify whether it's currently outside of a quoted string (by making sure that the number of quotes, excluding escaped quotes and nested quotes, is even) looks like this as a PHP regex:
'/(?:
(?= # Assert even number of (relevant) single quotes, looking ahead:
(?:
(?:\\\\.|"(?:\\\\.|[^"\\\\])*"|[^\\\\\'"])*
\'
(?:\\\\.|"(?:\\\\.|[^"\'\\\\])*"|[^\\\\\'])*
\'
)*
(?:\\\\.|"(?:\\\\.|[^"\\\\])*"|[^\\\\\'])*
$
)
(?= # Assert even number of (relevant) double quotes, looking ahead:
(?:
(?:\\\\.|\'(?:\\\\.|[^\'\\\\])*\'|[^\\\\\'"])*
"
(?:\\\\.|\'(?:\\\\.|[^\'"\\\\])*\'|[^\\\\"])*
"
)*
(?:\\\\.|\'(?:\\\\.|[^\'\\\\])*\'|[^\\\\"])*
$
)
([A-Za-z.]+) # Match ASCII letters/dots
)+/x'
An explanation can be found here. But probably a regex isn't the right tool for this.

You could also try this:
preg_match_all('/[\w.]+(?=(?:[^\'"]|[\'"][^\'"]*["\'])*$)/', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# Matched text = $result[0][$i];
}
To match all letters, digits and _ and dots outside your quotes. You can extend your allowable characters by adding them into [\w.]

The trick I use here is to force the regex to branch whenever it encounters a quote, then later on we ignore this branch.
$subject = <<<END
foo == 'bar test baz' and test.asd = "buz foo"
foo == 5 AND bar != 'buz' OR fuz == 'foo bar fuz luz'
END;
$regexp = '/(?:["\'][^"\']+["\']|([a-zA-Z\\.]+\b))/';
preg_replace_callback($regexp, function($matches) {;
if( count($matches) >= 2 ) {
print trim($matches[1]).' ';
}
}, $subject);
// Output: 'foo and test.asd foo AND bar OR fuz '
The main part of the regexp is
(?: anything between quotes | any word consisting of a-zA-Z )

Related

Find and replace string with condition in php

I am newbie in PHP. I want to replace certain characters in a string. My code is in below:
$str="this 'is' a new 'string and i wanna' replace \"in\" \"it here\"";
$find = [
'\'',
'"'
];
$replace = [
['^', '*']
['#', '#']
];
$result = null;
$odd = true;
for ($i=0; $i < strlen($str); $i++) {
if (in_array($str[$i], $find)) {
$key = array_search($str[$i], $find);
$result .= $odd ? $replace[$key][0] : $replace[$key][1];
$odd = !$odd;
} else {
$result .= $str[$i];
}
}
echo $result;
the output of the above code is:
this ^is* a new ^string and i wanna* replace #in# #it here#.
but I want the output to be:
this ^is* a new 'string and i wanna' replace #in# "it here".
That means character will replace for both quotation(left quotation and right quotation- condition is for ' and "). for single quotation, string will not be replaced either if have left or right quotation. it will be replaced for left and right quotation.

Ok, I don't know what all that code is trying to accomplish.
But anyway here is my go at it
$str = "this 'is' a new 'string and i wanna' replace \"in\" \"it here\"";
$str = preg_replace(["/'([^']+)'/",'/"([^"]+)"/'], ["^$1*", "#$1#"], $str, 1);
print_r($str);
You can test it here
Ouptput
this ^is* a new 'string and i wanna' replace #in# "it here"
Using preg_replace and a fairly simple Regular expression, we can replace the quotes. Now the trick here is the fourth parameter of preg_replace is $count And is defined as this:
count If specified, this variable will be filled with the number of replacements done.
Therefore, setting this to 1 limits it to the first match only. In other words it will do $count replacements, or 1 in this case. Now because it's an array of patterns, each pattern is treated separately. So each one is basically treated as a separate operation, and thus each is allowed $count matches, or each get 1 match/replacement.
Now rather or not this fits every use case you have I cannot say, but it's the most straight forward way to do it for the example you provided.
As for the match itself /'([^']+)'/
/ opening and closing "delimiters" for the Expression (its a required thing, although it doesn't have to be /)
' literal match, matches ' one time (the opening quote)
( ... ) capture group (group1) so we can use it in the replacement, as $1
[^']+ character set with a [^ not modifier, match anything not in the set, so anything that is not a ' one or more times, greedy
' literal match, matches ' one time (the ending quote)
The replacement "^$1*"
^ literal, adds this char in
$1 use the contents of the capture group (group1)
* literal, adds the char in
Hope that helps understand how it works.
UPDATE
Ok I think I finally deciphered what you want:
string will be replaced for if any word have left and right quotation. example..'word'..here string will be changed..but 'word...in this case not change or word' also not be changed.
This seems like you are trying to say only "whole" words with no spaces.
So in that case we have to adjust our regular expression like this:
$str = preg_replace(["/'([-\w]+)'/",'/"([-\w]+)"/'], ["^$1*", "#$1#"], $str);
So we removed the limit $count and we changed what is in the character group to be more strict:
[-\w]+ the \w means the working set, or in other words a-zA-Z0-9_ then the - is a literal (it has to/should go first in this case)
What we are saying with this is to match only strings that start and end with a quote(single|double) and only if the string within them match the working set plus the hyphen. This does not include the space. This way in the first case, your example, it produces the same result, but if you were to flip it to
//[ORIGINAL] this 'is' a new 'string and i wanna' replace \"in\" \"it here\"
this a new 'string and i wanna' replace 'is' \"it here\" \"in\"
You would get his output
this a new 'string and i wanna' replace ^is* \"it here\" #in#
Before this change you would have gotten
this a new ^string and i wanna* replace 'is' #it here# "in"
In other words it would have only replaced the first occurrence, now it will replace anything between the quotes if and only if it's a whole word.
As a final note you can be even more strict if you only want alpha characters by changing the character set to this [a-zA-Z]+, then it will match only a to z, upper or lower case. Whereas the example above will match 0 to 9 (or any combination of them) the - hyphen, the _ underline and the previously mentioned alpha sets.
Hope that is what you need.

Regex rules in an array

Maybe it can not be solved this issue as I want, but maybe you can help me guys.
I have a lot of malformed words in the name of my products.
Some of them has leading ( and trailing ) or maybe one of these, it is same for / and " signs.
What I do is that I am explode the name of the product by spaces, and examines these words.
So I want to replace them to nothing. But, a hard drive could be 40GB ATA 3.5" hard drive. I need to process all the word, but I can not use the same method for 3.5" as for () or // because this 3.5" is valid.
So I only need to replace the quotes, when it is at the start of the string AND at end of the string.
$cases = [
'(testone)',
'(testtwo',
'testthree)',
'/otherone/',
'/othertwo',
'otherthree/',
'"anotherone',
'anothertwo"',
'"anotherthree"',
];
$patterns = [
'/^\(/',
'/\)$/',
'~^/~',
'~/$~',
//Here is what I can not imagine, how to add the rule for `"`
];
$result = preg_replace($patterns, '', $cases);
This is works well, but can it be done in one regex_replace()? If yes, somebody can help me out the pattern(s) for the quotes?
Result for quotes should be this:
'"anotherone', //no quote at end leave the leading
'anothertwo"', //no quote at start leave the trailin
'anotherthree', //there are quotes on start and end so remove them.

You may use another approach: rather than define an array of patterns, use one single alternation based regex:
preg_replace('~^[(/]|[/)]$|^"(.*)"$~s', '$1', $s)
See the regex demo
Details:
^[(/] - a literal ( or / at the start of the string
| - or
[/)]$ - a literal ) or / at the end of the string
| - or
^"(.*)"$ - a " at the start of the string, then any 0+ characters (due to /s option, the . matches a linebreak sequence, too) that are captured into Group 1, and " at the end of the string.
The replacement pattern is $1 that is empty when the first 2 alternatives are matched, and contains Group 1 value if the 3rd alternative is matched.
Note: In case you need to replace until no match is found, use a preg_match with preg_replace together (see demo):
$s = '"/some text/"';
$re = '~^[(/]|[/)]$|^"(.*)"$~s';
$tmp = '';
while (preg_match($re, $s) && $tmp != $s) {
$tmp = $s;
$s = preg_replace($re, '$1', $s);
}
echo $s;

This works
preg_replace([[/(]?(.+)[/)]?|/\"(.+)\"/], '$1', $string)

Find words starting and ending with dollar signs $ in PHP

I am looking to find and replace words in a long string. I want to find words that start looks like this: $test$ and replace it with nothing.
I have tried a lot of things and can't figure out the regular expression. This is the last one I tried:
preg_replace("/\b\\$(.*)\\$\b/im", '', $text);
No matter what I do, I can't get it to replace words that begin and end with a dollar sign.

Use single quotes instead of double quotes and remove the double escape.
$text = preg_replace('/\$(.*?)\$/', '', $text);
Also a word boundary \b does not consume any characters, it asserts that on one side there is a word character, and on the other side there is not. You need to remove the word boundary for this to work and you have nothing containing word characters in your regular expression, so the i modifier is useless here and you have no anchors so remove the m (multi-line) modifier as well.
As well * is a greedy operator. Therefore, .* will match as much as it can and still allow the remainder of the regular expression to match. To be clear on this, it will replace the entire string:
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*)\$/', '', $text));
# => string(0) ""
I recommend using a non-greedy operator *? here. Once you specify the question mark, you're stating (don't be greedy.. as soon as you find a ending $... stop, you're done.)
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*?)\$/', '', $text));
# => string(10) " bar quz "
Edit
To fix your problem, you can use \S which matches any non-white space character.
$text = '$20.00 is the $total$';
var_dump(preg_replace('/\$\S+\$/', '', $text));
# string(14) "$20.00 is the "

There are three different positions that qualify as word boundaries \b:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
$ is not a word character, so don't use \b or it won't work. Also, there is no need for the double escaping and no need for the im modifiers:
preg_replace('/\$(.*)\$/', '', $text);
I would use:
preg_replace('/\$[^$]+\$/', '', $text);

You can use preg_quote to help you out on 'quoting':
$t = preg_replace('/' . preg_quote('$', '/') . '.*?' . preg_quote('$', '/') . '/', '', $text);
echo $t;
From the docs:
This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -

Contrary to your use of word boundary markers (\b), you actually want the inverse effect (\B)-- you want to make sure that there ISN'T a word character next to the non-word character $.
You also don't need to use capturing parentheses because you are not using a backreference in your replacement string.
\S+ means one or more non-whitespace characters -- with greedy/possessive matching.
Code: (Demo)
$text = '$foo$ boo hi$$ mon$k$ey $how thi$ $baz$ bar $foobar$';
var_export(
preg_replace(
'/\B\$\S+\$\B/',
'',
$text
)
);
Output:
' boo hi$$ mon$k$ey $how thi$ bar '

Count the number of quotes in a string that's not preceded by a backslash

I'm using the following expression to find the number of occurences of ' and " in a string I don't want the count to include \' or \".
$subStr = 'asdf"asdf""a\\"sdf\'asdf\'\'a\\\'sdf';
preg_match_all('/[^\\\\]\'|[^\\\\]\"/', $subStr, $matches);
echo count($matches[0]);
I expect it to return 6 but it only returns 4. I think this is because the strings "" and '' are only count once.
This is what $matches contain:
Array
(
[0] => Array
(
[0] => f"
[1] => f"
[2] => f'
[3] => f'
)
)
Is there any way I can get the count of 6? Note that I also need to exclude the \" and \'.

Why doesn't it work
You can't use a character class to match a character not preceded by another character. This is because a character class (negated or not) must still match a character. For example, [^a]b does not mean "b not preceded by a". It means: "a character that's not a followed by b".
The Solution
If you want to match a single-quote or double-quote character not preceded by a backslash, then you'll have to use a lookaround expression (a negative lookbehind, specifically).
The regex you're looking for is (?<!\\\\)[\'"].
Autopsy:
(?<! - start of the lookbehind expression
\\\\ - match a literal backslash character
) - end of the lookbehind expression
[\'"] - character class that matches a single character from the list "'
Visual Representation:
This effectively matches any single-quote / double-quote character that is not preceded by a literal backslash character.
Using the above expression with preg_match_all is simple:
$subStr = 'asdf"asdf""a\\"sdf\'asdf\'\'a\\\'sdf';
preg_match_all('/(?<!\\\\)[\'"]/', $subStr, $matches);
echo count($matches[0]); // => 6
Demo

preg_match_all('/([\'"])/', $subStr, $matches);
Alternately:
print count(preg_split('/[\'"]/', $subStr)) - 1;
Update: if you want to escape \' or \"
preg_match_all('/(?<!\\\)([\'"])/', $subStr, $matches);

You could, of course, go for a non-regex approach too:
$number = substr_count($string, '"') + substr_count($string, "'");

Try this...
$subStr = 'asdf"asdf""asdf\'asdf\'\'asdf';
preg_match_all('/["\']/', $subStr, $matches);
echo count($matches[0]);
Demo

Regex: match two adjacent strings

I have some code such as:
if('hello' == 2 && 'world' !== -1){
return true;
}
I'm having some trouble matching the condition in the if statement. The first regex I thought of was /'.*'/, but this matches:
'hello'
' == 2 && '
'world'
which isn't what I was hoping for. I only want to match the single quotes and the text inside.
'hello'
'world'
Any body have any idea?

Try this
preg_match_all('/\'[^\'\r\n]*\'/m', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
# Matched text = $result[0][$i];
}
Explanation
"
' # Match the character “'” literally
[^'\\r\\n] # Match a single character NOT present in the list below
# The character “'”
# A carriage return character
# A line feed character
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
' # Match the character “'” literally
"

The two matching groups int this should pick up your quoted values:
^.*(\'.*?\').*(\'.*?\').*$

For your specific case
\'[a-z]*?\'
For the entire code, if you have uppercase characters in the quotes, you can use
\'[a-zA-Z]*?\'
However, if you have special characters in the quotes as well, then you can use what #Chris Cooper suggested. Depending on your need, there are a variety of answers possible.
Note: '?' after * makes * non-greedy, so it wont try to search till the last quote.
It also matters which regex method you use to get the answers.

Here's what I came up with!
preg_match_all("#'[^'\n\r]*'#", $subject, $matches);
Match '.
Match any character that is not ', new line, or carriage return.
Match '.
Without all of the escaping, I think it's a bit more readable—for being a regular expression, anyway.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP regular expression match text not within strings - php

Related

Find and replace string with condition in php

Regex rules in an array

Find words starting and ending with dollar signs $ in PHP

Count the number of quotes in a string that's not preceded by a backslash

Regex: match two adjacent strings

Categories

Resources