I'm trying to remove all quote characters from a string but not those that are escaped.
Example:
#TEST string "quoted part\" which escapes" other "quoted string"
Should result in:
#TEST string quoted part\" which escapes other quoted string
I tried to achieve this using
$string = '#TEST string "quoted part\" which escapes" other "quoted string"'
preg_replace("/(?>=\\)([\"])/","", $string);
But can't seem to find a match pattern.
Any help or tip on an other approach
A very good example for (*SKIP)(*FAIL):
\\['"](*SKIP)(*FAIL)|["']
Replace this with an empty string and you're fine. See a demo on regex101.com.
In PHP this would be (you need to escape the backslash as well):
<?php
$string = <<<DATA
#TEST string "quoted part\" witch escape" other "quoted string"
DATA;
$regex = '~\\\\[\'"](*SKIP)(*FAIL)|["\']~';
$string = preg_replace($regex, '', $string);
echo $string;
?>
See a demo on ideone.com.
While (*SKIP)(*F) is a good technique all in all, it seems you may use a mere negative lookbehind in this case, where no other escape entities may appear but escaped quotes:
preg_replace("/(?<!\\\\)[\"']/","", $string);
See the regex demo.
Here, the regex matches...
(?<!\\\\) - a position inside the string that is not immediately preceded with a literal backslash (note that in PHP string literals, you need two backslashes to define a literal backslash, and to match a literal backslash with a regex pattern, the literal backslash in the string literal must be doubled since the backslash is a special regex metacharacter)
[\"'] - a double or single quote.
PHP demo:
$str = '#TEST string "quoted part\\" witch escape" other "quoted string"';
$res = preg_replace('/(?<!\\\\)[\'"]/', '', $str);
echo $res;
// => #TEST string quoted part\" witch escape other quoted string
In case backslashes may also be escaped in the input, you need to make sure you do not match a " that comes after two \\ (since in that case, a " is not escaped):
preg_replace("/(?<!\\\\)((?:\\\\{2})*)[\"']/",'$1', $string);
The ((?:\\\\{2})*) part will capture paired \s before " or ' and will put them back with the help of the $1 backreference.
May be this
$str = '#TEST string "quoted part\" witch escape" other "quoted string"';
echo preg_replace("#([^\\\])\"#", "$1", $str);
Related
There is a string in format:
else if($rule=='somerule1')
echo '{"s":1,"n":"name surname"}';
else if($rule=='somerule2')
echo '{"s":1,"n":"another text here"}';
...
"s" can have only number, "n" any text.
In input I have $rule value, and I need to remove the else if block that corresponds to this value. I am trying this:
$str = preg_replace("/else if\(\$rule=='$rule'\)\necho '{\"s\":[0-9],\"n\":\".*\"/", "", $str);
where $str is a string, that contains blocks I mentioned above, $rule is a string with rule I need to remove. But the function returns $str without changes.
What do I do wrong?
For example, script to change "s" value to 1 works nice:
$str = preg_replace("/$rule'\)\necho '{\"s\":[0-9]/", $rule."')\necho '{\"s\":1", $str);
So, probably, I am doing mistake with = symbol, or maybe with space, or with .*.
The regex pattern can be much less strict, much simpler, and far easier to read/maintain.
You need to literally match the first line (conditional expression) with the only dynamic component being the $rule variable, then match the entire line that immediately follows it.
Code: (Demo)
$contents = <<<'TEXT'
else if($rule=='somerule1')
echo '{"s":1,"n":"name surname"}';
else if($rule=='somerule2')
echo '{"s":1,"n":"another text here"}';
TEXT;
$rule = "somerule1";
echo preg_replace("~\Qelse if(\$rule=='$rule')\E\R.+~", "", $contents);
Output:
else if($rule=='somerule2')
echo '{"s":1,"n":"another text here"}';
So, what have I done? Here's the official pattern demo.
\Q...\E means "treat everything in between these two metacharacters literally"
Then the only character that needs escaping is the first $, this is not to stop it from being interpreted as a end-of-string metacharacter, but as the start of the $rule variable because the pattern is wrapped in double quotes.
The second occurrence of $rule in the pattern DOES need to be interpreted as the variable so it is not escaped.
The \R is a metacharacter which means \n, \r and \r\n
Finally match all of the next line with the "any character" . with a one or more quantifier (+).
The pattern does not match because you have to use a double escape to match the backslash \\\$.
Apart from that, you are not matching the whole line as this part ".*\" stops at the double quote before }';
$str = 'else if($rule==\'somerule1\')
echo \'{"s":1,"n":"name surname"}\';
else if($rule==\'somerule2\')
echo \'{"s":1,"n":"another text here"}\';';
$rule = "somerule1";
$pattern = "/else if\(\\\$rule=='$rule'\)\necho '{\"s\":\d+,\"n\":\"[^\"]*\"}';/";
$str = preg_replace($pattern, "", $str);
echo $str;
Output
else if($rule=='somerule2')
echo '{"s":1,"n":"another text here"}';
Php demo
I have a file that contains a collection strings. All of the strings begin with the same set of characters and end with the same character. I need to find all of the strings that match a certain pattern, and then remove particular characters from them before saving the file. Each string looks like this:
Data_*: " ... "
where Data_ is the same for each string, the asterisk is an incrementing integer that is either two or three digits, and the colon and the double quotation marks are the same for each string. The ... is completely different in every string and it's the part of each I need to work with. I need to remove all double quotation marks from the ... , preserving the enclosing double quotation marks. I don't need to replace them, just remove them.
So for example, I need this...
Data_83: "He said, "Yes!" to the question"
to become this...
Data_83: "He said, Yes! to the question"
I am familiar with PHP and would like to use this. I know how to do something like...
<?php
$filename = 'path/to/file';
$content = file_get_contents($filename);
$new_content = str_replace('"', '', $content);
file_put_contents($filename, $new_content);
And I'm pretty sure a regular expression will be what I'm wanting to use to find the strings and remove the extra double quotation marks. But I'm very new to regular expressions and need some help here.
EDIT:
I should have mentioned, the file is a PHP file containing an object. It looks a bit like this:
<?php
$thing = {
Data_83: "He said, "Yes!" to the question",
Data_84: "Another string with "unwanted" quotes"
}
You may use preg_replace_callback with a regex like
'~^(\h*Data_\d{2,}:\h*")(.*)"~m'
Note that you may make it safer if you specify an optional , at the end of the line: '~^(\h*Data_\d{2,}:\h*")(.*)",?\h*$~m' but you might need to introduce another capturing group then (around ,?\h*, and then append $m[3] in the preg_replace_callback callback function).
Details
^ - start of the line (m is a multiline modifier)
(\h*Data_\d{2,}:\h*") - Group 1 ($m[1]):
\h* - 0+ horizontal whitespaces
Data_ - Data_ substring
\d{2,} - 2 or more digits
: - a colon
\h* - 0+ horizontal whitespaces
" - double quote
(.*) - Group 2 ($m[2]): any 0+ chars other than line break chars, as many as possible, up to the last...
" - double quote (on a line).
The $m represents the whole match object, and you only need to remove the " inside $m[2], the second capture.
See the PHP demo:
preg_replace_callback('~^(\h*Data_\d{2,}:\h*")(.*)"~m', function($m) {
return $m[1] . str_replace('"', '', $m[2]) . '"';
}, $content);
Not as elegant but you could create a UDF:
function RemoveNestedQuotes($string)
{
$firstPart = explode(":", $string)[0];
preg_match('/"(.*)"/', $string, $matches, PREG_OFFSET_CAPTURE);
$tmpString = $matches[1][0];
return $firstPart . ': "' . preg_replace('/"/', '', $tmpString) . '"';
}
example:
$string = 'Data_83: "He said, "Yes!" to the question"';
echo RemoveNestedQuotes($string);
// Data_83: "He said, Yes! to the question"
One more step after str_replace with implode and explode. You can just do it like this.
<?php
$string = 'Data_83: "He said, "Yes!" to the question"';
$string = str_replace('"', '', $string);
echo $string =implode(': "',explode(': ',$string)).'"';
?>
Demo : https://eval.in/912466
Program Output
Data_83: "He said, Yes! to the question"
Just to replace " quotes
<?php
$string = 'Data_83: "He said, "Yes!" to the question"';
echo preg_replace('/"/', '', $string);
?>
Demo : https://eval.in/912457
The way I see it, you don't need to make any preg_replace_callback() calls or a convoluted run of explosions and replacements. You merely need to disqualify the 2 double quotes that you wish to retain and match the rest for removal.
Code: (Demo)
$string = 'Data_83: "He said, "Yes!" to the question",
Data_184: "He said, "WTF!" to the question"';
echo preg_replace('/^[^"]+"(*SKIP)(*FAIL)|"(?!,\R|$)/m','',$string);
Output:
Data_83: "He said, Yes! to the question",
Data_184: "He said, WTF! to the question"
Pattern Demo
/^[^"]+"(*SKIP)(*FAIL)|"(?!,?$)/m
This pattern says:
match from the start of each line until you reach the first double quote, then DISQUALIFY it.
then after the |, match all double quotes that are not optionally followed by a comma then the end of line.
While this pattern worked on regex101 with my sample input, when I transferred it to the php sandbox to whack together a demo, I needed to add \R to maintain accuracy. You can test to see which is appropriate for your server/environment.
To get a double quoted string (which I cannot change) correctly parsed I have to do following:
$string = '15 Rose Avenue\n Irlam\n Manchester';
$string = str_replace('\n', "\n", $string);
print nl2br($string); // demonstrates that the \n's are now linebreak characters
So far, so good.
But in my given string there are characters like \xC3\xA4. There are many characters like this (beginning with \x..)
How can I get them correctly parsed as shown above with the linebreak?
You can use
$str = stripcslashes($str);
You can escape a \ in single quotes:
$string = str_replace('\\n', "\n", $string);
But you're going to have a lot of potential replaces if you need to do \\xC3, etc.... best use a preg_replace_callback() with a function(callback) to translate them to bytes
I am looking to find and replace words in a long string. I want to find words that start looks like this: $test$ and replace it with nothing.
I have tried a lot of things and can't figure out the regular expression. This is the last one I tried:
preg_replace("/\b\\$(.*)\\$\b/im", '', $text);
No matter what I do, I can't get it to replace words that begin and end with a dollar sign.
Use single quotes instead of double quotes and remove the double escape.
$text = preg_replace('/\$(.*?)\$/', '', $text);
Also a word boundary \b does not consume any characters, it asserts that on one side there is a word character, and on the other side there is not. You need to remove the word boundary for this to work and you have nothing containing word characters in your regular expression, so the i modifier is useless here and you have no anchors so remove the m (multi-line) modifier as well.
As well * is a greedy operator. Therefore, .* will match as much as it can and still allow the remainder of the regular expression to match. To be clear on this, it will replace the entire string:
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*)\$/', '', $text));
# => string(0) ""
I recommend using a non-greedy operator *? here. Once you specify the question mark, you're stating (don't be greedy.. as soon as you find a ending $... stop, you're done.)
$text = '$fooo$ bar $baz$ quz $foobar$';
var_dump(preg_replace('/\$(.*?)\$/', '', $text));
# => string(10) " bar quz "
Edit
To fix your problem, you can use \S which matches any non-white space character.
$text = '$20.00 is the $total$';
var_dump(preg_replace('/\$\S+\$/', '', $text));
# string(14) "$20.00 is the "
There are three different positions that qualify as word boundaries \b:
Before the first character in the string, if the first character is a word character.
After the last character in the string, if the last character is a word character.
Between two characters in the string, where one is a word character and the other is not a word character.
$ is not a word character, so don't use \b or it won't work. Also, there is no need for the double escaping and no need for the im modifiers:
preg_replace('/\$(.*)\$/', '', $text);
I would use:
preg_replace('/\$[^$]+\$/', '', $text);
You can use preg_quote to help you out on 'quoting':
$t = preg_replace('/' . preg_quote('$', '/') . '.*?' . preg_quote('$', '/') . '/', '', $text);
echo $t;
From the docs:
This is useful if you have a run-time string that you need to match in some text and the string may contain special regex characters.
The special regular expression characters are: . \ + * ? [ ^ ] $ ( ) { } = ! < > | : -
Contrary to your use of word boundary markers (\b), you actually want the inverse effect (\B)-- you want to make sure that there ISN'T a word character next to the non-word character $.
You also don't need to use capturing parentheses because you are not using a backreference in your replacement string.
\S+ means one or more non-whitespace characters -- with greedy/possessive matching.
Code: (Demo)
$text = '$foo$ boo hi$$ mon$k$ey $how thi$ $baz$ bar $foobar$';
var_export(
preg_replace(
'/\B\$\S+\$\B/',
'',
$text
)
);
Output:
' boo hi$$ mon$k$ey $how thi$ bar '
After having some trouble building a json string I discovered some text in my database containing double quotes. I need to replace the quotes with their escaped equivalents. This works:
function escape( $str ) {
return preg_replace('/"/',"\\\"",$str);
}
but it doesn't take into account that a quote may already be escaped. How can I modify the expression so that it's only true only for a non escaped character?
You need to use a negative lookbehind here
function escape( $str ) {
return preg_replace('/(?<!\\)"/',"\\\"",$str);
}
Try first remove the '\' from all escaped doube-quotes, than escape all double-quotes.
str_replace(array('\"', '"'), array('"', '\"'), $str);
Try preg_replace('/([^\\\])"/', '$1\\"', $str);
I believe this will work
regex:
(?<!\\)((?:\\\\)*)"
code:
$re = '/(?<!\\\\)((?:\\\\\\\\)*)"/';
preg_replace($re, '$1\\"', 'foo"bar'); // foo\"bar -- slash added
preg_replace($re, '$1\\"', 'foo\\"bar'); // foo\"bar -- already escaped, nothing added
preg_replace($re, '$1\\"', 'foo\\\\"bar'); // foo\\\"bar -- not escaped, extra slash added