How to match a quoted string with escaped quotes in it? - php

/^"((?:[^"]|\\.)*)"/
Against this string:
"quote\_with\\escaped\"characters" more
It only matches until the \", although I've clearly defined \ as an escape character (and it matches \_ and \\ fine...).

It works correctly if you flip the order of your two alternatives:
/^"((?:\\.|[^"])*)"/
The problem is that otherwise the important \ character gets eaten up before it tries matching \". It worked before for \\ and \_ only because both characters in either pair get matched by your [^"].

Using Python with raw-string literals to ensure no further interpretation of escape sequences is taking place, the following variant does work:
import re
x = re.compile(r'^"((?:[^"\\]|\\.)*)"')
s = r'"quote\_with\\escaped\"characters" more"'
mo = x.match(s)
print mo.group()
emits "quote\_with\\escaped\"characters"; I believe that in your version (which also interrupts the match precociously if substituted in here) the "not a doublequote" subexpression ([^"]) is swallowing the backslashes that you intend to be taken as escaping the immediately-following characters. All I'm doing here is ensuring that such backslashes are NOT swallowed in this way, and, as I said, it seems to work with this change.

Not intend to confuse, just another information I've played around with. Below regexp(PCRE) try to not match wrong syntax (eg. end with \") and can use with both ' or "
/('|").*\\\1.*?[^\\]\1/
to use with php
<?php if (preg_match('/(\'|").*\\\\\1.*?[^\\\\]\1/', $subject)) return true; ?>
For:
"quote\_with\\escaped\"characters" "aaa"
'just \'another\' quote "example\"'
"Wrong syntax \"
"No escapes, no match here"
This only match:
"quote\_with\\escaped\"characters" and
'just \'another\' quote "example\"'

Related

php - validate street address with regular expression

I am using php version 7.0.14. In spite of several good examples on stackoverflow, I cannot get my php regex to work. I've been trying for hours every conceivable combination. The problem comes in trying to allow periods and slashes, which must be escaped. I have tried enclosing the regex in double and single quotes. I have tried escaping with one backslash, two, three, four. It either errors out, lets everything through (like $) or does not allow periods and slashes.
$strStreet = "123 1/2 S. Main St. Apt. 1";
#$strRegEx = "/^[a-z0-9 ,#-'\/]{3,50}$/i";
$strRegEx = '/^[a-z0-9 ,#-\'\/]{3,50}$/i';
if (preg_match($strRegEx, $strStreet) === 0 ) {
print "bad address";
}
Thanks in advance for any help.
You have two issues here:
there is no . in your character class (in a character class the . doesn't need to be escaped)
the - in a character class needs to be escaped, or moved to the front or end (in other regex engines the escaping isn't available). On its own it creates a range of the characters on each side of it.
so:
^[-a-z0-9 ,#'\/.]{3,50}$
should work for you. (also if you use a different delimiter the forward slash won't need to be escaped)
Demo: https://regex101.com/r/PfZAlO/1/

addslashes() only if slashes are not added from before

I'm getting a lot of text values from my database that I need to output with slashes added before characters that need to be quoted.
Problem is that some of the data already has the slashes added there from before, whilst some of it doesn't.
How can I add slashes using for example addslashes() - but at the same time make sure that it doesn't add an extra slash in the cases where the slash is already added?
Example:
Input: test
Output should be: test
Input: test
Output should be: test
This is PHP 5.3.10.
If you know that you don't have any double slashes, simply run addslashes() and then replace all \\ with \.
If you have something like this:
test
Using addslashes(), the output will be:
test
So, you may need to replace every occurrence of more than one \ to be sure
function addslashes($string) {
return preg_replace('/([^\\\\])\"/','$1\"',$string);
}
The answer of Qaflanti is correct but I would like to make it more complete, if you want to escape both single and double quotes.
First option :
function escape_quotes($string) {
return preg_replace("/(^|[^\\\\])(\"|')/","$1\\\\$2", $string);
}
Input
I love \"carots\" but "I" don't like \'cherries\'
Output
I love \"carots\" but \"I\" don\'t like \'cherries\'
Explanation :
The \ has a special meaning inside a quoted expression and will escape the following character in the string, so while you would need to write \\ in a regex to search for the backslash character, in php you need to escape those two backslashes also, adding up to a total of 4 backslashes.
So with that in mind, the first capturing group then searches for a single character that is not a backslash (and not two or four backslashes as misleading as it is)
The second capturing group will search for a double or a single quote
exactly once.
So this finds unescaped quotes (double and single) and add a backslash before the quote thus escaping it.
Second option :
Or it might just be best for you to convert them to html entities from the start :
function htmlentities_quotes($string) {
return str_replace(array('"', "'"), array(""", "&apos;"), $string);
}
And then you just have to use the php function htmlspecialchars_decode($string); to revert it back to how it was.
Input
I love "carots" but "I" don't like 'cherries'
Output
I love "carots" but "I" don&apos;t like
&apos;cherries&apos;

what does the following php regular expression evaluates to?

I come across a php regular expression, mentioned below, I am not sure why \q\ is used in it, can anybody help me to understand this?
$strBuildTitle="SOME URL";
$patterns[0] = "/[^a-zA-Z0-9\q\ ]/";
$replacements[0] = " ";
$strBuildTitle = preg_replace($patterns, $replacements, $strBuildTitle);
I believe it tries to remove any non-alpha-numeric character from the given url, not sure why \q\ is used here. Is is related with removal of quotes?
\q and \ aren't valid escape sequences.
In double quoted strings, it's PHP's policy to ignore those and replace them with their apparent value, meaning \ simply becoming and \q becoming q. The latter case already being covered by [a-z].

regex with special characters?

i am looking for a regex that can contain special chracters like / \ . ' "
in short i would like a regex that can match the following:
may contain lowercase
may contain uppercase
may contain a number
may contain space
may contain / \ . ' "
i am making a php script to check if a certain string have the above or not, like a validation check.
The regular expression you are looking for is
^[a-z A-Z0-9\/\\.'"]+$
Remember if you are using PHP you need to use \ to escape the backslashes and the quotation mark you use to encapsulate the string.
In PHP using preg_match it should look like this:
preg_match("/^[a-z A-Z0-9\\/\\\\.'\"]+$/",$value);
This is a good place to find the regular expressions you might want to use.
http://regexpal.com/
You can always escape them by appending a \ in front of the special characters.
try this:
preg_match("/[A-Za-z0-9\/\\.'\"]/", ...)
NikoRoberts is 100% correct.
I would only add the following suggestion: When creating a PHP regex pattern string, always use: single-quotes. There are far fewer chars which need to be escaped (i.e. only the single quote and the backslash itself needs to be escaped (and the backslash only needs to be escaped if it appears at the end of the string)).
When dealing with backslash soup, it helps to print out the (interpreted) regex string. This shows you exactly what is being presented to the regex engine.
Also, a "number" might have an optional sign? Yes? Here is my solution (in the form of a tested script):
<?php // test.php 20110311_1400
$data_good = 'abcdefghijklmnopqrstuvwxyzABCDE'.
'FGHIJKLMNOPQRSTUVWXYZ0123456789+- /\\.\'"';
$data_bad = 'abcABC012~!###$%^&*()';
$re = '%^[a-zA-Z0-9+\- /\\\\.\'"]*$%';
echo($re ."\n");
if (preg_match($re, $data_good)) {
echo("CORRECT: Good data matches.\n");
} else {
echo("ERROR! Good data does NOT match.\n");
}
if (preg_match($re, $data_bad)) {
echo("ERROR! Bad data matches.\n");
} else {
echo("CORRECT: Bad data does NOT match.\n");
}
?>
The following regex will match a single character that fits the description you gave:
[a-zA-Z0-9\ \\\/\.\'\"]
If your point is to insure that ONLY characters in this range of characters are used in your string, then you can use the negation of this which would be:
[^a-zA-Z0-9\ \\\/\.\'\"]
In the second case, you could use your regex to find the bad stuff (that you don't want to be included), and if it didn't find anything then your string pattern must be kosher, because I'm assuming that if you find one character that is not in the proper range, then your string is not valid.
so to put it in PHP syntax:
$regex = "[^a-zA-Z0-9\ \\\/\.\'\"]"
if preg_match( $regex, ... ) {
// handle the bad stuff
}
Edit 1:
I've completely ignored the fact that backslashes are special in php double-quoted strings, so here is a correcting to the above code:
$regex = "[^a-zA-Z0-9\\ \\\\\\/\\.\\'\\\"]"
If that doesn't work it shouldn't take too much for someone to debug how many of the backslashes need to be escaped with a backslash, and what other characters need also to be escaped....

How do I match a square bracket literal using RegEx?

What's the regex to match a square bracket? I'm using \\] in a pattern in eregi_replace, but it doesn't seem to be able to find a ]...
\] is correct, but note that PHP itself ALSO has \ as an escape character, so you might have to use \\[ (or a different kind of string literal).
Works flawlessly:
<?php
$hay = "ab]cd";
echo eregi_replace("\]", "e", $hay);
?>
Output:
abecd
There are two ways of doing this:
/ [\]] /x;
/ \] /x;
While you may consider the latter as the better option, and indeed I would consider using it in simpler regexps. I would consider the former, the better option for larger regexps. Consider the following:
/ (\w*) ( [\d\]] ) /x;
/ (\w*) ( \d | \] ) /x;
In this example, the former is my preferred solution. It does a better job of combining the separate entities, which may each match at the given location. It may also have some speed benefits, depending on implementation.
Note: This is in Perl syntax, partly to ensure proper highlighting.
In PHP you may need to double up on the back-slashes.
"[\\]]" and "\\]"
You don't need to escape it: if isolated, a ] is treated as a regular character.
Tested with eregi_replace and preg_replace.
[ is another beast, you have to escape it. Looks like single and double quotes, single or double escape are all treated the same by PHP, for both regex families.
Perhaps your problem is elsewhere in your expression, you should give it in full.
In .Net you escape special characters by adding up a backslash; "\" meaning it would become; "["...
Though since you normally do this in string literals you would either have to do something like this;
#"\["
or something like this;
"\\["
You problem may come from the fact you are using eregi_replace with the first parameter enclosed in simple quotes:
'\['
In double quotes, though, it could works well depending on the context, since it changes the way the parameter is passed to the function (simple quotes just pass the string without any interpretation, hence the need to double to "\" character).
Here, if "\[" is interpreted as an escape character, you still need to double "\".
Note: based on your comment, you may try the regex
<\s*(?:br|p)\s*\/?\s*\>\s*\[
in order to detect a [ right after a <br>or a <p>

Categories