i am looking for a regex that can contain special chracters like / \ . ' "
in short i would like a regex that can match the following:
may contain lowercase
may contain uppercase
may contain a number
may contain space
may contain / \ . ' "
i am making a php script to check if a certain string have the above or not, like a validation check.
The regular expression you are looking for is
^[a-z A-Z0-9\/\\.'"]+$
Remember if you are using PHP you need to use \ to escape the backslashes and the quotation mark you use to encapsulate the string.
In PHP using preg_match it should look like this:
preg_match("/^[a-z A-Z0-9\\/\\\\.'\"]+$/",$value);
This is a good place to find the regular expressions you might want to use.
http://regexpal.com/
You can always escape them by appending a \ in front of the special characters.
try this:
preg_match("/[A-Za-z0-9\/\\.'\"]/", ...)
NikoRoberts is 100% correct.
I would only add the following suggestion: When creating a PHP regex pattern string, always use: single-quotes. There are far fewer chars which need to be escaped (i.e. only the single quote and the backslash itself needs to be escaped (and the backslash only needs to be escaped if it appears at the end of the string)).
When dealing with backslash soup, it helps to print out the (interpreted) regex string. This shows you exactly what is being presented to the regex engine.
Also, a "number" might have an optional sign? Yes? Here is my solution (in the form of a tested script):
<?php // test.php 20110311_1400
$data_good = 'abcdefghijklmnopqrstuvwxyzABCDE'.
'FGHIJKLMNOPQRSTUVWXYZ0123456789+- /\\.\'"';
$data_bad = 'abcABC012~!###$%^&*()';
$re = '%^[a-zA-Z0-9+\- /\\\\.\'"]*$%';
echo($re ."\n");
if (preg_match($re, $data_good)) {
echo("CORRECT: Good data matches.\n");
} else {
echo("ERROR! Good data does NOT match.\n");
}
if (preg_match($re, $data_bad)) {
echo("ERROR! Bad data matches.\n");
} else {
echo("CORRECT: Bad data does NOT match.\n");
}
?>
The following regex will match a single character that fits the description you gave:
[a-zA-Z0-9\ \\\/\.\'\"]
If your point is to insure that ONLY characters in this range of characters are used in your string, then you can use the negation of this which would be:
[^a-zA-Z0-9\ \\\/\.\'\"]
In the second case, you could use your regex to find the bad stuff (that you don't want to be included), and if it didn't find anything then your string pattern must be kosher, because I'm assuming that if you find one character that is not in the proper range, then your string is not valid.
so to put it in PHP syntax:
$regex = "[^a-zA-Z0-9\ \\\/\.\'\"]"
if preg_match( $regex, ... ) {
// handle the bad stuff
}
Edit 1:
I've completely ignored the fact that backslashes are special in php double-quoted strings, so here is a correcting to the above code:
$regex = "[^a-zA-Z0-9\\ \\\\\\/\\.\\'\\\"]"
If that doesn't work it shouldn't take too much for someone to debug how many of the backslashes need to be escaped with a backslash, and what other characters need also to be escaped....
Related
So I'm trying to check for match and if match, extract a variable name out of a string. The variable name should be preceded by "$" and cannot be escaped with "\", so for example "$name" should extract "name" and "\$name" or "name" shouldn't match. Heres the command:
$match = preg_match("/^(?<!\\)(\$.*)$/", $potential, $name);
I constructed and tested it using regex101.com and it works there, however, I'm getting an error from PHP saying
"preg_match(): Compilation failed: missing ) at offset 13 in ..."
and I have no clue what its referring to.
My thought is that you will need to escape certain characters to consume the regular expression in PHP
$match = preg_match('/^(?<!\\\\)(\$.*)$/', $potential, $name);
Edit: the backslash is the escape character in both Regex and PHP, you will need to doubly escape the slashes.
You've escaped a bracket:
preg_match('/^(?<!\\) <----HERE
FYI you can use several other delimiters to make your regex's more readable. Because so often we have slashes and escaped chars, then using '/' makes it hard to read. Consider using '#' or '~' or even '#' to increase readability.
Also reL your online regex tool of choice, it depends on which regular expression implementation (and version) the service uses, as to how accurate your results. I always use rubular.com (Uses PCRE) but for PHP you can use phpliveregex.com
I am porting code from Node.js to PHP and keep getting errors with this regular expression:
^/[a-z0-9]{6}([^0-9a-z]|$)
PHP complains about a dollar sign:
Unknown modifier '$'
In JavaScript I was able to check if a string was ending with [^0-9a-z] or END OF STRING.
How do I do this in PHP with preg_match()?
My PHP code looks like this:
<?
$sExpression = '^/[a-z0-9]{6}([^0-9a-z]|$)';
if (preg_match('|' . $sExpression . '|', $sUrl)) {
// ...
}
?>
The JavaScript code was similar to this:
var sExpression = '^/[a-z0-9]{6}([^0-9a-z]|$)';
var oRegex = new RegExp(sExpression);
if (oRegex.test(sUrl)) {
// ...
}
To match a string that starts with a slash, followed by six alphanumerics and is then followed by either the end-of-string or something that's not alphanumeric:
preg_match('~^/[a-z0-9]{6}([^0-9a-z]|$)~i', $str);
The original JavaScript probably used new RegExp(<expression>), but PCRE requires a proper enclosure of the expression; those are the ~ characters I've put in the above code. Btw, I've made the expression case insensitive by using the i modifier; feel free to remove it if not desired.
You were using | as the enclosure; as such, you should have escaped the pipe character inside the expression, but by doing so you would have changed the meaning. It's generally best to choose delimiters that do not have a special meaning in an expression; it also helps to choose delimiters that don't occur as such in the expression, e.g., my choice of ~ avoids having to escape any character.
Expressions in PCRE can be generalised as:
<start-delimiter> stuff <end-delimiter> modifiers
Typically, the starting delimiter is the same as the ending delimiter, except for cases such as [expression]i or {expression}i whereby the opening brace is matched with the closing brace :)
Fix the regular expression first:
^/[a-z0-9]{6}([^0-9a-z]|$)
Try this.
As others pointed out, I'm an idiot and saw a / as a \ ... LOL.
Ok, well go at this again,
I’d avoid using the "|" and just do it this way.
if (preg_match('/^\/[a-z0-9]{6}([^0-9a-z]|$)/', $sUrl)) { ... }
Reducing this to just matching a particular character or end of string (PHP),
\D777(\D|$)\
This will match:
xxx777xxx or xxx777 but not xxx7777 or xxx7777xxx
I'm getting a lot of text values from my database that I need to output with slashes added before characters that need to be quoted.
Problem is that some of the data already has the slashes added there from before, whilst some of it doesn't.
How can I add slashes using for example addslashes() - but at the same time make sure that it doesn't add an extra slash in the cases where the slash is already added?
Example:
Input: test
Output should be: test
Input: test
Output should be: test
This is PHP 5.3.10.
If you know that you don't have any double slashes, simply run addslashes() and then replace all \\ with \.
If you have something like this:
test
Using addslashes(), the output will be:
test
So, you may need to replace every occurrence of more than one \ to be sure
function addslashes($string) {
return preg_replace('/([^\\\\])\"/','$1\"',$string);
}
The answer of Qaflanti is correct but I would like to make it more complete, if you want to escape both single and double quotes.
First option :
function escape_quotes($string) {
return preg_replace("/(^|[^\\\\])(\"|')/","$1\\\\$2", $string);
}
Input
I love \"carots\" but "I" don't like \'cherries\'
Output
I love \"carots\" but \"I\" don\'t like \'cherries\'
Explanation :
The \ has a special meaning inside a quoted expression and will escape the following character in the string, so while you would need to write \\ in a regex to search for the backslash character, in php you need to escape those two backslashes also, adding up to a total of 4 backslashes.
So with that in mind, the first capturing group then searches for a single character that is not a backslash (and not two or four backslashes as misleading as it is)
The second capturing group will search for a double or a single quote
exactly once.
So this finds unescaped quotes (double and single) and add a backslash before the quote thus escaping it.
Second option :
Or it might just be best for you to convert them to html entities from the start :
function htmlentities_quotes($string) {
return str_replace(array('"', "'"), array(""", "'"), $string);
}
And then you just have to use the php function htmlspecialchars_decode($string); to revert it back to how it was.
Input
I love "carots" but "I" don't like 'cherries'
Output
I love "carots" but "I" don't like
'cherries'
I would like to replace extra spaces (instances of consecutive whitespace characters) with one space, as long as those extra spaces are not in double or single quotes (or any other enclosures I may want to include).
I saw some similar questions, but I could not find a direct response to my needs above. Thank you!
Hope you're still looking, or come back to check! This seems to work for me:
'/\s+((["\']).*?(?=\2)\2)|\s\s+/'
...and replace with $1
EDIT
Also, if you need to allow for escaped quotes like \" or \', you could use this expression:
'/\s+((["\'])(\\\\\2|(?!\2).)*?(?=\2)\2)|\s\s+/'
It gets a bit stickier if you want to add support for "balanced" quotes like brackets (e.g. () or {})
END EDIT
Let me know if you find problems or would like some explanation!
HOPEFULLY FINAL EDIT AND WARNINGS
Potential problem: If a quoted string starts at the beginning of the string variable (or file), it will either not count as a quoted string (and have any whitespace reduced) or it will throw off the whole thing, making anything NOT in quotes get treated as though it was in quotes and vice versa -
A potential change that might remedy this is to use the following match expression
/(?:^|\s+)((["\'])(\\\\\2|(?!\2).)*?(?=\2)\2)|\s\s+/
this replaces \s+ with (?:^|\s+) at the beginning of the expression
this will add a space at the beginning of the variable if the string starts with a quote - just trim() or remove that whitespace to continue
I seem to have used the "line by line" approach (like sed, if I'm not mistaken) to reach my original results - if you use the "whole file" or "whole string" setting or approach, carriage-return-line-feed seems to count as two whitespace characters (can't imagine why...), thus turning any newlines into single spaces (unless they are inside quotes and "dot-matches-newline" is used, of course)
this could be resolved by replacing the . and \s shorthand character classes with the specific characters you want to match, like the following:
/(?:^|[ \t]+)((["\'])(\\\\\2|(?!\2)[\s\S])*?(?=\2)\2)|[ \t]{2,}/
this does not require the dot-matches-newline switch and only replaces multiple spaces or tabs - not newlines - with a single space (and of course, only if they are not quoted)
EXAMPLE
This link shows an example of the first expression and last expression in use on sample text on http://codepad.viper-7.com
You could do it in several steps. Consider the following example:
$str = 'This is a string with "Bunch of extra spaces". Leave them "untouched !".';
$id = 0;
$buffer = array();
$str = preg_replace_callback('|".*?"|', function($m) use (&$id, &$buffer) {
$buffer[] = $m[0];
return '__' . $id++;
}, $str);
$str = preg_replace('|\s+|', ' ', $str);
$str = preg_replace_callback('|__(\d+)|', function($m) use ($buffer) {
return $buffer[$m[1]];
}, $str);
echo $str;
This will output the string:
This is a string with "Bunch of extra spaces". Leave them "untouched !".
Although this is is not the prettiest solution.
/^"((?:[^"]|\\.)*)"/
Against this string:
"quote\_with\\escaped\"characters" more
It only matches until the \", although I've clearly defined \ as an escape character (and it matches \_ and \\ fine...).
It works correctly if you flip the order of your two alternatives:
/^"((?:\\.|[^"])*)"/
The problem is that otherwise the important \ character gets eaten up before it tries matching \". It worked before for \\ and \_ only because both characters in either pair get matched by your [^"].
Using Python with raw-string literals to ensure no further interpretation of escape sequences is taking place, the following variant does work:
import re
x = re.compile(r'^"((?:[^"\\]|\\.)*)"')
s = r'"quote\_with\\escaped\"characters" more"'
mo = x.match(s)
print mo.group()
emits "quote\_with\\escaped\"characters"; I believe that in your version (which also interrupts the match precociously if substituted in here) the "not a doublequote" subexpression ([^"]) is swallowing the backslashes that you intend to be taken as escaping the immediately-following characters. All I'm doing here is ensuring that such backslashes are NOT swallowed in this way, and, as I said, it seems to work with this change.
Not intend to confuse, just another information I've played around with. Below regexp(PCRE) try to not match wrong syntax (eg. end with \") and can use with both ' or "
/('|").*\\\1.*?[^\\]\1/
to use with php
<?php if (preg_match('/(\'|").*\\\\\1.*?[^\\\\]\1/', $subject)) return true; ?>
For:
"quote\_with\\escaped\"characters" "aaa"
'just \'another\' quote "example\"'
"Wrong syntax \"
"No escapes, no match here"
This only match:
"quote\_with\\escaped\"characters" and
'just \'another\' quote "example\"'