preg_match for . / or \ in PHP - php

I am trying to match . \ or / using preg_match in PHP.
I thought this would do it but it's matching all strings.
$string = '';
$chars = '/(\.|\\|\/)/';
if (preg_match($chars, $string) != 0) {
echo 'Chars found.';
}

Argument given to preg_match() is string. Strings are automatically escaped by PHP. For example, if you have {\\\\} (backslash) given to the regexp engine, PHP will first parse it creating {\\} (\\ is replaced by \).
Next, regexp engine parses the regexp. It sees {\\} which PHP gave to regexp engine. It sees \ as escape character, so it actually matches \ character which was escaped by \.
In your case, it looks like /(\.|\\|\/)/. PHP gives to regexp engine /(\.|\|\/)/ which is actually either . or |/ (notice that | character was escaped).
Personally, I try to avoid escaping meta-characters, especially with how regexp engine works. I usually use [.] instead, it's more readable. Your regexp written with this would look like /([.]|\\\\|[/])/.
It's possible to do few optimizations. While it's my personal thing, I prefer to use {} as delimiters (yes, you can use pairs of characters). Also, your regexp matches single characters, so you could easily write it as {[.\\\\/]}, which is very readable in my opinion (notice four slashes, it's needed because both PHP and regexp engine parse backslashes).
Also, preg_match() returns number of matches. It will be always bigger than 0, so you can easily consider it to be boolean and avoid writting == 0. Instead, you can insert ! before string to make it negative. But I think you accidentally reversed condition (it matches if it doesn't match). Valid code below:
$string = '';
$chars = '{[.\\\\/]}';
if (preg_match($chars, $string)) {
echo 'Chars found.';
}

Your if logic is flawed. preg_match will return the number of matches. Therefore, == 0 means "no matches".
That said, single quoted strings don't expand escape sequences except \' and \\. You need to double your backslash escape for it to appear in the regex as expected. Change your code to:
$string = '';
$chars = '/(\.|\\\\|\/)/';
if (preg_match($chars, $string) != 0) {
echo 'Chars found.';
}
Here's a test case:
$strings = array('', '.', '/', '\\', 'abc');
$pattern= '/(\.|\\\\|\/)/'
foreach($strings as $string) {
if (preg_match($pattern, $string) > 0) {
printf('String "%s" matched!', $string);
}
}

The issue is probably with PHP. When escaping something in a regex string, you also need to escape the backslashes you use to escape, or PHP will attempt to interpret it as a special character.
As that probably didn't make sense, have an example.
$string = "\." will make PHP attempt to escape the ., and fail. You instead need to change this to $string = "\\\.".

When trying to REGEX match slashes, I would strongly suggest using a different separator character than '/'. It reduces the amount of escaping you need to do and makes it much more readable:
$chars = '%(\.|\\|/)%';

Try this:
$chars = '%(\.|\\\\|/)%'

Related

php preg_replace wont accept outside reference

I have the following function which as you can see, replaces certain characters in a string with the pattern, yet it only works when I enter in the pattern as a string like in the first commented out line. I put an echo in there to test what was coming back and its as it should be so I dont know whats going on! Has anyone any clues?
private function check_string( $s )
{
//return preg_replace( '/[^a-z 0-9~%\.:_\\-()"]/i', '', $s );
// a-z 0-9~%\.:_\\-()"
echo $this->permitted_uri_chars;
// /[^a-z 0-9~%\.:_\\-()"]/i
$pattern = '/[^'. $this->permitted_uri_chars .']/i';
return preg_replace( $pattern, '', $s );
}
The error I get is
Message: preg_replace(): Compilation failed: range out of order in character class at offset 18
ANSWER
Thanks to Jason McCreary
$pattern = '/[^'. preg_quote($this->config->item('permitted_uri_chars'), '/') .']+/i';
It is working in the first example because you properly escaped characters for both PHP and the Regular Expression. (i.e. \\).
When using a string, you have only escaped for PHP. So when you use this string in your Regular Expression it is no longer escaped.
This is demonstrated by the following example:
echo '/[^a-z 0-9~%\.:_\\-()"]/i';
// becomes: /[^a-z 0-9~%\.:_\-()"]/i
A few options would be:
Double escape.
Avoid the Regular Expression escaping by placing the dash at the end: /[^a-z 0-9~%.:_()"-]/
Use preg_quote() if you're going to accept strings regular expression syntax.
Note: I'd encourage you to read about escaping inside character classes.

Regex preg_match change only special character between quote

$value = "abrak'adabra' baba";
$pattern = array();
$replacement = array();
$pattern[] = '/(\'[^\']+\')|(a)/e';
$replacement = "strlen('\\2') ? 'i' : '\\0'";
The code above change abrak'adabra' baba into ibrik'adabra' bibi
What I want to do is to change abrak'adabra' baba into abrak'idibri' baba. How to do that?
Honestly I don't even really understand the regex pattern above.
There are what I know and I don't know about the code:
In $pattern say: (any word which contain has two quotes and no quote between) or (character "a"). In the replacement, php code such a strlen will works because /e modifier will be used. But I can't understand why is it an "or" logic there.
If length of the second part in the pattern (the a character) is more than zero, than replace it with "i", else do something else (I don't understand what \0 means)
I'll appreciate any help. This regex stuff has frustating me :(
Using the e modifier (eval) in patterns is dangerous, as someone could potentially execute malicious code on your server (see the manual's section on that for more).
Instead, if you need to do extra processing on matched items, you can use preg_replace_callback:
// Find all characters between single quotes
$result = preg_replace_callback('/\'(.*?)\'/', function($matches){
// Replace 'a' with 'i' in found matches
return '\''.str_replace('a', 'i', $matches[1]).'\'';
}, $value);
If all you're doing is replacing a with i between the quotes, there may be more optimal ways to go about it, but this way you have room for more advanced processing on the strings found between quotes.

preg_match a key and date from string

I am working on a project that involves a type of caching to be performed. Multiple caches can be done for situations based on different cache names. In the files I am storying a cache like so:
{cache:2011-12-11 02:01:47}
And when I search for it, I am trying to preg_match it like this:
$match = "{cache:/\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}/}";
$str = 'FIND ME! {cache:2011-12-11 02:01:47}';
if (preg_match($match, $str, $matches)) {
print "it's a match";
print_r($match);
}
The problem is, it never finds it. But this will work if I do:
$match = "/\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}/";
What am I doing wrong with my preg_match statement? And is there something type of string search I could use that is faster than preg_match?
Your in-code regex will not work, because you copy&pasted the delimiters where they don't belong:
$match = "{cache:/\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}/}";
^ ^
Eeek! Eeek!
This way your { and } became the regex delimiters, and the inner slashes were interpreted as literal characters to search for.
It rather should have been:
$match = "/\{cache:\d{4}\-\d{2}\-\d{2} \d{2}:\d{2}:\d{2}}/";
Note also the escaped leading \{ curly.

regex with special characters?

i am looking for a regex that can contain special chracters like / \ . ' "
in short i would like a regex that can match the following:
may contain lowercase
may contain uppercase
may contain a number
may contain space
may contain / \ . ' "
i am making a php script to check if a certain string have the above or not, like a validation check.
The regular expression you are looking for is
^[a-z A-Z0-9\/\\.'"]+$
Remember if you are using PHP you need to use \ to escape the backslashes and the quotation mark you use to encapsulate the string.
In PHP using preg_match it should look like this:
preg_match("/^[a-z A-Z0-9\\/\\\\.'\"]+$/",$value);
This is a good place to find the regular expressions you might want to use.
http://regexpal.com/
You can always escape them by appending a \ in front of the special characters.
try this:
preg_match("/[A-Za-z0-9\/\\.'\"]/", ...)
NikoRoberts is 100% correct.
I would only add the following suggestion: When creating a PHP regex pattern string, always use: single-quotes. There are far fewer chars which need to be escaped (i.e. only the single quote and the backslash itself needs to be escaped (and the backslash only needs to be escaped if it appears at the end of the string)).
When dealing with backslash soup, it helps to print out the (interpreted) regex string. This shows you exactly what is being presented to the regex engine.
Also, a "number" might have an optional sign? Yes? Here is my solution (in the form of a tested script):
<?php // test.php 20110311_1400
$data_good = 'abcdefghijklmnopqrstuvwxyzABCDE'.
'FGHIJKLMNOPQRSTUVWXYZ0123456789+- /\\.\'"';
$data_bad = 'abcABC012~!###$%^&*()';
$re = '%^[a-zA-Z0-9+\- /\\\\.\'"]*$%';
echo($re ."\n");
if (preg_match($re, $data_good)) {
echo("CORRECT: Good data matches.\n");
} else {
echo("ERROR! Good data does NOT match.\n");
}
if (preg_match($re, $data_bad)) {
echo("ERROR! Bad data matches.\n");
} else {
echo("CORRECT: Bad data does NOT match.\n");
}
?>
The following regex will match a single character that fits the description you gave:
[a-zA-Z0-9\ \\\/\.\'\"]
If your point is to insure that ONLY characters in this range of characters are used in your string, then you can use the negation of this which would be:
[^a-zA-Z0-9\ \\\/\.\'\"]
In the second case, you could use your regex to find the bad stuff (that you don't want to be included), and if it didn't find anything then your string pattern must be kosher, because I'm assuming that if you find one character that is not in the proper range, then your string is not valid.
so to put it in PHP syntax:
$regex = "[^a-zA-Z0-9\ \\\/\.\'\"]"
if preg_match( $regex, ... ) {
// handle the bad stuff
}
Edit 1:
I've completely ignored the fact that backslashes are special in php double-quoted strings, so here is a correcting to the above code:
$regex = "[^a-zA-Z0-9\\ \\\\\\/\\.\\'\\\"]"
If that doesn't work it shouldn't take too much for someone to debug how many of the backslashes need to be escaped with a backslash, and what other characters need also to be escaped....

Replacing HTML attributes using a regex in PHP

OK,I know that I should use a DOM parser, but this is to stub out some code that's a proof of concept for a later feature, so I want to quickly get some functionality on a limited set of test code.
I'm trying to strip the width and height attributes of chunks HTML, in other words, replace
width="number" height="number"
with a blank string.
The function I'm trying to write looks like this at the moment:
function remove_img_dimensions($string,$iphone) {
$pattern = "width=\"[0-9]*\"";
$string = preg_replace($pattern, "", $string);
$pattern = "height=\"[0-9]*\"";
$string = preg_replace($pattern, "", $string);
return $string;
}
But that doesn't work.
How do I make that work?
PHP is unique among the major languages in that, although regexes are specified in the form of string literals like in Python, Java and C#, you also have to use regex delimiters like in Perl, JavaScript and Ruby.
Be aware, too, that you can use single-quotes instead of double-quotes to reduce the need to escape characters like double-quotes and backslashes. It's a good habit to get into, because the escaping rules for double-quoted strings can be surprising.
Finally, you can combine your two replacements into one by means of a simple alternation:
$pattern = '/(width|height)="[0-9]*"/i';
Your pattern needs the start/end pattern character. Like this:
$pattern = "/height=\"[0-9]*\"/";
$string = preg_replace($pattern, "", $string);
"/" is the usual character, but most characters would work ("|pattern|","#pattern#",whatever).
I think you're missing the parentheses (which can be //, || or various other pairs of characters) that need to surround a regular expression in the string. Try changing your $pattern assignments to this form:
$pattern = "/width=\"[0-9]*\"/";
...if you want to be able to do a case-insensitive comparison, add an 'i' at the end of the string, thus:
$pattern = "/width=\"[0-9]*\"/i";
Hope this helps!
David

Categories