what does the following php regular expression evaluates to? - php

I come across a php regular expression, mentioned below, I am not sure why \q\ is used in it, can anybody help me to understand this?
$strBuildTitle="SOME URL";
$patterns[0] = "/[^a-zA-Z0-9\q\ ]/";
$replacements[0] = " ";
$strBuildTitle = preg_replace($patterns, $replacements, $strBuildTitle);
I believe it tries to remove any non-alpha-numeric character from the given url, not sure why \q\ is used here. Is is related with removal of quotes?

\q and \ aren't valid escape sequences.
In double quoted strings, it's PHP's policy to ignore those and replace them with their apparent value, meaning \ simply becoming and \q becoming q. The latter case already being covered by [a-z].

Related

Regexes work in PHP and don't in Erlang. Why?

I tried to rewrite url parsing function written in PHP to Erlang. And I found that these regex don't work in Erlang but work fine in PHP code. Can you tell why and how to make it work with Erlang.
Loose = "^(?:(?![^:#]+:[^:#\/]*#)([^:\/?#.]+):)?(?:\/\/\/?)?((?:(([^:#]*):?([^:#]*))?#)?([^:\/?#]*)(?::(\d*))?)(((?:\/(\w:))?(\/(?:[^?#](?![^?#\/]*\.[^?#\/.]+(?:[?#]|$)))*\/?)?([^?#\/]*))(?:\?([^#]*))?(?:#(.*))?)".
re:compile( Loose ).
{error,{"nothing to repeat",166}}
Strict = "^(?:([^:\/?#]+):)?(?:\/\/\/?((?:(([^:#]*):?([^:#]*))?#)?([^:\/?#]*)(?::(\d*))?))?(((?:\/(\w:))?((?:[^?#\/]*\/)*)([^?#]*))(?:\?([^#]*))?(?:#(.*))?)".
re:compile( Strict ).
{error,{"nothing to repeat",114}}
But this code works fine:
$url = "http://gazeta.ru/";
$loose = '/^(?:(?![^:#]+:[^:#\/]*#)([^:\/?#.]+):)?(?:\/\/\/?)?((?:(([^:#]*):?([^:#]*))?#)?([^:\/?#]*)(?::(\d*))?)(((?:\/(\w:))?(\/(?:[^?#](?![^?#\/]*\.[^?#\/.]+(?:[?#]|$)))*\/?)?([^?#\/]*))(?:\?([^#]*))?(?:#(.*))?)/';
preg_match($loose, $url, $match);
var_dump( $match );
The character "\" is special in strings in Erlang. There are other special characters which must be preceded by a backslash, these include doublequote and backslash. The technique of marking special characters is called escaping and backslash itself is called an escape character. So "\" must be followed with another character. For example if you want to include character '\' (one backslash) into a string you should write "\\":
CorrectString = "C:\\windows" %% Correct
WrongString = "C:\windows" %% Wrong
Hence you have to change all single backslashes in your regexp to double backslashes. Here is an example in erlang shell:
3> Loose = "^(?:(?![^:#]+:[^:#\\/]*#)([^:\\/?#.]+):)?(?:\\/\\/\\/?)?((?:(([^:#]*):?([^:#]*))?#)?([^:\\/?#]*)(?::(\\d*))?)(((?:\\/(\\w:))?(\\/(?:[^?#](?![^?#\\/]*\\.[^?#\\/.]+(?:[?#]|$)))*\\/?)?([^?#\\/]*))(?:\\?([^#]*))?(?:#(.*))?)".
4> re:compile(Loose).
{ok,{re_pattern,14,0,
<<69,82,67,80,147,2,0,0,16,0,0,0,1,0,0,0,14,0,0,0,0,0,0,
...>>}}

Can't get Regex working in PHP, works in RegEXP program

Here is the input I am searching:
\u003cspan class=\"prs\">email_address#me.com\u003c\/span>
Trying to just return email_address#me.com.
My regex class=\\"prs\\">(.*?)\\ returns "class=\"prs\">email_address#me.com\" in RegExp which is OK, I can work with that result.
But I can't get it to work in PHP.
$regex = "/class=\\\"prs\\\">(.*?)\\/";
Gives me an error "No ending delimiter"
Can someone please help?
Your original code:
$regex = "/class=\\\"prs\\\">(.*?)\\/";
The reason you get No ending delimiter is that although you are escaping the backslash prior to the closing forward slash, what you have done is escaped it in the context of the PHP string, not in the context of the regex engine.
So the PHP string escaping mechanism does its thing, and by the time the regex engine gets it, it will look like this:
/class=\"prs\">(.*?)\/
This means that the regular expression engine will see the backslash at the end of the expression as escaping the forward slash that you are intending to use to close the expression.
The usual PHP solution to this kind of thing is to switch to using single-quoted string instead of a double-quoted one, but this still won't work, as \\ is an escaped backslash in both single and double quoted strings.
What you need to do is double up the number of backslash characters at the end of your string, so your code needs to look like this:
$regex = "/class=\\\"prs\\\">(.*?)\\\\/";
The way to prove what it's doing is to print the contents of the $regex variable, so you can see what the string will look like to the regex engine. These kinds of errors are actually very hard to spot, but looking at the actual content of the string will help you spot them.
Hope that helps.
If you change to single quotes it should fix it
$regex = '/class=\\\"prs\\\">(.*?)\\/';

Regex pattern matching literal repeated \n

Given a literal string such as:
Hello\n\n\n\n\n\n\n\n\n\n\n\nWorld
I would like to reduce the repeated \n's to a single \n.
I'm using PHP, and been playing around with a bunch of different regex patterns. So here's a simple example of the code:
$testRegex = '/(\\n){2,}/';
$test = 'Hello\n\n\n\n\n\n\n\n\nWorld';
$test2 = preg_replace($testRegex ,'\n',$test);
echo "<hr/>test regex<hr/>".$test2;
I'm new to PHP, not that new to regex, but it seems '\n' conforms to special rules. I'm still trying to nail those down.
Edit: I've placed the literal code I have in my php file here, if I do str_replace() I can get good things to happen, but that's not a complete solution obviously.
To match a literal \n with regex, your string literal needs four backslashes to produce a string with two backlashes that’s interpreted by the regex engine as an escape for one backslash.
$testRegex = '/(\\\\n){2,}/';
$test = 'Hello\n\n\n\n\n\n\n\n\n\n\n\nWorld';
$test2 = preg_replace($testRegex, '\n', $test);
Perhaps you need to double up the escape in the regular expression?
$pattern = "/\\n+/"
$awesome_string = preg_replace($pattern, "\n", $string);
Edit: Just read your comment on the accepted answer. Doesn't apply, but is still useful.
If you're intending on expanding this logic to include other forms of white-space too:
$output = echo preg_replace('%(\s)*%', '$1', $input);
Reduces all repeated white-space characters to single instances of the matched white-space character.
it indeed conforms to special rules, and you need to add the "multiline"-modifier, m. So your pattern would look like
$pattern = '/(\n)+/m'
which should provide you with the matches. See the doc for all modifiers and their detailed meaning.
Since you're trying to reduce all newlines to one, the pattern above should work with the rest of your code. Good luck!
Try this regular expression:
/[\n]*/

regex with special characters?

i am looking for a regex that can contain special chracters like / \ . ' "
in short i would like a regex that can match the following:
may contain lowercase
may contain uppercase
may contain a number
may contain space
may contain / \ . ' "
i am making a php script to check if a certain string have the above or not, like a validation check.
The regular expression you are looking for is
^[a-z A-Z0-9\/\\.'"]+$
Remember if you are using PHP you need to use \ to escape the backslashes and the quotation mark you use to encapsulate the string.
In PHP using preg_match it should look like this:
preg_match("/^[a-z A-Z0-9\\/\\\\.'\"]+$/",$value);
This is a good place to find the regular expressions you might want to use.
http://regexpal.com/
You can always escape them by appending a \ in front of the special characters.
try this:
preg_match("/[A-Za-z0-9\/\\.'\"]/", ...)
NikoRoberts is 100% correct.
I would only add the following suggestion: When creating a PHP regex pattern string, always use: single-quotes. There are far fewer chars which need to be escaped (i.e. only the single quote and the backslash itself needs to be escaped (and the backslash only needs to be escaped if it appears at the end of the string)).
When dealing with backslash soup, it helps to print out the (interpreted) regex string. This shows you exactly what is being presented to the regex engine.
Also, a "number" might have an optional sign? Yes? Here is my solution (in the form of a tested script):
<?php // test.php 20110311_1400
$data_good = 'abcdefghijklmnopqrstuvwxyzABCDE'.
'FGHIJKLMNOPQRSTUVWXYZ0123456789+- /\\.\'"';
$data_bad = 'abcABC012~!###$%^&*()';
$re = '%^[a-zA-Z0-9+\- /\\\\.\'"]*$%';
echo($re ."\n");
if (preg_match($re, $data_good)) {
echo("CORRECT: Good data matches.\n");
} else {
echo("ERROR! Good data does NOT match.\n");
}
if (preg_match($re, $data_bad)) {
echo("ERROR! Bad data matches.\n");
} else {
echo("CORRECT: Bad data does NOT match.\n");
}
?>
The following regex will match a single character that fits the description you gave:
[a-zA-Z0-9\ \\\/\.\'\"]
If your point is to insure that ONLY characters in this range of characters are used in your string, then you can use the negation of this which would be:
[^a-zA-Z0-9\ \\\/\.\'\"]
In the second case, you could use your regex to find the bad stuff (that you don't want to be included), and if it didn't find anything then your string pattern must be kosher, because I'm assuming that if you find one character that is not in the proper range, then your string is not valid.
so to put it in PHP syntax:
$regex = "[^a-zA-Z0-9\ \\\/\.\'\"]"
if preg_match( $regex, ... ) {
// handle the bad stuff
}
Edit 1:
I've completely ignored the fact that backslashes are special in php double-quoted strings, so here is a correcting to the above code:
$regex = "[^a-zA-Z0-9\\ \\\\\\/\\.\\'\\\"]"
If that doesn't work it shouldn't take too much for someone to debug how many of the backslashes need to be escaped with a backslash, and what other characters need also to be escaped....

How to match a quoted string with escaped quotes in it?

/^"((?:[^"]|\\.)*)"/
Against this string:
"quote\_with\\escaped\"characters" more
It only matches until the \", although I've clearly defined \ as an escape character (and it matches \_ and \\ fine...).
It works correctly if you flip the order of your two alternatives:
/^"((?:\\.|[^"])*)"/
The problem is that otherwise the important \ character gets eaten up before it tries matching \". It worked before for \\ and \_ only because both characters in either pair get matched by your [^"].
Using Python with raw-string literals to ensure no further interpretation of escape sequences is taking place, the following variant does work:
import re
x = re.compile(r'^"((?:[^"\\]|\\.)*)"')
s = r'"quote\_with\\escaped\"characters" more"'
mo = x.match(s)
print mo.group()
emits "quote\_with\\escaped\"characters"; I believe that in your version (which also interrupts the match precociously if substituted in here) the "not a doublequote" subexpression ([^"]) is swallowing the backslashes that you intend to be taken as escaping the immediately-following characters. All I'm doing here is ensuring that such backslashes are NOT swallowed in this way, and, as I said, it seems to work with this change.
Not intend to confuse, just another information I've played around with. Below regexp(PCRE) try to not match wrong syntax (eg. end with \") and can use with both ' or "
/('|").*\\\1.*?[^\\]\1/
to use with php
<?php if (preg_match('/(\'|").*\\\\\1.*?[^\\\\]\1/', $subject)) return true; ?>
For:
"quote\_with\\escaped\"characters" "aaa"
'just \'another\' quote "example\"'
"Wrong syntax \"
"No escapes, no match here"
This only match:
"quote\_with\\escaped\"characters" and
'just \'another\' quote "example\"'

Categories