I'm trying to write a regex to convert \[#twitter:1234\] into [#twitter:1234] i.e. unescape the square brackets for specific tags like Twitter, video, etc. I wrote up my expression and have tested it in Regex101 and PHPLiveRegex and it looks good but it still fails to get a match in my runtime. My actual implementation code is:
$content = preg_replace( "/\\\[#((?:twitter|video|instagram|cneembed):.*?)\\\]/i", "[#$1]", $content );
If anyone has any idea why the expression isn't working your guidance would be much appreciated. I'm generally pretty good at this stuff but I feel like I've gone blind on this one. I'm pretty certain the issue is how I'm escaping my backslashes since I can easily get the expression to match [#twitter:1234\] just not the leading slash. Thanks!
The problem is that backslash is both an escape character for strings and also an escape character for regular expressions, so you need to double all the backslashes to get them to pass through to the regexp engine.
$content = preg_replace( "/\\\\\\[#((?:twitter|video|instagram|cneembed):.*?)\\\\\\]/i", "[#$1]", $content );
Related
Lately I've been studying (more in practice to tell the truth) regex, and I'm noticing his power. This demand made by me (link), I am aware of 'backreference'. I think I understand how it works, it works in JavaScript, while in PHP not.
For example I have this string:
[b]Text B[/b]
[i]Text I[/i]
[u]Text U[/u]
[s]Text S[/s]
And use the following regex:
\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]
This testing it on regex101.com works, the same for JavaScript, but does not work with PHP.
Example of preg_replace (not working):
echo preg_replace(
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]/i",
"<$1>$2</$1>",
"[b]Text[/b]"
);
While this way works:
echo preg_replace(
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/(b|i|u|s)\]/i",
"<$1>$2</$1>",
"[b]Text[/b]"
);
I can not understand where I'm wrong, thanks to everyone who helps me.
It is because you use a double quoted string, inside a double quoted string \1 is read as the octal notation of a character (the control character SOH = start of heading), not as an escaped 1.
So two ways:
use single quoted string:
'/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]/i'
or escape the backslash to obtain a literal backslash (for the string, not for the pattern):
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\\1\]/i"
As an aside, you can write your pattern like this:
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\1]~i';
// with oniguruma notation
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\g{1}]~i';
// oniguruma too but relative:
// (the second group on the left from the current position)
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\g{-2}]~i';
Lately I've been studying (more in practice to tell the truth) regex, and I'm noticing his power. This demand made by me (link), I am aware of 'backreference'. I think I understand how it works, it works in JavaScript, while in PHP not.
For example I have this string:
[b]Text B[/b]
[i]Text I[/i]
[u]Text U[/u]
[s]Text S[/s]
And use the following regex:
\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]
This testing it on regex101.com works, the same for JavaScript, but does not work with PHP.
Example of preg_replace (not working):
echo preg_replace(
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]/i",
"<$1>$2</$1>",
"[b]Text[/b]"
);
While this way works:
echo preg_replace(
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/(b|i|u|s)\]/i",
"<$1>$2</$1>",
"[b]Text[/b]"
);
I can not understand where I'm wrong, thanks to everyone who helps me.
It is because you use a double quoted string, inside a double quoted string \1 is read as the octal notation of a character (the control character SOH = start of heading), not as an escaped 1.
So two ways:
use single quoted string:
'/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\1\]/i'
or escape the backslash to obtain a literal backslash (for the string, not for the pattern):
"/\[(b|i|u|s)\]\s*(.*?)\s*\[\/\\1\]/i"
As an aside, you can write your pattern like this:
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\1]~i';
// with oniguruma notation
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\g{1}]~i';
// oniguruma too but relative:
// (the second group on the left from the current position)
$pattern = '~\[([bius])]\s*(.*?)\s*\[/\g{-2}]~i';
Here is the input I am searching:
\u003cspan class=\"prs\">email_address#me.com\u003c\/span>
Trying to just return email_address#me.com.
My regex class=\\"prs\\">(.*?)\\ returns "class=\"prs\">email_address#me.com\" in RegExp which is OK, I can work with that result.
But I can't get it to work in PHP.
$regex = "/class=\\\"prs\\\">(.*?)\\/";
Gives me an error "No ending delimiter"
Can someone please help?
Your original code:
$regex = "/class=\\\"prs\\\">(.*?)\\/";
The reason you get No ending delimiter is that although you are escaping the backslash prior to the closing forward slash, what you have done is escaped it in the context of the PHP string, not in the context of the regex engine.
So the PHP string escaping mechanism does its thing, and by the time the regex engine gets it, it will look like this:
/class=\"prs\">(.*?)\/
This means that the regular expression engine will see the backslash at the end of the expression as escaping the forward slash that you are intending to use to close the expression.
The usual PHP solution to this kind of thing is to switch to using single-quoted string instead of a double-quoted one, but this still won't work, as \\ is an escaped backslash in both single and double quoted strings.
What you need to do is double up the number of backslash characters at the end of your string, so your code needs to look like this:
$regex = "/class=\\\"prs\\\">(.*?)\\\\/";
The way to prove what it's doing is to print the contents of the $regex variable, so you can see what the string will look like to the regex engine. These kinds of errors are actually very hard to spot, but looking at the actual content of the string will help you spot them.
Hope that helps.
If you change to single quotes it should fix it
$regex = '/class=\\\"prs\\\">(.*?)\\/';
Given a literal string such as:
Hello\n\n\n\n\n\n\n\n\n\n\n\nWorld
I would like to reduce the repeated \n's to a single \n.
I'm using PHP, and been playing around with a bunch of different regex patterns. So here's a simple example of the code:
$testRegex = '/(\\n){2,}/';
$test = 'Hello\n\n\n\n\n\n\n\n\nWorld';
$test2 = preg_replace($testRegex ,'\n',$test);
echo "<hr/>test regex<hr/>".$test2;
I'm new to PHP, not that new to regex, but it seems '\n' conforms to special rules. I'm still trying to nail those down.
Edit: I've placed the literal code I have in my php file here, if I do str_replace() I can get good things to happen, but that's not a complete solution obviously.
To match a literal \n with regex, your string literal needs four backslashes to produce a string with two backlashes that’s interpreted by the regex engine as an escape for one backslash.
$testRegex = '/(\\\\n){2,}/';
$test = 'Hello\n\n\n\n\n\n\n\n\n\n\n\nWorld';
$test2 = preg_replace($testRegex, '\n', $test);
Perhaps you need to double up the escape in the regular expression?
$pattern = "/\\n+/"
$awesome_string = preg_replace($pattern, "\n", $string);
Edit: Just read your comment on the accepted answer. Doesn't apply, but is still useful.
If you're intending on expanding this logic to include other forms of white-space too:
$output = echo preg_replace('%(\s)*%', '$1', $input);
Reduces all repeated white-space characters to single instances of the matched white-space character.
it indeed conforms to special rules, and you need to add the "multiline"-modifier, m. So your pattern would look like
$pattern = '/(\n)+/m'
which should provide you with the matches. See the doc for all modifiers and their detailed meaning.
Since you're trying to reduce all newlines to one, the pattern above should work with the rest of your code. Good luck!
Try this regular expression:
/[\n]*/
I'm trying to use preg_replace to get some data from a remote page, but I'm having a bit of an issue when it comes to sorting out the pattern.
function getData($Url){
$str = file_get_contents($Url);
if(strlen($str)>0){
preg_match("/\<span class=\"SectionHeader\"\>title\</span>/<br/>/\<div class=\"header2\"\>(.*)\</div\></span\>/",$str,$title);
return $title[1];
}
}
Here's the HTML as is before I ended up throwing a million slashes at it (looks like I forgot a part or two):
<span class="cell CellFullWidth"><span class="SectionHeader">mytitle</span><br/><div class="Center">Event Name</div></span>
Where Event Name is the data I want to return in my function.
Thanks a lot guys, this is a pain in the ass.
While I am inclined to agree with the commenters that this is not a pretty solution, here's my untested revision of your statement:
preg_match('#\<span class="SectionHeader"\>title\</span\>/\<br/\>/\<div class="header2"\>(.*)\</div\>\</span\>#',$str,$title);
I changed the double-quoted string to single-quoted as you aren't using any of the variable-substitution features of double-quoted strings and this avoids having to backslash-escape double-quotes as well as avoiding any ambiguity about backslashes (which perhaps should have been doubled to produce the proper strings--see the php manual on strings). I changed the slash / delimiters to hash # because of the number of slashes appearing in the match pattern (some of which were not backslash-escaped in your version).
There are quite a few things wrong with your expression:
You're using / as the delimiter, but then use / unescaped in various places.
You're escaping < and > seemingly at random. They shouldn't be escaped at all.
You have some rogue /s around the <br/> for some reason.
The class name for the div is specified as header2 in the regex but Center in the sample HTML
The title is mytitle in the HTML and title in the regex
With all of these corrected, you get:
preg_match('(<span class="SectionHeader">mytitle</span><br/><div class="Center">(.*)</div\></span\>)',$data,$t);
If you want to match any title instead of the specific title mytitle, just replace that with .*?.