I'm having some problems with the BBCode I created to use with the SyntaxHighlighter
function bb_parse_code($str) {
while (preg_match_all('`\[(code)=?(.*?)\]([\s\S]*)\[/code\]`', $str, $matches)) foreach ($matches[0] as $key => $match) {
list($tag, $param, $innertext) = array($matches[1][$key], $matches[2][$key], $matches[3][$key]);
switch ($tag) {
case 'code': $replacement = '<pre class="brush: '.$param.'">'.str_replace(" ", " ", str_replace(array("<br>", "<br />"), "\n", $innertext))."</pre>"; break;
}
$str = str_replace($match, $replacement, $str);
}
return $str;
}
And I have the bbcode:
[b]bold[/b]
[u]underlined[/u]
[code=js]function (lol) {
alert(lol);
}[/code]
[b]bold2[/b]
[code=php]
<? echo 'lol' ?>
[/code]
Which returns this:
I know the problem is on the ([\s\S]*) of the regex that allows any character, but how do to make the code work with line breaks?
You should use the following pattern:
`\[(code)=?(.*?)\](.*?)\[/code\]`s
A couple of changes:
The switch to .*? to make the quantifier lazy.
The s modifier at the end, which causes . to match new lines too.
Related
I am using a WordPress plugin named Acronyms (https://wordpress.org/plugins/acronyms/). This plugin replaces acronyms with their description. It uses a PHP PREG_REPLACE function.
The issue is that it replaces the acronyms contained in a <pre> tag, which I use to present a source code.
Could you modify this expression so that it won't replace acronyms contained inside <pre> tags (not only directly, but in any moment)? Is it possible?
The PHP code is:
$text = preg_replace(
"|(?!<[^<>]*?)(?<![?.&])\b$acronym\b(?!:)(?![^<>]*?>)|msU"
, "<acronym title=\"$fulltext\">$acronym</acronym>"
, $text
);
You can use a PCRE SKIP/FAIL regex trick (also works in PHP) to tell the regex engine to only match something if it is not inside some delimiters:
(?s)<pre[^<]*>.*?<\/pre>(*SKIP)(*F)|\b$acronym\b
This means: skip all substrings starting with <pre> and ending with </pre>, and only then match $acronym as a whole word.
See demo on regex101.com
Here is a sample PHP demo:
<?php
$acronym = "ASCII";
$fulltext = "American Standard Code for Information Interchange";
$re = "/(?s)<pre[^<]*>.*?<\\/pre>(*SKIP)(*F)|\\b$acronym\\b/";
$str = "<pre>ASCII\nSometext\nMoretext</pre>More text \nASCII\nMore text<pre>More\nlines\nASCII\nlines</pre>";
$subst = "<acronym title=\"$fulltext\">$acronym</acronym>";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
<pre>ASCII</pre><acronym title="American Standard Code for Information Interchange">ASCII</acronym><pre>ASCII</pre>
It is also possible to use preg_split and keep the code block as a group, only replace the non-code block part then combine it back as a complete string:
function replace($s) {
return str_replace('"', '"', $s); // do something with `$s`
}
$text = 'Your text goes here...';
$parts = preg_split('#(<\/?[-:\w]+(?:\s[^<>]+?)?>)#', $text, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$text = "";
$x = 0;
foreach ($parts as $v) {
if (trim($v) === "") {
$text .= $v;
continue;
}
if ($v[0] === '<' && substr($v, -1) === '>') {
if (preg_match('#^<(\/)?(?:code|pre)(?:\s[^<>]+?)?>$#', $v, $m)) {
$x = isset($m[1]) && $m[1] === '/' ? 0 : 1;
}
$text .= $v; // this is a HTML tag…
} else {
$text .= !$x ? replace($v) : $v; // process or skip…
}
}
return $text;
Taken from here.
I am using a WordPress plugin named Acronyms (https://wordpress.org/plugins/acronyms/). This plugin replaces acronyms with their description. It uses a PHP PREG_REPLACE function.
The issue is that it replaces the acronyms contained in a <pre> tag, which I use to present a source code.
Could you modify this expression so that it won't replace acronyms contained inside <pre> tags (not only directly, but in any moment)? Is it possible?
The PHP code is:
$text = preg_replace(
"|(?!<[^<>]*?)(?<![?.&])\b$acronym\b(?!:)(?![^<>]*?>)|msU"
, "<acronym title=\"$fulltext\">$acronym</acronym>"
, $text
);
You can use a PCRE SKIP/FAIL regex trick (also works in PHP) to tell the regex engine to only match something if it is not inside some delimiters:
(?s)<pre[^<]*>.*?<\/pre>(*SKIP)(*F)|\b$acronym\b
This means: skip all substrings starting with <pre> and ending with </pre>, and only then match $acronym as a whole word.
See demo on regex101.com
Here is a sample PHP demo:
<?php
$acronym = "ASCII";
$fulltext = "American Standard Code for Information Interchange";
$re = "/(?s)<pre[^<]*>.*?<\\/pre>(*SKIP)(*F)|\\b$acronym\\b/";
$str = "<pre>ASCII\nSometext\nMoretext</pre>More text \nASCII\nMore text<pre>More\nlines\nASCII\nlines</pre>";
$subst = "<acronym title=\"$fulltext\">$acronym</acronym>";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
<pre>ASCII</pre><acronym title="American Standard Code for Information Interchange">ASCII</acronym><pre>ASCII</pre>
It is also possible to use preg_split and keep the code block as a group, only replace the non-code block part then combine it back as a complete string:
function replace($s) {
return str_replace('"', '"', $s); // do something with `$s`
}
$text = 'Your text goes here...';
$parts = preg_split('#(<\/?[-:\w]+(?:\s[^<>]+?)?>)#', $text, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$text = "";
$x = 0;
foreach ($parts as $v) {
if (trim($v) === "") {
$text .= $v;
continue;
}
if ($v[0] === '<' && substr($v, -1) === '>') {
if (preg_match('#^<(\/)?(?:code|pre)(?:\s[^<>]+?)?>$#', $v, $m)) {
$x = isset($m[1]) && $m[1] === '/' ? 0 : 1;
}
$text .= $v; // this is a HTML tag…
} else {
$text .= !$x ? replace($v) : $v; // process or skip…
}
}
return $text;
Taken from here.
How to make preg find all possible solutions for regular expression pattern?
Here's the code:
<?php
$text = 'Amazing analyzing.';
$regexp = '/(^|\\b)([\\S]*)(a)([\\S]*)(\\b|$)/ui';
$matches = array();
if (preg_match_all($regexp, $text, $matches, PREG_SET_ORDER)) {
foreach ($matches as $match) {
echo "{$match[2]}[{$match[3]}]{$match[4]}\n";
}
}
?>
Output:
Am[a]zing
an[a]lyzing.
Output that i need:
[A]mazing
Am[a]zing
[A]nalyzing.
an[a]lyzing.
You have to use look behind/ahead zero-length assertions (instead of a normal pattern which consumes the characters around what your are looking for): http://www.regular-expressions.info/lookaround.html
Lookaround assertions won't help, for two reasons:
Since they are zero-length, they won't return characters that you need.
As Avinash Raj noted, PHP lookbehind doesn't allow *.
This yields the output that you need:
<?php
$text = 'Amazing analyzing.';
foreach (preg_split('/\s+/', $text) as $word)
{
$matches = preg_split('/(a)/i', $word, 0, PREG_SPLIT_DELIM_CAPTURE);
for ($match = 1; $match < count($matches); $match += 2)
{
$prefix = join(array_slice($matches, 0, $match));
$suffix = join(array_slice($matches, $match+1));
echo "{$prefix}[{$matches[$match]}]{$suffix}\n";
}
}
?>
I am trying to Ucfirst all strings within <strong> in a sentence. Tried this without any luck:
function getTextBetweenTags($string, $tagname)
{
$pattern = "/<$tagname>(.*?)<\/$tagname>/";
preg_match($pattern, $string, $matches);
return ucfirst($matches[1]);
}
$sentence = "Yellow pitty lies <strong>about</strong> the life.";
$finalsentence = getTextBetweenTags($sentence,"strong");
What is the correct way to do that ?
There is a simpler way. Instead of using php you could use only css, for instance:
strong:first-letter{
text-transform: capitalize
}
You need to include matching for the text before and after the tags.
function getTextBetweenTags($string, $tagname)
{
$pattern = "/(.*<$tagname>)(.*?)(<\/$tagname>.*)/";
preg_match($pattern, $string, $matches);
return $matches[1] . ucfirst($matches[2]) . $matches[3];
}
$sentence = "Yellow pitty lies <strong>about</strong> the life.";
$finalsentence = getTextBetweenTags($sentence,"strong");
i'm not sure how i could have phrased the title better, but my issue is that the highlight function doesn't highlight the search keywords which are at the end of the word. for example, if the search keyword is 'self', it will highlight 'self' or 'self-lessness' or 'Self' [with capital S] but it will not highlight the self of 'yourself' or 'himself' etc. .
this is the highlight function:
function highlightWords($text, $words) {
preg_match_all('~\w+~', $words, $m);
if(!$m)
return $text;
$re = '~\\b(' . implode('|', $m[0]) . ')~i';
$string = preg_replace($re, '<span class="highlight">$0</span>', $text);
return $string;
}
It seems you might have a \b at the beginning of your regex, which means a word boundary. Since the 'self' in 'yourself' doesn't start at a word boundary, it doesn't match. Get rid of the \b.
Try something like this:
function highlight($text, $words) {
if (!is_array($words)) {
$words = preg_split('#\\W+#', $words, -1, PREG_SPLIT_NO_EMPTY);
}
$regex = '#\\b(\\w*(';
$sep = '';
foreach ($words as $word) {
$regex .= $sep . preg_quote($word, '#');
$sep = '|';
}
$regex .= ')\\w*)\\b#i';
return preg_replace($regex, '<span class="highlight">\\1</span>', $text);
}
$text = "isa this is test text";
$words = array('is');
echo highlight($text, $words); // <span class="highlight">isa</span> <span class="highlight">this</span> <span class="highlight">is</span> test text
The loop, is so that every search word is properly quoted...
EDIT: Modified function to take either string or array in $words parameter.