I'm writing a regex where I need to filter content to format it's typography. So far, my code seems to be filtering out my content properly using preg_replace, but I can't figure out how to avoid this for content wrapped within certain tags, say <pre>.
As a reference, this is to be used within WordPress's the_content filter, so my current code looks like so:
function my_typography( $str ) {
$ignore_elements = array("code", "pre");
$rules = array(
"?" => array("before"=> " ", "after"=>""),
// the others are stripped out for simplicity
);
foreach($rules as $rule=>$params) {
// Pseudo :
// if( !in_array( $parent_tag, $ignore_elements) {
// /Pseudo
$formatted = $params['before'] . $rule . $params['after'];
$str = preg_replace( $rule, $formatted, $str );
// Pseudo :
// }
// /Pseudo
}
return $str;
}
add_filter( 'the_content', 'my_typography' );
Basically:
<p>Was this filtered? I hope so</p>
<pre>Was this filtered? I hope not.</pre>
should become
<p>Was this filtered ? I hope so</p>
<pre>Was this filtered? I hope not.</pre>
You need to wrap search regex with regex delimiter in preg_replace and must call preg_quote to escape all special regex characters such ?, ., *, + etc:
$str = preg_replace( '~' . preg_quote($rule, '~') . '~', $formatted, $str );
Full Code:
function my_typography( $str ) {
$ignore_elements = array("code", "pre");
$rules = array(
"?" => array("before"=> " ", "after"=>""),
// the others are stripped out for simplicity
);
foreach($rules as $rule=>$params) {
// Pseudo :
// if( !in_array( $parent_tag, $ignore_elements) {
// /Pseudo
$formatted = $params['before'] . $rule . $params['after'];
$str = preg_replace( '~' . preg_quote($rule, '~') . '~', $formatted, $str );
// Pseudo :
// }
// /Pseudo
}
return $str;
}
Output:
<p>Was this filtered ? I hope so</p>
<pre>Was this filtered ? I hope not.</pre>
I would like to replace words outside of HTML-tags.
So if I got
Hello
and I want to replace "Hello" with "Bye" I would like to get this result:
Bye.
Well, I learned that I have to use a DOM-parser to achieve that.
So I used https://github.com/sunra/php-simple-html-dom-parser and included it.
Now I did
$test = $dom->find('text');
To get the text of the dom.
Now I can loop through the results:
foreach($test as $t) {
if (strpos($t->innertext,$word)!==false) {
$t->innertext = preg_replace(
'/\b' . preg_quote( $word, "/" ) . '\b/i',
"<a href='$url' target='$target' data-uk-tooltip title='$item->title'>\$0</a>",
$t->innertext,1
);
}
}
But unfortunately, if $item->title contains $word, the HTML-structure is smashed.
It looks like there was much confusion. According to the docs, $dom->find($tag) returns an array of all tags, but you are looking for a tag called text ?
Maybe you should try $test = $dom->find('a'); instead ?
Also in your code, it is not clear where the variables $url, $target and $item come from :
foreach($test as $t) {
if (strpos($t->innertext,$word)!==false) {
$t->innertext = preg_replace(
'/\b' . preg_quote( $word, "/" ) . '\b/i',
"<a href='$url' target='$target' data-uk-tooltip title='$item->title'>\$0</a>",
$t->innertext,1
);
}
}
This should work better:
foreach($test as $t) {
if (strpos($t->innertext,$word)!==false) {
$t->innertext = preg_replace(
'/\b' . preg_quote( $word, "/" ) . '\b/i',
"Replacement",
$t->innertext,1
);
}
}
I am trying to run following regular expression based function in php where in the end am returning the output.
function vg_excerpt_more( $output ) {
$string = $output;
$pattern_auto_excerpt = '#([...]</p>)$#';
$pattern_manual_excerpt = '#(</p>)$#';
$replacement = ' [Continue...]</p>';
if ( preg_match( $pattern_auto_excerpt, $string ) ) {
$pattern = $pattern_auto_excerpt;
} else if ( preg_match( $pattern_manual_excerpt, $string ) ) {
$pattern = $pattern_manual_excerpt;
}
$output = preg_replace( $pattern, $replacement, $string );
return $output;
}
add_filter( 'the_excerpt', 'vg_excerpt_more' );
add_filter( 'excerpt_more', 'vg_excerpt_more' );
Well, the string could either end in [...]</p> OR </p> so I have to check the two cases.
The problem is, it is throwing warnings as -
WARNING: PREG_MATCH(): COMPILATION FAILED: POSIX COLLATING ELEMENTS
ARE NOT SUPPORTED AT OFFSET 1 in - 'preg_match( $pattern_auto_excerpt,
$string )'
and
WARNING: PREG_REPLACE(): EMPTY REGULAR EXPRESSION in - '$output =
preg_replace( $pattern, $replacement, $string );'
EDIT:
After useful replies by #user1852180 I moved ahead and did this -
function vg_excerpt_more( $output ) {
$string = $output;
$pattern = '';
// $pattern_auto_excerpt = '#(\[...\]</p>)$#';
$pattern_auto_excerpt = '#(\[(?:\.|…)+\])#';
$pattern_manual_excerpt = '#(</p>)$#';
$replacement = ' [Continue...]</p>';
if ( preg_match( $pattern_auto_excerpt, $string ) ) {
$pattern = '#(\[(?:\.|…)+\]</p>)$#';
if ( preg_match( $pattern, $string ) ) {
return preg_replace( $pattern, $replacement, $string ) . "Dummy2";
}
} else if ( preg_match( $pattern_manual_excerpt, $string ) ) {
$pattern = $pattern_manual_excerpt;
return preg_replace( $pattern, $replacement, $string ) . "Dummy";
}
return $output;
}
add_filter( 'the_excerpt', 'vg_excerpt_more' );
add_filter( 'excerpt_more', 'vg_excerpt_more' );
But am still seeing [...] in the frontend along with the replacement.
PS. It also never prints 'Dummy2', always 'Dummy'.
You need to escape the brackets in the first pattern, and the dot:
$pattern_auto_excerpt = '#(\[(?:\.|…)+\]</p>)$#';
You don't need to use the if/else to check if it has [...], let regex check that with the question mark:
function vg_excerpt_more( $output ) {
$pattern = '#(?:\[(?:\.|…)+\])?</p>$#';
$replacement = ' [Continue...]</p>';
return preg_replace( $pattern, $replacement, $output );
}
I had this regex to fix corrupted serialized objects
$data = preg_replace(
'!s:(\d+):"(.*?)";!se',
"'s:' . strlen('$2') . ':\"$2\";'",
$data
);
But recently updated the code for PHP 5.5+ because of /e modifier has been deprecated
$data = preg_replace_callback(
'/s:(\d+):"(.*?)";/',
create_function(
'$matches',
'return "s:".strlen($matches[2]).":\"".( $matches[2] )."\";";'
),
$data
);
I have analyzed the returning data of the both functions and it seems the new one is deletes additional slashes
result for 1
<a title=\\"A sample title\\" href=\\"http://sitei-url.com/\\">text</a>
result for 2
<a title=\"A sample title\" href=\"http://sitei-url.com/\">text</a>
when I try the unserialize the returning data, the first one is working ok but second one not
I'd appreciated to some help on this!
Thanks
edit
this one seems working as the first one. Added s parameter.
$data = preg_replace_callback(
'/s:(\d+):"(.*?)";/s',
create_function(
'$matches',
'return "s:".strlen($matches[2]).":\"".( $matches[2] )."\";";'
),
$data
);
Thanks for everyone for their answers!
An example with a closure :
$data = preg_replace_callback(
'/s:(\d+):"(.*?)";/',
function($matches) {
return "s:" . strlen($matches[2]) . ":\\\"" . $matches[2] . "\\\";";
),
$data
);
EDIT For PHP 5.2 Compatibility
function pregCallback($matches) {
return "s:" . strlen($matches[2]) . ":\\\"" . $matches[2] . "\\\";";
}
$data = preg_replace_callback('/s:(\d+):"(.*?)";/', 'pregCallback', $data);
I have this interesting function that I'm using to create new lines into paragraphs. I'm using it instead of the nl2br() function, as it outputs better formatted text.
function nl2p($string, $line_breaks = true, $xml = true) {
$string = str_replace(array('<p>', '</p>', '<br>', '<br />'), '', $string);
// It is conceivable that people might still want single line-breaks
// without breaking into a new paragraph.
if ($line_breaks == true)
return '<p>'.preg_replace(array("/([\n]{2,})/i", "/([^>])\n([^<])/i"), array("</p>\n<p>", '<br'.($xml == true ? ' /' : '').'>'), trim($string)).'</p>';
else
return '<p>'.preg_replace(
array("/([\n]{2,})/i", "/([\r\n]{3,})/i","/([^>])\n([^<])/i"),
array("</p>\n<p>", "</p>\n<p>", '<br'.($xml == true ? ' /' : '').'>'),
trim($string)).'</p>';
}
The problem is that whenever I try to create a single line break, it inadvertently removes the first character of the paragraph below it. I'm not familiar enough with regex to understand what is causing the problem.
Here is another approach that doesn't use regular expressions. Note, this function will remove any single line-breaks.
function nl2p($string)
{
$paragraphs = '';
foreach (explode("\n", $string) as $line) {
if (trim($line)) {
$paragraphs .= '<p>' . $line . '</p>';
}
}
return $paragraphs;
}
If you only need to do this once in your app and don't want to create a function, it can easily be done inline:
<?php foreach (explode("\n", $string) as $line): ?>
<?php if (trim($line)): ?>
<p><?=$line?></p>
<?php endif ?>
<?php endforeach ?>
The problem is with your match for single line breaks. It matches the last character before the line break and the first after. Then you replace the match with <br>, so you lose those characters as well. You need to keep them in the replacement.
Try this:
function nl2p($string, $line_breaks = true, $xml = true) {
$string = str_replace(array('<p>', '</p>', '<br>', '<br />'), '', $string);
// It is conceivable that people might still want single line-breaks
// without breaking into a new paragraph.
if ($line_breaks == true)
return '<p>'.preg_replace(array("/([\n]{2,})/i", "/([^>])\n([^<])/i"), array("</p>\n<p>", '$1<br'.($xml == true ? ' /' : '').'>$2'), trim($string)).'</p>';
else
return '<p>'.preg_replace(
array("/([\n]{2,})/i", "/([\r\n]{3,})/i","/([^>])\n([^<])/i"),
array("</p>\n<p>", "</p>\n<p>", '$1<br'.($xml == true ? ' /' : '').'>$2'),
trim($string)).'</p>';
}
I also wrote a very simple version:
function nl2p($text)
{
return '<p>' . str_replace(['\r\n', '\r', '\n'], '</p><p>', $text) . '</p>';
}
#Laurent's answer wasn't working for me - the else statement was doing what the $line_breaks == true statement should have been doing, and it was making multiple line breaks into <br> tags, which PHP's native nl2br() already does.
Here's what I managed to get working with the expected behavior:
function nl2p( $string, $line_breaks = true, $xml = true ) {
// Remove current tags to avoid double-wrapping.
$string = str_replace( array( '<p>', '</p>', '<br>', '<br />' ), '', $string );
// Default: Use <br> for single line breaks, <p> for multiple line breaks.
if ( $line_breaks == true ) {
$string = '<p>' . preg_replace(
array( "/([\n]{2,})/i", "/([\r\n]{3,})/i", "/([^>])\n([^<])/i" ),
array( "</p>\n<p>", "</p>\n<p>", '$1<br' . ( $xml == true ? ' /' : '' ) . '>$2' ),
trim( $string ) ) . '</p>';
// Use <p> for all line breaks if $line_breaks is set to false.
} else {
$string = '<p>' . preg_replace(
array( "/([\n]{1,})/i", "/([\r]{1,})/i" ),
"</p>\n<p>",
trim( $string ) ) . '</p>';
}
// Remove empty paragraph tags.
$string = str_replace( '<p></p>', '', $string );
// Return string.
return $string;
}
Here's an approach that comes with a reverse method to replace paragraphs back to regular line breaks and vice versa.
These are useful to use when building a form input. When saving a users input you may want to convert line breaks to paragraph tags, however when editing the text in a form, you may not want the user to see any html characters. Then we would replace the paragraphs back to line breaks.
// This function will convert newlines to HTML paragraphs
// without paying attention to HTML tags. Feed it a raw string and it will
// simply return that string sectioned into HTML paragraphs
function nl2p($str) {
$arr=explode("\n",$str);
$out='';
for($i=0;$i<count($arr);$i++) {
if(strlen(trim($arr[$i]))>0)
$out.='<p>'.trim($arr[$i]).'</p>';
}
return $out;
}
// Return paragraph tags back to line breaks
function p2nl($str)
{
$str = preg_replace("/<p[^>]*?>/", "", $str);
$str = str_replace("</p>", "\r\n", $str);
return $str;
}
Expanding upon #NaturalBornCamper's solution:
function nl2p( $text, $class = '' ) {
$string = str_replace( array( "\r\n\r\n", "\n\n" ), '</p><p>', $text);
$string = str_replace( array( "\r\n", "\n" ), '<br />', $string);
return '<p' . ( $class ? ' class="' . $class . '"' : '' ) . '>' . $string . '</p>';
}
This takes care of both double line breaks by converting them to paragraphs, and single line breaks by converting them to <br />
Just type this between your lines:
echo '<br>';
This will give you a new line.