PHP preg_match ignore within certain elements

PHP preg_match ignore within certain elements - php

I'm writing a regex where I need to filter content to format it's typography. So far, my code seems to be filtering out my content properly using preg_replace, but I can't figure out how to avoid this for content wrapped within certain tags, say <pre>.
As a reference, this is to be used within WordPress's the_content filter, so my current code looks like so:
function my_typography( $str ) {
$ignore_elements = array("code", "pre");
$rules = array(
"?" => array("before"=> " ", "after"=>""),
// the others are stripped out for simplicity
);
foreach($rules as $rule=>$params) {
// Pseudo :
// if( !in_array( $parent_tag, $ignore_elements) {
// /Pseudo
$formatted = $params['before'] . $rule . $params['after'];
$str = preg_replace( $rule, $formatted, $str );
// Pseudo :
// }
// /Pseudo
}
return $str;
}
add_filter( 'the_content', 'my_typography' );
Basically:
<p>Was this filtered? I hope so</p>
<pre>Was this filtered? I hope not.</pre>
should become
<p>Was this filtered ? I hope so</p>
<pre>Was this filtered? I hope not.</pre>

You need to wrap search regex with regex delimiter in preg_replace and must call preg_quote to escape all special regex characters such ?, ., *, + etc:
$str = preg_replace( '~' . preg_quote($rule, '~') . '~', $formatted, $str );
Full Code:
function my_typography( $str ) {
$ignore_elements = array("code", "pre");
$rules = array(
"?" => array("before"=> " ", "after"=>""),
// the others are stripped out for simplicity
);
foreach($rules as $rule=>$params) {
// Pseudo :
// if( !in_array( $parent_tag, $ignore_elements) {
// /Pseudo
$formatted = $params['before'] . $rule . $params['after'];
$str = preg_replace( '~' . preg_quote($rule, '~') . '~', $formatted, $str );
// Pseudo :
// }
// /Pseudo
}
return $str;
}
Output:
<p>Was this filtered ? I hope so</p>
<pre>Was this filtered ? I hope not.</pre>

Related

Migrate create_function() which is not supported since PHP 7.2

I have 'create_function() in my PHP code:
function encode_code_in_comment( $source ) { $encoded = preg_replace_callback( '/\[(php|html|javascript|css|nginx|apache|terminal)\](.*?)\[\/\1\]/ims',
create_function(
'$matches',
'$matches[2] = preg_replace(
array("/^[\r|\n]+/i", "/[\r|\n]+$/i"), "",
$matches[2]);
return "<pre class=\"language-" . $matches[1] . "\"><code>" . esc_html( $matches[2] ) . "</code></pre>";'
),
$source );
if ( $encoded ) {
return $encoded;
} else {
return $source;
}}
I know that there are duplicates threads about the subject, but nevertheless, i'm really struggling to covert this to an anonymous function. How do i rewrite it?

Your main problem is that your code is badly formatted, making it hard to see where the create_function call begins and ends; here it is with some more logical linebreaks and indents:
function encode_code_in_comment( $source ) {
$encoded = preg_replace_callback(
'/\[(php|html|javascript|css|nginx|apache|terminal)\](.*?)\[\/\1\]/ims',
create_function(
'$matches',
'
$matches[2] = preg_replace(
array("/^[\r|\n]+/i", "/[\r|\n]+$/i"),
"",
$matches[2]
);
return "<pre class=\"language-" . $matches[1] . "\"><code>" . esc_html( $matches[2] ) . "</code></pre>";
'
),
$source
);
if ( $encoded ) {
return $encoded;
} else {
return $source;
}
}
From this and the documentation of create_function, we can see that the created function needs one argument, $matches, and to have a body starting $matches[2] = and ending </pre>";
Looking at the manual for anonymous functions we see that the new syntax is function(arguments) { body }, so instead of:
create_function('$matches', ... )
you want:
function($matches) { ... }
and in between, instead of:
'
$matches[2] = ...
...
... </pre>";
'
you want to just remove the quotes and leave the code:
$matches[2] = ...
...
... </pre>";
The body is in single quotes, and there are no escaped single quotes in there, so the code doesn't need any other changes.

Wordpress: Automatically change specific URLs in posts

I have found a solution to change links in my wordpress theme, but not the links in the content. How is it possible to get the URL in the content, so I can also changed them?
I need to use the content filter. But how is it possible to change URLs like apple.com/test/ apple.com/test-123/, apple.com, microsoft.com, microsoft.com/test/. The function should also change correctly every matched URL in the content.
add_filter('the_content ', 'function_name');
The answer of a similiar question unfortunately doesn't work.
This is my working solution to change links, but not the links in the content.
add_filter('rh_post_offer_url_filter', 'link_change_custom');
function link_change_custom($offer_post_url){
$shops= array(
array('shop'=>'apple.com','id'=>'1234'),
array('shop'=>'microsoft.com','id'=>'5678'),
array('shop'=>'dell.com','id'=>'9876'),
);
foreach( $shops as $rule ) {
if (!empty($offer_post_url) && strpos($offer_post_url, $rule['shop']) !== false) {
$offer_post_url = 'https://www.network.com/promotion/click/id='.$rule['id'].'-yxz?param0='.rawurlencode($offer_post_url);
}
}
$shops2= array(
array('shop'=>'example.com','id'=>'1234'),
array('shop'=>'domain2.com','id'=>'5678'),
array('shop'=>'domain3','id'=>'9876'),
);
foreach( $shops2 as $rule ) {
if (!empty($offer_post_url) && strpos($offer_post_url, $rule['shop']) !== false) {
$offer_post_url = 'https://www.second-network.com/promotion/click/id='.$rule['id'].'-yxz?param0='.rawurlencode($offer_post_url);
}
}
return $offer_post_url;
}

If I understood you correctly, that is what you need
add_filter( 'the_content', 'replace_links_by_promotions' );
function replace_links_by_promotions( $content ) {
$shop_ids = array(
'apple.com' => '1234',
'microsoft.com' => '5678',
'dell.com' => '9876',
);
preg_match_all( '/https?:\/\/(www\.)?([-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6})\b([-a-zA-Z0-9()#:%_\+.~#?&\/=]*)/', $content, $matches, PREG_OFFSET_CAPTURE );
foreach ( $matches[2] as $index => $match ) {
if ( ! isset( $shop_ids[ $match[0] ] ) ) {
continue;
}
$offer_post_url = 'https://www.network.com/promotion/click/id=' . $shop_ids[ $match[0] ] . '-yxz?param0=' . rawurlencode( $matches[0][ $index ][0] );
$content = substr_replace( $content, $offer_post_url, $matches[0][ $index ][1], strlen( $matches[0][ $index ][0] ) );
}
return $content;
}

I think this works. Note that, as written, it will match every "apple.", "dell.", and "microsoft." link in every type of content that uses the content filter - posts, pages, excerpts, many custom post types, etc. - so, if you don't really want that, and you very well may not, then the main replacement function will have to be conditionalized, and the regex function more precisely targeted..., and that can get complicated.
(Also, come to think of it, I'm not sure whether the quotes in the anchor tags that the Regex finds will require special handling. If this doesn't work, we can look at that, too. Or maybe switch to a DOM parser, like maybe I should have started out by doing... )
/** INITIATE FILTER FUNCTION **/
add_filter( 'the_content', 'wpso_change_urls' ) ;
/**
* PREG CALLBACK FUNCTION
* Match Matches to id #s
* and return replacement urls enclosed in quotes (as found)
*/
function wpso_found_urls( $matches ) {
//someone else probably has a v clever parsimonious way to do this next part
//but at least this makes what's happening easy to read
if ( strpos( $matches[0], 'apple' ) ) {
$id = '1234' ;
}
if ( strpos( $matches[0], 'microsoft' ) ) {
$id = '5678' ;
}
if ( strpos( $matches[0], 'dell' ) ) {
$id = '9876' ;
}
$raw_url = trim( $matches[0], '"' ) ;
return '"https://www.network.com/promotion/click/id='. $id .'-yxz?param0='.rawurlencode( $raw_url) . '"' ;
}
/** ENDURING A DREADFUL FATE USING REGEX TO PARSE HTML **/
function wpso_change_urls( $content ) {
$find_urls = array(
'/"+(http|https)(\:\/\/\S*apple.\S*")/',
'/"+(http|https)(\:\/\/\S*microsoft.\S*")/',
'/"+(http|https)(\:\/\/\S*dell.\S*")/',
);
return preg_replace_callback( $find_urls, 'wpso_found_urls', $content ) ;
}
Returning (note: example prior to trimming quotes from the "raw URL" before encoded):
...from original (post editor) content like this:

Might try using something like the_content filter to do this:
add_filter('the_content', function($content){
// filter $content and replace urls
$content = str_replace('http://old-url', 'http://new-url', $content);
return $content;
});
More: https://developer.wordpress.org/reference/hooks/the_content/

PHP : How can I Highlight searched words in results and keep the words original text case?

I am displaying search results on site where users can search for specific keyword, words.
On results page I am trying to Highlight the searched words , in the result.
So user can get idea which words matched where.
e.g.
if user searches for : mango
the resulting item original : This Post contains Mango.
the resulting output I want of highlighted item : This Post contains <strong>Mango</strong>
I am using it like this.
<?php
//highlight all words
function highlight_words( $title, $searched_words_array) {
// loop through searched_words_array
foreach( $searched_words_array as $searched_word ) {
$title = highlight_word( $title, $searched_word); // highlight word
}
return $title; // return highlighted data
}
//highlight single word with color
function highlight_word( $title, $searched_word) {
$replace = '<strong>' . $searched_word . '</strong>'; // create replacement
$title = str_ireplace( $searched_word, $replace, $title ); // replace content
return $title; // return highlighted data
}
I am getting searched words from Sphinx Search Engine , the issue is Sphinx returns entered/macthed words in lowercase.
So by using above code , my
results becomes : This Post contains <strong>mango</strong>
*notice the m from mango got lowercase.
So my question is how can I Highlight word i.e. wrap <strong> & </strong> around the words matching the Searched words ?
without loosing its textcase ?
*ppl. its not same questions as how to highlight search results , I am asking my keywords array is in lowercase and using above method the original word gets replaced by lowercase word.
so how can I stop that ?
the other question link will face this too , because the searched keywords are in lowercase. and using str_ireplace it will match it and replace it with lowercase word.
update :
i have combined various code snippets to get what i was expecting code to do.,
for now its working great.
function strong_words( $title, $searched_words_array) {
//for all words in array
foreach ($searched_words_array as $word){
$lastPos = 0;
$positions = array();
//find all positions of word
while (($lastPos = stripos($title, $word, $lastPos))!== false) {
$positions[] = $lastPos;
$lastPos = $lastPos + strlen($word);
}
//reverse sort numeric array
rsort($positions);
// highlight all occurances
foreach ($positions as $pos) {
$title = strong_word($title , $word, $pos);
}
}
//apply strong html code to occurances
$title = str_replace('#####','</strong>',$title);
$title = str_replace('*****','<strong>',$title);
return $title; // return highlighted data
}
function strong_word($title , $word, $pos){
//ugly hack to not use <strong> , </strong> here directly, as it can get replaced if searched word contains charcters from strong
$title = substr_replace($title, '#####', $pos+strlen($word) , 0) ;
$title = substr_replace($title, '*****', $pos , 0) ;
return $title;
}
$title = 'This is Great Mango00lk mango';
$words = array('man','a' , 'go','is','g', 'strong') ;
echo strong_words($title,$words);

Regex solution:
function highlight_word( $title, $searched_word) {
return preg_replace('#('.$searched_word.')#i','<strong>\1<strong>',$title) ;
}
Just be wary of special characters that may be interpreted as meta characters in $searched_word

Here's a code snippet I wrote a while back that's working to do exactly what you want:
if(stripos($result->question, $word) !== FALSE){
$word_to_highlight = substr($result->question, stripos($result->question, $word), strlen($word));
$result->question = str_replace($word_to_highlight, '<span class="search-term">'.$word_to_highlight.'</span>', $result->question);
}

//will find all occurances of all words and make them strong in html
function strong_words( $title, $searched_words_array) {
//for all words in array
foreach ($searched_words_array as $word){
$lastPos = 0;
$positions = array();
//find all positions of word
while (($lastPos = stripos($title, $word, $lastPos))!== false) {
$positions[] = $lastPos;
$lastPos = $lastPos + strlen($word);
}
//reverse sort numeric array
rsort($positions);
// highlight all occurances
foreach ($positions as $pos) {
$title = strong_word($title , $word, $pos);
}
}
//apply strong html code to occurances
$title = str_replace('#####','</strong>',$title);
$title = str_replace('*****','<strong>',$title);
return $title; // return highlighted data
}
function strong_word($title , $word, $pos){
//ugly hack to not use <strong> , </strong> here directly, as it can get replaced if searched word contains charcters from strong
$title = substr_replace($title, '#####', $pos+strlen($word) , 0) ;
$title = substr_replace($title, '*****', $pos , 0) ;
return $title;
}
$title = 'This is Great Mango00lk mango';
$word = array('man','a' , 'go','is','g', 'strong') ;
echo strong_words($title,$word);
This code will find all occurrences of all words and make them strong in html while keeping original text case.

function highlight_word( $content, $word, $color ) {
$replace = '<span style="background-color: ' . $color . ';">' . $word . '</span>'; // create replacement
$content = str_replace( $word, $replace, $content ); // replace content
return $content; // return highlighted data
}
function highlight_words( $content, $words, $colors ) {
$color_index = 0; // index of color (assuming it's an array)
// loop through words
foreach( $words as $word ) {
$content = highlight_word( $content, $word, $colors[$color_index] ); // highlight word
$color_index = ( $color_index + 1 ) % count( $colors ); // get next color index
}
return $content; // return highlighted data
}
// words to find
$words = array(
'normal',
'text'
);
// colors to use
$colors = array(
'#88ccff',
'#cc88ff'
);
// faking your results_text
$results_text = array(
array(
'ab' => 'AB #1',
'cd' => 'Some normal text with normal words isn\'t abnormal at all'
), array(
'ab' => 'AB #2',
'cd' => 'This is another text containing very normal content'
)
);
// loop through results (assuming $output1 is true)
foreach( $results_text as $result ) {
$result['cd'] = highlight_words( $result['cd'], $words, $colors );
echo '<fieldset><p>ab: ' . $result['ab'] . '<br />cd: ' . $result['cd'] . '</p></fieldset>';
}
Original link check here

New line to paragraph function

I have this interesting function that I'm using to create new lines into paragraphs. I'm using it instead of the nl2br() function, as it outputs better formatted text.
function nl2p($string, $line_breaks = true, $xml = true) {
$string = str_replace(array('<p>', '</p>', '<br>', '<br />'), '', $string);
// It is conceivable that people might still want single line-breaks
// without breaking into a new paragraph.
if ($line_breaks == true)
return '<p>'.preg_replace(array("/([\n]{2,})/i", "/([^>])\n([^<])/i"), array("</p>\n<p>", '<br'.($xml == true ? ' /' : '').'>'), trim($string)).'</p>';
else
return '<p>'.preg_replace(
array("/([\n]{2,})/i", "/([\r\n]{3,})/i","/([^>])\n([^<])/i"),
array("</p>\n<p>", "</p>\n<p>", '<br'.($xml == true ? ' /' : '').'>'),
trim($string)).'</p>';
}
The problem is that whenever I try to create a single line break, it inadvertently removes the first character of the paragraph below it. I'm not familiar enough with regex to understand what is causing the problem.

Here is another approach that doesn't use regular expressions. Note, this function will remove any single line-breaks.
function nl2p($string)
{
$paragraphs = '';
foreach (explode("\n", $string) as $line) {
if (trim($line)) {
$paragraphs .= '<p>' . $line . '</p>';
}
}
return $paragraphs;
}
If you only need to do this once in your app and don't want to create a function, it can easily be done inline:
<?php foreach (explode("\n", $string) as $line): ?>
<?php if (trim($line)): ?>
<p><?=$line?></p>
<?php endif ?>
<?php endforeach ?>

The problem is with your match for single line breaks. It matches the last character before the line break and the first after. Then you replace the match with <br>, so you lose those characters as well. You need to keep them in the replacement.
Try this:
function nl2p($string, $line_breaks = true, $xml = true) {
$string = str_replace(array('<p>', '</p>', '<br>', '<br />'), '', $string);
// It is conceivable that people might still want single line-breaks
// without breaking into a new paragraph.
if ($line_breaks == true)
return '<p>'.preg_replace(array("/([\n]{2,})/i", "/([^>])\n([^<])/i"), array("</p>\n<p>", '$1<br'.($xml == true ? ' /' : '').'>$2'), trim($string)).'</p>';
else
return '<p>'.preg_replace(
array("/([\n]{2,})/i", "/([\r\n]{3,})/i","/([^>])\n([^<])/i"),
array("</p>\n<p>", "</p>\n<p>", '$1<br'.($xml == true ? ' /' : '').'>$2'),
trim($string)).'</p>';
}

I also wrote a very simple version:
function nl2p($text)
{
return '<p>' . str_replace(['\r\n', '\r', '\n'], '</p><p>', $text) . '</p>';
}

#Laurent's answer wasn't working for me - the else statement was doing what the $line_breaks == true statement should have been doing, and it was making multiple line breaks into <br> tags, which PHP's native nl2br() already does.
Here's what I managed to get working with the expected behavior:
function nl2p( $string, $line_breaks = true, $xml = true ) {
// Remove current tags to avoid double-wrapping.
$string = str_replace( array( '<p>', '</p>', '<br>', '<br />' ), '', $string );
// Default: Use <br> for single line breaks, <p> for multiple line breaks.
if ( $line_breaks == true ) {
$string = '<p>' . preg_replace(
array( "/([\n]{2,})/i", "/([\r\n]{3,})/i", "/([^>])\n([^<])/i" ),
array( "</p>\n<p>", "</p>\n<p>", '$1<br' . ( $xml == true ? ' /' : '' ) . '>$2' ),
trim( $string ) ) . '</p>';
// Use <p> for all line breaks if $line_breaks is set to false.
} else {
$string = '<p>' . preg_replace(
array( "/([\n]{1,})/i", "/([\r]{1,})/i" ),
"</p>\n<p>",
trim( $string ) ) . '</p>';
}
// Remove empty paragraph tags.
$string = str_replace( '<p></p>', '', $string );
// Return string.
return $string;
}

Here's an approach that comes with a reverse method to replace paragraphs back to regular line breaks and vice versa.
These are useful to use when building a form input. When saving a users input you may want to convert line breaks to paragraph tags, however when editing the text in a form, you may not want the user to see any html characters. Then we would replace the paragraphs back to line breaks.
// This function will convert newlines to HTML paragraphs
// without paying attention to HTML tags. Feed it a raw string and it will
// simply return that string sectioned into HTML paragraphs
function nl2p($str) {
$arr=explode("\n",$str);
$out='';
for($i=0;$i<count($arr);$i++) {
if(strlen(trim($arr[$i]))>0)
$out.='<p>'.trim($arr[$i]).'</p>';
}
return $out;
}
// Return paragraph tags back to line breaks
function p2nl($str)
{
$str = preg_replace("/<p[^>]*?>/", "", $str);
$str = str_replace("</p>", "\r\n", $str);
return $str;
}

Expanding upon #NaturalBornCamper's solution:
function nl2p( $text, $class = '' ) {
$string = str_replace( array( "\r\n\r\n", "\n\n" ), '</p><p>', $text);
$string = str_replace( array( "\r\n", "\n" ), '<br />', $string);
return '<p' . ( $class ? ' class="' . $class . '"' : '' ) . '>' . $string . '</p>';
}
This takes care of both double line breaks by converting them to paragraphs, and single line breaks by converting them to <br />

Just type this between your lines:
echo '<br>';
This will give you a new line.

Find URLs, #replies and #hashtags from Tweets

I'm building a little Twitter thing in PHP and I'm trying to parse URLs, #replies and #hashtags and make them into clickable links.
The #replies would link to http://twitter.com/replies
Hashtags would like to http://search.twitter.com/search?q=%23hashtags
I've found a class for parsing URLs and I'm wondering if this could also be used to parse #replies and #hashtags as well:
// http://josephscott.org/archives/2008/11/makeitlink-detecting-urls-in-text-and-making-them-links/
class MakeItLink {
protected function _link_www( $matches ) {
$url = $matches[2];
$url = MakeItLink::cleanURL( $url );
if( empty( $url ) ) {
return $matches[0];
}
return "{$matches[1]}<a href='{$url}'>{$url}</a>";
}
public function cleanURL( $url ) {
if( $url == '' ) {
return $url;
}
$url = preg_replace( "|[^a-z0-9-~+_.?#=!&;,/:%#$*'()x80-xff]|i", '', $url );
$url = str_replace( array( "%0d", "%0a" ), '', $url );
$url = str_replace( ";//", "://", $url );
/* If the URL doesn't appear to contain a scheme, we
* presume it needs http:// appended (unless a relative
* link starting with / or a php file).
*/
if(
strpos( $url, ":" ) === false
&& substr( $url, 0, 1 ) != "/"
&& !preg_match( "|^[a-z0-9-]+?.php|i", $url )
) {
$url = "http://{$url}";
}
// Replace ampersans and single quotes
$url = preg_replace( "|&([^#])(?![a-z]{2,8};)|", "&$1", $url );
$url = str_replace( "'", "'", $url );
return $url;
}
public function transform( $text ) {
$text = " {$text}";
$text = preg_replace_callback(
'#(?<=[\s>])(\()?([\w]+?://(?:[\w\\x80-\\xff\#$%&~/\-=?#\[\](+]|[.,;:](?![\s<])|(?(1)\)(?![\s<])|\)))*)#is',
array( 'MakeItLink', '_link_www' ),
$text
);
$text = preg_replace( '#(<a( [^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i', "$1$3</a>", $text );
$text = trim( $text );
return $text;
}
}

I think what you're looking to do is essentially what I've included below. You'd add these two statements in transform method, just before the return statement.
$text = preg_replace('##(\w+)#', '$0', $text);
$text = preg_replace('/#(\w+)/', '$0', $text);
Is that what you're looking for?

Twitter recently released to open source both java and ruby (gem) implementations of the code they use for finding user names, hash tags, lists and urls.
It is very regular expression oriented.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP preg_match ignore within certain elements - php

Related

Migrate create_function() which is not supported since PHP 7.2

Wordpress: Automatically change specific URLs in posts

PHP : How can I Highlight searched words in results and keep the words original text case?

New line to paragraph function

Find URLs, #replies and #hashtags from Tweets

Categories

Resources