Wordpress: Automatically change specific URLs in posts - php

I have found a solution to change links in my wordpress theme, but not the links in the content. How is it possible to get the URL in the content, so I can also changed them?
I need to use the content filter. But how is it possible to change URLs like apple.com/test/ apple.com/test-123/, apple.com, microsoft.com, microsoft.com/test/. The function should also change correctly every matched URL in the content.
add_filter('the_content ', 'function_name');
The answer of a similiar question unfortunately doesn't work.
This is my working solution to change links, but not the links in the content.
add_filter('rh_post_offer_url_filter', 'link_change_custom');
function link_change_custom($offer_post_url){
$shops= array(
array('shop'=>'apple.com','id'=>'1234'),
array('shop'=>'microsoft.com','id'=>'5678'),
array('shop'=>'dell.com','id'=>'9876'),
);
foreach( $shops as $rule ) {
if (!empty($offer_post_url) && strpos($offer_post_url, $rule['shop']) !== false) {
$offer_post_url = 'https://www.network.com/promotion/click/id='.$rule['id'].'-yxz?param0='.rawurlencode($offer_post_url);
}
}
$shops2= array(
array('shop'=>'example.com','id'=>'1234'),
array('shop'=>'domain2.com','id'=>'5678'),
array('shop'=>'domain3','id'=>'9876'),
);
foreach( $shops2 as $rule ) {
if (!empty($offer_post_url) && strpos($offer_post_url, $rule['shop']) !== false) {
$offer_post_url = 'https://www.second-network.com/promotion/click/id='.$rule['id'].'-yxz?param0='.rawurlencode($offer_post_url);
}
}
return $offer_post_url;
}

If I understood you correctly, that is what you need
add_filter( 'the_content', 'replace_links_by_promotions' );
function replace_links_by_promotions( $content ) {
$shop_ids = array(
'apple.com' => '1234',
'microsoft.com' => '5678',
'dell.com' => '9876',
);
preg_match_all( '/https?:\/\/(www\.)?([-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6})\b([-a-zA-Z0-9()#:%_\+.~#?&\/=]*)/', $content, $matches, PREG_OFFSET_CAPTURE );
foreach ( $matches[2] as $index => $match ) {
if ( ! isset( $shop_ids[ $match[0] ] ) ) {
continue;
}
$offer_post_url = 'https://www.network.com/promotion/click/id=' . $shop_ids[ $match[0] ] . '-yxz?param0=' . rawurlencode( $matches[0][ $index ][0] );
$content = substr_replace( $content, $offer_post_url, $matches[0][ $index ][1], strlen( $matches[0][ $index ][0] ) );
}
return $content;
}

I think this works. Note that, as written, it will match every "apple.", "dell.", and "microsoft." link in every type of content that uses the content filter - posts, pages, excerpts, many custom post types, etc. - so, if you don't really want that, and you very well may not, then the main replacement function will have to be conditionalized, and the regex function more precisely targeted..., and that can get complicated.
(Also, come to think of it, I'm not sure whether the quotes in the anchor tags that the Regex finds will require special handling. If this doesn't work, we can look at that, too. Or maybe switch to a DOM parser, like maybe I should have started out by doing... )
/** INITIATE FILTER FUNCTION **/
add_filter( 'the_content', 'wpso_change_urls' ) ;
/**
* PREG CALLBACK FUNCTION
* Match Matches to id #s
* and return replacement urls enclosed in quotes (as found)
*/
function wpso_found_urls( $matches ) {
//someone else probably has a v clever parsimonious way to do this next part
//but at least this makes what's happening easy to read
if ( strpos( $matches[0], 'apple' ) ) {
$id = '1234' ;
}
if ( strpos( $matches[0], 'microsoft' ) ) {
$id = '5678' ;
}
if ( strpos( $matches[0], 'dell' ) ) {
$id = '9876' ;
}
$raw_url = trim( $matches[0], '"' ) ;
return '"https://www.network.com/promotion/click/id='. $id .'-yxz?param0='.rawurlencode( $raw_url) . '"' ;
}
/** ENDURING A DREADFUL FATE USING REGEX TO PARSE HTML **/
function wpso_change_urls( $content ) {
$find_urls = array(
'/"+(http|https)(\:\/\/\S*apple.\S*")/',
'/"+(http|https)(\:\/\/\S*microsoft.\S*")/',
'/"+(http|https)(\:\/\/\S*dell.\S*")/',
);
return preg_replace_callback( $find_urls, 'wpso_found_urls', $content ) ;
}
Returning (note: example prior to trimming quotes from the "raw URL" before encoded):
...from original (post editor) content like this:

Might try using something like the_content filter to do this:
add_filter('the_content', function($content){
// filter $content and replace urls
$content = str_replace('http://old-url', 'http://new-url', $content);
return $content;
});
More: https://developer.wordpress.org/reference/hooks/the_content/

Related

PHP preg_match ignore within certain elements

I'm writing a regex where I need to filter content to format it's typography. So far, my code seems to be filtering out my content properly using preg_replace, but I can't figure out how to avoid this for content wrapped within certain tags, say <pre>.
As a reference, this is to be used within WordPress's the_content filter, so my current code looks like so:
function my_typography( $str ) {
$ignore_elements = array("code", "pre");
$rules = array(
"?" => array("before"=> " ", "after"=>""),
// the others are stripped out for simplicity
);
foreach($rules as $rule=>$params) {
// Pseudo :
// if( !in_array( $parent_tag, $ignore_elements) {
// /Pseudo
$formatted = $params['before'] . $rule . $params['after'];
$str = preg_replace( $rule, $formatted, $str );
// Pseudo :
// }
// /Pseudo
}
return $str;
}
add_filter( 'the_content', 'my_typography' );
Basically:
<p>Was this filtered? I hope so</p>
<pre>Was this filtered? I hope not.</pre>
should become
<p>Was this filtered ? I hope so</p>
<pre>Was this filtered? I hope not.</pre>
You need to wrap search regex with regex delimiter in preg_replace and must call preg_quote to escape all special regex characters such ?, ., *, + etc:
$str = preg_replace( '~' . preg_quote($rule, '~') . '~', $formatted, $str );
Full Code:
function my_typography( $str ) {
$ignore_elements = array("code", "pre");
$rules = array(
"?" => array("before"=> " ", "after"=>""),
// the others are stripped out for simplicity
);
foreach($rules as $rule=>$params) {
// Pseudo :
// if( !in_array( $parent_tag, $ignore_elements) {
// /Pseudo
$formatted = $params['before'] . $rule . $params['after'];
$str = preg_replace( '~' . preg_quote($rule, '~') . '~', $formatted, $str );
// Pseudo :
// }
// /Pseudo
}
return $str;
}
Output:
<p>Was this filtered ? I hope so</p>
<pre>Was this filtered ? I hope not.</pre>

PHP regex bug not respecting linebreak

Okay, so I've got something of a weird edge case bug that I can't seem to squash.
I've got a textarea form input where users can type status updates. I've built a method to parse through this and autolink http-links (except for a few domains where I use the Essence library to do some oEmbed magic).
But in a very specific edge case the autolink complete buggers out.
Specifically, when there's url to a subdirectory, without an ending slash, where immediately after the url the user does a carriage return to a new line and keeps typing.
When this happens the first word on the new line is included in the url being matched.
The function looks like this:
function autolink( $text, $attributes=array() ) {
$regex = "/(http|https)\:\/\/[a-z0-9\-\.]+\.[a-z0-9]{2,99}(\/\S*)?/i";
$urls = array();
if( preg_match_all( $regex, $text, $urls, PREG_PATTERN_ORDER ) ) {
foreach($urls[0] as $url) {
$parsed_url = parse_url($url);
if( in_array( $parsed_url['host'], array( 'youtube.com', 'vimeo.com', 'soundcloud.com', 'www.youtube.com', 'www.vimeo.com', 'www.soundcloud.com' ) ) ) {
$essence = Essence\Essence::instance();
$media = $essence->embed( $url );
$text = str_replace($url, '<div class="embed-container">'.$media->html.'</div>', $text);
} else {
$attrs = '';
foreach( $attributes as $attribute => $value ) {
$attrs .= " {$attribute}=\"{$value}\"";
}
$text = str_replace($url,'<a href="'.$url.'"'.$attrs.'>'.$url.'</a>', $text);
}
}
}
$text = '<pre>'.print_r($urls, true).'</pre>'.$text;
$text = trim( $text );
return $text;
}

Need help with preg_replace interpreting {variables} with parameters

I want to replace
{youtube}Video_ID_Here{/youtube}
with the embed code for a youtube video.
So far I have
preg_replace('/{youtube}(.*){\/youtube}/iU',...)
and it works just fine.
But now I'd like to be able to interpret parameters like height, width, etc. So could I have one regex for this whether is does or doesn't have parameters? It should be able to inperpret all of these below...
{youtube height="200px" width="150px" color1="#eee" color2="rgba(0,0,0,0.5)"}Video_ID_Here{/youtube}
{youtube height="200px"}Video_ID_Here{/youtube}
{youtube}Video_ID_Here{/youtube}
{youtube width="150px" showborder="1"}Video_ID_Here{/youtube}
Try this:
function createEmbed($videoID, $params)
{
// $videoID contains the videoID between {youtube}...{/youtube}
// $params is an array of key value pairs such as height => 200px
return 'HTML...'; // embed code
}
if (preg_match_all('/\{youtube(.*?)\}(.+?)\{\/youtube\}/', $string, $matches)) {
foreach ($matches[0] as $index => $youtubeTag) {
$params = array();
// break out the attributes
if (preg_match_all('/\s([a-z0-9]+)="([^\s]+?)"/', $matches[1][$index], $rawParams)) {
for ($x = 0; $x < count($rawParams[0]); $x++) {
$params[$rawParams[1][$x]] = $rawParams[2][$x];
}
}
// replace {youtube}...{/youtube} with embed code
$string = str_replace($youtubeTag, createEmbed($matches[2][$index], $params), $string);
}
}
this code matches the {youtube}...{/youtube} tags first and then splits out the attributes into an array, passing both them (as key/value pairs) and the video ID to a function. Just fill in the function definition to make it validate the params you want to support and build up the appropriate HTML code.
You probably want to use preg_replace_callback, as the replacing can get quite convoluted otherwise.
preg_replace_callback('/{youtube(.*)}(.*){\/youtube}/iU',...)
And in your callback, check $match[1] for something like the /(width|showborder|height|color1)="([^"]+)"/i pattern. A simple preg_match_all inside a preg_replace_callback keeps all portions nice & tidy and above all legible.
I would do it something like this:
preg_match_all("/{youtube(.*?)}(.*?){\/youtube}/is", $content, $matches);
for($i=0;$i<count($matches[0]);$i++)
{
$params = $matches[1][$i];
$youtubeurl = $matches[2][$i];
$paramsout = array();
if(preg_match("/height\s*=\s*('|\")([0-9]+px)('|\")/i", $params, $match)
{
$paramsout[] = "height=\"{$match[2]}\"";
}
//process others
//setup new code
$tagcode = "<object ..." . implode(" ", $paramsout) ."... >"; //I don't know what the code is to display a youtube video
//replace original tag
$content = str_replace($matches[0][$i], $tagcode, $content);
}
You could just look for params after "{youtube" and before "}" but you open yourself up to XSS problems. The best way would be look for a specific number of parameters and verify them. Don't allow things like < and > to be passed inside your tags as someone could put do_something_nasty(); or something.
I'd not use regex at all, since they are notoriously bad at parsing markup.
Since your input format is so close to HTML/XML in the first place, I'd rely on that
$tests = array(
'{youtube height="200px" width="150px" color1="#eee" color2="rgba(0,0,0,0.5)"}Video_ID_Here{/youtube}'
, '{youtube height="200px"}Video_ID_Here{/youtube}'
, '{youtube}Video_ID_Here{/youtube}'
, '{youtube width="150px" showborder="1"}Video_ID_Here{/youtube}'
, '{YOUTUBE width="150px" showborder="1"}Video_ID_Here{/youtube}' // deliberately invalid
);
echo '<pre>';
foreach ( $tests as $test )
{
try {
$youtube = SimpleXMLYoutubeElement::fromUserInput( $test );
print_r( $youtube );
}
catch ( Exception $e )
{
echo $e->getMessage() . PHP_EOL;
}
}
echo '</pre>';
class SimpleXMLYoutubeElement extends SimpleXMLElement
{
public static function fromUserInput( $code )
{
$xml = #simplexml_load_string(
str_replace( array( '{', '}' ), array( '<', '>' ), strip_tags( $code ) ), __CLASS__
);
if ( !$xml || 'youtube' != $xml->getName() )
{
throw new Exception( 'Invalid youtube element' );
}
return $xml;
}
public function toEmbedCode()
{
// write code to convert this to proper embode code
}
}

Strip Tags thats are NOT in [code] [/code] segment

I'm trying to find a way to strip tags from a user-inputted string except from tags that are wrapped in the [code] [/code] BB style tag.
For example, a user may enter this:
<script>alert("hacked");</script>
[code]<script>alert("hello");</script>[/code]
What I would like is the "hacked" alert to be removed, but not the "Hello" alert.
I would like to remove ALL tags (php, html, css, js) outside of the [code] but allow anything within them.
So far, I've got the following code to do the reverse of what I would like:
preg_replace('/\[code\](.*?)\[\/code\]/ise','strip_tags(\'$1\')',$code)
I'm not sure if this is the best algorithm, but here's an idea.
Remove all the [code] blocks into an array
Strip tags from the remaining string
Re-insert the previously removed [code] blocks.
Voila!
Here's a stab at that algo
<?php
header( 'Content-Type: text/plain' );
$input = <<<BB
[code]<script>alert("hello");</script>[/code]
some text <script>alert("hacked");</script> some other text
[code]<script>alert("hello");</script>[/code]
some text <script>alert("hacked");</script> some other text
[code]<script>alert("hello");</script>[/code]
BB;
echo strip_custom( $input );
function strip_custom( $content )
{
$pattern = "#\\[code].*?\\[/code]#i";
if ( preg_match_all( $pattern, $content, $codeBlocks ) )
{
return array_join( $codeBlocks[0], array_map( 'strip_tags', preg_split( $pattern, $content ) ) );
}
return strip_tags( $content );
}
function array_join( array $glue, array $pieces )
{
$glue = array_values( $glue );
$pieces = array_values( $pieces );
$piecesSize = count( $pieces );
if ( count( $glue ) + 1 != $piecesSize )
{
return false;
}
$joined = array();
for ( $i = 0; $i < $piecesSize; $i++ )
{
$joined[] = $pieces[$i];
if ( isset( $glue[$i] ) )
{
$joined[] = $glue[$i];
}
}
return implode( '', $joined );
}
This is where regular expressions are not ideal. Regular expressions are superb when you know "what you want" but not "what you don't want". My suggestion is that you try to find an alternative way of doing the same thing, but without regular expressions.
You want to use an HTML Parser for this job.
I don't know PHP but Google found this HTML Parser for PHP.
Use a simple parser like this:
stack-pointer = 0
while not finished:
stack-pointer-n = code-start-matched or endl
tag-free-str = regex-magic-to-strip-tags(extract-str(stack-pointer, stack-pointer-n))
preserve-str = extract-str(stack-pointer-n, code-endl-matched or endl)
stack-pointer = code-endl-matched + 1
push(tag-free-str)
push(preserve-str)

Php parse links/emails

I am wondering if there is a simple snippet which converts links of any kind:
http://www.cnn.com to http://www.cnn.com
cnn.com to cnn.com
www.cnn.com to www.cnn.com
abc#def.com to to mailto:abc#def.com
I do not want to use any PHP5 specific library.
Thank you for your time.
UPDATE I have updated the above text to what i want to convert it to. Please note that the href tag and the text are different for case 2 and 3.
UPDATE2 Hows does gmail chat do it? Theirs is pretty smart and works only for real domains names. e.g. a.ly works but a.cb does not work.
yes ,
http://www.gidforums.com/t-1816.html
<?php
/**
NAME : autolink()
VERSION : 1.0
AUTHOR : J de Silva
DESCRIPTION : returns VOID; handles converting
URLs into clickable links off a string.
TYPE : functions
======================================*/
function autolink( &$text, $target='_blank', $nofollow=true )
{
// grab anything that looks like a URL...
$urls = _autolink_find_URLS( $text );
if( !empty($urls) ) // i.e. there were some URLS found in the text
{
array_walk( $urls, '_autolink_create_html_tags', array('target'=>$target, 'nofollow'=>$nofollow) );
$text = strtr( $text, $urls );
}
}
function _autolink_find_URLS( $text )
{
// build the patterns
$scheme = '(http:\/\/|https:\/\/)';
$www = 'www\.';
$ip = '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}';
$subdomain = '[-a-z0-9_]+\.';
$name = '[a-z][-a-z0-9]+\.';
$tld = '[a-z]+(\.[a-z]{2,2})?';
$the_rest = '\/?[a-z0-9._\/~#&=;%+?-]+[a-z0-9\/#=?]{1,1}';
$pattern = "$scheme?(?(1)($ip|($subdomain)?$name$tld)|($www$name$tld))$the_rest";
$pattern = '/'.$pattern.'/is';
$c = preg_match_all( $pattern, $text, $m );
unset( $text, $scheme, $www, $ip, $subdomain, $name, $tld, $the_rest, $pattern );
if( $c )
{
return( array_flip($m[0]) );
}
return( array() );
}
function _autolink_create_html_tags( &$value, $key, $other=null )
{
$target = $nofollow = null;
if( is_array($other) )
{
$target = ( $other['target'] ? " target=\"$other[target]\"" : null );
// see: http://www.google.com/googleblog/2005/01/preventing-comment-spam.html
$nofollow = ( $other['nofollow'] ? ' rel="nofollow"' : null );
}
$value = "<a href=\"$key\"$target$nofollow>$key</a>";
}
?>
Try this out. (for links not email)
$newTweet = preg_replace('!http://([a-zA-Z0-9./-]+[a-zA-Z0-9/-])!i', '\\0', $tweet->text);
I know is 5 years late, however I needed a similar solution and the best answer I got was from the user - erwan-dupeux-maire
Answer
I write this function. It replaces all the links in a string. Links can be in the following formats :
www.example.com
http://example.com
https://example.com
example.fr
The second argument is the target for the link ('_blank', '_top'... can be set to false). Hope it helps...
public static function makeLinks($str, $target='_blank')
{
if ($target)
{
$target = ' target="'.$target.'"';
}
else
{
$target = '';
}
// find and replace link
$str = preg_replace('#((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)#', '<a href="$1" '.$target.'>$1</a>', $str);
// add "http://" if not set
$str = preg_replace('/<a\s[^>]*href\s*=\s*"((?!https?:\/\/)[^"]*)"[^>]*>/i', '<a href="http://$1" '.$target.'>', $str);
return $str;
}
Here's the email snippet:
$email = "abc#def.com";
$pos = strrpos($email, "#");
if (!$pos === false) {
// This is an email address!
$email .= "mailto:" . $email;
}
What exactly are you looking to do with the links? strip the www or http? or add http://www to any link if required?

Categories