PHP regex bug not respecting linebreak

PHP regex bug not respecting linebreak - php

Okay, so I've got something of a weird edge case bug that I can't seem to squash.
I've got a textarea form input where users can type status updates. I've built a method to parse through this and autolink http-links (except for a few domains where I use the Essence library to do some oEmbed magic).
But in a very specific edge case the autolink complete buggers out.
Specifically, when there's url to a subdirectory, without an ending slash, where immediately after the url the user does a carriage return to a new line and keeps typing.
When this happens the first word on the new line is included in the url being matched.
The function looks like this:
function autolink( $text, $attributes=array() ) {
$regex = "/(http|https)\:\/\/[a-z0-9\-\.]+\.[a-z0-9]{2,99}(\/\S*)?/i";
$urls = array();
if( preg_match_all( $regex, $text, $urls, PREG_PATTERN_ORDER ) ) {
foreach($urls[0] as $url) {
$parsed_url = parse_url($url);
if( in_array( $parsed_url['host'], array( 'youtube.com', 'vimeo.com', 'soundcloud.com', 'www.youtube.com', 'www.vimeo.com', 'www.soundcloud.com' ) ) ) {
$essence = Essence\Essence::instance();
$media = $essence->embed( $url );
$text = str_replace($url, '<div class="embed-container">'.$media->html.'</div>', $text);
} else {
$attrs = '';
foreach( $attributes as $attribute => $value ) {
$attrs .= " {$attribute}=\"{$value}\"";
}
$text = str_replace($url,'<a href="'.$url.'"'.$attrs.'>'.$url.'</a>', $text);
}
}
}
$text = '<pre>'.print_r($urls, true).'</pre>'.$text;
$text = trim( $text );
return $text;
}

Related

str_replace doesn't replace accurately

I am working on a multilingual wordpress website where I need to append a loclale in all internal url in a web page.
So I have all of the content of the webpage in the $content variable. now
preg_match_all( '/<a\s+(?:[^>]*?\s+)?href=([\"\'])(.*?)\1/', $content, $matches );
$localized_url_arr = [];
$url_arr = [];
if ( ! empty( $matches[2] ) ) {
$current_locale = get_current_locale();
foreach ( $matches[2] as $url ) {
if ( preg_match( '/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/si', $url ) ) {
continue;
}
$new_url = add_locale_to_url( $url, $current_locale ); // this is adding locale to url eg => www.example.com --> www.example.com/us for us locale
if ( $new_url !== $url ) {
$localized_url_arr[] = [
$url => $new_url
];
}
}
}
$arr = array_merge(...$localized_url_arr);
$content = str_replace( array_keys($arr), array_values($arr), $content );
now Ideally this function should replace those to those urls which dosen;t have a locale in them. but it is appending locale in all the url, however the $arr has only those urls which needs to be appended with a locale but my str_replace is appending all urls that we have in matches[2] array.

When you replace www.example.com with www.example.com/us, it will replace it anywhere it appears, even if there's already /us after it.
You can use a regular expression with a negative lookahead to replace a string only if it's not followed by some other pattern.
preg_match_all( '/<a\s+(?:[^>]*?\s+)?href=([\"\'])(.*?)\1/', $content, $matches );
$localized_url_arr = [];
if ( ! empty( $matches[2] ) ) {
$current_locale = get_current_locale();
foreach ( $matches[2] as $url ) {
if ( preg_match( '/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/si', $url ) ) {
continue;
}
$new_url = add_locale_to_url( $url, $current_locale ); // this is adding locale to url eg => www.example.com --> www.example.com/us for us locale
$url_pattern = "#$url(?!/us/#si";
if ( $new_url !== $url ) {
$localized_url_arr[$url] = $new_url
}
}
$content = preg_replace(array_keys($localized_url_arr), array_values($localized_url_arr), $content );
}
The regular expression matches each URL unless it's followed by /us/, and will replace it with $new_url.

Wordpress: Automatically change specific URLs in posts

I have found a solution to change links in my wordpress theme, but not the links in the content. How is it possible to get the URL in the content, so I can also changed them?
I need to use the content filter. But how is it possible to change URLs like apple.com/test/ apple.com/test-123/, apple.com, microsoft.com, microsoft.com/test/. The function should also change correctly every matched URL in the content.
add_filter('the_content ', 'function_name');
The answer of a similiar question unfortunately doesn't work.
This is my working solution to change links, but not the links in the content.
add_filter('rh_post_offer_url_filter', 'link_change_custom');
function link_change_custom($offer_post_url){
$shops= array(
array('shop'=>'apple.com','id'=>'1234'),
array('shop'=>'microsoft.com','id'=>'5678'),
array('shop'=>'dell.com','id'=>'9876'),
);
foreach( $shops as $rule ) {
if (!empty($offer_post_url) && strpos($offer_post_url, $rule['shop']) !== false) {
$offer_post_url = 'https://www.network.com/promotion/click/id='.$rule['id'].'-yxz?param0='.rawurlencode($offer_post_url);
}
}
$shops2= array(
array('shop'=>'example.com','id'=>'1234'),
array('shop'=>'domain2.com','id'=>'5678'),
array('shop'=>'domain3','id'=>'9876'),
);
foreach( $shops2 as $rule ) {
if (!empty($offer_post_url) && strpos($offer_post_url, $rule['shop']) !== false) {
$offer_post_url = 'https://www.second-network.com/promotion/click/id='.$rule['id'].'-yxz?param0='.rawurlencode($offer_post_url);
}
}
return $offer_post_url;
}

If I understood you correctly, that is what you need
add_filter( 'the_content', 'replace_links_by_promotions' );
function replace_links_by_promotions( $content ) {
$shop_ids = array(
'apple.com' => '1234',
'microsoft.com' => '5678',
'dell.com' => '9876',
);
preg_match_all( '/https?:\/\/(www\.)?([-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6})\b([-a-zA-Z0-9()#:%_\+.~#?&\/=]*)/', $content, $matches, PREG_OFFSET_CAPTURE );
foreach ( $matches[2] as $index => $match ) {
if ( ! isset( $shop_ids[ $match[0] ] ) ) {
continue;
}
$offer_post_url = 'https://www.network.com/promotion/click/id=' . $shop_ids[ $match[0] ] . '-yxz?param0=' . rawurlencode( $matches[0][ $index ][0] );
$content = substr_replace( $content, $offer_post_url, $matches[0][ $index ][1], strlen( $matches[0][ $index ][0] ) );
}
return $content;
}

I think this works. Note that, as written, it will match every "apple.", "dell.", and "microsoft." link in every type of content that uses the content filter - posts, pages, excerpts, many custom post types, etc. - so, if you don't really want that, and you very well may not, then the main replacement function will have to be conditionalized, and the regex function more precisely targeted..., and that can get complicated.
(Also, come to think of it, I'm not sure whether the quotes in the anchor tags that the Regex finds will require special handling. If this doesn't work, we can look at that, too. Or maybe switch to a DOM parser, like maybe I should have started out by doing... )
/** INITIATE FILTER FUNCTION **/
add_filter( 'the_content', 'wpso_change_urls' ) ;
/**
* PREG CALLBACK FUNCTION
* Match Matches to id #s
* and return replacement urls enclosed in quotes (as found)
*/
function wpso_found_urls( $matches ) {
//someone else probably has a v clever parsimonious way to do this next part
//but at least this makes what's happening easy to read
if ( strpos( $matches[0], 'apple' ) ) {
$id = '1234' ;
}
if ( strpos( $matches[0], 'microsoft' ) ) {
$id = '5678' ;
}
if ( strpos( $matches[0], 'dell' ) ) {
$id = '9876' ;
}
$raw_url = trim( $matches[0], '"' ) ;
return '"https://www.network.com/promotion/click/id='. $id .'-yxz?param0='.rawurlencode( $raw_url) . '"' ;
}
/** ENDURING A DREADFUL FATE USING REGEX TO PARSE HTML **/
function wpso_change_urls( $content ) {
$find_urls = array(
'/"+(http|https)(\:\/\/\S*apple.\S*")/',
'/"+(http|https)(\:\/\/\S*microsoft.\S*")/',
'/"+(http|https)(\:\/\/\S*dell.\S*")/',
);
return preg_replace_callback( $find_urls, 'wpso_found_urls', $content ) ;
}
Returning (note: example prior to trimming quotes from the "raw URL" before encoded):
...from original (post editor) content like this:

Might try using something like the_content filter to do this:
add_filter('the_content', function($content){
// filter $content and replace urls
$content = str_replace('http://old-url', 'http://new-url', $content);
return $content;
});
More: https://developer.wordpress.org/reference/hooks/the_content/

how to skip images with certain class in wordpress function

I have the following function in my theme's function page. basically what it does is look for any image in the post page and add some spans with css to dynamically create a pinterest button.
function insert_pinterest($content) {
global $post;
$posturl = urlencode(get_permalink()); //Get the post URL
$pinspan = '<span class="pinterest-button">';
$pinurlNew = '<a href="#" onclick="window.open("http://pinterest.com/pin/create/button/?url='.$posturl.'&media=';
$pindescription = '&description='.urlencode(get_the_title());
$options = '","Pinterest","scrollbars=no,menubar=no,width=600,height=380,resizable=yes,toolbar=no,location=no,status=no';
$pinfinish = '");return false;" class="pin-it"></a>';
$pinend = '</span>';
$pattern = '/<img(.*?)src="(.*?).(bmp|gif|jpeg|jpg|png)"(.*?) \/>/i';
$replacement = $pinspan.$pinurlNew.'$2.$3'.$pindescription.$options.$pinfinish.'<img$1src="$2.$3" $4 />'.$pinend;
$content = preg_replace( $pattern, $replacement, $content );
//Fix the link problem
$newpattern = '/<a(.*?)><span class="pinterest-button"><a(.*?)><\/a><img(.*?)\/><\/span><\/a>/i';
$replacement = '<span class="pinterest-button"><a$2></a><a$1><img$3\/></a></span>';
$content = preg_replace( $newpattern, $replacement, $content );
return $content;
}
add_filter( 'the_content', 'insert_pinterest' );
it does everything just fine. but is there a way to have it skip over an image with a certain class name in it like "noPin" ?

I would use preg_replace_callback to check if a matched image contains noPin.
function skipNoPin($matches){
if ( strpos($matches[0], "noPin") === false){
return $pinspan.$pinurlNew.'$matches[2].$matches[3]'.$pindescription.$options.$pinfinish.'<img$1src="$2.$3" $4 />'.$pinend;
} else {
return $matches[0]
$content = preg_replace_callback(
$pattern,
skipNoPin,
$content );
Another image attribute could conceivably contain noPin, if you are concerned about that edge case, just make the test in the if statement more specific.

You have to exclude the class noPin from the $pattern regexp :
$pattern = '/<img(.*?)src="(.*?).(bmp|gif|jpeg|jpg|png)"(.*?) \/>/i';
Has to become something like
$pattern = '/<img(.*?)src="(.*?).(bmp|gif|jpeg|jpg|png)"(.*?) (?!class="noPin") \/>/i';
Please check the regexp syntax, but the idea is to exclude class="noPin" from the searched pattern. Then your replacement will not be added to these images.

Php parse links/emails

I am wondering if there is a simple snippet which converts links of any kind:
http://www.cnn.com to http://www.cnn.com
cnn.com to cnn.com
www.cnn.com to www.cnn.com
abc#def.com to to mailto:abc#def.com
I do not want to use any PHP5 specific library.
Thank you for your time.
UPDATE I have updated the above text to what i want to convert it to. Please note that the href tag and the text are different for case 2 and 3.
UPDATE2 Hows does gmail chat do it? Theirs is pretty smart and works only for real domains names. e.g. a.ly works but a.cb does not work.

yes ,
http://www.gidforums.com/t-1816.html
<?php
/**
NAME : autolink()
VERSION : 1.0
AUTHOR : J de Silva
DESCRIPTION : returns VOID; handles converting
URLs into clickable links off a string.
TYPE : functions
======================================*/
function autolink( &$text, $target='_blank', $nofollow=true )
{
// grab anything that looks like a URL...
$urls = _autolink_find_URLS( $text );
if( !empty($urls) ) // i.e. there were some URLS found in the text
{
array_walk( $urls, '_autolink_create_html_tags', array('target'=>$target, 'nofollow'=>$nofollow) );
$text = strtr( $text, $urls );
}
}
function _autolink_find_URLS( $text )
{
// build the patterns
$scheme = '(http:\/\/|https:\/\/)';
$www = 'www\.';
$ip = '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}';
$subdomain = '[-a-z0-9_]+\.';
$name = '[a-z][-a-z0-9]+\.';
$tld = '[a-z]+(\.[a-z]{2,2})?';
$the_rest = '\/?[a-z0-9._\/~#&=;%+?-]+[a-z0-9\/#=?]{1,1}';
$pattern = "$scheme?(?(1)($ip|($subdomain)?$name$tld)|($www$name$tld))$the_rest";
$pattern = '/'.$pattern.'/is';
$c = preg_match_all( $pattern, $text, $m );
unset( $text, $scheme, $www, $ip, $subdomain, $name, $tld, $the_rest, $pattern );
if( $c )
{
return( array_flip($m[0]) );
}
return( array() );
}
function _autolink_create_html_tags( &$value, $key, $other=null )
{
$target = $nofollow = null;
if( is_array($other) )
{
$target = ( $other['target'] ? " target=\"$other[target]\"" : null );
// see: http://www.google.com/googleblog/2005/01/preventing-comment-spam.html
$nofollow = ( $other['nofollow'] ? ' rel="nofollow"' : null );
}
$value = "<a href=\"$key\"$target$nofollow>$key</a>";
}
?>

Try this out. (for links not email)
$newTweet = preg_replace('!http://([a-zA-Z0-9./-]+[a-zA-Z0-9/-])!i', '\\0', $tweet->text);

I know is 5 years late, however I needed a similar solution and the best answer I got was from the user - erwan-dupeux-maire
Answer
I write this function. It replaces all the links in a string. Links can be in the following formats :
www.example.com
http://example.com
https://example.com
example.fr
The second argument is the target for the link ('_blank', '_top'... can be set to false). Hope it helps...
public static function makeLinks($str, $target='_blank')
{
if ($target)
{
$target = ' target="'.$target.'"';
}
else
{
$target = '';
}
// find and replace link
$str = preg_replace('#((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)#', '<a href="$1" '.$target.'>$1</a>', $str);
// add "http://" if not set
$str = preg_replace('/<a\s[^>]*href\s*=\s*"((?!https?:\/\/)[^"]*)"[^>]*>/i', '<a href="http://$1" '.$target.'>', $str);
return $str;
}

Here's the email snippet:
$email = "abc#def.com";
$pos = strrpos($email, "#");
if (!$pos === false) {
// This is an email address!
$email .= "mailto:" . $email;
}
What exactly are you looking to do with the links? strip the www or http? or add http://www to any link if required?

Find URLs, #replies and #hashtags from Tweets

I'm building a little Twitter thing in PHP and I'm trying to parse URLs, #replies and #hashtags and make them into clickable links.
The #replies would link to http://twitter.com/replies
Hashtags would like to http://search.twitter.com/search?q=%23hashtags
I've found a class for parsing URLs and I'm wondering if this could also be used to parse #replies and #hashtags as well:
// http://josephscott.org/archives/2008/11/makeitlink-detecting-urls-in-text-and-making-them-links/
class MakeItLink {
protected function _link_www( $matches ) {
$url = $matches[2];
$url = MakeItLink::cleanURL( $url );
if( empty( $url ) ) {
return $matches[0];
}
return "{$matches[1]}<a href='{$url}'>{$url}</a>";
}
public function cleanURL( $url ) {
if( $url == '' ) {
return $url;
}
$url = preg_replace( "|[^a-z0-9-~+_.?#=!&;,/:%#$*'()x80-xff]|i", '', $url );
$url = str_replace( array( "%0d", "%0a" ), '', $url );
$url = str_replace( ";//", "://", $url );
/* If the URL doesn't appear to contain a scheme, we
* presume it needs http:// appended (unless a relative
* link starting with / or a php file).
*/
if(
strpos( $url, ":" ) === false
&& substr( $url, 0, 1 ) != "/"
&& !preg_match( "|^[a-z0-9-]+?.php|i", $url )
) {
$url = "http://{$url}";
}
// Replace ampersans and single quotes
$url = preg_replace( "|&([^#])(?![a-z]{2,8};)|", "&$1", $url );
$url = str_replace( "'", "'", $url );
return $url;
}
public function transform( $text ) {
$text = " {$text}";
$text = preg_replace_callback(
'#(?<=[\s>])(\()?([\w]+?://(?:[\w\\x80-\\xff\#$%&~/\-=?#\[\](+]|[.,;:](?![\s<])|(?(1)\)(?![\s<])|\)))*)#is',
array( 'MakeItLink', '_link_www' ),
$text
);
$text = preg_replace( '#(<a( [^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i', "$1$3</a>", $text );
$text = trim( $text );
return $text;
}
}

I think what you're looking to do is essentially what I've included below. You'd add these two statements in transform method, just before the return statement.
$text = preg_replace('##(\w+)#', '$0', $text);
$text = preg_replace('/#(\w+)/', '$0', $text);
Is that what you're looking for?

Twitter recently released to open source both java and ruby (gem) implementations of the code they use for finding user names, hash tags, lists and urls.
It is very regular expression oriented.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP regex bug not respecting linebreak - php

Related

str_replace doesn't replace accurately

Wordpress: Automatically change specific URLs in posts

how to skip images with certain class in wordpress function

Php parse links/emails

Find URLs, #replies and #hashtags from Tweets

Categories

Resources