Implementing web address regular expression - php

I found the following online but I'm having trouble implementing it
(http|ftp|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#])?
This is what I want the php to do:
Take the following : Look here: http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php
And turn it into: Look here: http://www.rocketlanguages.com/span...anish_accents.php
If the URL is long then the a text gets broken down with a ... in the middle

Try this:
// URL regex from here:
// http://daringfireball.net/2010/07/improved_regex_for_matching_urls
define( 'URL_REGEX', <<<'_END'
~(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))~
_END
);
// PHP 5.3 or higher, can use closures (anonymous functions)
function replace_urls_with_anchor_tags( $string,
$length = 50,
$elision_string = '...' ) {
$replace_function = function( $matches ) use ( $length, $elision_string) {
$matched_url = $matches[ 0 ];
return '<a href="' . $matched_url . '">' .
abbreviated_url( $matched_url, $length, $elision_string ) .
'</a>';
};
return preg_replace_callback(
URL_REGEX,
$replace_function,
$string
);
}
function abbreviated_url( $url, $length = 50, $elision_string = '...' ) {
if ( strlen( $url ) <= $length ) {
return $url;
}
$width_either_side = (int) ( ( $length - strlen( $elision_string ) ) / 2 );
$left = substr( $url, 0, $width_either_side );
$right = substr( $url, strlen( $url ) - $width_either_side );
return $left . $elision_string . $right;
}
(The backtick in the URL_REGEX definition confuses stackoverflow.com's syntax highlighting, but it's nothing to be concerned about)
The function replace_urls_with_anchor_tags takes a string and changes all the URLs matched within to anchor tags, shortening long URLs by eliding with ellipses. The function takes optional length and elision_string arguments in case you wish to play around with the length and change the ellipses to something else.
Here's a usage example:
// Test it out
$test = <<<_END
Look here:
http://www.rocketlanguages.com/spanish/resources/pronunciation_spanish_accents.php
And here:
http://stackoverflow.com/questions/12385770/implementing-web-address-regular-expression
_END;
echo replace_urls_with_anchor_tags( $test, 50, '...' );
// OUTPUT:
// Look here:
// http://www.rocketlangua...ion_spanish_accents.php
//
// And here:
// http://stackoverflow.co...ress-regular-expression
Note that if you are using PHP 5.2 or lower you must rewrite replace_urls_with_anchor_tags to use create_function instead of closures. Closures were not introduced until PHP 5.3:
// No closures in PHP 5.2, must use create_function()
function replace_urls_with_anchor_tags( $string,
$length = 50,
$elision_string = '...' ) {
$replace_function = create_function(
'$matches',
'return "<a href=\"$matches[0]\">" .
abbreviated_url( $matches[ 0 ], ' .
$length . ', ' .
'"' . $elision_string . '"' .
') . "</a>";'
);
return preg_replace_callback(
URL_REGEX,
$replace_function,
$string
);
}
Note that I replaced the URL regex you had found with one linked to on the page DaveRandom referred to in his comment. It's more complete, and in fact there is actually a mistake in the regex you were using -- a couple of '/' characters are not escaped (in here: [\w\-\.,#?^=%&:/~\+#]*[\w\-\#?^=%&/~\+#]). Also, it doesn't detect port numbers like 80 or 8080.
Hope this helps.

I am using this Regular expression and it is working fine for me, try this if you want
(http|https|ftp):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?

Related

Migrate create_function() which is not supported since PHP 7.2

I have 'create_function() in my PHP code:
function encode_code_in_comment( $source ) { $encoded = preg_replace_callback( '/\[(php|html|javascript|css|nginx|apache|terminal)\](.*?)\[\/\1\]/ims',
create_function(
'$matches',
'$matches[2] = preg_replace(
array("/^[\r|\n]+/i", "/[\r|\n]+$/i"), "",
$matches[2]);
return "<pre class=\"language-" . $matches[1] . "\"><code>" . esc_html( $matches[2] ) . "</code></pre>";'
),
$source );
if ( $encoded ) {
return $encoded;
} else {
return $source;
}}
I know that there are duplicates threads about the subject, but nevertheless, i'm really struggling to covert this to an anonymous function. How do i rewrite it?
Your main problem is that your code is badly formatted, making it hard to see where the create_function call begins and ends; here it is with some more logical linebreaks and indents:
function encode_code_in_comment( $source ) {
$encoded = preg_replace_callback(
'/\[(php|html|javascript|css|nginx|apache|terminal)\](.*?)\[\/\1\]/ims',
create_function(
'$matches',
'
$matches[2] = preg_replace(
array("/^[\r|\n]+/i", "/[\r|\n]+$/i"),
"",
$matches[2]
);
return "<pre class=\"language-" . $matches[1] . "\"><code>" . esc_html( $matches[2] ) . "</code></pre>";
'
),
$source
);
if ( $encoded ) {
return $encoded;
} else {
return $source;
}
}
From this and the documentation of create_function, we can see that the created function needs one argument, $matches, and to have a body starting $matches[2] = and ending </pre>";
Looking at the manual for anonymous functions we see that the new syntax is function(arguments) { body }, so instead of:
create_function('$matches', ... )
you want:
function($matches) { ... }
and in between, instead of:
'
$matches[2] = ...
...
... </pre>";
'
you want to just remove the quotes and leave the code:
$matches[2] = ...
...
... </pre>";
The body is in single quotes, and there are no escaped single quotes in there, so the code doesn't need any other changes.

Wordpress: Automatically change specific URLs in posts

I have found a solution to change links in my wordpress theme, but not the links in the content. How is it possible to get the URL in the content, so I can also changed them?
I need to use the content filter. But how is it possible to change URLs like apple.com/test/ apple.com/test-123/, apple.com, microsoft.com, microsoft.com/test/. The function should also change correctly every matched URL in the content.
add_filter('the_content ', 'function_name');
The answer of a similiar question unfortunately doesn't work.
This is my working solution to change links, but not the links in the content.
add_filter('rh_post_offer_url_filter', 'link_change_custom');
function link_change_custom($offer_post_url){
$shops= array(
array('shop'=>'apple.com','id'=>'1234'),
array('shop'=>'microsoft.com','id'=>'5678'),
array('shop'=>'dell.com','id'=>'9876'),
);
foreach( $shops as $rule ) {
if (!empty($offer_post_url) && strpos($offer_post_url, $rule['shop']) !== false) {
$offer_post_url = 'https://www.network.com/promotion/click/id='.$rule['id'].'-yxz?param0='.rawurlencode($offer_post_url);
}
}
$shops2= array(
array('shop'=>'example.com','id'=>'1234'),
array('shop'=>'domain2.com','id'=>'5678'),
array('shop'=>'domain3','id'=>'9876'),
);
foreach( $shops2 as $rule ) {
if (!empty($offer_post_url) && strpos($offer_post_url, $rule['shop']) !== false) {
$offer_post_url = 'https://www.second-network.com/promotion/click/id='.$rule['id'].'-yxz?param0='.rawurlencode($offer_post_url);
}
}
return $offer_post_url;
}
If I understood you correctly, that is what you need
add_filter( 'the_content', 'replace_links_by_promotions' );
function replace_links_by_promotions( $content ) {
$shop_ids = array(
'apple.com' => '1234',
'microsoft.com' => '5678',
'dell.com' => '9876',
);
preg_match_all( '/https?:\/\/(www\.)?([-a-zA-Z0-9#:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6})\b([-a-zA-Z0-9()#:%_\+.~#?&\/=]*)/', $content, $matches, PREG_OFFSET_CAPTURE );
foreach ( $matches[2] as $index => $match ) {
if ( ! isset( $shop_ids[ $match[0] ] ) ) {
continue;
}
$offer_post_url = 'https://www.network.com/promotion/click/id=' . $shop_ids[ $match[0] ] . '-yxz?param0=' . rawurlencode( $matches[0][ $index ][0] );
$content = substr_replace( $content, $offer_post_url, $matches[0][ $index ][1], strlen( $matches[0][ $index ][0] ) );
}
return $content;
}
I think this works. Note that, as written, it will match every "apple.", "dell.", and "microsoft." link in every type of content that uses the content filter - posts, pages, excerpts, many custom post types, etc. - so, if you don't really want that, and you very well may not, then the main replacement function will have to be conditionalized, and the regex function more precisely targeted..., and that can get complicated.
(Also, come to think of it, I'm not sure whether the quotes in the anchor tags that the Regex finds will require special handling. If this doesn't work, we can look at that, too. Or maybe switch to a DOM parser, like maybe I should have started out by doing... )
/** INITIATE FILTER FUNCTION **/
add_filter( 'the_content', 'wpso_change_urls' ) ;
/**
* PREG CALLBACK FUNCTION
* Match Matches to id #s
* and return replacement urls enclosed in quotes (as found)
*/
function wpso_found_urls( $matches ) {
//someone else probably has a v clever parsimonious way to do this next part
//but at least this makes what's happening easy to read
if ( strpos( $matches[0], 'apple' ) ) {
$id = '1234' ;
}
if ( strpos( $matches[0], 'microsoft' ) ) {
$id = '5678' ;
}
if ( strpos( $matches[0], 'dell' ) ) {
$id = '9876' ;
}
$raw_url = trim( $matches[0], '"' ) ;
return '"https://www.network.com/promotion/click/id='. $id .'-yxz?param0='.rawurlencode( $raw_url) . '"' ;
}
/** ENDURING A DREADFUL FATE USING REGEX TO PARSE HTML **/
function wpso_change_urls( $content ) {
$find_urls = array(
'/"+(http|https)(\:\/\/\S*apple.\S*")/',
'/"+(http|https)(\:\/\/\S*microsoft.\S*")/',
'/"+(http|https)(\:\/\/\S*dell.\S*")/',
);
return preg_replace_callback( $find_urls, 'wpso_found_urls', $content ) ;
}
Returning (note: example prior to trimming quotes from the "raw URL" before encoded):
...from original (post editor) content like this:
Might try using something like the_content filter to do this:
add_filter('the_content', function($content){
// filter $content and replace urls
$content = str_replace('http://old-url', 'http://new-url', $content);
return $content;
});
More: https://developer.wordpress.org/reference/hooks/the_content/

PHP preg_match ignore within certain elements

I'm writing a regex where I need to filter content to format it's typography. So far, my code seems to be filtering out my content properly using preg_replace, but I can't figure out how to avoid this for content wrapped within certain tags, say <pre>.
As a reference, this is to be used within WordPress's the_content filter, so my current code looks like so:
function my_typography( $str ) {
$ignore_elements = array("code", "pre");
$rules = array(
"?" => array("before"=> " ", "after"=>""),
// the others are stripped out for simplicity
);
foreach($rules as $rule=>$params) {
// Pseudo :
// if( !in_array( $parent_tag, $ignore_elements) {
// /Pseudo
$formatted = $params['before'] . $rule . $params['after'];
$str = preg_replace( $rule, $formatted, $str );
// Pseudo :
// }
// /Pseudo
}
return $str;
}
add_filter( 'the_content', 'my_typography' );
Basically:
<p>Was this filtered? I hope so</p>
<pre>Was this filtered? I hope not.</pre>
should become
<p>Was this filtered ? I hope so</p>
<pre>Was this filtered? I hope not.</pre>
You need to wrap search regex with regex delimiter in preg_replace and must call preg_quote to escape all special regex characters such ?, ., *, + etc:
$str = preg_replace( '~' . preg_quote($rule, '~') . '~', $formatted, $str );
Full Code:
function my_typography( $str ) {
$ignore_elements = array("code", "pre");
$rules = array(
"?" => array("before"=> " ", "after"=>""),
// the others are stripped out for simplicity
);
foreach($rules as $rule=>$params) {
// Pseudo :
// if( !in_array( $parent_tag, $ignore_elements) {
// /Pseudo
$formatted = $params['before'] . $rule . $params['after'];
$str = preg_replace( '~' . preg_quote($rule, '~') . '~', $formatted, $str );
// Pseudo :
// }
// /Pseudo
}
return $str;
}
Output:
<p>Was this filtered ? I hope so</p>
<pre>Was this filtered ? I hope not.</pre>

Two questions regarding regular expressions

I currently use this:
$text = preg_replace('/' . $line . '/', '[x]\\0[/x]', $text);
$line is a simple regular expression:
https?://(?:.+?\.)?dailymotion\.com/video/[A-Za-z0-9]+
This is working fine so far. But there are two things that I need and I can't figure out, how to do that:
... I don't want to perform the replacement, if that string is contained within a BBCode i.e.
[bla]http://www.dailymotion.com/video/xuams9[/bla]
or
[bla=http://www.dailymotion.com/video/xuams9]trololo[/bla]
or
[bla='http://www.dailymotion.com/video/xuams9']http://www.dailymotion.com/video/xuams9[/bla]
The 2nd thing is, that I just want to match until the first space. This is what I currently use:
$text = preg_replace('/' . $line . '(?:[^ ]+)?/', '[x]\\0[/x]', $text);
I don't know, if I should do it like this or if there's a better way.
So, basically i'm just trying to match
http://www.dailymotion.com/video/test4
from this:
[tagx='http://www.dailymotion.com/video/test1']http://www.dailymotion.com/video/test2[/tagx] | [tagy]Hello http://www.dailymotion.com/video/test3 World[/tagy] | [tagz]Hello World[/tagz] http://www.dailymotion.com/video/test4
EDIT:
This is, what i have so far (which works slightly):
(?:(?<!(\[\/url\]|\[\/url=))(\s|^))' . $line . '(?:[^ ]+)(?:(?<![[:punct:]])(\s|\.?$))?
You can use a lookbehind assertions to do this.
http://php.net/manual/en/regexp.reference.assertions.php
By using the following lookbehind before $line
(?<!\[bla]|\[bla=|\[bla=')
it will match $link that is not starting with [bla], [bla= and [bla='.
→ Try this:
$text = array();
$text[ 0 ] = "[bla]http://www.dailymotion.com/video/xuams9[/bla]";
$text[ 1 ] = "[bla=http://www.dailymotion.com/video/xuams9]trololo[/bla]";
$text[ 2 ] = "http://www.dailymotion.com/video/xuams9";
$text[ 3 ] = "A http://www.dailymotion.com/video/xuams9 B C";
$line = "/http:\/\/www.dailymotion\.com\/video\/[A-Za-z0-9]+/";
$tag = array();
$tag[ 0 ] = "/\[[A-Za-z]{1,12}\]http:\/\/www.dailymotion\.com\/video\/[A-Za-z0-9]+\[\/[A-Za-z]{1,12}\]/";
$tag[ 1 ] = "/\[[A-Za-z]{1,12}=http:\/\/www.dailymotion\.com\/video\/[A-Za-z0-9]+\][A-Za-z0-9]{0,}\[\/[A-Za-z]{1,12}\]/";
foreach( $text as $k=>$v ) {
if( preg_match( $tag[ 0 ], $v ) == false && preg_match( $tag[ 1 ], $v ) == false ) {
echo '!';
$output = preg_replace( $line, '[x]\\0[/x]', $v );
}
else { $output = $v; };
echo "Text #" . ( $k + 1 ) . ": {$output}<br />";
}
Result:
Text #1: [bla]http://www.dailymotion.com/video/xuams9[/bla]
Text #2: [bla=http://www.dailymotion.com/video/xuams9]trololo[/bla]
!Text #3: [x]http://www.dailymotion.com/video/xuams9[/x]
!Text #4: A [x]http://www.dailymotion.com/video/xuams9[/x] B C

Php parse links/emails

I am wondering if there is a simple snippet which converts links of any kind:
http://www.cnn.com to http://www.cnn.com
cnn.com to cnn.com
www.cnn.com to www.cnn.com
abc#def.com to to mailto:abc#def.com
I do not want to use any PHP5 specific library.
Thank you for your time.
UPDATE I have updated the above text to what i want to convert it to. Please note that the href tag and the text are different for case 2 and 3.
UPDATE2 Hows does gmail chat do it? Theirs is pretty smart and works only for real domains names. e.g. a.ly works but a.cb does not work.
yes ,
http://www.gidforums.com/t-1816.html
<?php
/**
NAME : autolink()
VERSION : 1.0
AUTHOR : J de Silva
DESCRIPTION : returns VOID; handles converting
URLs into clickable links off a string.
TYPE : functions
======================================*/
function autolink( &$text, $target='_blank', $nofollow=true )
{
// grab anything that looks like a URL...
$urls = _autolink_find_URLS( $text );
if( !empty($urls) ) // i.e. there were some URLS found in the text
{
array_walk( $urls, '_autolink_create_html_tags', array('target'=>$target, 'nofollow'=>$nofollow) );
$text = strtr( $text, $urls );
}
}
function _autolink_find_URLS( $text )
{
// build the patterns
$scheme = '(http:\/\/|https:\/\/)';
$www = 'www\.';
$ip = '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}';
$subdomain = '[-a-z0-9_]+\.';
$name = '[a-z][-a-z0-9]+\.';
$tld = '[a-z]+(\.[a-z]{2,2})?';
$the_rest = '\/?[a-z0-9._\/~#&=;%+?-]+[a-z0-9\/#=?]{1,1}';
$pattern = "$scheme?(?(1)($ip|($subdomain)?$name$tld)|($www$name$tld))$the_rest";
$pattern = '/'.$pattern.'/is';
$c = preg_match_all( $pattern, $text, $m );
unset( $text, $scheme, $www, $ip, $subdomain, $name, $tld, $the_rest, $pattern );
if( $c )
{
return( array_flip($m[0]) );
}
return( array() );
}
function _autolink_create_html_tags( &$value, $key, $other=null )
{
$target = $nofollow = null;
if( is_array($other) )
{
$target = ( $other['target'] ? " target=\"$other[target]\"" : null );
// see: http://www.google.com/googleblog/2005/01/preventing-comment-spam.html
$nofollow = ( $other['nofollow'] ? ' rel="nofollow"' : null );
}
$value = "<a href=\"$key\"$target$nofollow>$key</a>";
}
?>
Try this out. (for links not email)
$newTweet = preg_replace('!http://([a-zA-Z0-9./-]+[a-zA-Z0-9/-])!i', '\\0', $tweet->text);
I know is 5 years late, however I needed a similar solution and the best answer I got was from the user - erwan-dupeux-maire
Answer
I write this function. It replaces all the links in a string. Links can be in the following formats :
www.example.com
http://example.com
https://example.com
example.fr
The second argument is the target for the link ('_blank', '_top'... can be set to false). Hope it helps...
public static function makeLinks($str, $target='_blank')
{
if ($target)
{
$target = ' target="'.$target.'"';
}
else
{
$target = '';
}
// find and replace link
$str = preg_replace('#((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)#', '<a href="$1" '.$target.'>$1</a>', $str);
// add "http://" if not set
$str = preg_replace('/<a\s[^>]*href\s*=\s*"((?!https?:\/\/)[^"]*)"[^>]*>/i', '<a href="http://$1" '.$target.'>', $str);
return $str;
}
Here's the email snippet:
$email = "abc#def.com";
$pos = strrpos($email, "#");
if (!$pos === false) {
// This is an email address!
$email .= "mailto:" . $email;
}
What exactly are you looking to do with the links? strip the www or http? or add http://www to any link if required?

Categories