I am working on a multilingual wordpress website where I need to append a loclale in all internal url in a web page.
So I have all of the content of the webpage in the $content variable. now
preg_match_all( '/<a\s+(?:[^>]*?\s+)?href=([\"\'])(.*?)\1/', $content, $matches );
$localized_url_arr = [];
$url_arr = [];
if ( ! empty( $matches[2] ) ) {
$current_locale = get_current_locale();
foreach ( $matches[2] as $url ) {
if ( preg_match( '/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/si', $url ) ) {
continue;
}
$new_url = add_locale_to_url( $url, $current_locale ); // this is adding locale to url eg => www.example.com --> www.example.com/us for us locale
if ( $new_url !== $url ) {
$localized_url_arr[] = [
$url => $new_url
];
}
}
}
$arr = array_merge(...$localized_url_arr);
$content = str_replace( array_keys($arr), array_values($arr), $content );
now Ideally this function should replace those to those urls which dosen;t have a locale in them. but it is appending locale in all the url, however the $arr has only those urls which needs to be appended with a locale but my str_replace is appending all urls that we have in matches[2] array.
When you replace www.example.com with www.example.com/us, it will replace it anywhere it appears, even if there's already /us after it.
You can use a regular expression with a negative lookahead to replace a string only if it's not followed by some other pattern.
preg_match_all( '/<a\s+(?:[^>]*?\s+)?href=([\"\'])(.*?)\1/', $content, $matches );
$localized_url_arr = [];
if ( ! empty( $matches[2] ) ) {
$current_locale = get_current_locale();
foreach ( $matches[2] as $url ) {
if ( preg_match( '/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/si', $url ) ) {
continue;
}
$new_url = add_locale_to_url( $url, $current_locale ); // this is adding locale to url eg => www.example.com --> www.example.com/us for us locale
$url_pattern = "#$url(?!/us/#si";
if ( $new_url !== $url ) {
$localized_url_arr[$url] = $new_url
}
}
$content = preg_replace(array_keys($localized_url_arr), array_values($localized_url_arr), $content );
}
The regular expression matches each URL unless it's followed by /us/, and will replace it with $new_url.
Related
Okay, so I've got something of a weird edge case bug that I can't seem to squash.
I've got a textarea form input where users can type status updates. I've built a method to parse through this and autolink http-links (except for a few domains where I use the Essence library to do some oEmbed magic).
But in a very specific edge case the autolink complete buggers out.
Specifically, when there's url to a subdirectory, without an ending slash, where immediately after the url the user does a carriage return to a new line and keeps typing.
When this happens the first word on the new line is included in the url being matched.
The function looks like this:
function autolink( $text, $attributes=array() ) {
$regex = "/(http|https)\:\/\/[a-z0-9\-\.]+\.[a-z0-9]{2,99}(\/\S*)?/i";
$urls = array();
if( preg_match_all( $regex, $text, $urls, PREG_PATTERN_ORDER ) ) {
foreach($urls[0] as $url) {
$parsed_url = parse_url($url);
if( in_array( $parsed_url['host'], array( 'youtube.com', 'vimeo.com', 'soundcloud.com', 'www.youtube.com', 'www.vimeo.com', 'www.soundcloud.com' ) ) ) {
$essence = Essence\Essence::instance();
$media = $essence->embed( $url );
$text = str_replace($url, '<div class="embed-container">'.$media->html.'</div>', $text);
} else {
$attrs = '';
foreach( $attributes as $attribute => $value ) {
$attrs .= " {$attribute}=\"{$value}\"";
}
$text = str_replace($url,'<a href="'.$url.'"'.$attrs.'>'.$url.'</a>', $text);
}
}
}
$text = '<pre>'.print_r($urls, true).'</pre>'.$text;
$text = trim( $text );
return $text;
}
It looks like fopen can't open files with spaces.
For example:
$url = 'http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main 616x200.jpg';
fopen($url, 'r');
returns false (mind the space in the url), but file is accessible by browsers.
I've also tried to escape the url by urlencode and rawurlencode with no luck. How to properly escape the spaces?
You can use this code:
$arr = parse_url ( 'http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main 616x200.jpg' );
$parts = explode ( '/', $arr['path'] );
$fname = $parts[count($parts)-1];
unset($parts[count($parts)-1]);
$url = $arr['scheme'] . '://' . $arr['host'] . join('/', $parts) . '/' . urlencode ( $fname );
var_dump( $url );
Alternative & Shorter Answer (Thanks to #Dziamid)
$url = 'http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main 616x200.jpg';
$parts = pathinfo($url);
$url = $parts['dirname'] . '/' . urlencode($parts['basename']);
var_dump( $url );
OUTPUT:
string(76) "http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main+616x200.jpg"
rawurlencodeis the way to go, but no not escape the full URL. Only escape the filename. So you will end up in http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main%20616x200.jpg
All solutions proposed here are wrong because they don't escape the query string part and the base directory part. Additionally they don't take in consideration user, pass and fragment url parts.
To correctly escape a valid URL you have to separately escape the path parts and the query parts.
So the solution is to extract the url parts, escape each part and rebuild the url.
Here is a simple code snippet:
function safeUrlEncode( $url ) {
$urlParts = parse_url($url);
$urlParts['path'] = safeUrlEncodePath( $urlParts['path'] );
$urlParts['query'] = safeUrlEncodeQuery( $urlParts['query'] );
return http_build_url($urlParts);
}
function safeUrlEncodePath( $path ) {
if( strlen( $path ) == 0 || strpos($path, "/") === false ){
return "";
}
$pathParts = explode( "/" , $path );
return implode( "/", $pathParts );
}
function safeUrlEncodeQuery( $query ) {
$queryParts = array();
parse_str($query, $queryParts);
$queryParts = urlEncodeArrayElementsRecursively( $queryParts );
return http_build_query( $queryParts );
}
function urlEncodeArrayElementsRecursively( $array ){
if( ! is_array( $array ) ) {
return urlencode( $array );
} else {
foreach( $array as $arrayKey => $arrayValue ){
$array[ $arrayKey ] = urlEncodeArrayElementsRecursively( $arrayValue );
}
}
return $array;
}
Usage would simply be:
$encodedUrl = safeUrlEncode( $originalUrl );
Side note
In my code snippet i'm making use of http://php.net/manual/it/function.http-build-url.php which is available under PECL extension. If you don't have PECL extension on your server you can simply include the pure PHP implementation: http://fuelforthefire.ca/free/php/http_build_url/
Cheers :)
$url = 'http://gatewaypeople.com/images/articles/cntrbutnssttmnts12_main 616x200.jpg';
fopen(urlencode($url), 'r');
I need to search a string of words against a dictionary of words(txt file) and capitalize any word that is not found.
I'm trying to split the string into an array of words and check them against the unix /usr/dict/words dictionary. If a match is found for the word it gets lcfirst($word) if no match then ucfirst( $word )
The dictionary is opened and put into an array using fgetcsv (I also tried using fgets and exploding on end of line).
function wnd_title_case( $string ) {
$file = fopen( "/users/chris/sites/wp-dev/trunk/core/words.txt", "rb" );
while ( !feof( $file ) ) {
$line_of_text = fgetcsv( $file );
$exceptions = array( $line_of_text );
}
fclose( $file );
$delimiters = array(" ", "-", "O'");
foreach ( $delimiters as $delimiter ) {
$words = explode( $delimiter, $string );
$newwords = array();
foreach ($words as $word) {
if ( in_array( strtoupper( $word ), $exceptions ) ) {
// check exceptions list for any words that should be lower case
$word = lcfirst( $word );
} elseif ( !in_array( $word, $exceptions ) ) {
// everything else capitalized
$word = ucfirst( $word );
}
array_push( $newwords, $word );
}
$string = join( $delimiter, $newwords );
}
$string = ucfirst( $string );
return $string;
}
I have verified that the file gets opened.
The desired output: Sentence case title string with proper nouns capitalized.
The current output: Title string with every word capitalized
Edit:
Using Jay's answer below I came up with a workable solution. My first problem was that my words dictionary contained both capitalized and non capitalized words so I found a proper names dictionary to to check against using a regex callback. It's not perfect but gets it right most of the time.
function title_case( $string ) {
$fp = #fopen( THEME_DIR. "/_/inc/propernames", "r" );
$exceptions = array();
if ( $fp ) {
while( !feof($fp) ) {
$buffer = fgets( $fp );
array_push( $exceptions, trim($buffer) );
}
}
fclose( $fp );
$content = strtolower( $string );
$pattern = '~\b' . implode ( '|', $exceptions ) . '\b~i';
$content = preg_replace_callback ( $pattern, 'regex_callback', $content );
$new_content = $content;
return ucfirst( $new_content );
}
function regex_callback ( $data ) {
if ( strlen( $data[0] ) > 3 )
return ucfirst( strtolower( $data[0] ));
else return ( $data[0] );
}
The simplest way to do this with regex is to do the following
convert your text to all uppercase first letters $content = ucwords($original_content);
Using your array of words in the dictionary, create a regex by imploding all your words with a pipe character |, and surrounding it with boundary markers and delimiters followed by the case insensitive flag, so you would end up with ~\bword1|word2|word3\b~i (obviously with your large list)
create a function to lower the matched value using strtolower to be used with preg_replace_callback
An example of a working demo is this
function regex_callback($data) {
return strtolower($data[0]);
}
$original_content = 'hello my name is jay gilford';
$words = array('hello', 'my', 'name', 'is');
$content = ucwords($original_content);
$pattern = '~\b' . implode('|', $words) . '\b~i';
$content = preg_replace_callback($pattern, 'regex_callback', $content);
echo $content;
You could also optionally use strtolower to begin with on the content for consistency. The above code outputs hello my name is Jay Gilford
I am wondering if there is a simple snippet which converts links of any kind:
http://www.cnn.com to http://www.cnn.com
cnn.com to cnn.com
www.cnn.com to www.cnn.com
abc#def.com to to mailto:abc#def.com
I do not want to use any PHP5 specific library.
Thank you for your time.
UPDATE I have updated the above text to what i want to convert it to. Please note that the href tag and the text are different for case 2 and 3.
UPDATE2 Hows does gmail chat do it? Theirs is pretty smart and works only for real domains names. e.g. a.ly works but a.cb does not work.
yes ,
http://www.gidforums.com/t-1816.html
<?php
/**
NAME : autolink()
VERSION : 1.0
AUTHOR : J de Silva
DESCRIPTION : returns VOID; handles converting
URLs into clickable links off a string.
TYPE : functions
======================================*/
function autolink( &$text, $target='_blank', $nofollow=true )
{
// grab anything that looks like a URL...
$urls = _autolink_find_URLS( $text );
if( !empty($urls) ) // i.e. there were some URLS found in the text
{
array_walk( $urls, '_autolink_create_html_tags', array('target'=>$target, 'nofollow'=>$nofollow) );
$text = strtr( $text, $urls );
}
}
function _autolink_find_URLS( $text )
{
// build the patterns
$scheme = '(http:\/\/|https:\/\/)';
$www = 'www\.';
$ip = '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}';
$subdomain = '[-a-z0-9_]+\.';
$name = '[a-z][-a-z0-9]+\.';
$tld = '[a-z]+(\.[a-z]{2,2})?';
$the_rest = '\/?[a-z0-9._\/~#&=;%+?-]+[a-z0-9\/#=?]{1,1}';
$pattern = "$scheme?(?(1)($ip|($subdomain)?$name$tld)|($www$name$tld))$the_rest";
$pattern = '/'.$pattern.'/is';
$c = preg_match_all( $pattern, $text, $m );
unset( $text, $scheme, $www, $ip, $subdomain, $name, $tld, $the_rest, $pattern );
if( $c )
{
return( array_flip($m[0]) );
}
return( array() );
}
function _autolink_create_html_tags( &$value, $key, $other=null )
{
$target = $nofollow = null;
if( is_array($other) )
{
$target = ( $other['target'] ? " target=\"$other[target]\"" : null );
// see: http://www.google.com/googleblog/2005/01/preventing-comment-spam.html
$nofollow = ( $other['nofollow'] ? ' rel="nofollow"' : null );
}
$value = "<a href=\"$key\"$target$nofollow>$key</a>";
}
?>
Try this out. (for links not email)
$newTweet = preg_replace('!http://([a-zA-Z0-9./-]+[a-zA-Z0-9/-])!i', '\\0', $tweet->text);
I know is 5 years late, however I needed a similar solution and the best answer I got was from the user - erwan-dupeux-maire
Answer
I write this function. It replaces all the links in a string. Links can be in the following formats :
www.example.com
http://example.com
https://example.com
example.fr
The second argument is the target for the link ('_blank', '_top'... can be set to false). Hope it helps...
public static function makeLinks($str, $target='_blank')
{
if ($target)
{
$target = ' target="'.$target.'"';
}
else
{
$target = '';
}
// find and replace link
$str = preg_replace('#((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)#', '<a href="$1" '.$target.'>$1</a>', $str);
// add "http://" if not set
$str = preg_replace('/<a\s[^>]*href\s*=\s*"((?!https?:\/\/)[^"]*)"[^>]*>/i', '<a href="http://$1" '.$target.'>', $str);
return $str;
}
Here's the email snippet:
$email = "abc#def.com";
$pos = strrpos($email, "#");
if (!$pos === false) {
// This is an email address!
$email .= "mailto:" . $email;
}
What exactly are you looking to do with the links? strip the www or http? or add http://www to any link if required?
I'm building a little Twitter thing in PHP and I'm trying to parse URLs, #replies and #hashtags and make them into clickable links.
The #replies would link to http://twitter.com/replies
Hashtags would like to http://search.twitter.com/search?q=%23hashtags
I've found a class for parsing URLs and I'm wondering if this could also be used to parse #replies and #hashtags as well:
// http://josephscott.org/archives/2008/11/makeitlink-detecting-urls-in-text-and-making-them-links/
class MakeItLink {
protected function _link_www( $matches ) {
$url = $matches[2];
$url = MakeItLink::cleanURL( $url );
if( empty( $url ) ) {
return $matches[0];
}
return "{$matches[1]}<a href='{$url}'>{$url}</a>";
}
public function cleanURL( $url ) {
if( $url == '' ) {
return $url;
}
$url = preg_replace( "|[^a-z0-9-~+_.?#=!&;,/:%#$*'()x80-xff]|i", '', $url );
$url = str_replace( array( "%0d", "%0a" ), '', $url );
$url = str_replace( ";//", "://", $url );
/* If the URL doesn't appear to contain a scheme, we
* presume it needs http:// appended (unless a relative
* link starting with / or a php file).
*/
if(
strpos( $url, ":" ) === false
&& substr( $url, 0, 1 ) != "/"
&& !preg_match( "|^[a-z0-9-]+?.php|i", $url )
) {
$url = "http://{$url}";
}
// Replace ampersans and single quotes
$url = preg_replace( "|&([^#])(?![a-z]{2,8};)|", "&$1", $url );
$url = str_replace( "'", "'", $url );
return $url;
}
public function transform( $text ) {
$text = " {$text}";
$text = preg_replace_callback(
'#(?<=[\s>])(\()?([\w]+?://(?:[\w\\x80-\\xff\#$%&~/\-=?#\[\](+]|[.,;:](?![\s<])|(?(1)\)(?![\s<])|\)))*)#is',
array( 'MakeItLink', '_link_www' ),
$text
);
$text = preg_replace( '#(<a( [^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i', "$1$3</a>", $text );
$text = trim( $text );
return $text;
}
}
I think what you're looking to do is essentially what I've included below. You'd add these two statements in transform method, just before the return statement.
$text = preg_replace('##(\w+)#', '$0', $text);
$text = preg_replace('/#(\w+)/', '$0', $text);
Is that what you're looking for?
Twitter recently released to open source both java and ruby (gem) implementations of the code they use for finding user names, hash tags, lists and urls.
It is very regular expression oriented.