I use this function to make URLs to clickable links but the problem is that when there is some Unicode character in the URL it becomes clickable links only before that character...
Function:
function clickable($text) {
$text = eregi_replace('(((f|ht){1}tp://)[-a-zA-Z0-9#:%_\+.~#?&//=]+)',
'<a class="und" href="\\1">\\1</a>', $text);
$text = eregi_replace('([[:space:]()[{}])(www.[-a-zA-Z0-9#:%_\+.~#?&//=]+)',
'\\1\\2', $text);
$text = eregi_replace('([_\.0-9a-z-]+#([0-9a-z][0-9a-z-]+\.)+[a-z]{2,3})',
'\\1', $text);
return $text;
}
How to fix this problem?
First of all, don't use eregi_replace. I don't think it's possible to use it with unicode - and it's depreciated from php 5.3. Use preg_replace.
Then you can try something like that
preg_replace("/(https?|ftps?|mailto):\/\/([-\w\p{L}\.]+)+(:\d+)?(\/([\w\p{L}\/_\.#]*(\?\S+)?)?)?/u", '$0
EDIT - updated expression to include # character
Try using \p{L} instead of a-zA-Z and \p{Ll} instead of a-z
You can find details of unicode handling in regular expressions here
And get in the habit of using the preg functions rather than the deprecated ereg functions
Related
I receive a plain description text for some Instagram posts, and try to highlight a hashtags. I use this code:
$caption = preg_replace('/(?<!\S)#([0-9a-zA-Z_.]+)/', '#$1', $caption);
The problem is: this doesn't work with non-latin characters, like "ş" or "ö". And the second problem: this doesn't work with hashtags without space between, for example, "#quote#quoteoftheday #myquote" - my regular expression will highlight only "#quote" and "#myquote". May I somehow solve these problems in just single regular expression?
You could remove the (?<!\S) part so it can also match when there is a non whitespace char in from of # and add the unicode flag /u.
You can shorten 0-9a-zA-Z_ to \w so your expression might look like:
#([\w.]+)
Regex demo
$caption = "#quote#öquoteoftheday #şmyquote";
$caption = preg_replace('/#([\w.]+)/u', '#$1', $caption);
echo $caption;
Result:
#quote#öquoteoftheday #şmyquote
Using following code to convert URLs within a text into real HTML links:
$string = preg_replace("!(http|https|ftp|ftps)://([.]?[&;%#:=a-zA-Z0-9_/?-])*!", "\\0", $string);
$string = preg_replace("!(^| |\n)(www([.]?[&;%#:=a-zA-Z0-9_/?-])+)!", "\\1\\2", $string);
Now there is a problem with URLs containing the # symbol (and perhaps other special chars too) and I would need the urlencode() function somehow within this preg_replace() function.
Any ideas?
I am trying to convert user's posts (text) into hashtag clickable links, using PHP.
From what I found, hashtags should only contain alpha-numeric characters.
$text = 'Testing#one #two #three.test';
$text = preg_replace('/#([0-9a-zA-Z]+)/i', '#$1', $text);
It places links on all (#one #two #three), but I think the #one should not be converted, because it is next to another alpha-numeric character, how to adjust the reg-ex to fix that ?
The 3rd one is also OK, it matches just #three, which I think is correct.
You could modify your regex to include a negative lookbehind for a non-whitespace character, like so:
(?<!\S)#([0-9a-zA-Z]+)
Working regex example:
http://regex101.com/r/mR4jZ7
PHP:
$text = preg_replace('/(?<!\S)#([0-9a-zA-Z]+)/', '#$1', $text);
Edit:
And to make the expression compatible with other languages (non-english characters):
(?<!\S)#([0-9\p{L}]+)
Working example:
https://regex101.com/r/Pquem3/1
With uni-code, html encoded safe and joined regexp; ~(?<!&)#([\pL\d]+)~u
Here some's tags like #tag1 #tag2#tag3 etc.
Finally I have found the solution like: facebook or others hashtag to url solutions, it may be help you too. This code also works with unicode. I have used some of Bangla Unicode, let me know other Languages work as well, I think it will work on any language.
$str = '#Your Text #Unicode #ফ্রিকেলস বা #তিল মেলানিনের #অতিরিক্ত উৎপাদনের জন্য হয় যা #সূর্যালোকে #বাড়ে';
$regex = '/(?<!\S)#([0-9a-zA-Z\p{L}\p{M}]+)/mu';
$text = preg_replace($regex, '#$1', $str);
echo $text;
To catch the second and third hashtags without the first one, you need to specify that the hashtag should start at the beginning of the line, or be preceded one of more characters of whitespace as follows:
$text = 'Testing#one #two #three.test';
$text = preg_replace('/(^|\s+)#([0-9a-zA-Z]+)(\b|$)/', '$1#$2', $text);
The \b in the third group defines a word boundary, which allows the pattern to match #three when it is immediately followed by a non-word character.
Edit: MElliott's answer above is more efficient, for the record.
I was using this function to find links in a string and convert them to html links
function makeClickableLinks($s) {
return preg_replace('#(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)#', '$1', $s);
}
The problem is that its not working with urls with non-latin chars like this
https://www.facebook.com/pages/Celebração/123434584839
for which the result is
https://www.facebook.com/pages/Celebra��ão/123434584839
Any help?
Try to use regex pattern
(?:(^)|(?<=(.)))((?<!^)https?://.*?(?=\1)|https?://.*?(?=\s|$))
having url in $2
To match latin characters you should be using unicode friendly regex. Something like this should work:
#(https?://([-\pL\.]+[-\pL])+(:\pN+)?(/([\pL/_\.#-]*(\?\S+)?[^\.\s])?)?)#u
I'm trying to run this php command:
preg_replace($regexp, $replace, $text, $maxsingle);
Where the vars are:
$regexp = '/(?!(?:[^<\\[]+[>\\]]|[^>\\]]+<\\/a>))\\b(שלום)\\b/imsU';
$replace = '<a title="$1" href="http://stackoverflow.com">$1</a>';
$text is a long post
$maxsingle = 3;
When the text I'm trying to match (in the above case "שלום") is in english everything works. However, when the text is Hebrew, it doesn't matches anything...
Any ideas how to make Hebrew work with preg_replace?
Thanks.
Try using the /u (utf-8) flag