Convert links in string with non-latin chars using regex - php

I was using this function to find links in a string and convert them to html links
function makeClickableLinks($s) {
return preg_replace('#(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)#', '$1', $s);
}
The problem is that its not working with urls with non-latin chars like this
https://www.facebook.com/pages/Celebração/123434584839
for which the result is
https://www.facebook.com/pages/Celebra��ão/123434584839
Any help?

Try to use regex pattern
(?:(^)|(?<=(.)))((?<!^)https?://.*?(?=\1)|https?://.*?(?=\s|$))
having url in $2

To match latin characters you should be using unicode friendly regex. Something like this should work:
#(https?://([-\pL\.]+[-\pL])+(:\pN+)?(/([\pL/_\.#-]*(\?\S+)?[^\.\s])?)?)#u

Related

PHP: make a string clickable

For a Mandarin learning-tool I would like to create links for each chinese "character" within a word. For example I have the chinese word "自行车" (bicycle). Then I would like to make each of the three characters "clickable".
$word = '自行车';
And the output should be:
$output = "<a href='?char=自'>自</a>
<a href='?char=行'>行</a>
<a href='?char=车'>车</a>";
Does anyone have an idea how to do this?
With this regex you can split your characters.
preg_split('//u', "自行车", null, PREG_SPLIT_NO_EMPTY);
the result is array of three character.
You can use preg_replace to replace the chinese char with regular expression /(\p{Han})/u then replace it with what your need.
preg_replace("/(\p{Han})/u","<a href='?char=$1'>$1</a>",'ss自行车ss');
output:
ss<a href='?char=自'>自</a><a href='?char=行'>行</a><a href='?char=车'>车</a>ss
Refer to Php - regular expression to check if the string has chinese chars
You can extract chars in a for loop and generate anchor tags.
<a href='<link?char=<extracted char>'><extracted char></a>

Get hashtags from string, replace some symbols in it and convert it to link PHP

I have some string with hashtags looks like #user_name.
Now i convert all hashtags to links this way:
$text = preg_replace ("/#(\\w+)/", '#$1 ', $text);
As you can see, all hashtags becomes to subdomain names. If you know, there some problems with _ symbol in subdomain names (some browsers not supports that, IE supports, but not set cookies, etc..). So i need to replace the _ symbols in subdomain to - (minus), but keep _ symbols in hashtag view. There link what i need to #user_name. How?
You can use preg_replace_callback () like this:
$text = preg_replace_callback ("/#(\\w+)/", function ($matches) {
return ''.$matches[1].' '; }, "test #user_john here");
Well, it looks like you already have an answer...
I was going to suggest this regex to use:
(https?:..[^\.]*)(_)([^\.]*)
DEMO

UTF 8 in preg_match

sorry for my English.
I’m trying to use preg_match with utf-8 in PHP.
preg_match("/\bjaunā\b Iel.*/iu", "Jaunā Iela");
Function returns 0. But
preg_match("/\bjauna\b Iel.*/iu", "Jauna Iela");
works fine.
Why?
Thanks.
Word boundaries don't work correctly with special chars. In the text Jaunā Iela the word bounderies are: \bJaun\bā \bIela\b
So instead of using word bounderies, try a look-ahead and look-behind assertion for a space. (or beginning of string) Like so:
The regex:
(?<=^|\s)Jaunā(?=\s) Iel.*
PHP:
preg_match("/(?<=^|\s)Jaunā(?=\s) Iel.*/i", "Jaunā Iela");
Working regex example:
http://regex101.com/r/tV6yR9

php preg_match url regex not working

I can't get this function working correctly:
function isValidURL($url){
return preg_match('%http://domain\.com/([A-Za-z0-9.-_]+)/([A-Za-z0-9.-_]+)%', $url);
}
The url:
http://domain.com/anything-12/anything-12/
can contain numbers, letters and symbols _ -
I assume its to do with the first regex - as these work
http://domain.com/anything12/anything12/
http://domain.com/anything12/anything-12/
http://domain.com/anything12/any-thing-12/
http://domain.com/anything_12/any-thing-12/
As always all help is appreciated and thanks in advance.
You need to escape the - in the character class of your regex.
You need to anchor your regex so that tries to match the entire input string and not part of it.
The modified regex is:
'%^http://domain\.com/([A-Za-z0-9.\-_]+)/([A-Za-z0-9.\-_]+)/$%'
You can shorten your regex by noting that [A-Za-z0-9_] is same as \w and also there is a repeating sub-regex.
'%^http://domain\.com(/[\w.-]+){2}/$%'

Regex to match all characters except letters and numbers

I want to clean the filenames of all uploaded files. I want to remove all characters except periods, letters and numbers. I'm not good with regex so I thought I would ask here.
Can someone show me how to put this together? I'm using PHP.
$newfilename=preg_replace('/[^a-zA-Z0-9.]/','',$filename);
s/[^.a-zA-Z\d]//g
(This is a Perl expression of how to use the RegExp. In PHP you do:
$output = preg_replace('/[^.a-zA-Z\d]/', '', $input);
Try to use this:
$cleanString = preg_replace('#\W#', '', $string);
It will remove all but letters and numbers.

Categories