Modifying a function to convert text URLs to links and limit name - php

So I have this function to convert text URLs to links. What do I need to add to preg_replace, to limit a long URL, currently it displays domain only, I would like to add a 4th attribute to this, that shows limit from full URL, limits the URL, I have no idea how to do this. This function is tested and works very well this is why I want to keep using it, how to I add this extra snippet for 4th attribute.
function autolink($str, $attributes=array()) {
$attrs = '';
foreach ($attributes as $attribute => $value) {
$attrs .= " {$attribute}=\"{$value}\"";
}
$str = ' ' . $str;
$str = preg_replace(
'|([\w\d]*)\s(https?://([\d\w\.-]+\.[\w\.]{2,6})[^\s\]\[\<\>]*/?)|i',
'$1 $3',
$str
);
return $str;
}
Can someone please help me out.
INPUT:
$cont=autolink("https://www.example.com/test.php?ss=a https://www.example.com/2 https://www.exxxxxxxxxxxxxxxxxxaaample.com/3");
Output current:
www.example.com www.example.com www.exxxxxxxxxxxxxxxxxxaaample.com
WANTED output:
www.example.com/test... www.example.com/2 www.exxxxxxxxxxxxx.....
I should be able to limit the URL name to like 40 chars. Anyways the third attribute is not needed and would be modified, not returning the domain, but the real name of URL limited to 40 chars.

I hope it's final: answer is *preg_replace_callback*and this is the best regex, used here, best way to make text URL to HTML link, easy to modify.
function autolink($textOriginal)
{
$textOriginal=" $textOriginal";
$urlPattern = '|([\w]*)\s(https?://([\w.-]+\.[\w.]{2,6})[^\s\]\[\<\>]*/?)|i';
$textNew = preg_replace_callback($urlPattern, "replaceLinkParts", $textOriginal);
return $textNew;
}
function replaceLinkParts($matches)
{
$thefullurl=''.$matches[2].'';
$whatisthis=''.$matches[1].'';
//////////
$myDomainx = parse_url("$thefullurl");
$simpleurl="" . $myDomainx["host"] . "";
$simpleurlp = "" . $myDomainx["path"] . "";
if ($simpleurlp == "/") {
$simpleurlp = "";
}
$newurl="$simpleurl$simpleurlp";
////////////////
$lastpart= "";
$limichars="30";
$limitlast="10";
$calcit=$limichars-$limitlast;
$nmre=strlen($newurl);
if ($nmre > $limichars) {
$lastpart=$nmre - $limitlast;
$lastpart= substr($newurl, $lastpart, $limitlast);
$newurl= substr($newurl, 0, $calcit);
$newurl= ''.$newurl.'...';
}
////////////
$retthis=''.$whatisthis.' <a target="_blank" title="'.$thefullurl.'" rel="nofollow" href="'.$thefullurl.'">'.$newurl.''.$lastpart.'</a>';
return $retthis;
}
Outputs HTML link: www.example.com...ifverylongurl

Related

How to change href property checking urls

I need to verify a text to show it in the page of a website. I need to transform all urls links of the the same website(not others urls of other websites) in links. I need to involve all them with the tag <a>. The problem is is the property href, that I need to put the correct url inside it. I am trying to verify all the the text and if I find a url, I need to verify if it contains the substring "http://". If not, I must put it in the href property. I did some attempt, but all their aren't working yet :( . Any idea how can I do this?
My function is below:
$string = "This is a url from my website: http://www.mysite.com.br and I have a article interesting there, the link is http://www.mysite.com.br/articles/what-is-psychology/205967. I need that the secure url link works too https://www.mysite.com.br/articles/what-is-psychology/205967. the following urls must be valid too: www.mysite.com.br and mysite.com.br";
function urlMySite($string){
$verifyUrl = '';
$urls = array("mysite.com.br");
$text = explode(" ", $string);
$alltext = "";
for($i = 0; $i < count($texto); $i++){
foreach ($urls as $value){
$pos = strpos($text[$i], $value);
if (!($pos === false)){
$verifyUrl = " <a href='".$text[$i]."' target='_blank'>".$text[$i]."</a> ";
if (strpos($verifyUrl, 'http://') !== true) {
$verifyUrl = " <a href='http://".$text[$i]."' target='_blank'>".$text[$i]."</a> ";
}
$alltext .= $verifyUrl;
} else {
$alltext .= " ".$text[$i]." ";
}
}
}
return $alltext;
}
You should use PREG_MATCH_ALL to find all occurances of the URL and replace each of the Matches with a clickable Link.
You could use this function:
function augmentText($text){
$pattern = "~(https?|file|ftp)://[a-z0-9./&?:=%-_]*~i";
preg_match_all($pattern, $text, $matches);
if( count($matches[0]) > 0 ){
foreach($matches[0] as $match){
$text = str_replace($match, "<a href='" . $match . "' target='_blank'>" . $match . "</a>", $text);
}
}
return $text;
}
Change the reguylar expression pattern to match only the URL's you want to make clickable.
Good luck

URL conversion (into link and urldecode)

I want to ask 2 questions about url conversion in php.
1 question: I need to convert text into link. I've done my own preg and also read many forums, but all solutions are connected with www. or (ht|f)tp(s), but I need preg that will convert domain names even without www and http in text, for example:
I like stackoverflow.com very much
into
I like <a href='http://stackoverflow.com'>stackoverflow.com</a> very much
Sure it must consider points and commas and etc., like:
I like stackoverflow.com.
into
I like <a href='http://stackoverflow.com'>stackoverflow.com</a>.
And one more question: links with url-encoded symbols on wiki are displayed as they are, but on other sites they are displayed like url-encoded string (%XX%XX%XX). How did wiki do this? Thanks!
For your first question, I would not recommend you to that, it is very difficult to know if the a word containing a dot is a domain name or not, and people often forget to put a space after the dot in the middle of a paragraph.
for your second question, it is simple, you url encode the link in the href but not between the open and close a tag. For example :
http://site.na/test.php?t=sqdl&54"dfd°=+
The function auto_link from CodeIgniter URL helper can help you:
if ( ! function_exists('auto_link'))
{
function auto_link($str, $type = 'both', $popup = FALSE)
{
if ($type != 'email')
{
if (preg_match_all("#(^|\s|\()((http(s?)://)|(www\.))(\w+[^\s\)\<]+)#i", $str, $matches))
{
$pop = ($popup == TRUE) ? " target=\"_blank\" " : "";
for ($i = 0; $i < count($matches['0']); $i++)
{
$period = '';
if (preg_match("|\.$|", $matches['6'][$i]))
{
$period = '.';
$matches['6'][$i] = substr($matches['6'][$i], 0, -1);
}
$str = str_replace($matches['0'][$i],
$matches['1'][$i].'<a href="http'.
$matches['4'][$i].'://'.
$matches['5'][$i].
$matches['6'][$i].'"'.$pop.'>http'.
$matches['4'][$i].'://'.
$matches['5'][$i].
$matches['6'][$i].'</a>'.
$period, $str);
}
}
}
if ($type != 'url')
{
if (preg_match_all("/([a-zA-Z0-9_\.\-\+]+)#([a-zA-Z0-9\-]+)\.([a-zA-Z0-9\-\.]*)/i", $str, $matches))
{
for ($i = 0; $i < count($matches['0']); $i++)
{
$period = '';
if (preg_match("|\.$|", $matches['3'][$i]))
{
$period = '.';
$matches['3'][$i] = substr($matches['3'][$i], 0, -1);
}
$str = str_replace($matches['0'][$i], safe_mailto($matches['1'][$i].'#'.$matches['2'][$i].'.'.$matches['3'][$i]).$period, $str);
}
}
}
return $str;
}
}

str_replace not replacing correctly

I have the following, simple code:
$text = str_replace($f,''.$u.'',$text);
where $f is a URL, like http://google.ca, and $u is the name of the URL (my function names it 'Google').
My problem is, is if I give my function a string like
http://google.ca http://google.ca
it returns
Google" target="_blank">Google</a> Google" target="_blank">Google</a>
Which obviously isn't what I want. I want my function to echo out two separate, clickable links. But str_replace is replacing the first occurrence (it's in a loop to loop through all the found URLs), and that first occurrence has already been replaced.
How can I tell str_replace to ignore that specific one, and move onto the next? The string given is user input, so I can't just give it a static offset or anything with substr, which I have tried.
Thank you!
One way, though it's a bit of a kludge: you can use a temporary marker that (hopefully) won't appear in the string:
$text = str_replace ($f, '' . $u . '',
$text);
That way, the first substitution won't be found again. Then at the end (after you've processed the entire line), simply change the markers back:
$text = str_replace ('XYZZYPLUGH', $f, $text);
Why not pass your function an array of URLs, instead?
function makeLinks(array $urls) {
$links = array();
foreach ($urls as $url) {
list($desc, $href) = $url;
// If $href is based on user input, watch out for "javascript: foo;" and other XSS attacks here.
$links[] = '<a href="' . htmlentities($href) . '" target="_blank">'
. htmlentities($desc)
. '</a>';
}
return $links; // or implode('', $links) if you want a string instead
}
$urls = array(
array('Google', 'http://google.ca'),
array('Google', 'http://google.ca')
);
var_dump(makeLinks($urls));
If i understand your problem correctly, you can just use the function sprintf. I think something like this should work:
function urlize($name, $url)
{
// Make sure the url is formatted ok
if (!filter_var($url, FILTER_VALIDATE_URL))
return '';
$name = htmlspecialchars($name, ENT_QUOTES);
$url = htmlspecialchars($url, ENT_QUOTES);
return sprintf('%s', $url, $name);
}
echo urlize('my name', 'http://www.domain.com');
// my name
I havent test it though.
I suggest you to use preg_replace instead of str_replace here like this code:
$f = 'http://google.ca';
$u = 'Google';
$text='http://google.ca http://google.ca';
$regex = '~(?<!<a href=")' . preg_quote($f) . '~'; // negative lookbehind
$text = preg_replace($regex, ''.$u.'', $text);
echo $text . "\n";
$text = preg_replace($regex, ''.$u.'', $text);
echo $text . "\n";
OUTPUT:
Google Google
Google Google

How to wrap user mentions in a HTML link on PHP?

Im working on a commenting web application and i want to parse user mentions (#user) as links. Here is what I have so far:
$text = "#user is not #user1 but #user3 is #user4";
$pattern = "/\#(\w+)/";
preg_match_all($pattern,$text,$matches);
if($matches){
$sql = "SELECT *
FROM users
WHERE username IN ('" .implode("','",$matches[1]). "')
ORDER BY LENGTH(username) DESC";
$users = $this->getQuery($sql);
foreach($users as $i=>$u){
$text = str_replace("#{$u['username']}",
"<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a> ", $text);
}
$echo $text;
}
The problem is that user links are being overlapped:
<a rel="11327" class="ct-userLink" href="#">
<a rel="21327" class="ct-userLink" href="#">#user</a>1
</a>
How can I avoid links overlapping?
Answer Update
Thanks to the answer picked, this is how my new foreach loop looks like:
foreach($users as $i=>$u){
$text = preg_replace("/#".$u['username']."\b/",
"<a href='#' title='{$u['user_id']}'>#{$u['username']}</a> ", $text);
}
Problem seems to be that some usernames can encompass other usernames. So you replace user1 properly with <a>user1</a>. Then, user matches and replaces with <a><a>user</a>1</a>. My suggestion is to change your string replace to a regex with a word boundary, \b, that is required after the username.
The Twitter widget has JavaScript code to do this. I ported it to PHP in my WordPress plugin. Here's the relevant part:
function format_tweet($tweet) {
// add #reply links
$tweet_text = preg_replace("/\B[#@]([a-zA-Z0-9_]{1,20})/",
"#<a class='atreply' href='http://twitter.com/$1'>$1</a>",
$tweet);
// make other links clickable
$matches = array();
$link_info = preg_match_all("/\b(((https*\:\/\/)|www\.)[^\"\']+?)(([!?,.\)]+)?(\s|$))/",
$tweet_text, $matches, PREG_SET_ORDER);
if ($link_info) {
foreach ($matches as $match) {
$http = preg_match("/w/", $match[2]) ? 'http://' : '';
$tweet_text = str_replace($match[0],
"<a href='" . $http . $match[1] . "'>" . $match[1] . "</a>" . $match[4],
$tweet_text);
}
}
return $tweet_text;
}
instead of parsing for '#user' parse for '#user ' (with space in the end) or ' #user ' to even avoid wrong parsing of email addresses (eg: mailaddress#user.com) maybe ' #user: ' should also be allowed. this will only work, if usernames have no whitespaces...
You can go for a custom str replace function which stops at first replace.. Something like ...
function str_replace_once($needle , $replace , $haystack){
$pos = strpos($haystack, $needle);
if ($pos === false) {
// Nothing found
return $haystack;
}
return substr_replace($haystack, $replace, $pos, strlen($needle));
}
And use it like:
foreach($users as $i=>$u){
$text = str_replace_once("#{$u['username']}",
"<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a> ", $text);
}
You shouldn’t replace one certain user mention at a time but all at once. You could use preg_split to do that:
// split text at mention while retaining user name
$parts = preg_split("/#(\w+)/", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$n = count($parts);
// $n is always an odd number; 1 means no match found
if ($n > 1) {
// collect user names
$users = array();
for ($i=1; $i<$n; $i+=2) {
$users[$parts[$i]] = '';
}
// get corresponding user information
$sql = "SELECT *
FROM users
WHERE username IN ('" .implode("','", array_keys($users)). "')";
$users = array();
foreach ($this->getQuery($sql) as $user) {
$users[$user['username']] = $user;
}
// replace mentions
for ($i=1; $i<$n; $i+=2) {
$u = $users[$parts[$i]];
$parts[$i] = "<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a>";
}
// put everything back together
$text = implode('', $parts);
}
I like dnl solution of parsing ' #user', but maybe is not suitable for you.
Anyway, did you try to use strip_tags function to remove the anchor tags? That way you have the string without the links, and you can parse it building the links again.
strip_tags

PHP Remove URL from string

If I have a string that contains a url (for examples sake, we'll call it $url) such as;
$url = "Here is a funny site http://www.tunyurl.com/34934";
How do i remove the URL from the string?
Difficulty is, urls might also show up without the http://, such as ;
$url = "Here is another funny site www.tinyurl.com/55555";
There is no HTML present. How would i start a search if http or www exists, then remove the text/numbers/symbols until the first space?
I re-read the question, here is a function that would work as intended:
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,'http') || (count(explode('.',$u)) > 1)) {
unset($U[$k]);
return cleaner( implode(' ',$U));
}
}
return implode(' ',$U);
}
$url = "Here is another funny site www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);
Edit #2/#3 (I must be bored). Here is a version that verifies there is a TLD within the URL:
function containsTLD($string) {
preg_match(
"/(AC($|\/)|\.AD($|\/)|\.AE($|\/)|\.AERO($|\/)|\.AF($|\/)|\.AG($|\/)|\.AI($|\/)|\.AL($|\/)|\.AM($|\/)|\.AN($|\/)|\.AO($|\/)|\.AQ($|\/)|\.AR($|\/)|\.ARPA($|\/)|\.AS($|\/)|\.ASIA($|\/)|\.AT($|\/)|\.AU($|\/)|\.AW($|\/)|\.AX($|\/)|\.AZ($|\/)|\.BA($|\/)|\.BB($|\/)|\.BD($|\/)|\.BE($|\/)|\.BF($|\/)|\.BG($|\/)|\.BH($|\/)|\.BI($|\/)|\.BIZ($|\/)|\.BJ($|\/)|\.BM($|\/)|\.BN($|\/)|\.BO($|\/)|\.BR($|\/)|\.BS($|\/)|\.BT($|\/)|\.BV($|\/)|\.BW($|\/)|\.BY($|\/)|\.BZ($|\/)|\.CA($|\/)|\.CAT($|\/)|\.CC($|\/)|\.CD($|\/)|\.CF($|\/)|\.CG($|\/)|\.CH($|\/)|\.CI($|\/)|\.CK($|\/)|\.CL($|\/)|\.CM($|\/)|\.CN($|\/)|\.CO($|\/)|\.COM($|\/)|\.COOP($|\/)|\.CR($|\/)|\.CU($|\/)|\.CV($|\/)|\.CX($|\/)|\.CY($|\/)|\.CZ($|\/)|\.DE($|\/)|\.DJ($|\/)|\.DK($|\/)|\.DM($|\/)|\.DO($|\/)|\.DZ($|\/)|\.EC($|\/)|\.EDU($|\/)|\.EE($|\/)|\.EG($|\/)|\.ER($|\/)|\.ES($|\/)|\.ET($|\/)|\.EU($|\/)|\.FI($|\/)|\.FJ($|\/)|\.FK($|\/)|\.FM($|\/)|\.FO($|\/)|\.FR($|\/)|\.GA($|\/)|\.GB($|\/)|\.GD($|\/)|\.GE($|\/)|\.GF($|\/)|\.GG($|\/)|\.GH($|\/)|\.GI($|\/)|\.GL($|\/)|\.GM($|\/)|\.GN($|\/)|\.GOV($|\/)|\.GP($|\/)|\.GQ($|\/)|\.GR($|\/)|\.GS($|\/)|\.GT($|\/)|\.GU($|\/)|\.GW($|\/)|\.GY($|\/)|\.HK($|\/)|\.HM($|\/)|\.HN($|\/)|\.HR($|\/)|\.HT($|\/)|\.HU($|\/)|\.ID($|\/)|\.IE($|\/)|\.IL($|\/)|\.IM($|\/)|\.IN($|\/)|\.INFO($|\/)|\.INT($|\/)|\.IO($|\/)|\.IQ($|\/)|\.IR($|\/)|\.IS($|\/)|\.IT($|\/)|\.JE($|\/)|\.JM($|\/)|\.JO($|\/)|\.JOBS($|\/)|\.JP($|\/)|\.KE($|\/)|\.KG($|\/)|\.KH($|\/)|\.KI($|\/)|\.KM($|\/)|\.KN($|\/)|\.KP($|\/)|\.KR($|\/)|\.KW($|\/)|\.KY($|\/)|\.KZ($|\/)|\.LA($|\/)|\.LB($|\/)|\.LC($|\/)|\.LI($|\/)|\.LK($|\/)|\.LR($|\/)|\.LS($|\/)|\.LT($|\/)|\.LU($|\/)|\.LV($|\/)|\.LY($|\/)|\.MA($|\/)|\.MC($|\/)|\.MD($|\/)|\.ME($|\/)|\.MG($|\/)|\.MH($|\/)|\.MIL($|\/)|\.MK($|\/)|\.ML($|\/)|\.MM($|\/)|\.MN($|\/)|\.MO($|\/)|\.MOBI($|\/)|\.MP($|\/)|\.MQ($|\/)|\.MR($|\/)|\.MS($|\/)|\.MT($|\/)|\.MU($|\/)|\.MUSEUM($|\/)|\.MV($|\/)|\.MW($|\/)|\.MX($|\/)|\.MY($|\/)|\.MZ($|\/)|\.NA($|\/)|\.NAME($|\/)|\.NC($|\/)|\.NE($|\/)|\.NET($|\/)|\.NF($|\/)|\.NG($|\/)|\.NI($|\/)|\.NL($|\/)|\.NO($|\/)|\.NP($|\/)|\.NR($|\/)|\.NU($|\/)|\.NZ($|\/)|\.OM($|\/)|\.ORG($|\/)|\.PA($|\/)|\.PE($|\/)|\.PF($|\/)|\.PG($|\/)|\.PH($|\/)|\.PK($|\/)|\.PL($|\/)|\.PM($|\/)|\.PN($|\/)|\.PR($|\/)|\.PRO($|\/)|\.PS($|\/)|\.PT($|\/)|\.PW($|\/)|\.PY($|\/)|\.QA($|\/)|\.RE($|\/)|\.RO($|\/)|\.RS($|\/)|\.RU($|\/)|\.RW($|\/)|\.SA($|\/)|\.SB($|\/)|\.SC($|\/)|\.SD($|\/)|\.SE($|\/)|\.SG($|\/)|\.SH($|\/)|\.SI($|\/)|\.SJ($|\/)|\.SK($|\/)|\.SL($|\/)|\.SM($|\/)|\.SN($|\/)|\.SO($|\/)|\.SR($|\/)|\.ST($|\/)|\.SU($|\/)|\.SV($|\/)|\.SY($|\/)|\.SZ($|\/)|\.TC($|\/)|\.TD($|\/)|\.TEL($|\/)|\.TF($|\/)|\.TG($|\/)|\.TH($|\/)|\.TJ($|\/)|\.TK($|\/)|\.TL($|\/)|\.TM($|\/)|\.TN($|\/)|\.TO($|\/)|\.TP($|\/)|\.TR($|\/)|\.TRAVEL($|\/)|\.TT($|\/)|\.TV($|\/)|\.TW($|\/)|\.TZ($|\/)|\.UA($|\/)|\.UG($|\/)|\.UK($|\/)|\.US($|\/)|\.UY($|\/)|\.UZ($|\/)|\.VA($|\/)|\.VC($|\/)|\.VE($|\/)|\.VG($|\/)|\.VI($|\/)|\.VN($|\/)|\.VU($|\/)|\.WF($|\/)|\.WS($|\/)|\.XN--0ZWM56D($|\/)|\.XN--11B5BS3A9AJ6G($|\/)|\.XN--80AKHBYKNJ4F($|\/)|\.XN--9T4B11YI5A($|\/)|\.XN--DEBA0AD($|\/)|\.XN--G6W251D($|\/)|\.XN--HGBK6AJ7F53BBA($|\/)|\.XN--HLCJ6AYA9ESC7A($|\/)|\.XN--JXALPDLP($|\/)|\.XN--KGBECHTV($|\/)|\.XN--ZCKZAH($|\/)|\.YE($|\/)|\.YT($|\/)|\.YU($|\/)|\.ZA($|\/)|\.ZM($|\/)|\.ZW)/i",
$string,
$M);
$has_tld = (count($M) > 0) ? true : false;
return $has_tld;
}
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,".")) { //only preg_match if there is a dot
if (containsTLD($u) === true) {
unset($U[$k]);
return cleaner( implode(' ',$U));
}
}
}
return implode(' ',$U);
}
$url = "Here is another funny site badurl.badone somesite.ca/worse.jpg but this badsite.com www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);
returns:
Cleaned: Here is another funny site badurl.badone but this and and
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
Parsing text for URLs is hard and looking for pre-existing, heavily tested code that already does this for you would be better than writing your own code and missing edge cases. For example, I would take a look at the process in Django's urlize, which wraps URLs in anchors. You could port it over to PHP, and--instead of wrapping URLs in an anchor--just delete them from the text.
thanks mike,
update a bit, it return notice error,
'/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i'
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
$url = "Here is a funny site http://www.tunyurl.com/34934";
$replace = 'http www .com .org .net';
$with = '';
$clean_url = clean($url,$replace,$with);
echo $clean_url;
function clean($url,$replace,$with) {
$replace = explode(" ",$replace);
$new_string = '';
$check = explode(" ",$url);
foreach($check AS $key => $value) {
foreach($replace AS $key2 => $value2 ) {
if (-1 < strpos( strtolower($value), strtolower($value2) ) ) {
$value = $with;
break;
}
}
$new_string .= " ".$value;
}
return $new_string;
}
You would need to write a regular expression to extract out the urls.

Categories