I need to verify a text to show it in the page of a website. I need to transform all urls links of the the same website(not others urls of other websites) in links. I need to involve all them with the tag <a>. The problem is is the property href, that I need to put the correct url inside it. I am trying to verify all the the text and if I find a url, I need to verify if it contains the substring "http://". If not, I must put it in the href property. I did some attempt, but all their aren't working yet :( . Any idea how can I do this?
My function is below:
$string = "This is a url from my website: http://www.mysite.com.br and I have a article interesting there, the link is http://www.mysite.com.br/articles/what-is-psychology/205967. I need that the secure url link works too https://www.mysite.com.br/articles/what-is-psychology/205967. the following urls must be valid too: www.mysite.com.br and mysite.com.br";
function urlMySite($string){
$verifyUrl = '';
$urls = array("mysite.com.br");
$text = explode(" ", $string);
$alltext = "";
for($i = 0; $i < count($texto); $i++){
foreach ($urls as $value){
$pos = strpos($text[$i], $value);
if (!($pos === false)){
$verifyUrl = " <a href='".$text[$i]."' target='_blank'>".$text[$i]."</a> ";
if (strpos($verifyUrl, 'http://') !== true) {
$verifyUrl = " <a href='http://".$text[$i]."' target='_blank'>".$text[$i]."</a> ";
}
$alltext .= $verifyUrl;
} else {
$alltext .= " ".$text[$i]." ";
}
}
}
return $alltext;
}
You should use PREG_MATCH_ALL to find all occurances of the URL and replace each of the Matches with a clickable Link.
You could use this function:
function augmentText($text){
$pattern = "~(https?|file|ftp)://[a-z0-9./&?:=%-_]*~i";
preg_match_all($pattern, $text, $matches);
if( count($matches[0]) > 0 ){
foreach($matches[0] as $match){
$text = str_replace($match, "<a href='" . $match . "' target='_blank'>" . $match . "</a>", $text);
}
}
return $text;
}
Change the reguylar expression pattern to match only the URL's you want to make clickable.
Good luck
Related
I've been searching around for this but all I could find was broken scripts and plus, I might have a method that is quite simple.
I'm trying to use a for () loop for this one.
This is what I've got:
<?php
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$makerepstring = "Here is a link: http://youtube.com and another: http://google.com";
if(preg_match_all($reg_exUrl, $makerepstring, $url)) {
// make the url into link
for($i=0; $i < count(array_keys($url[0])); $i++){
$makerepstring = preg_replace($reg_exUrl, ''.$url[0][$i].' ', $makerepstring);
}
}
echo $makerepstring;
?>
However this fails brutally for some reason I can't comprehend.
The output from echo $makerepstring; as follows(from source code):
http://google.com " target="_blank" rel="nofollow">http://google.com </a> http://google.com " target="_blank" rel="nofollow">http://google.com </a>
I'd really like to do it with a for()... Could somebody try and figure out how to get this to work with me?
Thanks in advance!
/J
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$makerepstring = "http://youtube.com http://google.com";
$url = array();
$instances = preg_match_all($reg_exUrl, $makerepstring, $url);
if ($instances > 0) {
// make the url into link
for($i=0; $i < count(array_keys($url[0])); $i++){
$makerepstring = preg_replace($reg_exUrl, ''.$url[0][$i].' ', $makerepstring);
/*echo $url[0][$i]."<br />";
echo $i."<br />";
print_r($url);
echo "<br />";*/
}
}
echo $makerepstring;
This does not work either, although I'm not quite sure how you meant I should do this.
EDIT:
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$makeurl = "http://google.com http://youtube.com";
if(preg_match($reg_exUrl, $makeurl, $url)) {
echo preg_replace($reg_exUrl, ''.$url[0].' ', $makeurl);
} else {
echo $makeurl;
}
Would give:
http://google.com http://google.com
that's not how preg_match_all works. http://php.net/manual/en/function.preg-match-all.php shows you that the matches go in a passed-along array, and the function returns the number of matches, instead. So first call
...
$matches = array();
$instances = preg_match_all(..., $matches);
if ($instances > 0) {
// and then your code
}
...
And then iterate over the $matches array, which now has content.
You are performing the match twice:
in preg_match_all function
then you are matching again in preg_replace, which should not happen here
Use string concatation instead:
$makerepstring = "Here is a link: http://youtube.com and another: http://google.com";
$new_str = '';
if(preg_match_all($reg_exUrl, $makerepstring, $url)) {
var_dump($url[0]);
// make the url into link
for($i=0; $i < count(array_keys($url[0])); $i++){
$new_str .= ''.$url[0][$i].' ';
}
}
echo $new_str;
I am using the following code to add links to urls in text...
if (preg_match_all("#((http(s?)://)|www\.)?([a-zA-Z0-9\-\.])(\w+[^\s\)\<]+)#i", $str, $matches))
{
?><pre><?php
print_r($matches);
?></pre><?php
for ($i = 0; $i < count($matches[0]); $i++)
{
$url = $matches[0][$i];
$parsed = parse_url($url);
$prefix = '';
if (!isset($parsed["scheme"])){
$prefix = 'http://';
}
$url = $prefix.$url;
$replace = ''.$matches[0][$i].'';
$str = str_replace($matches[0][$i], ''.$matches[0][$i].'', $str);
}
}
the problem comes when i enter twice the same url in the text at any place..
for example.
google.com text text google.com
it will add a link on the first one and then search for google.com which is inside the link and try to add again in there..
how can i make sure it will add the links separately without problems?
You can use preg_replace_callback() to reliably work on individual matches.
I have the following, simple code:
$text = str_replace($f,''.$u.'',$text);
where $f is a URL, like http://google.ca, and $u is the name of the URL (my function names it 'Google').
My problem is, is if I give my function a string like
http://google.ca http://google.ca
it returns
Google" target="_blank">Google</a> Google" target="_blank">Google</a>
Which obviously isn't what I want. I want my function to echo out two separate, clickable links. But str_replace is replacing the first occurrence (it's in a loop to loop through all the found URLs), and that first occurrence has already been replaced.
How can I tell str_replace to ignore that specific one, and move onto the next? The string given is user input, so I can't just give it a static offset or anything with substr, which I have tried.
Thank you!
One way, though it's a bit of a kludge: you can use a temporary marker that (hopefully) won't appear in the string:
$text = str_replace ($f, '' . $u . '',
$text);
That way, the first substitution won't be found again. Then at the end (after you've processed the entire line), simply change the markers back:
$text = str_replace ('XYZZYPLUGH', $f, $text);
Why not pass your function an array of URLs, instead?
function makeLinks(array $urls) {
$links = array();
foreach ($urls as $url) {
list($desc, $href) = $url;
// If $href is based on user input, watch out for "javascript: foo;" and other XSS attacks here.
$links[] = '<a href="' . htmlentities($href) . '" target="_blank">'
. htmlentities($desc)
. '</a>';
}
return $links; // or implode('', $links) if you want a string instead
}
$urls = array(
array('Google', 'http://google.ca'),
array('Google', 'http://google.ca')
);
var_dump(makeLinks($urls));
If i understand your problem correctly, you can just use the function sprintf. I think something like this should work:
function urlize($name, $url)
{
// Make sure the url is formatted ok
if (!filter_var($url, FILTER_VALIDATE_URL))
return '';
$name = htmlspecialchars($name, ENT_QUOTES);
$url = htmlspecialchars($url, ENT_QUOTES);
return sprintf('%s', $url, $name);
}
echo urlize('my name', 'http://www.domain.com');
// my name
I havent test it though.
I suggest you to use preg_replace instead of str_replace here like this code:
$f = 'http://google.ca';
$u = 'Google';
$text='http://google.ca http://google.ca';
$regex = '~(?<!<a href=")' . preg_quote($f) . '~'; // negative lookbehind
$text = preg_replace($regex, ''.$u.'', $text);
echo $text . "\n";
$text = preg_replace($regex, ''.$u.'', $text);
echo $text . "\n";
OUTPUT:
Google Google
Google Google
Im working on a commenting web application and i want to parse user mentions (#user) as links. Here is what I have so far:
$text = "#user is not #user1 but #user3 is #user4";
$pattern = "/\#(\w+)/";
preg_match_all($pattern,$text,$matches);
if($matches){
$sql = "SELECT *
FROM users
WHERE username IN ('" .implode("','",$matches[1]). "')
ORDER BY LENGTH(username) DESC";
$users = $this->getQuery($sql);
foreach($users as $i=>$u){
$text = str_replace("#{$u['username']}",
"<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a> ", $text);
}
$echo $text;
}
The problem is that user links are being overlapped:
<a rel="11327" class="ct-userLink" href="#">
<a rel="21327" class="ct-userLink" href="#">#user</a>1
</a>
How can I avoid links overlapping?
Answer Update
Thanks to the answer picked, this is how my new foreach loop looks like:
foreach($users as $i=>$u){
$text = preg_replace("/#".$u['username']."\b/",
"<a href='#' title='{$u['user_id']}'>#{$u['username']}</a> ", $text);
}
Problem seems to be that some usernames can encompass other usernames. So you replace user1 properly with <a>user1</a>. Then, user matches and replaces with <a><a>user</a>1</a>. My suggestion is to change your string replace to a regex with a word boundary, \b, that is required after the username.
The Twitter widget has JavaScript code to do this. I ported it to PHP in my WordPress plugin. Here's the relevant part:
function format_tweet($tweet) {
// add #reply links
$tweet_text = preg_replace("/\B[#@]([a-zA-Z0-9_]{1,20})/",
"#<a class='atreply' href='http://twitter.com/$1'>$1</a>",
$tweet);
// make other links clickable
$matches = array();
$link_info = preg_match_all("/\b(((https*\:\/\/)|www\.)[^\"\']+?)(([!?,.\)]+)?(\s|$))/",
$tweet_text, $matches, PREG_SET_ORDER);
if ($link_info) {
foreach ($matches as $match) {
$http = preg_match("/w/", $match[2]) ? 'http://' : '';
$tweet_text = str_replace($match[0],
"<a href='" . $http . $match[1] . "'>" . $match[1] . "</a>" . $match[4],
$tweet_text);
}
}
return $tweet_text;
}
instead of parsing for '#user' parse for '#user ' (with space in the end) or ' #user ' to even avoid wrong parsing of email addresses (eg: mailaddress#user.com) maybe ' #user: ' should also be allowed. this will only work, if usernames have no whitespaces...
You can go for a custom str replace function which stops at first replace.. Something like ...
function str_replace_once($needle , $replace , $haystack){
$pos = strpos($haystack, $needle);
if ($pos === false) {
// Nothing found
return $haystack;
}
return substr_replace($haystack, $replace, $pos, strlen($needle));
}
And use it like:
foreach($users as $i=>$u){
$text = str_replace_once("#{$u['username']}",
"<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a> ", $text);
}
You shouldn’t replace one certain user mention at a time but all at once. You could use preg_split to do that:
// split text at mention while retaining user name
$parts = preg_split("/#(\w+)/", $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$n = count($parts);
// $n is always an odd number; 1 means no match found
if ($n > 1) {
// collect user names
$users = array();
for ($i=1; $i<$n; $i+=2) {
$users[$parts[$i]] = '';
}
// get corresponding user information
$sql = "SELECT *
FROM users
WHERE username IN ('" .implode("','", array_keys($users)). "')";
$users = array();
foreach ($this->getQuery($sql) as $user) {
$users[$user['username']] = $user;
}
// replace mentions
for ($i=1; $i<$n; $i+=2) {
$u = $users[$parts[$i]];
$parts[$i] = "<a href='#' class='ct-userLink' rel='{$u['user_id']}'>#{$u['username']}</a>";
}
// put everything back together
$text = implode('', $parts);
}
I like dnl solution of parsing ' #user', but maybe is not suitable for you.
Anyway, did you try to use strip_tags function to remove the anchor tags? That way you have the string without the links, and you can parse it building the links again.
strip_tags
If I have a string that contains a url (for examples sake, we'll call it $url) such as;
$url = "Here is a funny site http://www.tunyurl.com/34934";
How do i remove the URL from the string?
Difficulty is, urls might also show up without the http://, such as ;
$url = "Here is another funny site www.tinyurl.com/55555";
There is no HTML present. How would i start a search if http or www exists, then remove the text/numbers/symbols until the first space?
I re-read the question, here is a function that would work as intended:
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,'http') || (count(explode('.',$u)) > 1)) {
unset($U[$k]);
return cleaner( implode(' ',$U));
}
}
return implode(' ',$U);
}
$url = "Here is another funny site www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);
Edit #2/#3 (I must be bored). Here is a version that verifies there is a TLD within the URL:
function containsTLD($string) {
preg_match(
"/(AC($|\/)|\.AD($|\/)|\.AE($|\/)|\.AERO($|\/)|\.AF($|\/)|\.AG($|\/)|\.AI($|\/)|\.AL($|\/)|\.AM($|\/)|\.AN($|\/)|\.AO($|\/)|\.AQ($|\/)|\.AR($|\/)|\.ARPA($|\/)|\.AS($|\/)|\.ASIA($|\/)|\.AT($|\/)|\.AU($|\/)|\.AW($|\/)|\.AX($|\/)|\.AZ($|\/)|\.BA($|\/)|\.BB($|\/)|\.BD($|\/)|\.BE($|\/)|\.BF($|\/)|\.BG($|\/)|\.BH($|\/)|\.BI($|\/)|\.BIZ($|\/)|\.BJ($|\/)|\.BM($|\/)|\.BN($|\/)|\.BO($|\/)|\.BR($|\/)|\.BS($|\/)|\.BT($|\/)|\.BV($|\/)|\.BW($|\/)|\.BY($|\/)|\.BZ($|\/)|\.CA($|\/)|\.CAT($|\/)|\.CC($|\/)|\.CD($|\/)|\.CF($|\/)|\.CG($|\/)|\.CH($|\/)|\.CI($|\/)|\.CK($|\/)|\.CL($|\/)|\.CM($|\/)|\.CN($|\/)|\.CO($|\/)|\.COM($|\/)|\.COOP($|\/)|\.CR($|\/)|\.CU($|\/)|\.CV($|\/)|\.CX($|\/)|\.CY($|\/)|\.CZ($|\/)|\.DE($|\/)|\.DJ($|\/)|\.DK($|\/)|\.DM($|\/)|\.DO($|\/)|\.DZ($|\/)|\.EC($|\/)|\.EDU($|\/)|\.EE($|\/)|\.EG($|\/)|\.ER($|\/)|\.ES($|\/)|\.ET($|\/)|\.EU($|\/)|\.FI($|\/)|\.FJ($|\/)|\.FK($|\/)|\.FM($|\/)|\.FO($|\/)|\.FR($|\/)|\.GA($|\/)|\.GB($|\/)|\.GD($|\/)|\.GE($|\/)|\.GF($|\/)|\.GG($|\/)|\.GH($|\/)|\.GI($|\/)|\.GL($|\/)|\.GM($|\/)|\.GN($|\/)|\.GOV($|\/)|\.GP($|\/)|\.GQ($|\/)|\.GR($|\/)|\.GS($|\/)|\.GT($|\/)|\.GU($|\/)|\.GW($|\/)|\.GY($|\/)|\.HK($|\/)|\.HM($|\/)|\.HN($|\/)|\.HR($|\/)|\.HT($|\/)|\.HU($|\/)|\.ID($|\/)|\.IE($|\/)|\.IL($|\/)|\.IM($|\/)|\.IN($|\/)|\.INFO($|\/)|\.INT($|\/)|\.IO($|\/)|\.IQ($|\/)|\.IR($|\/)|\.IS($|\/)|\.IT($|\/)|\.JE($|\/)|\.JM($|\/)|\.JO($|\/)|\.JOBS($|\/)|\.JP($|\/)|\.KE($|\/)|\.KG($|\/)|\.KH($|\/)|\.KI($|\/)|\.KM($|\/)|\.KN($|\/)|\.KP($|\/)|\.KR($|\/)|\.KW($|\/)|\.KY($|\/)|\.KZ($|\/)|\.LA($|\/)|\.LB($|\/)|\.LC($|\/)|\.LI($|\/)|\.LK($|\/)|\.LR($|\/)|\.LS($|\/)|\.LT($|\/)|\.LU($|\/)|\.LV($|\/)|\.LY($|\/)|\.MA($|\/)|\.MC($|\/)|\.MD($|\/)|\.ME($|\/)|\.MG($|\/)|\.MH($|\/)|\.MIL($|\/)|\.MK($|\/)|\.ML($|\/)|\.MM($|\/)|\.MN($|\/)|\.MO($|\/)|\.MOBI($|\/)|\.MP($|\/)|\.MQ($|\/)|\.MR($|\/)|\.MS($|\/)|\.MT($|\/)|\.MU($|\/)|\.MUSEUM($|\/)|\.MV($|\/)|\.MW($|\/)|\.MX($|\/)|\.MY($|\/)|\.MZ($|\/)|\.NA($|\/)|\.NAME($|\/)|\.NC($|\/)|\.NE($|\/)|\.NET($|\/)|\.NF($|\/)|\.NG($|\/)|\.NI($|\/)|\.NL($|\/)|\.NO($|\/)|\.NP($|\/)|\.NR($|\/)|\.NU($|\/)|\.NZ($|\/)|\.OM($|\/)|\.ORG($|\/)|\.PA($|\/)|\.PE($|\/)|\.PF($|\/)|\.PG($|\/)|\.PH($|\/)|\.PK($|\/)|\.PL($|\/)|\.PM($|\/)|\.PN($|\/)|\.PR($|\/)|\.PRO($|\/)|\.PS($|\/)|\.PT($|\/)|\.PW($|\/)|\.PY($|\/)|\.QA($|\/)|\.RE($|\/)|\.RO($|\/)|\.RS($|\/)|\.RU($|\/)|\.RW($|\/)|\.SA($|\/)|\.SB($|\/)|\.SC($|\/)|\.SD($|\/)|\.SE($|\/)|\.SG($|\/)|\.SH($|\/)|\.SI($|\/)|\.SJ($|\/)|\.SK($|\/)|\.SL($|\/)|\.SM($|\/)|\.SN($|\/)|\.SO($|\/)|\.SR($|\/)|\.ST($|\/)|\.SU($|\/)|\.SV($|\/)|\.SY($|\/)|\.SZ($|\/)|\.TC($|\/)|\.TD($|\/)|\.TEL($|\/)|\.TF($|\/)|\.TG($|\/)|\.TH($|\/)|\.TJ($|\/)|\.TK($|\/)|\.TL($|\/)|\.TM($|\/)|\.TN($|\/)|\.TO($|\/)|\.TP($|\/)|\.TR($|\/)|\.TRAVEL($|\/)|\.TT($|\/)|\.TV($|\/)|\.TW($|\/)|\.TZ($|\/)|\.UA($|\/)|\.UG($|\/)|\.UK($|\/)|\.US($|\/)|\.UY($|\/)|\.UZ($|\/)|\.VA($|\/)|\.VC($|\/)|\.VE($|\/)|\.VG($|\/)|\.VI($|\/)|\.VN($|\/)|\.VU($|\/)|\.WF($|\/)|\.WS($|\/)|\.XN--0ZWM56D($|\/)|\.XN--11B5BS3A9AJ6G($|\/)|\.XN--80AKHBYKNJ4F($|\/)|\.XN--9T4B11YI5A($|\/)|\.XN--DEBA0AD($|\/)|\.XN--G6W251D($|\/)|\.XN--HGBK6AJ7F53BBA($|\/)|\.XN--HLCJ6AYA9ESC7A($|\/)|\.XN--JXALPDLP($|\/)|\.XN--KGBECHTV($|\/)|\.XN--ZCKZAH($|\/)|\.YE($|\/)|\.YT($|\/)|\.YU($|\/)|\.ZA($|\/)|\.ZM($|\/)|\.ZW)/i",
$string,
$M);
$has_tld = (count($M) > 0) ? true : false;
return $has_tld;
}
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,".")) { //only preg_match if there is a dot
if (containsTLD($u) === true) {
unset($U[$k]);
return cleaner( implode(' ',$U));
}
}
}
return implode(' ',$U);
}
$url = "Here is another funny site badurl.badone somesite.ca/worse.jpg but this badsite.com www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);
returns:
Cleaned: Here is another funny site badurl.badone but this and and
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
Parsing text for URLs is hard and looking for pre-existing, heavily tested code that already does this for you would be better than writing your own code and missing edge cases. For example, I would take a look at the process in Django's urlize, which wraps URLs in anchors. You could port it over to PHP, and--instead of wrapping URLs in an anchor--just delete them from the text.
thanks mike,
update a bit, it return notice error,
'/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i'
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
$url = "Here is a funny site http://www.tunyurl.com/34934";
$replace = 'http www .com .org .net';
$with = '';
$clean_url = clean($url,$replace,$with);
echo $clean_url;
function clean($url,$replace,$with) {
$replace = explode(" ",$replace);
$new_string = '';
$check = explode(" ",$url);
foreach($check AS $key => $value) {
foreach($replace AS $key2 => $value2 ) {
if (-1 < strpos( strtolower($value), strtolower($value2) ) ) {
$value = $with;
break;
}
}
$new_string .= " ".$value;
}
return $new_string;
}
You would need to write a regular expression to extract out the urls.