Related
I'm using FQL to run this query
"SELECT post_id, actor_id, description, created_time, message, type, attachment FROM stream WHERE source_id = $page_id AND type > 0 LIMIT 0,9"
Which returns 10 items with a lot of information that isn't used and wanted some help and guidelines to help strip it down to something like
{
"image" : '...',
"text" : '...',
"username" : '...',
"userurl" : '...',
"userpic" : '...'
}
Can someone give me some tips on reformatting a JSON object?
Thanks
It works for me in a way like this:
https://graph.facebook.com/fql?q=SELECT%20aid%2C%20owner%2C%20name%2C%20object_id%20FROM%20album%20WHERE%20aid%3D%2220531316728_324257%22
Figured it out for myself, created a simple PHP class to hold the variables needed which are then added to an array.
For anyone interested here's the main bit of the code.
Class:
class Item{
public $image;
public $link;
public $text;
public $username;
public $userurl;
public $userpic;
}
Being used:
$feed = json_decode($feed);
$data = array();
foreach ($feed->data as $post){
$item = new Item;
if ($post->attachment->media){
if (isset($post->attachment->media[0]->src)){
$item->image = $post->attachment->media[0]->src;
}else if (isset($post->attachment->media[0]->photo->images[1]->src)){
$item->image = $post->attachment->media[0]->photo->images[1]->src;
}else if (isset($post->attachment->media[0]->src)){
$item->image = $post->attachment->media[0]->src;
}
$item->link = $post->attachment->media[0]->href;
}
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$text = $post->message;
if(preg_match($reg_exUrl, $text, $url)){
$text = preg_replace($reg_exUrl, "".$url[0]." ", $text);
}
$item->text = $text;
$puser = number_format($post->actor_id,0,'','');
$url = "https://graph.facebook.com/$puser?fields=picture,name,link&access_token=$at";
$puser = file_get_contents($url);
$puser = json_decode($puser);
$item->userpic = $puser->picture->data->url;
$item->username = $puser->name;
$item->userurl = $puser->link;
$item->platform = "facebook";
$data[] = $item;
}
$this->response($data, 200);
}
hope this helps anyone else in the same situation.
i wrote simple 3 functions to scrape titles , description and keywords of simple html page
this is the first function to scrape titles
function getPageTitle ($url)
{
$content = $url;
if (eregi("<title>(.*)</title>", $content, $array)) {
$title = $array[1];
return $title;
}
}
and it works fine
and those are 2 functions to scrape description and keywords and those not working
function getPageKeywords($url)
{
$content = $url;
if ( preg_match('/<meta[\s]+[^>]*?name[\s]?=[\s\"\']+keywords[\s\"\']+content[\s]?=[\s\"\']+(.*?)[\"\']+.*?>/i', $content, $array)) {
$keywords = $array[1];
return $keywords;
}
}
function getPageDesc($url)
{
$content = $url;
if ( preg_match('/<meta[\s]+[^>]*?name[\s]?=[\s\"\']+description[\s\"\']+content[\s]?=[\s\"\']+(.*?)[\"\']+.*?>/i', $content, $array)) {
$desc = $array[1];
return $desc;
}
}
i know there may be something wrong with the preg_match line but i really don't know
i tried it so much things but it doesn't work
Why not use get_meta_tags? PHP Documentation Here
<?php
// Assuming the above tags are at www.example.com
$tags = get_meta_tags('http://www.example.com/');
// Notice how the keys are all lowercase now, and
// how . was replaced by _ in the key.
echo $tags['author']; // name
echo $tags['keywords']; // php documentation
echo $tags['description']; // a php manual
echo $tags['geo_position']; // 49.33;-86.59
?>
NOTE You can change the parameter to either a URL, local file or string.
Its better to use php's native DOMDocument to parse HTML then regex, you can also use , tho in this day in age allot of sites dont even add the keywords, description tags no more, so you cant rely on them always being there. But here is how you can do it with DOMDocument:
<?php
$source = file_get_contents('http://php.net');
$dom = new DOMDocument("1.0","UTF-8");
#$dom->loadHTML($source);
$dom->preserveWhiteSpace = false;
//Get Title
$title = $dom->getElementsByTagName('title')->item(0)->nodeValue;
$description = '';
$keywords = '';
foreach($dom->getElementsByTagName('meta') as $metas) {
if($metas->getAttribute('name') =='description'){ $description = $metas->getAttribute('content'); }
if($metas->getAttribute('name') =='keywords'){ $keywords = $metas->getAttribute('content'); }
}
print_r($title);
print_r($description);
print_r($keywords);
?>
I am currently writing a webapp in which some pages are heavily reliant on being able to pull the correct youtube video in - and play it. The youtube URLS are supplied by the users and for this reason will generally come in with variants one of them may look like this:
http://www.youtube.com/watch?v=y40ND8kXDlg
while the other may look like this:
http://www.youtube.com/watch/v/y40ND8kXDlg
Currently I am able to pull the ID from the latter using the code below:
function get_youtube_video_id($video_id)
{
// Did we get a URL?
if ( FALSE !== filter_var( $video_id, FILTER_VALIDATE_URL ) )
{
// http://www.youtube.com/v/abcxyz123
if ( FALSE !== strpos( $video_id, '/v/' ) )
{
list( , $video_id ) = explode( '/v/', $video_id );
}
// http://www.youtube.com/watch?v=abcxyz123
else
{
$video_query = parse_url( $video_id, PHP_URL_QUERY );
parse_str( $video_query, $video_params );
$video_id = $video_params['v'];
}
}
return $video_id;
}
How can I deal with URLS that use the ?v version rather than the /v/ version?
Like this:
$link = "http://www.youtube.com/watch?v=oHg5SJYRHA0";
$video_id = explode("?v=", $link);
$video_id = $video_id[1];
Here is universal solution:
$link = "http://www.youtube.com/watch?v=oHg5SJYRHA0&lololo";
$video_id = explode("?v=", $link); // For videos like http://www.youtube.com/watch?v=...
if (empty($video_id[1]))
$video_id = explode("/v/", $link); // For videos like http://www.youtube.com/watch/v/..
$video_id = explode("&", $video_id[1]); // Deleting any other params
$video_id = $video_id[0];
Or just use this regex:
(\?v=|/v/)([-a-zA-Z0-9]+)
<?php
// Here is a sample of the URLs this regex matches: (there can be more content after the given URL that will be ignored)
// http://youtu.be/dQw4w9WgXcQ
// http://www.youtube.com/embed/dQw4w9WgXcQ
// http://www.youtube.com/watch?v=dQw4w9WgXcQ
// http://www.youtube.com/?v=dQw4w9WgXcQ
// http://www.youtube.com/v/dQw4w9WgXcQ
// http://www.youtube.com/e/dQw4w9WgXcQ
// http://www.youtube.com/user/username#p/u/11/dQw4w9WgXcQ
// http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/0/dQw4w9WgXcQ
// http://www.youtube.com/watch?feature=player_embedded&v=dQw4w9WgXcQ
// http://www.youtube.com/?feature=player_embedded&v=dQw4w9WgXcQ
// It also works on the youtube-nocookie.com URL with the same above options.
// It will also pull the ID from the URL in an embed code (both iframe and object tags)
preg_match('%(?:youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[?&]v=)|youtu\.be/)([^"&?/ ]{11})%i', $url, $match);
$youtube_id = $match[1];
?>
<?php
$your_url='https://www.youtube.com/embed/G_5-SqD2gtA';
function get_youtube_id_from_url($url)
{
if (stristr($url,'youtu.be/'))
{preg_match('/(https:|http:|)(\/\/www\.|\/\/|)(.*?)\/(.{11})/i', $url, $final_ID); return $final_ID[4]; }
else
{#preg_match('/(https:|http:|):(\/\/www\.|\/\/|)(.*?)\/(embed\/|watch.*?v=|)([a-z_A-Z0-9\-]{11})/i', $url, $IDD); return $IDD[5]; }
}
echo get_youtube_id_from_url($your_url)
?>
Try:
function youtubeID($url){
$res = explode("v",$url);
if(isset($res[1])) {
$res1 = explode('&',$res[1]);
if(isset($res1[1])){
$res[1] = $res1[0];
}
$res1 = explode('#',$res[1]);
if(isset($res1[1])){
$res[1] = $res1[0];
}
}
return substr($res[1],1,12);
return false;
}
$url = "http://www.youtube.com/watch/v/y40ND8kXDlg";
echo youtubeID($url1);
Should work for both
Okay, this is a much better answer than my previous:
$link = 'http://www.youtube.com/watch?v=oHg5SJYRHA0&player=normal';
strtok($link, '?');
parse_str(strtok(''));
echo $v;
It's might be good to have this in a function to keep the new variables out of the global scope (unless you want them there, obviously).
This may not be in use still, but there might be other people looking for an answer, so, to get a YouTube ID from a URL.
P.S: This works for all types of URL, I've tested it;
Function getYouTubeID($URL){
$YouTubeCheck = preg_match('![?&]{1}v=([^&]+)!', $URL . '&', $Data);
If($YouTubeCheck){
$VideoID = $Data[1];
}
Return $VideoID;
}
Or just use the preg_match function itself;
If(preg_match('![?&]{1}v=([^&]+)!', $URL . '&', $Data)){
$VideoID = $Data[1];
}
Hope this helps someone :)!
Simplest method I know with YouTube.
function GetYouTubeId($url)
{
preg_match('%(?:youtube(?:-nocookie)?\.com/(?:[^/]+/.+/|(?:v|e(?:mbed)?)/|.*[&]v=)|youtu\.be/)([^"&?/ ]{11})%i', $url, $match);
$youtube_id = $match[1];
return $youtube_id;
}
$parts = explode('=', $link);
// $parts[1] will y40ND8kXDlg
This example works only if there's one '=' in the URL. Ever likely to be more?
i just would search for the last "/" or the last "=". After it you find always the video-id.
preg_match("#([\w\d\-]){11}#is", 'http://www.youtube.com/watch?v=y40ND8kXDlg', $matches);
echo $matches[1];
This is best way to get youtube vedio id , Or any field in url , but you must change index (V) from $ID_youtube['v'] to anything you want.
function getID_youtube($url)
{
parse_str(parse_url($url, PHP_URL_QUERY), $ID_youtube);
return $ID_youtube['v'];
}
<?php
$link = "http://www.youtube.com/watch?v=oHg5SJYRHA0";
$video_id = str_replace('http://www.youtube.com/watch?v=', '', $link);
echo $video_id;
?>
Output:
oHg5SJYRHA0
Source
<?php
$url = "https://www.youtube.com/watch?v=uKW_FPsFiB8&feature=related";
parse_str( parse_url( $url, PHP_URL_QUERY ), $vid );
echo $vid['v'];
?>
Output: uKW_FPsFiB8
This will work for urls like https://www.youtube.com/watch?v=uKW_FPsFiB8&feature=related or https://www.youtube.com/watch?v=vzH8FH1HF3A&feature=relmfu or only https://www.youtube.com/watch?v=uKW_FPsFiB8
All YouTube video ids are 11 characters of length. I wrote Regex based it:
<?php
$url = "https://www.youtube.com/watch?v=gooWdc6kb80";
preg_match('/(?:\/|=)(.{11})(?:$|&|\?)/', $url, $matches);
echo $matches[1];
?>
It can match different YouTube video formats:
// http://youtu.be/dQw4w9WgXcQ
// http://www.youtube.com/embed/dQw4w9WgXcQ
// http://www.youtube.com/watch?v=dQw4w9WgXcQ
// http://www.youtube.com/?v=dQw4w9WgXcQ
// http://www.youtube.com/v/dQw4w9WgXcQ
// http://www.youtube.com/e/dQw4w9WgXcQ
// http://www.youtube.com/user/username#p/u/11/dQw4w9WgXcQ
// http://www.youtube.com/sandalsResorts#p/c/54B8C800269D7C1B/0/dQw4w9WgXcQ
// http://www.youtube.com/watch?feature=player_embedded&v=dQw4w9WgXcQ
// http://www.youtube.com/?feature=player_embedded&v=dQw4w9WgXcQ
// https://www.youtube.com/embed/dQw4w9WgXcQ?feature=oembed
// https://www.youtube.com/embed/dQw4w9WgXcQ?start=16&feature=oembed
The function below is designed to apply rel="nofollow" attributes to all external links and no internal links unless the path matches a predefined root URL defined as $my_folder below.
So given the variables...
$my_folder = 'http://localhost/mytest/go/';
$blog_url = 'http://localhost/mytest';
And the content...
internal
internal cloaked link
external
The end result, after replacement should be...
internal
internal cloaked link
external
Notice that the first link is not altered, since its an internal link.
The link on the second line is also an internal link, but since it matches our $my_folder string, it gets the nofollow too.
The third link is the easiest, since it does not match the blog_url, its obviously an external link.
However, in the script below, ALL of my links are getting nofollow. How can I fix the script to do what I want?
function save_rseo_nofollow($content) {
$my_folder = $rseo['nofollow_folder'];
$blog_url = get_bloginfo('url');
preg_match_all('~<a.*>~isU',$content["post_content"],$matches);
for ( $i = 0; $i <= sizeof($matches[0]); $i++){
if ( !preg_match( '~nofollow~is',$matches[0][$i])
&& (preg_match('~' . $my_folder . '~', $matches[0][$i])
|| !preg_match( '~'.$blog_url.'~',$matches[0][$i]))){
$result = trim($matches[0][$i],">");
$result .= ' rel="nofollow">';
$content["post_content"] = str_replace($matches[0][$i], $result, $content["post_content"]);
}
}
return $content;
}
Here is the DOMDocument solution...
$str = 'internal
internal cloaked link
external
external
external
external
';
$dom = new DOMDocument();
$dom->preserveWhitespace = FALSE;
$dom->loadHTML($str);
$a = $dom->getElementsByTagName('a');
$host = strtok($_SERVER['HTTP_HOST'], ':');
foreach($a as $anchor) {
$href = $anchor->attributes->getNamedItem('href')->nodeValue;
if (preg_match('/^https?:\/\/' . preg_quote($host, '/') . '/', $href)) {
continue;
}
$noFollowRel = 'nofollow';
$oldRelAtt = $anchor->attributes->getNamedItem('rel');
if ($oldRelAtt == NULL) {
$newRel = $noFollowRel;
} else {
$oldRel = $oldRelAtt->nodeValue;
$oldRel = explode(' ', $oldRel);
if (in_array($noFollowRel, $oldRel)) {
continue;
}
$oldRel[] = $noFollowRel;
$newRel = implode($oldRel, ' ');
}
$newRelAtt = $dom->createAttribute('rel');
$noFollowNode = $dom->createTextNode($newRel);
$newRelAtt->appendChild($noFollowNode);
$anchor->appendChild($newRelAtt);
}
var_dump($dom->saveHTML());
Output
string(509) "<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
internal
internal cloaked link
external
external
external
external
</body></html>
"
Try to make it more readable first, and only afterwards make your if rules more complex:
function save_rseo_nofollow($content) {
$content["post_content"] =
preg_replace_callback('~<(a\s[^>]+)>~isU', "cb2", $content["post_content"]);
return $content;
}
function cb2($match) {
list($original, $tag) = $match; // regex match groups
$my_folder = "/hostgator"; // re-add quirky config here
$blog_url = "http://localhost/";
if (strpos($tag, "nofollow")) {
return $original;
}
elseif (strpos($tag, $blog_url) && (!$my_folder || !strpos($tag, $my_folder))) {
return $original;
}
else {
return "<$tag rel='nofollow'>";
}
}
Gives following output:
[post_content] =>
internal
<a href="http://localhost/mytest/go/hostgator" rel=nofollow>internal cloaked link</a>
<a href="http://cnn.com" rel=nofollow>external</a>
The problem in your original code might have been $rseo which wasn't declared anywhere.
Try this one (PHP 5.3+):
skip selected address
allow manually set rel parameter
and code:
function nofollow($html, $skip = null) {
return preg_replace_callback(
"#(<a[^>]+?)>#is", function ($mach) use ($skip) {
return (
!($skip && strpos($mach[1], $skip) !== false) &&
strpos($mach[1], 'rel=') === false
) ? $mach[1] . ' rel="nofollow">' : $mach[0];
},
$html
);
}
Examples:
echo nofollow('something');
// will be same because it's already contains rel parameter
echo nofollow('something'); // ad
// add rel="nofollow" parameter to anchor
echo nofollow('something', 'localhost');
// skip this link as internall link
Using regular expressions to do this job properly would be quite complicated. It would be easier to use an actual parser, such as the one from the DOM extension. DOM isn't very beginner-friendly, so what you can do is load the HTML with DOM then run the modifications with SimpleXML. They're backed by the same library, so it's easy to use one with the other.
Here's how it can look like:
$my_folder = 'http://localhost/mytest/go/';
$blog_url = 'http://localhost/mytest';
$html = '<html><body>
internal
internal cloaked link
external
</body></html>';
$dom = new DOMDocument;
$dom->loadHTML($html);
$sxe = simplexml_import_dom($dom);
// grab all <a> nodes with an href attribute
foreach ($sxe->xpath('//a[#href]') as $a)
{
if (substr($a['href'], 0, strlen($blog_url)) === $blog_url
&& substr($a['href'], 0, strlen($my_folder)) !== $my_folder)
{
// skip all links that start with the URL in $blog_url, as long as they
// don't start with the URL from $my_folder;
continue;
}
if (empty($a['rel']))
{
$a['rel'] = 'nofollow';
}
else
{
$a['rel'] .= ' nofollow';
}
}
$new_html = $dom->saveHTML();
echo $new_html;
As you can see, it's really short and simple. Depending on your needs, you may want to use preg_match() in place of the strpos() stuff, for example:
// change the regexp to your own rules, here we match everything under
// "http://localhost/mytest/" as long as it's not followed by "go"
if (preg_match('#^http://localhost/mytest/(?!go)#', $a['href']))
{
continue;
}
Note
I missed the last code block in the OP when I first read the question. The code I posted (and basically any solution based on DOM) is better suited at processing a whole page rather than a HTML block. Otherwise, DOM will attempt to "fix" your HTML and may add a <body> tag, a DOCTYPE, etc...
Thanks #alex for your nice solution. But, I was having a problem with Japanese text. I have fixed it as following way. Also, this code can skip multiple domains with the $whiteList array.
public function addRelNoFollow($html, $whiteList = [])
{
$dom = new \DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
$a = $dom->getElementsByTagName('a');
/** #var \DOMElement $anchor */
foreach ($a as $anchor) {
$href = $anchor->attributes->getNamedItem('href')->nodeValue;
$domain = parse_url($href, PHP_URL_HOST);
// Skip whiteList domains
if (in_array($domain, $whiteList, true)) {
continue;
}
// Check & get existing rel attribute values
$noFollow = 'nofollow';
$rel = $anchor->attributes->getNamedItem('rel');
if ($rel) {
$values = explode(' ', $rel->nodeValue);
if (in_array($noFollow, $values, true)) {
continue;
}
$values[] = $noFollow;
$newValue = implode($values, ' ');
} else {
$newValue = $noFollow;
}
// Create new rel attribute
$rel = $dom->createAttribute('rel');
$node = $dom->createTextNode($newValue);
$rel->appendChild($node);
$anchor->appendChild($rel);
}
// There is a problem with saveHTML() and saveXML(), both of them do not work correctly in Unix.
// They do not save UTF-8 characters correctly when used in Unix, but they work in Windows.
// So we need to do as follows. #see https://stackoverflow.com/a/20675396/1710782
return $dom->saveHTML($dom->documentElement);
}
<?
$str='internal
internal cloaked link
external';
function test($x){
if (preg_match('#localhost/mytest/(?!go/)#i',$x[0])>0) return $x[0];
return 'rel="nofollow" '.$x[0];
}
echo preg_replace_callback('/href=[\'"][^\'"]+/i', 'test', $str);
?>
Here is the another solution which has whitelist option and add tagret Blank attribute.
And also it check if there already a rel attribute before add a new one.
function Add_Nofollow_Attr($Content, $Whitelist = [], $Add_Target_Blank = true)
{
$Whitelist[] = $_SERVER['HTTP_HOST'];
foreach ($Whitelist as $Key => $Link)
{
$Host = preg_replace('#^https?://#', '', $Link);
$Host = "https?://". preg_quote($Host, '/');
$Whitelist[$Key] = $Host;
}
if(preg_match_all("/<a .*?>/", $Content, $matches, PREG_SET_ORDER))
{
foreach ($matches as $Anchor_Tag)
{
$IS_Rel_Exist = $IS_Follow_Exist = $IS_Target_Blank_Exist = $Is_Valid_Tag = false;
if(preg_match_all("/(\w+)\s*=\s*['|\"](.*?)['|\"]/",$Anchor_Tag[0],$All_matches2))
{
foreach ($All_matches2[1] as $Key => $Attr_Name)
{
if($Attr_Name == 'href')
{
$Is_Valid_Tag = true;
$Url = $All_matches2[2][$Key];
// bypass #.. or internal links like "/"
if(preg_match('/^\s*[#|\/].*/', $Url))
{
continue 2;
}
foreach ($Whitelist as $Link)
{
if (preg_match("#$Link#", $Url)) {
continue 3;
}
}
}
else if($Attr_Name == 'rel')
{
$IS_Rel_Exist = true;
$Rel = $All_matches2[2][$Key];
preg_match("/[n|d]ofollow/", $Rel, $match, PREG_OFFSET_CAPTURE);
if( count($match) > 0 )
{
$IS_Follow_Exist = true;
}
else
{
$New_Rel = 'rel="'. $Rel . ' nofollow"';
}
}
else if($Attr_Name == 'target')
{
$IS_Target_Blank_Exist = true;
}
}
}
$New_Anchor_Tag = $Anchor_Tag;
if(!$IS_Rel_Exist)
{
$New_Anchor_Tag = str_replace(">",' rel="nofollow">',$Anchor_Tag);
}
else if(!$IS_Follow_Exist)
{
$New_Anchor_Tag = preg_replace("/rel=[\"|'].*?[\"|']/",$New_Rel,$Anchor_Tag);
}
if($Add_Target_Blank && !$IS_Target_Blank_Exist)
{
$New_Anchor_Tag = str_replace(">",' target="_blank">',$New_Anchor_Tag);
}
$Content = str_replace($Anchor_Tag,$New_Anchor_Tag,$Content);
}
}
return $Content;
}
To use it:
$Page_Content = 'internal
internal
google
example
stackoverflow';
$Whitelist = ["http://yoursite.com","http://localhost"];
echo Add_Nofollow_Attr($Page_Content,$Whitelist,true);
WordPress decision:
function replace__method($match) {
list($original, $tag) = $match; // regex match groups
$my_folder = "/articles"; // re-add quirky config here
$blog_url = 'https://'.$_SERVER['SERVER_NAME'];
if (strpos($tag, "nofollow")) {
return $original;
}
elseif (strpos($tag, $blog_url) && (!$my_folder || !strpos($tag, $my_folder))) {
return $original;
}
else {
return "<$tag rel='nofollow'>";
}
}
add_filter( 'the_content', 'add_nofollow_to_external_links', 1 );
function add_nofollow_to_external_links( $content ) {
$content = preg_replace_callback('~<(a\s[^>]+)>~isU', "replace__method", $content);
return $content;
}
a good script which allows to add nofollow automatically and to keep the other attributes
function nofollow(string $html, string $baseUrl = null) {
return preg_replace_callback(
'#<a([^>]*)>(.+)</a>#isU', function ($mach) use ($baseUrl) {
list ($a, $attr, $text) = $mach;
if (preg_match('#href=["\']([^"\']*)["\']#', $attr, $url)) {
$url = $url[1];
if (is_null($baseUrl) || !str_starts_with($url, $baseUrl)) {
if (preg_match('#rel=["\']([^"\']*)["\']#', $attr, $rel)) {
$relAttr = $rel[0];
$rel = $rel[1];
}
$rel = 'rel="' . ($rel ? (strpos($rel, 'nofollow') ? $rel : $rel . ' nofollow') : 'nofollow') . '"';
$attr = isset($relAttr) ? str_replace($relAttr, $rel, $attr) : $attr . ' ' . $rel;
$a = '<a ' . $attr . '>' . $text . '</a>';
}
}
return $a;
},
$html
);
}
I am wondering if there is a simple snippet which converts links of any kind:
http://www.cnn.com to http://www.cnn.com
cnn.com to cnn.com
www.cnn.com to www.cnn.com
abc#def.com to to mailto:abc#def.com
I do not want to use any PHP5 specific library.
Thank you for your time.
UPDATE I have updated the above text to what i want to convert it to. Please note that the href tag and the text are different for case 2 and 3.
UPDATE2 Hows does gmail chat do it? Theirs is pretty smart and works only for real domains names. e.g. a.ly works but a.cb does not work.
yes ,
http://www.gidforums.com/t-1816.html
<?php
/**
NAME : autolink()
VERSION : 1.0
AUTHOR : J de Silva
DESCRIPTION : returns VOID; handles converting
URLs into clickable links off a string.
TYPE : functions
======================================*/
function autolink( &$text, $target='_blank', $nofollow=true )
{
// grab anything that looks like a URL...
$urls = _autolink_find_URLS( $text );
if( !empty($urls) ) // i.e. there were some URLS found in the text
{
array_walk( $urls, '_autolink_create_html_tags', array('target'=>$target, 'nofollow'=>$nofollow) );
$text = strtr( $text, $urls );
}
}
function _autolink_find_URLS( $text )
{
// build the patterns
$scheme = '(http:\/\/|https:\/\/)';
$www = 'www\.';
$ip = '\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}';
$subdomain = '[-a-z0-9_]+\.';
$name = '[a-z][-a-z0-9]+\.';
$tld = '[a-z]+(\.[a-z]{2,2})?';
$the_rest = '\/?[a-z0-9._\/~#&=;%+?-]+[a-z0-9\/#=?]{1,1}';
$pattern = "$scheme?(?(1)($ip|($subdomain)?$name$tld)|($www$name$tld))$the_rest";
$pattern = '/'.$pattern.'/is';
$c = preg_match_all( $pattern, $text, $m );
unset( $text, $scheme, $www, $ip, $subdomain, $name, $tld, $the_rest, $pattern );
if( $c )
{
return( array_flip($m[0]) );
}
return( array() );
}
function _autolink_create_html_tags( &$value, $key, $other=null )
{
$target = $nofollow = null;
if( is_array($other) )
{
$target = ( $other['target'] ? " target=\"$other[target]\"" : null );
// see: http://www.google.com/googleblog/2005/01/preventing-comment-spam.html
$nofollow = ( $other['nofollow'] ? ' rel="nofollow"' : null );
}
$value = "<a href=\"$key\"$target$nofollow>$key</a>";
}
?>
Try this out. (for links not email)
$newTweet = preg_replace('!http://([a-zA-Z0-9./-]+[a-zA-Z0-9/-])!i', '\\0', $tweet->text);
I know is 5 years late, however I needed a similar solution and the best answer I got was from the user - erwan-dupeux-maire
Answer
I write this function. It replaces all the links in a string. Links can be in the following formats :
www.example.com
http://example.com
https://example.com
example.fr
The second argument is the target for the link ('_blank', '_top'... can be set to false). Hope it helps...
public static function makeLinks($str, $target='_blank')
{
if ($target)
{
$target = ' target="'.$target.'"';
}
else
{
$target = '';
}
// find and replace link
$str = preg_replace('#((https?://)?([-\w]+\.[-\w\.]+)+\w(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)*)#', '<a href="$1" '.$target.'>$1</a>', $str);
// add "http://" if not set
$str = preg_replace('/<a\s[^>]*href\s*=\s*"((?!https?:\/\/)[^"]*)"[^>]*>/i', '<a href="http://$1" '.$target.'>', $str);
return $str;
}
Here's the email snippet:
$email = "abc#def.com";
$pos = strrpos($email, "#");
if (!$pos === false) {
// This is an email address!
$email .= "mailto:" . $email;
}
What exactly are you looking to do with the links? strip the www or http? or add http://www to any link if required?