Create link from Plaintext [duplicate] - php

Lets say that $content is the content of a textarea
/*Convert the http/https to link */
$content = preg_replace('!((https://|http://)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="$1">$1</a> ', nl2br($_POST['helpcontent'])." ");
/*Convert the www. to link prepending http://*/
$content = preg_replace('!((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$1">$1</a> ', $content." ");
This was working ok for links, but realised that it was breaking the markup when an image is within the text...
I am trying like this now:
$content = preg_replace('!\s((https?://|http://)+[a-z0-9_./?=&-]+)!i', ' $1 ', nl2br($_POST['content'])." ");
$content = preg_replace('!((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$1">$1</a> ', $content." ");
As is the images are respected, but the problem is that url's with http:// or https:// format won't be converted now..:
google.com -> Not converted (as expected)
www.google.com -> Well Converted
http://google.com -> Not converted (unexpected)
https://google.com -> Not converted (unexpected)
What am I missing?
-EDIT-
Current almost working solution:
$content = preg_replace('!(\s|^)((https?://)+[a-z0-9_./?=&-]+)!i', ' $2 ', nl2br($_POST['content'])." ");
$content = preg_replace('!(\s|^)((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$2" target="_blank">$2</a> ', $content." ");
The thing here is that if this is the input:
www.funcook.com http://www.funcook.com https://www.funcook.com
funcook.com http://funcook.com https://funcook.com
All the urls I want (all, except name.domain) are converted as expected, but this is the output
www.funcook.com http://www.funcook.com https://www.funcook.com ;
funcook.com http://funcook.com https://funcook.com
Note an ; is inserted, any idea why?

try this:
preg_replace('!(\s|^)((https?://|www\.)+[a-z0-9_./?=&-]+)!i', ' $2 ',$text);
It will pick up links beginning with http:// or with www.
Example

You can't at 100%. Becuase there may be links such as stackoverflow.com which do not have www..
If you're only targeting those links:
!(www\.\S+)!i
Should work well enough for you.
EDIT: As for your newest question, as to why http links don't get converted but https do, Your first pattern only searches for https://, or http://. which isn't the case. Simplify it by replacing:
(https://|http://\.)
With
(https?://)
Which will make the s optional.

Another method to go about adding hyperlinks is that you could take the text that you want to parse for links, and explode it into an array. Then loop through it using foreach (very fast function - http://www.phpbench.com/) and change anything that starts with http://, or https://, or www., or ends with .com/.org/etc into a link.
I'm thinking maybe something like this:
$userTextArray = explode(" ",$userText);
foreach( $userTextArray as &$word){
//if statements to test if if it starts with www. or ends with .com or whatever else
//change $word so that it is a link
}
Your changes will be reflected in the array since you had the "&" before $userText in your foreach statement.
Now just implode the array back into a string and you're good to go.
This made sense in my head... But I'm not 100% sure that this is what you're looking for

I had similar problem. Here is function which helped me. Maybe it will fit your needs to:
function clHost($Address) {
$parseUrl = parse_url(trim($Address));
return str_replace ("www.","",trim(trim($parseUrl[host] ? $parseUrl[host].$parseUrl[path] : $parseUrl[path]),'/'));
}
This function will return domain without protocol and "www", so you can add them yourself later.
For example:
$url = "http://www.". clHost($link);
I did it like that, because I couldn't find good regexp.

\s((https?://|www.)+[a-z0-9_./?=&-]+)
The problem is that your starting \s is forcing the match to start with a space, so, if you don't have that starting space your match fails. The reg exp is fine (without the \s), but to avoid replacing the images you need to add something to avoid matching them.
If the images are pure html use this:
(?<!src=")((https?://|www.)+[a-z0-9_./?=&-]+)
That will look for src=" before the url, to ignore it.
If you use another mark up, tell me and I'll try to find another way to avoid the images.

Related

PHP: replace relative top URL "../" with absolute domain URL

I want to convert relative URLs that starts with ../stuff/more.php to http://www.example.com/stuff/more.php in my RSS feed.
I used this PHP code to do so is the following:
$content = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"'>]+)#", '$1http://www.example.com/$2$3', $content);
The result is wrong thought, it returns the URL like this
http://www.example.com/../stuff/more.php
Notice the ../ part hasn't been removed, please help!
So Basically..
This what I have: ../stuff/more.php
This is what I get (after running the code above): http://www.example.com/../stuff/more.php
This what I WANT: http://www.example.com/stuff/more.php
Adding (\.|\.\.|\/)* should work.
$content = preg_replace("#(<\s*a\s+[^>]href\s=\s*[\"'])(?!http)(../|../|/)*([^\"'>]+)([\"'>]+)#", '$1http://www.example.com/$3$4', $content);
Also, note $2$3 has been changed to $3$4
Edit:
Reduced to one alternative:
$content = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)(\.\.\/)*([^\"'>]+)([\"'>]+)#", '$1http://www.example.com/$3$4', $content);
Why don't you just replace the first 2 dots with the domain?
$result = str_replace('..', 'http://www.example.com', $contet, 1);
Use $_SERVER[HTTP_HOST] $_SERVER[REQUEST_URI] is the global variable in PHP to get the absolute url.
Well, I'll start looking at the regex. Most of it looks good (in fact, you've got a good enough regex here I'm a little surprised you're having trouble otherwise!) but the end is a bit weird -- better like this:
#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"']>)#
(Technically it would be better to capture the starting quote and make sure it's a matching ending quote, but chances are you won't have any problems there.
To remove the ../ I would do it apart from regex entirely:
foreach (array("<a href=\"http://../foo/bar\">",
"<a href=\"../foo/bar\">") as $content) {
echo "A content=$content<br />\n";
########## copy from here down to...
if (preg_match("#(<\s*a\s+[^>]*?href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"']>)#", $content, $m)) {
echo "m=<pre>".print_r($m,true)."</pre><br />\n";
if (substr($m[2], 0, 3) == '../')
$m[2] = substr($m[2], 3);
$content = $m[1].'http://www.example.com/'.$m[2].$m[3];
}
######### copy from above down to HERE
echo "B content=$content<br />\n";
}
(I included a mini-test suite around what you're looking for - you will need to take just the marked lines inside for your code.)
I found the solution thanks to everyone who helped me on this.
Here's the code I used:
$content = preg_replace("#(<a href=\"\.\.\/)#", '<a href="http://www.example.com/', $content);
it searches for <a href="../ and replace it with http://www.example.com/ it's not general but this works for me.

warp string url into html link tag

so i trying to replace url(like www.google.com) into html link tag like this
return preg_replace('"\b(www.\S+)"', '$1', $url);
this is modified preg_replace from somewhere else and i still don't fully understand how it works....
so if i using this i will get
<a href"www.google.com" style="color:white;">www.google.com</a>
and this mean it will look for www.google.com in my web directory and not opening www.google.com website
what i trying to achieve is
<a href"http://www.google.com" style="color:white;">google</a>
so how to achieve something like that? and it will be better if when i type
http://www.google.com
then it will also replace
http://www.
too....
if my configuration for current is just replace www. but if i type with http:// then it will only warp from www.google.com and leaving http://
edit:
so i want a flexibel one, so if i type www.google.com or http://www.google.com it will catch it and turn it into link tag....
and also if i type like https://stackoverflow.com/posts/26154124/ it will only show like
<a href"https://stackoverflow.com/posts/26154124/" style="color:white;">stackoverflow</a>
2nd edit : i sorry, i forgot that say that i also want to keep other text within the link.. so it will be like i'm typing "here is a big news in www.xxxxx.com" and when i display it/ it will be "here is a big news in xxxxx" where xxxxx is gonna be clickable links...
and i trying to do this because i'm gonna use it for my website announcement part...
can we do something like that?
update:
i have thinking about some idea, how about first we do preg_replace to catch http:// and remove it, after that we catch www. to warp it into html link tag and also catch www. and everything afer .com/.co/or .whatever and remove it all until the next space
but i don't know how to use preg_replace since i have no idea how it works
This is about the closest I could come up with, in order to do what you want to achieve.
There are others ways I'm sure, but you can give the following a try.
You could have the following inside a text file: (you can have 1 or more lines)
bookmark.txt (in the exact format shown) - You don't need to put the http:// it's automatic.
Sidenote: You can use google.com or www.google.com - Only the word Google will appear as the link, but the full href will appear in the HTML source and be a valid URL.
Google google.com
Yahoo yahoo.com
stackoverflow stackoverflow.com/posts/26154124
Then use the following PHP:
<?php
$file = "bookmark.txt";
$lines = file($file);
foreach($lines as $line)
{
$piece=explode(" ",$line);
$link=trim($piece[1]);
$text=trim($piece[0]);
echo ''.$text.'<br>';
}
Notes: The " " in $piece=explode(" ",$line); represents a space being the seperator.
You can change it to $piece=explode("|",$line); then changing the text file to:
Google|google.com
Yahoo|yahoo.com
stackoverflow|stackoverflow.com/posts/26154124
should you decide to use something like Google Search
I.e.:
Google Search|google.com
Yahoo|yahoo.com
stackoverflow link|stackoverflow.com/posts/26154124
You can also try:
$content = "http://stackoverflow.com/posts/26154124";
$content = preg_replace('$(https?://[a-z0-9_./?=&#-]+)(?![^<>]*>)$i', ' StackOverflow ', $content." ");
echo $content;
But at this point and reading your question over and over including your edit, I am not entirely sure that is what you want, and stands at being very broad and beyond the scope of my knowledge with PHP.
You mean this?
$text = ' This is a text with the link of google.com. And another phrase with google.com url.';
echo str_replace('google.com','<a href="http://google.com" target=_blank>google</a>',$text);
you were always right but just a simple typing mistake was done
<a href"http://stackoverflow.com/posts/26154124/" style="color:white;">stackoverflow</a>
You missed the '=' equal sign after href. please try this instead
stackoverflow
hope it helps...

Show another site in my page and change all links throught mine (like proxy)

I want to make like a proxy page (not for proxy at all) and as i knew i need to change all URLS SRC LINK and so on to others - for styles and images grab from right play, and urls goto throught my page going to $_GET["url"] and then to give me next page.
But iv tied to preg_replace() each element, also im not so good with it, and if on one website it works, on another i cant see CSS for example...
The first question is there are any PHP classes or just scripts to make it easy? (I was trying to google hours)
And if not help me with the following code :
<?php
$url = $_GET["url"];
$text = file_get_contents($url);
$data = parse_url($url);
$url=$data['scheme'].'://'.$data['host'];
$text = preg_replace('|<iframe [^>]*[^>]*|', '', $text);
$text = preg_replace('/<a(.*?)href="([^"]*)"(.*?)>/','<a $1 href="http://my.site/?url='.$url.'$2" $3>',$text);
$text = preg_replace('/<link(.*?)href="(?!http:\/\/)([^"]+)"(.*?)/', "<link $1 href=\"".$url."/\\2\"$3", $text);
$text = preg_replace('/src="(?!http:\/\/)([^"]+)"/', "src=\"".$url."/\\1\"", $text);
$text = preg_replace('/background:url\(([^"]*)\)/',"background:url(".$url."$1)", $text);
echo $text;
?>
Replacing with "src" №4 i need to denied replace when starts from double slash, because it could starts like 'src="//somethingdomain"' and not need to replace them.
Also i need to ignore replace №2 when href is going to the same domain, or it looks like need.site/news.need.site/324244
And is it possible to pass action in form throught my script? For example google search query.
And one small problem one web site is openning corrent some times before, but after iv open it hundreds times by this script in getting unknown symbols (without any divs body etc...) ��S�n�#�� i was trying to encode to UTF-8 ANSI but symbol just changing,
maybe they ban me ? oO
function link_replace($url,$myurl) {
$content = file_get_contents($url);
$content = preg_replace('#href="(http)(.*?)"#is', 'href="'.$myurl.'?url=$1$2"', $content);
$content = preg_replace('#href="([^http])(.*?)"#is', 'href="'.$myurl.'?url='.$url.'$1$2"', $content);
return $content;
}
echo link_replace($url,$myurl);
I'm not absolutely sure but I guess the result is just compressed e.g. with gzip try removing the accepted encoding headers while proxying the request.

Convert urls from text to links even if no protocol

Lets say that $content is the content of a textarea
/*Convert the http/https to link */
$content = preg_replace('!((https://|http://)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="$1">$1</a> ', nl2br($_POST['helpcontent'])." ");
/*Convert the www. to link prepending http://*/
$content = preg_replace('!((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$1">$1</a> ', $content." ");
This was working ok for links, but realised that it was breaking the markup when an image is within the text...
I am trying like this now:
$content = preg_replace('!\s((https?://|http://)+[a-z0-9_./?=&-]+)!i', ' $1 ', nl2br($_POST['content'])." ");
$content = preg_replace('!((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$1">$1</a> ', $content." ");
As is the images are respected, but the problem is that url's with http:// or https:// format won't be converted now..:
google.com -> Not converted (as expected)
www.google.com -> Well Converted
http://google.com -> Not converted (unexpected)
https://google.com -> Not converted (unexpected)
What am I missing?
-EDIT-
Current almost working solution:
$content = preg_replace('!(\s|^)((https?://)+[a-z0-9_./?=&-]+)!i', ' $2 ', nl2br($_POST['content'])." ");
$content = preg_replace('!(\s|^)((www\.)+[a-z0-9_./?=&-]+)!i', '<a target="_blank" href="http://$2" target="_blank">$2</a> ', $content." ");
The thing here is that if this is the input:
www.funcook.com http://www.funcook.com https://www.funcook.com
funcook.com http://funcook.com https://funcook.com
All the urls I want (all, except name.domain) are converted as expected, but this is the output
www.funcook.com http://www.funcook.com https://www.funcook.com ;
funcook.com http://funcook.com https://funcook.com
Note an ; is inserted, any idea why?
try this:
preg_replace('!(\s|^)((https?://|www\.)+[a-z0-9_./?=&-]+)!i', ' $2 ',$text);
It will pick up links beginning with http:// or with www.
Example
You can't at 100%. Becuase there may be links such as stackoverflow.com which do not have www..
If you're only targeting those links:
!(www\.\S+)!i
Should work well enough for you.
EDIT: As for your newest question, as to why http links don't get converted but https do, Your first pattern only searches for https://, or http://. which isn't the case. Simplify it by replacing:
(https://|http://\.)
With
(https?://)
Which will make the s optional.
Another method to go about adding hyperlinks is that you could take the text that you want to parse for links, and explode it into an array. Then loop through it using foreach (very fast function - http://www.phpbench.com/) and change anything that starts with http://, or https://, or www., or ends with .com/.org/etc into a link.
I'm thinking maybe something like this:
$userTextArray = explode(" ",$userText);
foreach( $userTextArray as &$word){
//if statements to test if if it starts with www. or ends with .com or whatever else
//change $word so that it is a link
}
Your changes will be reflected in the array since you had the "&" before $userText in your foreach statement.
Now just implode the array back into a string and you're good to go.
This made sense in my head... But I'm not 100% sure that this is what you're looking for
I had similar problem. Here is function which helped me. Maybe it will fit your needs to:
function clHost($Address) {
$parseUrl = parse_url(trim($Address));
return str_replace ("www.","",trim(trim($parseUrl[host] ? $parseUrl[host].$parseUrl[path] : $parseUrl[path]),'/'));
}
This function will return domain without protocol and "www", so you can add them yourself later.
For example:
$url = "http://www.". clHost($link);
I did it like that, because I couldn't find good regexp.
\s((https?://|www.)+[a-z0-9_./?=&-]+)
The problem is that your starting \s is forcing the match to start with a space, so, if you don't have that starting space your match fails. The reg exp is fine (without the \s), but to avoid replacing the images you need to add something to avoid matching them.
If the images are pure html use this:
(?<!src=")((https?://|www.)+[a-z0-9_./?=&-]+)
That will look for src=" before the url, to ignore it.
If you use another mark up, tell me and I'll try to find another way to avoid the images.

PHP: finding, replacing, shortening, and prettifying user links with <a> tags, ellipses, and link icons

When a user enters a URL, e.g. http://www.google.com, I would like to be able to parse that text using PHP, find any links, and replace them with <a> tags that include the original URL as an HREF.
In other words, http://www.google.com will become
http://www.google.com
I'd like to be able to do this for all URLs of these forms (with .com interchangeable with any TLD):
http://www.google.com
www.google.com
google.com
docs.google.com
What's the most performant way to do this? I could try writing some really fancy regex, but I doubt that's the best method available to me.
For bonus points, I'd also like to prepend http:// to any URL lacking it, and strip the display text itself down to something of the form http://www.google.com/reallyLongL... and display an external link icon afterwards.
Trying to find links in the format domain.com is going to be a pain in the butt. It would require keeping track of all TLDs and using them in the search.if you didnt the end of the last sentence i typed and the beginning of this sentence would be a link to http://search.if. Even if you did .in is a valid TLD and a common word.
I'd recommend telling your users they have to begin links with www. or http:// then write a simple regex to capture them and add the links.
www.google.com
This is not a URL, it's a hostname. It's generally not a good idea to start marking up bare hostnames in arbitrary text, because in the general case any word or sequence of dot-separated words is a perfectly valid hostname. That means you up with horrible hacks like looking for leading www. (and you'll get questions like “why can I link to www.stackoverflow.com but not stackoverflow.com?”) or trailing TLDs (which gets more and more impractical as more new TLDs are introduced; “why can I like to ncm.com but not ncm.museum?”), and you'll often mark up things that aren't supposed to be links.
I could try writing some really fancy regex
Well I can't see how you'd do it without regex.
The trick is coping with markup. If you can have <, & and " characters in the input, you mustn't let them into HTML output. If your input is plain text, you can do that by calling htmlspecialchars() before applying a simple replacement on a pattern like that in nico's answer.
(If the input already contains markup, you've got problems and you'd probably need an HTML parser to determine which bits are markup to avoid adding more markup inside of. Similarly, if you're doing more processing after this, inserting more tags, those steps are may have the same difficulty. In ‘bbcode’-like languages this often leads to bugs and security problems.)
Another problem is trailing punctuation. It's common for people to put a full stop, comma, close bracket, exclamation mark etc after a link, which aren't supposed to be part of the link but which are actually valid characters. It's useful to strip these off and not put them in the link. But then you break Wiki links that end in ), so maybe you want to not treat ) as a trailing character if there's a ( in the link, or something like that. This sort of thing can't be done in a simple regex replace, but you can in a replacement callback function.
HTML Purifier has a built-in linkify function to save you all the headaches.
It's other features are also simply too useful to pass up if you're dealing with any kind of user input that you also have to display.
Not so fancy regexps that should work
/\b(https?:\/\/[^\s+\"\<\>]+)/ig
/\b(www.[^\s+\"\<\>]+)/ig
Note that the last two would be impossible to do correctly as you cannot distinguish google.com from something like this.Where I finish one sentence and don't put a space after the full stop.
As for shortening the URLs, having your URL in $url:
if (strlen($url) > 20) // Or whatever length you like
{
$shortURL = substr($url, 0, 20)."…";
}
else
{
$shortURL = $url;
}
echo '<a href="'.$url.'" >'.$shortURL.'</a>';
From http://www.exorithm.com/algorithm/view/markup_urls
function markup_urls ($text)
{
// split the text into words
$words = preg_split('/([\s\n\r]+)/', $text, -1, PREG_SPLIT_DELIM_CAPTURE);
$text = "";
// iterate through the words
foreach($words as $word) {
// chopword = the portion of the word that will be replaced
$chopword = $word;
$chopword = preg_replace('/^[^A-Za-z0-9]*/', '', $chopword);
if ($chopword <> '') {
// linkword = the text that will replace chopword in the word
$linkword='';
// does it start with http://abc. ?
if (preg_match('/^(http:\/\/)[a-zA-Z0-9_]{2,}.*/', $chopword)) {
$chopword = preg_replace('/[^A-Za-z0-9\/]*$/', '', $chopword);
$linkword = ''.$chopword.'';
// does it equal abc.def.ghi ?
} else if (preg_match('/^[a-zA-Z]{2,}\.([a-zA-Z0-9_]+\.)+[a-zA-Z]{2,}(\/.*)?/', $chopword)) {
$chopword = preg_replace('/[^A-Za-z0-9\/]*$/', '', $chopword);
$linkword = ''.$chopword.'';
// does it start with abc#def.ghi ?
} else if (preg_match('/^[a-zA-Z0-9_\.]+\#([a-zA-Z0-9_]{2,}\.)+[a-zA-Z]{2,}.*/', $chopword)) {
$chopword = preg_replace('/[^A-Za-z0-9]*$/', '', $chopword);
$linkword = ''.$chopword.'';
}
// replace chopword with linkword in word (if linkword was set)
if ($linkword <> '') {
$word = str_replace($chopword, $linkword, $word);
}
}
// append the word
$text = $text.$word;
}
return $text;
}
I got this working exactly the way I want here:
<?php
$input = <<<EOF
http://www.example.com/
http://example.com
www.example.com
http://iamanextremely.com/long/link/so/I/will/be/trimmed/down/a/bit/so/i/dont/mess
/up/text/wrapping.html
EOF;
function trimlong($match)
{
$url = $match[0];
$display = $url;
if ( strlen($display) > 30 ) {
$display = substr($display,0,30)."...";
}
return ''.$display.' <img src="http://static.goalscdn.com/img/external-link.gif" height="10" width="11" />';
}
$output = preg_replace_callback('#(http://|www\\.)[^\\s<]+[^\\s<,.]#i',
array($this,'trimlong'),$input);
echo $output;

Categories