Find url from string with php - php

following code is used to find url from a string with php. Here is the code:
$string = "Hello http://www.bytes.com world www.yahoo.com";
preg_match('/(http:\/\/[^\s]+)/', $string, $text);
$hypertext = "" . $text[0] . "";
$newString = preg_replace('/(http:\/\/[^\s]+)/', $hypertext, $string);
echo $newString;
Well, it shows a link but if i provide few link it doesn't work and also if i write without http:// then it doesn't show link. I want whatever link is provided it should be active, Like stackoverflow.com.
Any help please..

A working method for linking with http/https/ftp/ftps/scp/scps:
$newStr = preg_replace('!(http|ftp|scp)(s)?:\/\/[a-zA-Z0-9.?&_/]+!', "\\0",$str);
I strongly advise NOT linking when it only has a dot, because it will consider PHP 5.2, ASP.NET, etc. links, which is hardly acceptable.
Update: if you want www. strings as well, take a look at this.

If you want to detect something like stackoverflow.com, then you're going to have to check for all possible TLDs to rule out something like Web 2.0, which is quite a long list. Still, this is also going to match something as ASP.NET etc.
The regex would looks something like this:
$hypertext = preg_replace(
'{\b(?:http://)?(www\.)?([^\s]+)(\.com|\.org|\.net)\b}mi',
'$1$2$3',
$text
);
This only matches domains ending in .com, .org and .net... as previously stated, you would have to extend this list to match all TLDs

#axiomer your example wasn't work if link will be in format:
https://stackoverflow.com?val1=bla&val2blablabla%20bla%20bla.bl
correct solution:
preg_replace('!(http|ftp|scp)(s)?:\/\/[a-zA-Z0-9.?%=&_/]+!', "\\0", $content);
produces:
https://stackoverflow.com?val1=bla&val2blablabla%20bla%20bla.bl

Related

PHP / RegEx - Convert URLs to links by detecting .com/.net/.org/.edu etc

I know there have been many questions asking for help converting URLs to clickable links in strings, but I haven't found quite what I'm looking for.
I want to be able to match any of the following examples and turn them into clickable links:
http://www.domain.com
https://www.domain.net
http://subdomain.domain.org
www.domain.com/folder
subdomain.domain.net
subdomain.domain.edu/folder/subfolder
domain.net
domain.com/folder
I do not want to match random.stuff.separated.with.periods.
EDIT: Please keep in mind that these URLs need to be found within larger strings of 'normal' text. For example, I want to match 'domain.net' in "Hello! Come check out domain.net!".
I think this could be accomplished with a regex that can determine whether the matching url contains .com, .net, .org, or .edu followed by either a forward slash or whitespace. Other than a user typo, I can't imagine any other case in which a valid URL would have one of those followed by anything else.
I realize there are many valid domain extensions out there, but I don't need to support them all. I can just choose which to support with something like (com|net|org|edu) in the regex. Unfortunately, I'm not skilled enough with regex yet to know how to properly implement this.
I'm hoping someone can help me find a regular expression (for use with PHP's preg_replace) that can match URLs based on just about any text connected by one or more dots and either ending with one of the specified extensions followed by whitespace OR containing one of the specified extensions followed by a slash and possibly folders.
I did several searches and so far have not found what I'm looking for. If there already exists a SO post that answers this, I apologize.
Thanks in advance.
--- EDIT 3 ---
After days of trial and error and some help from SO, here's what works:
preg_replace_callback('#(\s|^)((https?://)?(\w|-)+(\.(\w+|-)*)+(?<=\.net|org|edu|com|cc|br|jp|dk|gs|de)(\:[0-9]+)?(?:/[^\s]*)?)(?=\s|\b)#is',
create_function('$m', 'if (!preg_match("#^(https?://)#", $m[2]))
return $m[1]."".$m[2].""; else return $m[1]."".$m[2]."";'),
$event_desc);
This is a modified version of anubhava's code below and so far seems to do exactly what I want. Thanks!
You can use this regex:
#(\s|^)((?:https?://)?\w+(?:\.\w+)+(?<=\.(net|org|edu|com))(?:/[^\s]*|))(?=\s|\b)#is
Code:
$arr = array(
'http://www.domain.com/?foo=bar',
'http://www.that"sallfolks.com',
'This is really cool site: https://www.domain.net/ isn\'t it?',
'http://subdomain.domain.org',
'www.domain.com/folder',
'Hello! You can visit vertigofx.com/mysite/rocks for some awesome pictures, or just go to vertigofx.com by itself',
'subdomain.domain.net',
'subdomain.domain.edu/folder/subfolder',
'Hello! Check out my site at domain.net!',
'welcome.to.computers',
'Hello.Come visit oursite.com!',
'foo.bar',
'domain.com/folder',
);
foreach($arr as $url) {
$link = preg_replace_callback('#(\s|^)((?:https?://)?\w+(?:\.\w+)+(?<=\.(net|org|edu|com))(?:/[^\s]*|))(?=\s|\b)#is',
create_function('$m', 'if (!preg_match("#^(https?://)#", $m[2]))
return $m[1]."".$m[2].""; else return $m[1]."".$m[2]."";'),
$url);
echo $link . "\n";
OUTPUT:
http://www.domain.com/?foo=bar
http://www.that"sallfolks.com
This is really cool site: https://www.domain.net/ isn't it?
http://subdomain.domain.org
www.domain.com/folder
Hello! You can visit vertigofx.com/mysite/rocks for some awesome pictures, or just go to vertigofx.com by itself
subdomain.domain.net
subdomain.domain.edu/folder/subfolder
Hello! Check out my site at domain.net!
welcome.to.computers
Hello.Come visit oursite.com!
foo.bar
domain.com/folder
PS: This regex only supports http and https scheme in URL. So eg: if you want to support ftp also then you need to modify the regex a little.
'/(http(s)?:\/\/)?[\w\/\.]+(\.((com)|(edu)|(net)|(org)))[\w\/]*/'
That works for your examples. You might want to add extra characters support for "-", "&", "?", ":", etc in the last bracket.
'/(http(s)?:\/\/)?[\w\/\.]+(\.((com)|(edu)|(net)|(org)))[\w\/\?=&-;]*/'
This will support parameters and port numbers.
eg.: www.foo.ca:8888/test?param1=val1&param2=val2
Thanks a ton. I modified his final solution to allow all domains (.ca, .co.uk), not just the specified ones.
$html = preg_replace_callback('#(\s|^)((https?://)?(\w|-)+(\.[a-z]{2,3})+(\:[0-9]+)?(?:/[^\s]*)?)(?=\s|\b)#is',
create_function('$m', 'if (!preg_match("#^(https?://)#", $m[2])) return $m[1]."".$m[2].""; else return $m[1]."".$m[2]."";'),
$url);

Make certain text in outlook incoming e-mails into links?

I'd like to do some operations on incoming e-mails. Namely transform all 6 digit numbers into links which lead to a url based on the number.
I don't want to open a huge can of worms, in terms of APIs or languages besides PHP, this isn't that much of a timesaver, but it would be nice. Anyone done anything like this? Just looking to get pointed in the right direction !
You can use a regex to find your numbers and replace them with your links. Since I do not know your link structure, I made one up.
Here is a simple example:
$str = "Testing 385758 String";
preg_replace( '/(\d{6})/', '$1', $str);
This will turn $str into:
Testing 385758 String
Demo

Removing 'http://' from link via REGEX

What I would like to do is remove the "http://" part of these autogenerated links, below is an example of it.
http://google.com/search?gc...
Here are the regexes I am using in PHP to generate these links from a URL.
$patterns_sp[5] = '~([\S]+)~';
$replaces_sp[5] = '<a href=\1 target="_blank">\1<br/>';
$patterns_sp[6] = '~(?<=\>)([\S]{1,25})[^\s]+~';
$replaces_sp[6] = '\1...</a><br/>';
When these patterns are run on a URL like this:
http://www.google.com/search?gcx=c&ix=c1&sourceid=chrome&ie=UTF-8&q=regex
the REGEX gives me:
http://google.com/search?gc...
Where I am stuck:
There is no obvious reason why I cannot modify the fourth line of code to read like this:
$patterns_sp[6] = '~(?<=\>http\:\/\/)([\S]{1,25})[^\s]+~';
However, the REGEX still seems to capture the "http://" part of the address, thus making a long list of these very redundant looking. What I am left with is the same thing as in the first example.
Replace...
$patterns_sp[5] = '~([\S]+)~';
...with...
$patterns_sp[5] = '~^(?:https?|ftp):([\S]+)~';
Then you can access the protocol-less version with $1 and the whole link with $0.
Optionally, you can remove a leading protocol with something like...
preg_replace('/^(?:https?|ftp):/', '', $str);
I suggest not writing your own regex, instead have a look at http://php.net/manual/en/function.parse-url.php
Retrieve the components of the URL, then compose a new version that only contains the parts you want.

Extract URL from string

I'm trying to find a reliable solution to extract a url from a string of characters. I have a site where users answer questions and in the source box, where they enter their source of information, I allow them to enter a url. I want to extract that url and make it a hyperlink. Similar to how Yahoo Answers does it.
Does anyone know a reliable solution that can do this?
All the solutions I have found work for some URL's but not for others.
Thanks
John Gruber has spent a fair amount of time perfecting the "one regex to rule them all" for link detection. Using preg_replace() as mentioned in the other answers, using the following regex should be one of the most accurate, if not the most accurate, method for detecting a link:
(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
If you only wanted to match HTTP/HTTPS:
(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'".,<>?«»“”‘’]))
$string = preg_replace('/https?:\/\/[^\s"<>]+/', '$0', $string);
It only matches http/https, but that's really the only protocol you want to turn into a link. If you want others, you can change it like this:
$string = preg_replace('/(https?|ssh|ftp):\/\/[^\s"]+/', '$0', $string);
There are a lot of edge cases with urls. Like url could contain brackets or not contain protocol etc. Thats why regex is not enough.
I created a PHP library that could deal with lots of edge cases: Url highlight.
You could extract urls from string or directly highlight them.
Example:
<?php
use VStelmakh\UrlHighlight\UrlHighlight;
$urlHighlight = new UrlHighlight();
// Extract urls
$urlHighlight->getUrls("This is example http://example.com.");
// return: ['http://example.com']
// Make urls as hyperlinks
$urlHighlight->highlightUrls('Hello, http://example.com.');
// return: 'Hello, http://example.com.'
For more details see readme. For covered url cases see test.
Yahoo! Answers does a fairly good job of link identification when the link is written properly and separate from other text, but it isn't very good at separating trailing punctuation. For example The links are http://example.com/somepage.php, http://example.com/somepage2.php, and http://example.com/somepage3.php. will include commas on the first two and a period on the third.
But if that is acceptable, then patterns like this should do it:
\<http:[^ ]+\>
It looks like stackoverflow's parser is better. Is is open source?
This code is worked for me.
function makeLink($string){
/*** make sure there is an http:// on all URLs ***/
$string = preg_replace("/([^\w\/])(www\.[a-z0-9\-]+\.[a-z0-9\-]+)/i", "$1http://$2",$string);
/*** make all URLs links ***/
$string = preg_replace("/([\w]+:\/\/[\w-?&;#~=\.\/\#]+[\w\/])/i","<a target=\"_blank\" href=\"$1\">$1</a>",$string);
/*** make all emails hot links ***/
$string = preg_replace("/([\w-?&;#~=\.\/]+\#(\[?)[a-zA-Z0-9\-\.]+\.([a-zA-Z]{2,3}|[0-9]{1,3})(\]?))/i","$1",$string);
return $string;
}

Detecting .com / .co.uk etc etc

I currently have a preg_match to detect http:// and www. etc..... but I want to detect domain.com or domain.co.uk from a string
example string: "Hey hows it going,
check out domain.com" And I want to
detect domain.com
What I want is to detect any major domains form this string i.e. .com .co.uk .eu etc... from the form example.com example2.co.uk and then return true or false to handle it. In this case it would find domain.com.
However I do NOT want it to detect something like:
"hey.i love this site"
Whereby this is obviously an error in typing a space from the full stop!
Any ideas i need to scratch up on my regex!
Thanks,
Stefan
After they introduced non-Latin urls, it will be close to impossible to use regex to get a completely working filter. So I'd say it's not even worth trying to use regex for this anymore. Doubt parse_url() has support for it yet either, but using it means someone else have to work out the problems with non-Latin urls, which is always a bonus :) So use that
http://au.php.net/parse_url
http://thenextweb.com/me/2010/05/06/monumental-day-internet-nonlatin-domain-names-live/
Edit:
Ok, from a string, split it into words like this
$array = explode(" ", $string);
for(int i = 0; i < count($array);i++)
{
if(parse_url($array[i]) != false)
{
$url[] = $array[i];
}
}
Ok, parse_url() isn't supposed to be used like this, but there is no other function built into php to do url filtering as far as I can see.
Here is regexp that would match a provided list of domain zones:
[a-z0-9\-\.]+\.(com|co\.uk|net|org)

Categories