Linkify Regex Function PHP Daring Fireball Method - php

So, I know there are a ton of related questions on SO, but none of them are quite what I'm looking for. I'm trying to implement a PHP function that will convert text URLs from a user-generated post into links. I'm using the 'improved' Regex from Daring Fireball towards the bottom of the page: http://daringfireball.net/2010/07/improved_regex_for_matching_urls
The function does not return anything, and I'm not sure why.
<?php
if ( false === function_exists('linkify') ):
function linkify($str) {
$pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';
return preg_replace($pattern, "\\0", $str);
}
endif;
?>
Can someone please help me get this to work?
Thanks!

Try this:
$pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`\!()\[\]{};:\'".,<>?«»“”‘’]))';
return preg_replace("!$pattern!i", "\\0", $str);
PHP's preg function do need delimiters. The i at the end makes it case-insensitive
Update
If you use # as the delimiter, you wan't need to escape the ! in the pattern as such use the original pattern string (the pattern does not have a #): "#$pattern#i"
Update 2
To ensure that the links are correct, do this:
$pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';
return preg_replace_callback("#$pattern#i", function($matches) {
$input = $matches[0];
$url = preg_match('!^https?://!i', $input) ? $input : "http://$input";
return '' . "$input";
}, $str);
This will now append http:// to the urls so that browser doesn't think it is a relative link.

I was looking to just get the urls from a string using the same regex from the answer above by d_inevitable and wasn't looking to turn them into links or care about the rest of the string, I only wanted the urls with in the string so this is what I did. Hope it helps.
/**
* Returns the urls in an array from a string.
* This dos NOT return the string, only the urls with-in.
*/
function get_urls($str){
$regex = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';
preg_match_all("#$regex#i", $str, $matches);
$urls = $matches[0];
return $urls;
}

Related

substr() to preg_replace() matches php

I have two functions in PHP, trimmer($string,$number) and toUrl($string). I want to trim the urls extracted with toUrl(), to 20 characters for example. from https://www.youtube.com/watch?v=HU3GZTNIZ6M to https://www.youtube.com/wa...
function trimmer($string,$number) {
$string = substr ($string, 0, $number);
return $string."...";
}
function toUrl($string) {
$regex="/[^\W ]+[^\s]+[.]+[^\" ]+[^\W ]+/i";
$string= preg_replace($regex, "<a href='\\0'>".trimmer("\\0",20)."</a>",$string);
return $string;
}
But the problem is that the value of the match return \\0 not a variable like $url which could be easily trimmed with the function trimmer().
The Question is how do I apply substr() to \\0 something like this substr("\\0",0,20)?
What you want is preg_replace_callback:
function _toUrl_callback($m) {
return "" . trimmer($m[0], 20) ."";
}
function toUrl($string) {
$regex = "/[^\W ]+[^\s]+[.]+[^\" ]+[^\W ]+/i";
$string = preg_replace_callback($regex, "_toUrl_callback", $string);
return $string;
}
Also note that (side notes wrt your question):
You have a syntax error, '$regex' is not going to work (they don't replace var names in single-quoted strings)
You may want to look for better regexps to match URLs, you'll find plenty of them with a quick search
You may want to run through htmlspecialchars() your matches (mainly problems with "&", but that depends how you escape the rest of the string.
EDIT: Made it more PHP 4 friendly, requested by the asker.

PHP get specific string from url before and after unknown characters

I know it may sound as a common question but I have difficulty understanding this process.
So I have this string:
http://domain.com/campaign/tgadv?redirect
And I need to get only the word "tgadv". But I don't know that the word is "tgadv", it could be whatever.
Also the url itself may change and become:
http://domain.com/campaign/tgadv
or
http://domain.com/campaign/tgadv/
So what I need is to create a function that will get whatever word is after campaign and before any other particular character. That's the logic..
The only certain thing is that the word will come after the word campaign/ and that any other character that will be after the word we are searching is a special one ( i.e. / or ? )
I tried understanding preg_match but really cannot get any good result from it..
Any help would be highly appreciated!
I would not use a regex for that. I would use parse_url and basename:
$bits = parse_url('http://domain.com/campaign/tgadv?redirect');
$filename = basename($bits['path']);
echo $filename;
However, if want a regex solution, use something like this:
$pattern = '~(.*)/(.*)(\?.*)~';
preg_match($pattern, 'http://domain.com/campaign/tgadv?redirect', $matches);
$filename = $matches[2];
echo $filename;
Actually, preg_match sounds like the perfect solution to this problem. I assume you are having problems with the regex?
Try something like this:
<?php
$url = "http://domain.com/campaign/tgadv/";
$pattern = "#campaign/([^/\?]+)#";
preg_match($pattern, $url, $matches);
// $matches[1] will contain tgadv.
$path = "http://domain.com/campaign/tgadv?redirect";
$url_parts = parse_url($path);
$tgadv = strrchr($url_parts['path'], '/');
You don't really need a regex to accomplish this. You can do it using stripos() and substr().
For example:
$str = '....Your string...';
$offset = stripos($str, 'campaign/');
if ( $offset === false ){
//error, end of h4 tag wasn't found
}
$offset += strlen('campaign/');
$newStr = substr($str, $offset);
At this point $newStr will have all the text after 'campaign/'.
You then just need to use a similar process to find the special character position and use substr() to strip the string you want out.
You can also just use the good old string functions in this case, no need to involve regexps.
First find the string /campaign/, then take the substring with everything after it (tgadv/asd/whatever/?redirect), then find the next / or ? after the start of the string, and everything in between will be what you need (tgadv).

What is wrong with my Regex Function?

It seems like such a simple matter. Anyways, at the bottom of my code is an array which uses the outcome of the function. Now it doesn't seem to be working and i think it might be my regex for scanning Meta Keywords. So basically, I want to know what my function is doing wrong or how to create a fully working regex code
function getKeywords($link) {
$str2 = file_get_contents($link);
if (strlen($str2)>0) {
preg_match_all( '(?i)<meta\\s+name=\"keywords\"\\s+content=\"(.*?)\">', $str2, $keywords);
return $keywords[1];
}
}
try this:
function getKeywords($link) {
$str2 = file_get_contents($link);
if (strlen($str2)>0) {
if(preg_match( '/<meta\s+name="keywords"\s+content="(.*?)">/i', $str2, $keywords))
return $keywords[1];
else
return "";
}
}
You had multiple problems with your expression:
1). To many escape chars for \s and \"
2). You didn't lead with a / and end with a /
3). You were using preg_match_all instead of preg_match
4). You didn't handle the case when no keyword meta tag can be found.

regex for breadcrumb in php

I am currently building breadcrumb. It works for example for
http://localhost/researchportal/proposal/
<?php
$url_comp = explode('/',substr($url,1,-1));
$end = count($url_comp);
print_r($url_comp);
foreach($url_comp as $breadcrumb) {
$landing="http://localhost/";
$surl .= $breadcrumb.'/';
if(--$end)
echo '
<a href='.$landing.''.$surl.'>'.$breadcrumb.'</a>»';
else
echo '
'.$breadcrumb.'';
};?>
But when I typed in http://localhost////researchportal////proposal//////////
All the formatting was gone as it confuses my code.
I need to have the site path in an array like ([1]->researchportal, [2]->proposal)
regardless of how many slashes I put.
So can $url_comp = explode('/',substr($url,1,-1)); be turned into a regular expression to get my desired output?
You don't need regex. Look at htmlentities() and stripslashes() in the PHP manual. A regex will return a boolean value of whatever it says, and won't really help you achieve what you are trying to do. All the regex can let you do is say if the string matches the regex do something. If you put in a regex requiring at least 2 characters between each slash, then any time anyone puts more than one consecutive slash in there, the if statement will stop.
http://ca3.php.net/manual/en/function.stripslashes.php
http://ca3.php.net/manual/en/function.htmlentities.php
Found this on the php manual.
It uses simple str_replace statements, modifying this should achieve exactly what your post was asking.
<?
function stripslashes2($string) {
$string = str_replace("\\\"", "\"", $string);
$string = str_replace("\\'", "'", $string);
$string = str_replace("\\\\", "\\", $string);
return $string;
}
?>

Determine User Input Contains URL

I have a input form field which collects mixed strings.
Determine if a posted string contains an URL (e.g. http://link.com, link.com, www.link.com, etc) so it can then be anchored properly as needed.
An example of this would be something as micro blogging functionality where processing script will anchor anything with a link. Other sample could be this same post where 'http://link.com' got anchored automatically.
I believe I should approach this on display and not on input. How could I go about it?
You can use regular expressions to call a function on every match in PHP. You can for example use something like this:
<?php
function makeLink($match) {
// Parse link.
$substr = substr($match, 0, 6);
if ($substr != 'http:/' && $substr != 'https:' && $substr != 'ftp://' && $substr != 'news:/' && $substr != 'file:/') {
$url = 'http://' . $match;
} else {
$url = $match;
}
return '' . $match . '';
}
function makeHyperlinks($text) {
// Find links and call the makeLink() function on them.
return preg_replace('/((www\.|(http|https|ftp|news|file)+\:\/\/)[_.a-z0-9-]+\.[a-z0-9\/_:#=.+?,##%&~-]*[^.|\'|\# |!|\(|?|,| |>|<|;|\)])/e', "makeLink('$1')", $text);
}
?>
You will want to use a regular expression to match common URL patterns. PHP offers a function called preg_match that allows you to do this.
The regular expression itself could take several forms, but here is something to get you started (also maybe just Google 'URL regex':
'/^(((http|https|ftp)://)?([[a-zA-Z0-9]-.])+(.)([[a-zA-Z0-9]]){2,4}([[a-zA-Z0-9]/+=%&_.~?-]))$/'
So your code should look something this:
$matches = array(); // will hold the results of the regular expression match
$string = "http://www.astringwithaurl.com";
$regexUrl = '/^(((http|https|ftp):\/\/)?([[a-zA-Z0-9]\-\.])+(\.)([[a-zA-Z0-9]]){2,4}([[a-zA-Z0-9]\/+=%&_\.~?\-]*))*$/';
preg_match($regexUrl, $string, $matches);
print_r($matches); // an array of matched patterns
From here, you just want to wrap those URL patterns in an anchor/href tag and you're done.
Just how accurate do you want to be? Given just how varied URLs can be, you're going to have to draw the line somewhere. For instance. www.ca is a perfectly valid hostname and does bring up a site, but it's not something you'd EXPECT to work.
You should investigate regular expressions for this.
You will build a pattern that will match the part of your string that looks like a URL and format it appropriately.
It will come out something like this (lifted this, haven't tested it);
$pattern = "((https?|ftp|gopher|telnet|file|notes|ms-help):((//)|(\\\\))+[\w\d:##%/;$()~_?\+-=\\\.&]*)";
preg_match($pattern, $input_string, $url_matches, PREG_OFFSET_CAPTURE, 3);
$url_matches will contain an array of all of the parts of the input string that matched the url pattern.
You can use $_SERVER['HTTP_HOST'] to get the host information.
<?php
$host = $SERVER['HTTP_HOST'];
?>
Post

Categories