PHP regexp string to url problem [duplicate] - php

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How do I linkify urls in a string with php?
It would be so delightful if I could overcome this problem once and for all.
I need to able to create urls from strings like http://www.google.com and also www.google.com
function hyperlink($text)
{
// match protocol://address/path/
$text = ereg_replace("[a-zA-Z]+://([.]?[a-zA-Z0-9_/-])*", "\\0", $text);
// match www.something
$text = ereg_replace("(^| )(www([.]?[a-zA-Z0-9_/-])*)", "\\1\\2", $text);
// return $text
return $text;
}

You will find many good answers in the php manual. Although the examples are mainly on this page, you should use preg_replace instead.
$text = preg_replace('![a-z]+://[a-z0-9_/.-]+!i', '$0', $text);
$text = preg_replace('!(^| )(www([a-z0-9_/.-]+)!i', '$1$2', $text);
Note: with preg you can use arbitrary delimiters, not just the standard / at the start and end of your expression. I used ! as it does not appear in the expression and this way you don't have to escape /. Also note that the i makes the expression case-insensitive so a-z is enough instead of a-zA-Z.

Alex: so let me get this right, you've got a string, be it anything, and you wish to convert all instances of a URL to a link with the URL within it?
What i dont get is that you've already got it working with the regex you're trying:
$string = "
<p>This string has http://www.google.com/ and has www.google.com it should match both</p>
";
$string = preg_replace("/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&##\/%?=~_|!:,.;]*[-a-z0-9+&##\/%=~_|]/i","<a href='$0'>$0</a>", $string);
echo $string;
will convert both URLs to be links as expected. I've not made any changes.
I'm thinking i'm missing what you mean, maybe you could paste some errors so we can see what the problem really is.

function hyperlink($text)
{
// match protocol://address/path/
$text = ereg_replace("[a-zA-Z]+://([.]?[a-zA-Z0-9_/-])*", "\\0", $text);
// match www.something
$text = ereg_replace("(^| )(www([.]?[a-zA-Z0-9_/-])*)", "\\1\\2", $text);
// return $text
return $text;
}

I've just tested your solution and it works a treat, except for when you have a query string. e.g. www.example.com/search.php?q=ipod+nano&something=nothing will not get translated correctly.
I've made the relevant changes to your function below, this should now work more consistantly
function hyperlink($text)
{
// match protocol://address/path/
$text = ereg_replace("[a-zA-Z]+://([.]?[a-zA-Z0-9_/-])*", "\\0", $text);
// match www.something
$text = ereg_replace("(^| )(www([.]?[a-zA-Z0-9_/-\?&=\+%])*)", "\\1\\2", $text);
// return $text
return $text;
}
Just so you know i added: \?&=+% to the second regex.
You should test this across many more URL combinations.
But this should suffice for now.

Related

Space in # mention username and lowercase in link

I am trying to create a mention system and so far I've converted the #username in a link. But I wanted to see if it is possible for it to recognise whitespace for the names. For example: #Marie Lee instead of #MarieLee.
Also, I'm trying to convert the name in the link into lowercase letters (like: profile?id=marielee while leaving the mentioned showed with the uppercased, but haven't been able to.
This is my code so far:
<?php
function convertHashtags($str) {
$regex = '/#+([a-zA-Z0-9_0]+)/';
$str = preg_replace($regex, strtolower('$0'), $str);
return($str);
}
$string = 'I am #Marie Lee, nice to meet you!';
$string = convertHashtags($string);
echo $string;
?>
You may use this code with preg_replace_callback and an enhanced regex that will match all space separated words:
define("REGEX", '/#\w+(?:\h+\w+)*/');
function convertHashtags($str) {
return preg_replace_callback(REGEX, function ($m) {
return '$0';
}, $str);
}
If you want to allow only 2 words then you may use:
define("REGEX", '/#\w+(?:\h+\w+)?/');
You can filter out usernames based on alphanumeric characters, digits or spaces, nothing else to extract for it. Make sure that at least one character is matched before going for spaces to avoid empty space match with a single #. Works for maximum of 2 space separated words correctly for a username followed by a non-word character(except space).
<?php
function convertHashtags($str) {
$regex = '/#([a-zA-Z0-9_]+[\sa-zA-Z0-9_]*)/';
if(preg_match($regex,$str,$matches) === 1){
list($username,$name) = [$matches[0] , strtolower(str_replace(' ','',$matches[1]))];
return "<a href='profile?id=$name'>$username</a>";
}
throw new Exception('Unable to find username in the given string');
}
$string = 'I am #Marie Lee, nice to meet you!';
$string = convertHashtags($string);
echo $string;
Demo: https://3v4l.org/e2S8C
If you want the text to appear as is in the innerHTML of the anchor tag, you need to change
list($username,$name) = [$matches[0] , strtolower(str_replace(' ','',$matches[1]))];
to
list($username,$name) = [$str , strtolower(str_replace(' ','',$matches[1]))];
Demo: https://3v4l.org/dCQ4S

Regex match any url's in string with and without www and create a clickable URL

There are many similar questions, but I still have not found a solution to what I try to achieve in php. I preg_match_all a string which can contain URLs written in various ways, but also contains text which should not match. What I need to match is:
www.something.com
https://something.com
http://something.com
https://www.something.com
http://www.something.com
And any /..../.... after the URL, but not:
www.something.com</p> // this should match everything until the '</p>'
www.something.com. // this should match everything until the '.'
So far I got so far is
/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:#\-_=#])*/
and the function
if(preg_match_all("/((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]+\.([a-zA-Z0-9\&\.\/\?\:#\-_=#])*/",$text,$urls)){
foreach($urls[0]as $url ){
$text = str_replace($url,''.$url.'',$text);
}
}
but this gives a problem with http://www.... (the http:// won't be inlcuded in the displayed text), and with a URL without http or https the created link is relative to the domain I show the page on. Suggestions?
Here's a live Demo
Edit: my best regex so for any URL with http or https is /(http|https)\:\/\/[a-zA-Z0-9\-\.]+(\.[a-zA-Z]{2,3})?(\/[A-Za-z0-9-._~!$&()*+,;=:]*)*/. Now I just need a way to regex the URLs with only www.something... and transform that into http://www.something... in the href.
Here's another live demo with different examples.
Edit 2: the answer from this question is quite good. The only problem with this that I still encounter is with </p> after the URL and if there are words before and after a dot (this.for example).
$url = '#(http)?(s)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])#';
$string = preg_replace($url, '$0', $string);
echo $string;
Maybe this one fits your needs:
$text = preg_replace_callback('~(https?://|www)[a-z\d.-]+[\w/.?=&%:#]*\w~i', function($m) {
$prefix = stripos($m[0], 'www') === 0 ? 'http://' : '';
return "<a href='{$prefix}{$m[0]}'>{$m[0]}</a>";
}, $text);
$text = "<p>Some string www.test.com with urls http://test.com in it http://www.test.com. </p>";
$text = preg_replace_callback("#(http)?(s)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])#", 'replace_callback', $text);
function replace_callback($matches){
return '' . $matches[0] . '';
}
You regex was almost correct!
You we're matching a literal dot \. followed by 0 or more group of characters including the dot.
So i changed it to matching a literal dot followed by 1 or more characters excluding the dot which seems to be what you want, here is the final regex:
((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:#\-_=#]+\.([a-zA-Z0-9\&\/\?\:#\-_=#])+
See it in action:
https://regex101.com/r/h5pUvC/3/

PHP regEx help needed with /*<##> </##>*/

I am struggling with regEx, but can not get it to work.
I already try with:
SO question, online tool,
$text = preg_replace("%/\*<##>(?:(?!\*/).)</##>*\*/%s", "new", $text);
But nothing works.
My input string is:
$input = "something /*<##>old or something else</##>*/ something other";
and expected result is:
something /*<##>new</##>*/ something other
I see two issues that point out here, you have no capturing groups to replace the delimited markers inside your replacement call and your Negative Lookahead syntax is missing a repetition operator.
$text = preg_replace('%(/\*<##>)(?:(?!\*/).)*(</##>*\*/)%s', '$1new$2', $text);
Although, you can replace the lookahead with .*? since you are using the s (dotall) modifier.
$text = preg_replace('%(/\*<##>).*?(</##>*\*/)%s', '$1new$2', $text);
Or consider using a combination of lookarounds to do this without capturing groups.
$text = preg_replace('%/\*<##>\K.*?(?=</##>\*/)%s', 'new', $text);
Tested:
$input = "something /*<##>old or something else</##>*/ something other";
echo preg_replace('%(/\*<##>)(.*)(</##>\*/)%', '$1new$3', $input);

preg_replace everything but # sign

I've searched for an example of this, but can't seem to find it.
I'm looking to replace everything for a string but the #texthere
$Input = this is #cool isn't it?
$Output = #cool
I can remove the #cool using preg_replace("/#(\w+)/", "", $Input); but can't figure out how to do the opposite
You could match #\w+ and then replace the original string. Or, if you need to use preg_replace, you should be able to replace everything with the first capture group:
$output = preg_replace('/.*(#\w+).*/', '\1', $input);
Solution using preg_match (I assume this will perform better):
$matches = array();
preg_match('/#\w+/', $input, $matches);
$output = $matches[0];
Both patterns above do not address the issue how to handle inputs which match multiple times, such as this is #cool and #awesome, right?

regular expression in PHP to create wiki-style links

I'm developing a site which is going to use wiki-style links to internal content eg [[Page Name]]
I'm trying to write a regex to achieve this and I've got as far as turning it into a link and replacing spaces with dashes (this is our space substitute rather than underscores) but only for page names of two words.
I could write a separate regex for all likely numbers of words (say from 10 downwards) but I'm sure there must be a neater way of doing it.
Here's what I have at the moment:
$regex = "#[\[][\[]([^\s\]]*)[\s]([^\s\]]*)[\]][\]]#";
$description = preg_replace($regex,"$1 $2",$description);
If someone can advise me how I can modify this regex so it works for any number of words that would be really helpful.
You can use the preg_replace_callback() function which accepts a callback to process the replacement string. You can also use lazy quantifiers in the pattern instead of a lot of negations inside character classes.
The external preg_replace_callback will extract the matched text and pass it to the callback function, which will return the properly modified version.
$str = '[[Page Name with many words]]';
echo preg_replace_callback('/\[\[(.*?)\]\]/', 'parse_tags', $str);
function parse_tags($match) {
$text = $match[1];
$slug = preg_replace('/\s+/', '-', $text);
return "$text";
}
You should use a callback function to do the replacement (using preg_replace_callback):
$str = preg_replace_callback('/\[\[([^\]]+)\]\]/', function($matches) {
return '<a href="' . preg_replace('/\s+/', '-', $matches[1]) . '>' . $matches[1] . '</a>';
}, $str);

Categories