Link converting pregmatch working with markdown - php

function makeLinks($text) {
$text = preg_replace('%(?<!href=")(((f|ht){1}(tp://|tps://))[-a-zA-^Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1', $text);
$text = preg_replace('%([:space:]()[{}])(www.[-a-zA-Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1\\2', $text);
return $text;
}
It misses if I have something like this: - www.website.org (a hyphen then a space) at the beginning of a line. If I have - www.website.org - www.website.org it catches the second one.
Shouldn't that be covered by the space in the second preg_replace?
I also tried %(\s\n\r(){})
I am running it through markdown, but not till after (markdown(makeLinks($foo))) so I thought that shouldn't interfere, but when I take the markdown off and everything just echos out in one line, it does make links out of them. If i put makeLinks(markdown($foo)) it behaves the same as initially.. not making links out of the ones that begin with www at the beginning of list items.

Thats some pretty dodgy regex work there. Here is a regex I would recommend instead for URL dectection:
%(?<!href="?)(((f|ht)(tp://|tps://))?[a-zA-Z0-9-].[-a-zA-^Z0-9#:\%_\+.~#?&//=]+)%i
Should be a lot more reliable than the two you have now.

Related

Replace spaces in all URLs with %20 using Regex

I have a large block of HTML that contains multiples URLs with spaces in them. How do I used Regex to replace any space that occurs in a URL, with a '%20'. The good thing is that all of the URLs end with '.pdf'.
Looking for something I could run in BBedit/Text Wrangler, or even PHP.
Example: http://www.site-name.com/dir/file name here.pdf
Need to return: http://www.site-name.com/dir/file%20name%20here.pdf
Instead of Regex you could use could use urlencode in PHP to achieve this which escapes the url for you. Similar to encodeURI in JavaScript.
I was faced with exactly the same problem. I solved it with this:
$text = preg_replace("/http(.*) (.*)\.pdf/U", "http$1%20$2.pdf", $text);
This looks for a space between http and pdf and then replaces the space with %20.
If your URLs have multiple spaces, then simply run the code over and over until all the spaces are gone:
while(preg_match("/http(.*) (.*)\.pdf/U", $text))
{
$text = preg_replace("/http(.*) (.*)\.pdf/U", "http$1%20$2.pdf", $text);
echo('testing testing');
}
However, I've found this will overwrite text if there are two or more URLs on the same line. I haven't found a solution for this yet.

Automatic hyperlink adding an extra HTTP:// to the beginning?

I've the following script which hyperlinks any links posted on my site:
$text = trim($text);
while ($text != stripslashes($text)) { $text = stripslashes($text); }
$text = strip_tags($text,"<b><i><u>");
$text = preg_replace("/(?<!http:\/\/)www\./","http://www.",$text);
$text = preg_replace( "/((http|ftp)+(s)?:\/\/[^<>\s]+)/i", "\\0",$text);
However, for some reason if I add a https://www.test.com link it ends up displaying like this - https://http://www.test.com - what am I doing wrong? How can I make it work with https links as well? It works fine with http links. Thank you! :-)
The lookbehind that you have here, (?<!http:\/\/)www\. is only matching http://, but your test input (that's failing) is https://.
You can add a second lookbehind chained with the current one to specify the alternative https:// version too:
(?<!http:\/\/)(?<!https:\/\/)www\.
This would make your full line look like:
$text = preg_replace("/(?<!http:\/\/)(?<!https:\/\/)www\./","http://www.",$text);
The last I checked, PHP does not support variable-length lookbehinds, so things that may be familiar such as http[s]?:// wouldn't work here - hence the second pattern.

Preg_replace for url and links

Right now
I'm using
$content = preg_replace('#(https?://([-\w\.]+)+(:\d+)?((/[\w/_\.%\-+~]*)?(\?\S+)?)?)#', '$1', $content);
for replace url with links but it doesn't works with some symbols like # and so many other
and also i want that if the content appears like this
http://www.abc.com/
then the preg_replace skip this otherwise it will duplicate the same and produces wrong result.
The text helper class from Kohana has a function for this that would probably be a good starting point: https://github.com/kohana/core/blob/3.2/master/classes/kohana/text.php#L362
Why not just look for anything starting with http:// or https:// up until any whitespace character?
https?://[^\s]+
That is obviously pretty forgiving, the only problem is that you might get some false positives.

How to make this linking making script behave with markdown?

I'm using PHP markdown but I also need a script to convert plaintext links into clicakable ones. Both work independently, but when I try to run them together, if I run markdown first, the makelinks still processes on the html code and screws things up.. and.. vice versa. Any idea of how to stop it from doing that? I can't figure out regex to ignore the markdown style links
function makeLinks($text) {
$text = preg_replace('%(((f|ht){1}tp://)[-a-zA-^Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1', $text);
$text = preg_replace('%([[:space:]()[{}])(www.[-a-zA-Z0-9#:\%_\+.~#?&//=]+)%i',
'\\1\\2', $text);
return $text;
}
sample text:
###[Title Section](http://domain/folder/page.html)
- Blah blah some text and then a link: www.webpage.org.
The double-linkify problem can be solved best with guesswork and workarounds. (We have some duplicate questions, but I can never find a good one..)
Since already converted http://-urls only occur right after href=" or an >, you can use those for negative assertions.
(?<!href="|>)
Should be written at the start of your first regex:
$text = preg_replace('%(?<!href="|>)(((f|ht){1}tp://)...
Your second regex uses the :space: as anchor, so should be fault tolerant already.

Regex to conditionally replace Twitter hashtags with hyperlinks

I'm writing a small PHP script to grab the latest half dozen Twitter status updates from a user feed and format them for display on a webpage. As part of this I need a regex replace to rewrite hashtags as hyperlinks to search.twitter.com. Initially I tried to use:
<?php
$strTweet = preg_replace('/(^|\s)#(\w+)/', '\1#\2', $strTweet);
?>
(taken from https://gist.github.com/445729)
In the course of testing I discovered that #test is converted into a link on the Twitter website, however #123 is not. After a bit of checking on the internet and playing around with various tags I came to the conclusion that a hashtag must contain alphabetic characters or an underscore in it somewhere to constitute a link; tags with only numeric characters are ignored (presumably to stop things like "Good presentation Bob, slide #3 was my favourite!" from being linked). This makes the above code incorrect, as it will happily convert #123 into a link.
I've not done much regex in a while, so in my rustyness I came up with the following PHP solution:
<?php
$test = 'This is a test tweet to see if #123 and #4 are not encoded but #test, #l33t and #8oo8s are.';
// Get all hashtags out into an array
if (preg_match_all('/(^|\s)(#\w+)/', $test, $arrHashtags) > 0) {
foreach ($arrHashtags[2] as $strHashtag) {
// Check each tag to see if there are letters or an underscore in there somewhere
if (preg_match('/#\d*[a-z_]+/i', $strHashtag)) {
$test = str_replace($strHashtag, ''.$strHashtag.'', $test);
}
}
}
echo $test;
?>
It works; but it seems fairly long-winded for what it does. My question is, is there a single preg_replace similar to the one I got from gist.github that will conditionally rewrite hashtags into hyperlinks ONLY if they DO NOT contain just numbers?
(^|\s)#(\w*[a-zA-Z_]+\w*)
PHP
$strTweet = preg_replace('/(^|\s)#(\w*[a-zA-Z_]+\w*)/', '\1#\2', $strTweet);
This regular expression says a # followed by 0 or more characters [a-zA-Z0-9_], followed by an alphabetic character or an underscore (1 or more), followed by 0 or more word characters.
http://rubular.com/r/opNX6qC4sG <- test it here.
It's actually better to search for characters that aren't allowed in a hashtag otherwise tags like "#Trentemøller" wont work.
The following works well for me...
preg_match('/([ ,.]+)/', $string, $matches);
I have devised this: /(^|\s)#([[:alnum:]])+/gi
I found Gazlers answer to work, although the regex added a blank space at the beginning of the hashtag, so I removed the first part:
(^|\s)
This works perfectly for me now:
#(\w*[a-zA-Z_0-9]+\w*)
Example here: http://rubular.com/r/dS2QYZP45n

Categories