I use this code to make urls clickable as anchor links in forum posts.
function makelink($str) {
$str = preg_replace_callback('/((((http)(s)?:\/\/)|www\.)[-0-9æøåa-zA-Z?-??-?\(\)%_+\.~#?&;:#\/\/=]+)(?<!\.)/i', function($matches) {
if (strtolower(substr($matches[0], 0 , 4)) == 'www.') {
$matches[0] = 'http://' . $matches[0];
}
return ''.$matches[0].'';
}, $str);
return trim($str);
}
It works fine. Now I need to also make youtube links into embed codes underneath the link (appended to the link I guess).
It's ok for this that there is an extra replacement routine going on.
How could I make some code that replaces the resulting anchor (if it's a youtube link):
https://www.youtube.com/watch?v=LOLAy72Tv24
With this:
https://www.youtube.com/watch?v=LOLAy72Tv24
<br />
<iframe class="youtube" width="350" height="250" src="https://www.youtube.com/embed/LOLAy72Tv24/" allowfullscreen></iframe>
So it just needs to take the video id and put out embed code underneath the original link, while outputting both together.
Here's my proposed solution:
function makelink($str) {
$pattern = '/((((http)(s)?:\/\/)|www\.)[-0-9æøåa-zA-Z?-??-?\(\)%_+\.~#?&;:#\/\/=]+)(?<!\.)/i';
$str = preg_replace_callback($pattern, function($matches) {
if (strtolower(substr($matches[0], 0 , 4)) == 'www.') {
$matches[0] = 'http://' . $matches[0];
}
// store anchor tag html in a variable instead of returning immediately
$html = ''.$matches[0].'';
if (isYouTubeVideoUrl($matches[0])) {
$html .= '<br />'.makeiFrame($matches[0]);
}
return $html;
}, $str);
return trim($str);
}
function isYouTubeVideoUrl(string $url): bool
{
return (parse_url($url, PHP_URL_HOST) === 'www.youtube.com' || isYouTubeShortUrl($url))
&& strpos(parse_url($url, PHP_URL_QUERY), 'v=') !== false;
}
function isYouTubeShortUrl(string $url): bool
{
return parse_url($url, PHP_URL_HOST) === 'youtu.be';
}
function makeiFrame(string $url): string {
$embedUrl = 'https://www.youtube.com/embed/'.getYouTubeVideoId($url).'/';
return '<iframe class="youtube" width="350" height="250" src="'.$embedUrl.'" allowfullscreen></iframe>';
}
function getYouTubeVideoId(string $url): string
{
if (isYouTubeShortUrl($url)) {
preg_match('/[^\/]+$/', $url, $matches);
return $matches[0];
}
preg_match('/(?<=v=)(.*?)(?=(&|$))/', $url, $matches);
return $matches[0];
}
It is designed to work both with regular YouTube URLs (https://www.youtube.com/watch?v=LOLAy72Tv24) and short YouTube URLs (https://youtu.be/LOLAy72Tv24). It also supports the v parameter being anywhere in the query string for regular URLs.
Most of the code is pretty straightforward, the key lies in extracting the video id.
Short URLs have the format where the id is behind a slash, so [^\/]+$ looks for any characters that are not a slash at the end of the string:
[^\/] matches any character not a slash
+ is a quantifier for one or more, greedy
$ asserts the position at the end of the string
Regular URLs have the format where the id is in a parameter named v, so (?<=v=)(.+?)(?=(&|$)) looks for everything between v= and either & or the end of the string:
(?<=v=) is a positive lookbehind, assuring that we look for a string right after v=
(.+?) matches one or more characters (any, except for line terminators), lazy
(?=(&|$)) is a positive lookahead, assuring that we look for a string right before an ampersand (&) or the end of a string ($)
Related
I have some inherited code whose purpose is to identify urls in a string an prepend the http:// protocol onto them if it doesn't exist.
return preg_replace_callback(
'/((https?:\/\/)?\w+(\.\w{2,})+[\w?&%=+\/]+)/i',
function ($match) {
if (stripos($match[1], 'http://') !== 0 && stripos($match[1], 'https://') !== 0) {
$match[1] = 'http://' . $match[1];
}
return $match[1];
},
$string);
It's working, except when a domain has a hyphen it. So, for-instance, the following string will only partially work.
$string = "In front mfever.com/1 middle http://mf-ever.com/2 at the end";
Can any regex genius see what's wrong with it?
You just need to add the optional dash:
((https?:\/\/)?\w+\-?\w+(\.\w{2,})+[\w?&%=+\/]+)
See it work here https://regex101.com/r/Tkdapj/1
Currently, I use this code to detect and convert URL in link in text.
But now, I need to keep this system, but detect and convert image too.
public function convert_to_link($text)
{
$reg_user = '!#(.+)(?:\s|$)!U';
if (preg_match_all($reg_user, $text, $matches))
{
return preg_replace($reg_user, '$0', $text);
}
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
if(preg_match($reg_exUrl, $text, $url)) {
// make the urls hyper links
if (preg_match("/\[img=([^\s'\"<>]+?)\]/i", $text))
{
//This is not working
return preg_replace("/\[img=([^\s'\"<>]+?)\]/i", "<img border=0 src=\"\\1\">", $text);
}
else
{
return preg_replace($reg_exUrl, '$0 ', $text);
}
}
else
{
return $text;
}
}
I use too [img][/img] code to convert image to but with the code located above, the result is bad :
Looks like your regex is looking for this format
[img=http://example.com/name.png]
but your example is in a different format
[img]http://example.com/name.png[/img]
a regex to identify the second form would be
"/^\[img\]([^\[]+)\[\/img\]$/i"
I'm trying to turn youtube links into embed iframes, to play the videos. However, my current code is replacing the entire sentence with the embed code. What I want to do is to just convert the youtube link to an embed code, and leave the rest of the text unharmed.
Example: This is a youtube link: https://www.youtube.com/watch?v=T-8XurAKMkU and some text after.
Turned into: This is a youtube link: <embed> and some text after.
My current code:
$testing = "This is a youtube link: https://www.youtube.com/watch?v=T-8XurAKMkU and some text after.";
echo $core->convertyoutube($testing);
And the function:
public function convertyoutube($link) {
if (strpos($link, 'youtube.com/watch?v=') == true) {
$url = $link;
parse_str(parse_url($url, PHP_URL_QUERY), $youtube_array);
$videoid = $youtube_array['v'];
$embed = "<iframe width='420' height='315' src='https://www.youtube.com/embed/".$videoid."'></iframe>"; // what it should create with the extracted code
return $embed;
}
}
Well You are returning only embed code:
return $embed;
You need to replace only youtube part:
public function convertyoutube($link) {
$position = strpos($link, 'youtube.com/watch?v=');
if ($position !== false) {
$chunks = explode(' ', $link);
foreach ($chunks as &$chunk) {
$isYoutubeLink = strpos($chunk, 'youtube.com/watch?v=');
if ($isYoutubeLink !== false) {
$url = $chunk;
parse_str(parse_url($url, PHP_URL_QUERY), $youtube_array);
$videoid = $youtube_array['v'];
$chunk = "<iframe width='420' height='315' src='https://www.youtube.com/embed/".$videoid."'></iframe>"; // what it should create with the extracted code
}
}
return implode(' ', $chunks);
}
}
It works with multiple links in sentence. I guess there is "better" way with using regexp, however I am not very good at regexp and don't like to use it where it is not mandatory.
You could actually do this all with a single regex.
echo preg_replace('/https?:\/\/(?:www\.)?youtube\.com\/watch\?v=(.+?)(?:&|\s|$)/',
'<iframe width="420" height="315" src="https://www.youtube.com/embed/$1"></iframe>',
'This is a youtube link: https://www.youtube.com/watch?v=T-8XurAKMkU and some text after.');
Output:
This is a youtube link: https://www.youtube.com/watch?v=T-8XurAKMkU and some text after.
Regex101 Demo: https://regex101.com/r/cT2mW1/2
It will be better if you convert the links at the front-end view with javascript to avoid a excess loading of server. But it's your choise.
Single youtube video has different links like these:
1) https://www.youtube.com/watch?v=lfKON5AMvTM
2) https://youtu.be/lfKON5AMvTM
For this reason you should write your codes like this to catch every type of youtube links:
public function convertYouTube($content) {
$content = preg_replace("/http(s)?:\/\/youtu\.be\/([^\40\t\r\n\<]+)/i", '<iframe width="420" height="315" src="https://www.youtube.com/embed/$2"></iframe>', $content);
$content = preg_replace("/http(s)?:\/\/(w{3}\.)?youtube\.com\/watch\/?\?v=([^\40\t\r\n\<]+)/i", '<iframe width="420" height="315" src="https://www.youtube.com/embed/$3"></iframe>', $content);
return $content;
}
I parse through a text that contains several links. Some of them contain white spaces but have a file ending. My current pattern is:
preg_match_all('#\bhttps?://[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/))#', $links, $match);
This works the same way:
preg_match_all('/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/', $links, $match);
I don't know much about the patterns and didn't find a good tutorial that explains the meaning of all possible patterns and shows examples.
How could I filter an URL like this:
http://my-url.com/my doc.doc or even http://my-url.com/my doc with more white spaces.doc
The \s in that preg_match_all functions stands for a white space. But how could I check if there is a file ending behind one or some white spaces?
Is it possible?
Why not just make use of PHP's FILTER functions. ?
<?php
$url = "http://my-url.com/my doc.doc";
if(!filter_var($url, FILTER_VALIDATE_URL))
{
echo "URL is not valid";
}
else
{
echo "URL is valid";
}
OUTPUT :
URL is not valid
this might be what you are looking for which uses urlencode
$file = "my doc with more white spaces.doc";
echo " http://my-url.com/" . urlencode($file);
which produces:
http://my-url.com/my+doc+with+more+white+spaces.doc
or with rawurlencode
produces:
http://my-url.com/my%20doc%20with%20more%20white%20spaces.doc
EDIT: Something like the following might help to parse your urls with parse_url
DEMO
$url = 'http://my-url.com/my doc with more white spaces.doc';
$purl = parse_url($url);
$rurl = "";
if(isset($purl['scheme'])){
$rurl .= $purl['scheme'] . "://";
}
if(isset($purl['host'], $purl['path'])){
$rurl .= $purl['host'] . rawurlencode($purl['path']);
}
if($rurl === ""){
$rurl = $url;#error parsing error/invalid url?
}
for sub directories you can do
$purl['path'] = implode('/', array_map(function($value){return rawurlencode($value);}, explode('/', $purl['path'])));
I don't know much about php but this regex
(http|ftp)(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?
will match every url even with spaces
I think this regex will do.
use this regex
preg_match_all("/^(?si)(?>\s*)(((?>https?:\/\/(?>www\.)?)?(?=[\.-a-z0-9]{2,253}(?>$|\/|\?|\s))[a-z0-9][a-z0-9-]{1,62}(?>\.[a-z0-9][a-z0-9-]{1,62})+)(?>(?>\/|\?).*)?)?(?>\s*)$/", $input_lines, $output_array);
Demo
Alright after doing this really helpful tutorial I finally know how the regex syntax works. After finishing it I experimented a bit on this site
It was pretty easy after figuring out that all hyperlinks in my parsed document were in between quotation marks so I just had to change the regex to:
preg_match_all('#\bhttps?://[^()<>"]+#', $links, $match);
so that after the " it is looking for the next match that begins with http.
But that's not the full solution yet. The user Class was right - without rawurlencode the filenames it won't work.
So the next step was this:
function endsWith($haystack, $needle)
{
return $needle === "" || substr($haystack, -strlen($needle)) === $needle;
}
if(endsWith($textlink, ".doc") || endsWith($textlink, ".docx") || endsWith($textlink, ".pdf") || endsWith($textlink, ".jpg") || endsWith($textlink, ".jpeg") || endsWith($textlink, ".png")){
$file = substr( $textlink, strrpos( $textlink, '/' )+1 );
$rest_url=substr($textlink, 0, strrpos($textlink, '/' )+1 );
$textlink=$rest_url.rawurlencode($file);
}
That filters the filenames from the URLs and rawurlencodes them so that the the output links are correct.
I think this should work:
$url = '...';
$url_new = '';
$array = explode(' ',$url);
foreach($array as $name => $val){
if ($val!=' '){
$url_new = $url_new.$val;
}
}
I've a Glype proxy and I want not parse external URLs. All URLs on the page are automatically converted to: http://proxy.com/browse.php?u=[URL HERE]. Example: If I visit The Pirate Bay on my proxy, then I want not to parse the following URLs:
ByteLove.com (Not to: http://proxy.com/browse.php?u=http://bytelove.com&b=0)
BayFiles.com (Not to: http://proxy.com/browse.php?u=http://bayfiles.com&b=0)
BayIMG.com (Not to: http://proxy.com/browse.php?u=http://bayimg.com&b=0)
PasteBay.com (Not to: http://proxy.com/browse.php?u=http://pastebay.com&b=0)
Ipredator.com (Not to: http://proxy.com/browse.php?u=https://ipredator.se&b=0)
etc.
Of course I want to keep the internal URLs, so:
thepiratebay.se/browse (To: http://proxy.com/browse.php?u=http://thepiratebay.se/browse&b=0)
thepiratebay.se/top (To: http://proxy.com/browse.php?u=http://thepiratebay.se/top&b=0)
thepiratebay.se/recent (To: http://proxy.com/browse.php?u=http://thepiratebay.se/recent&b=0)
etc.
Is there a preg_replace to replace all URL's except thepiratebay.se and there subdomains (as in the example)? An other function is also welcome. (Such as domdocument, querypath, substr or strpos. Not str_replace because then I should define all URLs)
I've found something, but I'm not familiar with preg_replace:
$exclude = '.thepiratebay.se';
$pattern = '(https?\:\/\/.*?\..*?)(?=\s|$)';
$message= preg_replace("~(($exclude)?($pattern))~i", '$2$5$6', $message);
I'll guess you would need to provide a whitelist to tell which domains should be proxied:
$whitelist = array();
$whitelist[] = "internal1.se";
$whitelist[] = "internal2.no";
$whitelist[] = "internal3.com";
// and so on...
$string = 'External link 1<br>';
$string .= 'Internal link 1<br>';
$string .= 'Internal link 2<br>';
$string .= 'External link 2<br>';
//Assuming the URL always is inside '' or "" you can use this pattern:
$pattern = '#(https?://proxy\.org/browse\.php\?u=(https?[^&|\"|\']*)(&?[^&|\"|\']*))#i';
$string = preg_replace_callback($pattern, "my_callback", $string);
//I had only PHP 5.2 on my server, so I decided to use a callback function.
function my_callback($match) {
global $whitelist;
// set return bypass proxy URL
$returnstring = urldecode($match[2]);
foreach ($whitelist as $white) {
// check if URL matches whitelist
if (stripos($match[2], $white) > 0) {
$returnstring = $match[0];
break; } }
return $returnstring;
}
echo "NEW STRING[:\n" . $string . "\n]\n";
you can use preg_replace_callback() to execute a callback function for every match. In that function you can determine if the matched string should be converted or not.
<?php
$string = 'http://foobar.com/baz and http://example.org/bumm';
$pattern = '#(https?\:\/\/.*?\..*?)(?=\s|$)#i';
$string = preg_replace_callback($pattern, function($match) {
if (stripos($match[0], 'example.org/') !== false) {
// exclude all URLs containing example.org
return $match[0];
} else {
return 'http://proxy.com/?u=' . urlencode($match[0]);
}
}, $string);
echo $string, "\n";
(Example is using PHP 5.3 closure notation)