How to separate possible URI from other content in PHP?

How to separate possible URI from other content in PHP? - php

What is the simplest and fastest way to check if string is single URL or TEXT (that might contain urls)
possible scenarios:
// successful scenario
$example[] = 'http://sub-domain.my-domain.com/folder/file.php?some=param';
// successful scenario
$example[] = '/assets/scripts/jquery.min.js?v=1.4';
// successful scenario
$example[] = 'jquery.min.js';
// this scenario should fail validation
$example[] = "http://www.domain.com welcome text\n and some other http://www.domain.com";
// this scenario should fail validation
$example[] = "scriptVar=50;";
I have tried to use native php functions like parse_url, filter_var but non of them work as expected.
UPDATE 1
To make it more clear, I'm trying to separate possible URI from script content that would be inserted as DOM element. All urls would go as SRC attribute and rest as content, example:
<script type="text/javascript" src="{$string}"></script>
<script type="text/javascript">{$string}</script>
UPDATE 2
By analysing possible content I come to conclusion that string containing white space character or semicolon mean that string could not be URI, I presume that this pattern could solve my problem:
preg_match('/[\s]|[;]/', $string);
would it cover all possible javascript/css code?

$exampleData = Array(
'http://sub-domain.my-domain.com/folder/file.php?some=param',
'/assets/scripts/jquery.min.js?v=1.4',
'<a href="/assets/scripts/jquery.min.js?v=1.4">',
'<a href="assets/scripts/jquery.min.js?v=1.4">',
'http://www.domain.com welcome text\n and some other http://www.domain.com',
);
foreach($exampleData as $example)
{
echo "Trying \"" . $example . "\" -> ";
echo (preg_match('%((http(s)?://|www\.)[^ \r\n]+|<a.+?href=(\'|")(http(s)?://|www\.|[^#])[^\4\r\n]*?\4.*?>)%i', $example)) ?
"Match" : "No match";
echo "\r\n";
}
This would produce:
Trying "http://sub-domain.my-domain.com/folder/file.php?some=param" -> Match
Trying "/assets/scripts/jquery.min.js?v=1.4" -> No match
Trying "<a href="/assets/scripts/jquery.min.js?v=1.4">" -> Match
Trying "<a href="assets/scripts/jquery.min.js?v=1.4">" -> Match
Trying "http://www.domain.com welcome text\n and some other http://www.domain.com" -> Match
Update:
After reading your last update. If you want to parse HTML. Use a DOM-parser like:
http://simplehtmldom.sourceforge.net/
Example:
include_once('simple_html_dom.php');
$dom = file_get_html('http://www.stackoverflow.com/');
foreach($dom->find('script') as $scriptElement)
{
if(strlen(trim($scriptElement->src)) > 0)
{
// Script with URI set
echo "<strong>Found script with URI</strong>";
echo "<p>" . $scriptElement->src . "</p>";
}
else
{
// Script with content
echo "<strong>Found script with content</strong>";
echo("<p>" . nl2br(htmlspecialchars($scriptElement->innertext)) . "</p>");
}
}
Would output something like(HTML stripped):
Found script with URI
http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js
Found script with URI
http://sstatic.net/js/master.min.js?v=afc76d4deac3
Found script with content
var imagePath='http://sstatic.net/stackoverflow/img/';
var inboxUnviewedCount = -1;
...etc

This function will return true if the passed text is an URL. It is based on a regex seen here on SO.
function validate_url ($url)
{
$regex = '/^(https?|ftp):\/\/'; //protocol
$regex .= '(([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+'; //username
$regex .= '(:([a-z0-9$_\.\+!\*\'\(\),;\?&=-]|%[0-9a-f]{2})+)?'; //password
$regex .= '#)?'; //auth requires #
$regex .= '((([a-z0-9][a-z0-9-]*[a-z0-9]\.)*'; //domain segments AND
$regex .= '[a-z][a-z0-9-]*[a-z0-9]'; //top level domain OR
$regex .= '|((\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])\.){3}';
$regex .= '(\d|[1-9]\d|1\d{2}|2[0-4][0-9]|25[0-5])'; //IP address
$regex .= ')(:\d+)?'; //port
$regex .= ')(((\/+([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)*'; //path
$regex .= '(\?([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)'; //query string
$regex .= '?)?)?'; //path and query string optional
$regex .= '(#([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)?'; //fragment
$regex .= '$/i';
return (preg_match($regex, $url) ? true : false);
}
You can try it here: http://www.exorithm.com/algorithm/view/validate_url
EDIT in response to comment, this function will validate URL fragments like /index.php or index.php
function validate_url_fragment ($url)
{
$regex = '/^(((\/?([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)*'; //path
$regex .= '(\?([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)'; //query string
$regex .= '?)?)?'; //path and query string optional
$regex .= '(#([a-z0-9$_\.\+!\*\'\(\),;:#&=-]|%[0-9a-f]{2})*)?'; //fragment
$regex .= '$/i';
return (preg_match($regex, $url) ? true : false);
}
if (validate_url_fragment($url) || validate_url($url)) {
//is url
} else {
//not url
}
(note that the empty string is valid, so you may want a special case for that)

filter_var should do what you want for a single URL:
<?php
$safe_url = filter_var( $unsafe_url, FILTER_SANITIZE_URL );
?>

Related

Prevent an link becoming double encoded in PHP

I have the following URL in a MySQL database for a PHP application - part of our system allows a user to edit their previous post with these links and save - however as the url gets encoded again when a user edits this is then breaks the url as displayed below.
Is there an easy way or existing PHP function to determine if the string already has been encoded and to alter the string to remove the unwanted characters so it remains in the expected output below.
Expected output
url:https://r5uy4lmtdqka6a1rzyexlusfl-902rjcrzfe6k93co7a644-tom.s3.eu-west-2.amazonaws.com/Carbon%20Monoxide/Summer%20CO%20Campaign/CO%20Summer%202022/CO%20Summer%20you%20can%20smell%20the%20BBQ%20-%20600x600.jpg
Actual output
url:https://r5uy4lmtdqka6a1rzyexlusfl-902rjcrzfe6k93co7a644-tom.s3.eu-west-2.amazonaws.com/Carbon%2520Monoxide/Summer%2520CO%2520Campaign/CO%2520Summer%25202022/CO%2520Summer%2520you%2520can%2520smell%2520the%2520BBQ%2520-%2520600x600.jpg

As suggested in comments, double decode, then encode (only the query string part).
<?php
$str = "https://r5uy4lmtdqka6a1rzyexlusfl-902rjcrzfe6k93co7a644-tom.s3.eu-west-2.amazonaws.com/Carbon%2520Monoxide/Summer%2520CO%2520Campaign/CO%2520Summer%25202022/CO%2520Summer%2520you%2520can%2520smell%2520the%2520BBQ%2520-%2520600x600.jpg";
$str = "https://r5uy4lmtdqka6a1rzyexlusfl-902rjcrzfe6k93co7a644-tom.s3.eu-west-2.amazonaws.com/Carbon%20Monoxide/Summer%20CO%20Campaign/CO%20Summer%202022/CO%20Summer%20you%20can%20smell%20the%20BBQ%20-%20600x600.jpg";
function fix_url($str)
{
$arr = explode('/', $str, 4);
$qs = $arr[3]; // add if at all check?
while (true) {
$decoded = urldecode($qs);
if ($decoded == $qs) {
break;
}
$qs = $decoded;
}
$encoded = urlencode($decoded);
$result = $arr[0] . '//' . $arr[2] . $encoded;
return $result;
}
echo fix_url($str);

String to URL but detect if url is image?

i'm trying to do something in PHP
I'm trying to get the link of an image -> store it to my DB, but I'd like the user to be able to store text before it, and after it, I've gotten my hands on a similar function for links, but the image part is missing.
As you can see the turnUrlIntoHyperlink does a regex check over the entire arg passed, turning the text that contains it to the url, so users can post something like
Hey check this cool site "https://stackoverflow.com" its dope!
And the entire argument posting to my database.
However i can't seem to get the same function working for the Convert Image, as it simply won't post and removed text before/after it before when i made the attempt.
How would i do this in a correct way, and can i combine these 2 functions in to 1 function?
function convertImg($string) {
return preg_replace('/((https?):\/\/(\S*)\.(jpg|gif|png)(\?(\S*))?(?=\s|$|\pP))/i', '<img src="$1" />', $string);
}
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $newLinks){
if(strstr( $newLinks, ":" ) === false){
$link = 'http://'.$newLinks;
}else{
$link = $newLinks;
}
// Create Search and Replace strings
$search = $newLinks;
$replace = ''.$link.'';
$string = str_replace($search, $replace, $string);
}
}
//Return result
return $string;
}
more explained in detail :
When i post a link like https://google.com/ I'd like it to be a href,
But if i post an image like https://image.shutterstock.com/image-photo/duck-on-white-background-260nw-1037486431.jpg , i'd like it to be a img src,
Currently, i'm storing it in my db and echoing it to a little debug panel,

Do you mean that you want to make an <img> inside <a> element?
Your turnUrlIntoHyperlink function have captured the url successfully, so we can just use explode to get string before and after the link.
$exploded = explode($link, $string);
$string_before = $exploded[0];
$string_after = $exploded[1];
Code example:
<?php
function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// add http protocol if the url does not already contain it
$newLinks = $url[0][0];
if(strstr( $newLinks, ":" ) === false){
$link = 'http://'.$newLinks;
}else{
$link = $newLinks;
}
$exploded = explode($link, $string);
$string_before = $exploded[0];
$string_after = $exploded[1];
return $string_before.'<img src="'.$link.'">'.$string_after;
}
return $string;
}
echo turnUrlIntoHyperlink('Hey check this cool site https://stackoverflow.com/img/myimage.png its dope!');
Output:
Hey check this cool site <img src="https://stackoverflow.com/img/myimage.png"> its dope!
Edit: the question has been edited
Since an image URL is just another kind of link/URL, your logic should go like this pseudocode:
if link is image and link is url
print <img src=link> tag
else if link is url and link is not image
print <a href=link> tag
else
print link
So you can just write a new function to "merge" those two function:
function convertToImgOrHyperlink($string) {
$result = convertImg($string);
if($result != $string) return $result;
$result = turnUrlIntoHyperlink($string);
if($result != $string) return $result;
return $string;
}
echo convertToImgOrHyperlink('Hey check this cool site https://stackoverflow.com/img/myimage.png its dope!');
echo "\r\n\r\n";
echo convertToImgOrHyperlink('Hey check this cool site https://stackoverflow.com/ its dope!');
echo "\r\n\r\n";
Output:
Hey check this cool site <img src="https://stackoverflow.com/img/myimage.png" /> its dope!
Hey check this cool site https://stackoverflow.com/ its dope!
The basic idea is that since image url is also a link, such check must be done first. Then if it's effective (input and return is different), then do <img> convertion. Otherwise do <a> convertion.

PHP find all links in the text

I want to find all links in the text like this:
Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test
I know i need to use preg_match_all, but have only idea in the head: start search from http|https|ftp and end search where space or end of the text appears, thats all i need really, so all links wiil be found properly.
Anyone can help me with php regexp pattern?
I think i need to use assertions in the end of pattern, but can`t understand their properly usage for now.
Any ideas? Thanx!

I'd go with something simple like ~[a-z]+://\S+~i
starts with protocol [a-z]+://
\S+ followed by one or more non-whitespaces where \S is a shorthand for [^ \t\r\n\f]
used modifier i (PCRE_CASELESS) (possibly not really necessery)
So it could look like this:
$pattern = '~[a-z]+://\S+~';
$str = 'Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test';
if($num_found = preg_match_all($pattern, $str, $out))
{
echo "FOUND ".$num_found." LINKS:\n";
print_r($out[0]);
}
outputs:
FOUND 4 LINKS:
Array
(
[0] => http://hello.world
[1] => http://google.com/file.jpg
[2] => https://hell.o.wor.ld/test?qwe=qwe
[3] => http://test.test/test
)
Test on eval.in

function turnUrlIntoHyperlink($string){
//The Regular Expression filter
$reg_exUrl = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
// Check if there is a url in the text
if(preg_match_all($reg_exUrl, $string, $url)) {
// Loop through all matches
foreach($url[0] as $newLinks){
if(strstr( $newLinks, ":" ) === false){
$link = 'http://'.$newLinks;
}else{
$link = $newLinks;
}
// Create Search and Replace strings
$search = $newLinks;
$replace = ''.$link.'';
$string = str_replace($search, $replace, $string);
}
}
//Return result
return $string;
}

<?php
// The Regular Expression filter
$reg_exUrl = "/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
// The Text you want to filter for urls
$text = "The text you want to filter goes here. http://google.com";
// Check if there is a url in the text
if(preg_match($reg_exUrl, $text, $url)) {
// make the urls hyper links
echo preg_replace($reg_exUrl, "{$url[0]} ", $text);
} else {
// if no urls in the text just return the text
echo $text;
}
?>
Reference:http://css-tricks.com/snippets/php/find-urls-in-text-make-links/

Works like a charm. use this.
$str= "Test text http://hello.world";
preg_match_all('/\b(?:(?:https?|ftp|file):\/\/|www\.|ftp\.)[-A-Z0-9+&##\/%=~_|$?!:,.]*[A-Z0-9+&##\/%=~_|$]/i', $str, $result, PREG_PATTERN_ORDER);
print_r($result[0]);

The suggested answers are great, but one of them miss www. case, the other http://
So, let's combine all of those:
$text = Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test
preg_match_all('/(((http|https|ftp|ftps)\:\/\/)|(www\.))[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\:[0-9]+)?(\/\S*)?/', $text, $results, PREG_PATTERN_ORDER);
print_r($results[0]);
The return value for PREG_PATTERN_ORDER will be Array of Arrays (results) so that $results[0] is an array of full pattern matches, $results[1] is an array of strings matched by the first parenthesized subpattern, and so on.

function turnUrlIntoHyperlink($string)
{
// The Regular Expression filter
$reg_exUrl = "/(http|https|ftp|ftps)://[a-zA-Z0-9-.]+.[a-zA-Z]{2,3}(/\S*)?/";
// Check if there is a url in the text
if (preg_match($reg_exUrl, $string, $url)) {
// make the urls hyper links
echo preg_replace($reg_exUrl, "<a target='_blank' href='{$url[0]}'>{$url[0]}</a>", $string);
} else {
// if no urls in the text just return the text
echo $string;
}
}

For converting URLs to tags, and recognizing URLs without http/https, try the below. It uses preg_replace_callback to avoid the issue in one of the other answers with the same URL appearing multiple times:
private function convertUrls($string) {
$url_pattern = '/(((http|https)\:\/\/)|(www\.))[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(\:[0-9]+)?(\/\S*)?/';
return preg_replace_callback($url_pattern,
function($matches) {
$match = $matches[0];
if (strstr($match, ":") === false) {
$url = "https://$match";
} else {
$url = $match;
}
return '' . $url . '';
},
$string);
}

Alternative to regexp it´s use this library
Works very good, butnot for very complex codes.
foreach($html->find('a') as $element)
echo $element->href . '<br>';
And easy to use. No regular expressions skills required:-)

Not regexp, but finds it all and makes sure that they are not already encompassed in a tag already. It also checks to make sure that the link isn't encapsulated in (), [], "" or anything else with an open and close.
$txt = "Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test I am already linked up
It was also done in 1927 (http://test.com/reference) Also check this out:http://test/index&t=27";
$holder = explode("http",$txt);
for($i = 1; $i < (count($holder));$i++) {
if (substr($holder[$i-1],-6) != 'href="') { // this means that the link is not alread in an a tag.
if (strpos($holder[$i]," ")!==false) //if the link is not the last item in the text block, stop at the first space
$href = substr($holder[$i],0,strpos($holder[$i]," "));
else //else it is the last item, take it
$href = $holder[$i];
if (ctype_punct(substr($holder[$i-1],strlen($holder[$i-1])-1)) && ctype_punct(substr($holder[$i],strlen($holder[$i])-1)))
$href = substr($href,0,-1); //if both the fron and back of the link are encapsulated in punctuation, truncate the link by one
$holder[$i] = implode("$href\" target=\"_blank\" class=\"link\">http$href</a>",explode($href,$holder[$i]));
$holder[$i-1] .= "<a href=\"";
}
}
$txt = implode("http",$holder);
echo $txt;
Output:
Test text http://hello.world Test text
http://google.com/file.jpg Test text https://hell.o.wor.ld/test?qwe=qwe Test text
test text http://test.test/test I am already linked up
It was also done in 1927 (http://test.com/reference) Also check this out:http://test/index&t=27

i use this function
<?php
function deteli($string){
$pos = strpos($string, 'http');
$spos = strpos($string, ' ', $pos);
$lst = $spos - $pos;
$bef = substr($string, 0, $pos);
$aft = substr($string, $spos);
if ($pos == true || $pos == 0) {
$link = substr($string, $pos, $lst);
$res = $bef . "<a href='" . $link . "' class='link' target='_blank'>link</a>" . $aft . "";
return $res;
}
else{
return $string;
}
}?>

Could any one say what is the my mistake in this regex

I want to use this regex for validating my urls in php with preg_match function but when i use it it says "Unknown modifier '&'"
what is the problem ?
$urlregex = "/^(http|ftp|https)\:\/\/";
// USER AND PASS (optional)
$urlregex .= "([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?#)?";
// HOSTNAME OR IP
$urlregex .= "[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)+"; // http://x.x = minimum
// PORT (optional)
$urlregex .= "(\:[0-9]{2,5})?";
// PATH (optional)
$urlregex .= "(\/([a-z0-9+\$_-]\.?)+)*\/?";
// GET Query (optional)
$urlregex .= "(\?[a-z+&\$_.-][a-z0-9;:#/&%=+\$_.-]*)?";
// ANCHOR (optional)
$urlregex .= "(#[a-z_.-][a-z0-9+\$_.-]*)?\$/";
if(preg_match($urlregex, $url) === 1)
{
$errors[] = "URL_ISNOTVALID";
$ok = false;
}

Looks like you forgot to escape a forward slash:
$urlregex .= "(\?[a-z+&\$_.-][a-z0-9;:#/&%=+\$_.-]*)?";
should be
$urlregex .= "(\?[a-z+&\$_.-][a-z0-9;:#\/&%=+\$_.-]*)?";

The / (slash) in your GET Query is seen as the termination of the regex. And not the / at the end of your regex added in the ANCHOR line.
So you need to escape that / in front of the &.
$urlregex .= "(\?[a-z+&\$_.-][a-z0-9;:#/&%=+\$_.-]*)?";
bvecomes
$urlregex .= "(\?[a-z+&\$_.-][a-z0-9;:#\/&%=+\$_.-]*)?";
thats all.

You could save some trouble by using filter_var instead.
if (false !== ($url = filter_var($url, FILTER_VALIDATE_URL))) {
echo "$url is a valid url";
}
You can optionally add these options as the third parameter (use binary or to combine them):
FILTER_FLAG_PATH_REQUIRED
FILTER_FLAG_QUERY_REQUIRED

Preg-replace - replace all URLs except a domain and its subdomains

I've a Glype proxy and I want not parse external URLs. All URLs on the page are automatically converted to: http://proxy.com/browse.php?u=[URL HERE]. Example: If I visit The Pirate Bay on my proxy, then I want not to parse the following URLs:
ByteLove.com (Not to: http://proxy.com/browse.php?u=http://bytelove.com&b=0)
BayFiles.com (Not to: http://proxy.com/browse.php?u=http://bayfiles.com&b=0)
BayIMG.com (Not to: http://proxy.com/browse.php?u=http://bayimg.com&b=0)
PasteBay.com (Not to: http://proxy.com/browse.php?u=http://pastebay.com&b=0)
Ipredator.com (Not to: http://proxy.com/browse.php?u=https://ipredator.se&b=0)
etc.
Of course I want to keep the internal URLs, so:
thepiratebay.se/browse (To: http://proxy.com/browse.php?u=http://thepiratebay.se/browse&b=0)
thepiratebay.se/top (To: http://proxy.com/browse.php?u=http://thepiratebay.se/top&b=0)
thepiratebay.se/recent (To: http://proxy.com/browse.php?u=http://thepiratebay.se/recent&b=0)
etc.
Is there a preg_replace to replace all URL's except thepiratebay.se and there subdomains (as in the example)? An other function is also welcome. (Such as domdocument, querypath, substr or strpos. Not str_replace because then I should define all URLs)
I've found something, but I'm not familiar with preg_replace:
$exclude = '.thepiratebay.se';
$pattern = '(https?\:\/\/.*?\..*?)(?=\s|$)';
$message= preg_replace("~(($exclude)?($pattern))~i", '$2$5$6', $message);

I'll guess you would need to provide a whitelist to tell which domains should be proxied:
$whitelist = array();
$whitelist[] = "internal1.se";
$whitelist[] = "internal2.no";
$whitelist[] = "internal3.com";
// and so on...
$string = 'External link 1<br>';
$string .= 'Internal link 1<br>';
$string .= 'Internal link 2<br>';
$string .= 'External link 2<br>';
//Assuming the URL always is inside '' or "" you can use this pattern:
$pattern = '#(https?://proxy\.org/browse\.php\?u=(https?[^&|\"|\']*)(&?[^&|\"|\']*))#i';
$string = preg_replace_callback($pattern, "my_callback", $string);
//I had only PHP 5.2 on my server, so I decided to use a callback function.
function my_callback($match) {
global $whitelist;
// set return bypass proxy URL
$returnstring = urldecode($match[2]);
foreach ($whitelist as $white) {
// check if URL matches whitelist
if (stripos($match[2], $white) > 0) {
$returnstring = $match[0];
break; } }
return $returnstring;
}
echo "NEW STRING[:\n" . $string . "\n]\n";

you can use preg_replace_callback() to execute a callback function for every match. In that function you can determine if the matched string should be converted or not.
<?php
$string = 'http://foobar.com/baz and http://example.org/bumm';
$pattern = '#(https?\:\/\/.*?\..*?)(?=\s|$)#i';
$string = preg_replace_callback($pattern, function($match) {
if (stripos($match[0], 'example.org/') !== false) {
// exclude all URLs containing example.org
return $match[0];
} else {
return 'http://proxy.com/?u=' . urlencode($match[0]);
}
}, $string);
echo $string, "\n";
(Example is using PHP 5.3 closure notation)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to separate possible URI from other content in PHP? - php

filter_var should do what you want for a single URL: <?php $safe_url = filter_var( $unsafe_url, FILTER_SANITIZE_URL ); ?>

Related

Prevent an link becoming double encoded in PHP

String to URL but detect if url is image?

PHP find all links in the text

Could any one say what is the my mistake in this regex

Preg-replace - replace all URLs except a domain and its subdomains

Categories

Resources