preg_replace out CSS comments? - php

I'm writing a quick preg_replace to strip comments from CSS. CSS comments usually have this syntax:
/* Development Classes*/
/* Un-comment me for easy testing
(will make it simpler to see errors) */
So I'm trying to kill everything between /* and */, like so:
$pattern = "#/\*[^(\*/)]*\*/#";
$replace = "";
$v = preg_replace($pattern, $replace, $v);
No dice! It seems to be choking on the forward slashes, because I can get it to remove the text of comments if I take the /s out of the pattern. I tried some simpler patterns to see if I could just lose the slashes, but they return the original string unchanged:
$pattern = "#/#";
$pattern = "/\//";
Any ideas on why I can't seem to match those slashes? Thanks!

Here's a solution:
$regex = array(
"`^([\t\s]+)`ism"=>'',
"`^\/\*(.+?)\*\/`ism"=>"",
"`([\n\A;]+)\/\*(.+?)\*\/`ism"=>"$1",
"`([\n\A;\s]+)//(.+?)[\n\r]`ism"=>"$1\n",
"`(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+`ism"=>"\n"
);
$buffer = preg_replace(array_keys($regex),$regex,$buffer);
Taken from the Script/Stylesheet Pre-Processor in Samstyle PHP Framework
See: http://code.google.com/p/samstyle-php-framework/source/browse/trunk/sp.php
csstest.php:
<?php
$buffer = file_get_contents('test.css');
$regex = array(
"`^([\t\s]+)`ism"=>'',
"`^\/\*(.+?)\*\/`ism"=>"",
"`([\n\A;]+)\/\*(.+?)\*\/`ism"=>"$1",
"`([\n\A;\s]+)//(.+?)[\n\r]`ism"=>"$1\n",
"`(^[\r\n]*|[\r\n]+)[\s\t]*[\r\n]+`ism"=>"\n"
);
$buffer = preg_replace(array_keys($regex),$regex,$buffer);
echo $buffer;
?>
test.css:
/* testing to remove this */
.test{}
Output of csstest.php:
.test{}

I don't believe you can use grouping within a negated character class like you have there. What you're going to want to use is called Assertions, of which there are two types. "look-ahead" and "look-behind".
The pattern you're looking for in English is basically, "forward slash, literal wildcard, anything that isn't followed by a forward slash or anything other than a literal wildcard that is followed by a forward slash or a forward slash that isn't preceded by a literal wildcard zero or more times, literal wild card, forward slash"
<?php
$str = '/* one */ onemore
/*
* a
* b
**/
stuff // single line
/**/';
preg_match_all('#/\*(?:.(?!/)|[^\*](?=/)|(?<!\*)/)*\*/#s', $str, $matches);
print_r($matches);
?>

I had the same issue.
To solve it, I first simplified the code by replacing "/ASTERIX" and "ASTERIX/" with different identifiers and then used those as the start and end markers.
$code = str_replace("/*","_COMSTART",$code);
$code = str_replace("*/","COMEND_",$code);
$code = preg_replace("/_COMSTART.*?COMEND_/s","",$code);
The /s flag tells the search to go onto new lines

There's a number of suggestions out there, but this one seems to work for me:
$v=preg_replace('!/\*[^*]*\*+([^/][^*]*\*+)*/!', '', $v);
so
"/* abc */.test { color:white; } /* XYZ */.test2 { padding:1px; /* DEF */} /* QWERTY */"
gives
.test { color:white; } .test2 { padding:1px; }
see https://onlinephp.io/c/2ae1c for working test

Just for fun(and small project of course) I made a non-regexp version of a such code (I hope it's faster):
function removeCommentFromCss( $textContent )
{
$clearText = "";
$charsInCss = strlen( $textContent );
$searchForStart = true;
for( $index = 0; $index < $charsInCss; $index++ )
{
if ( $searchForStart )
{
if ( $textContent[ $index ] == "/" && (( $index + 1 ) < $charsInCss ) && $textContent[ $index + 1 ] == "*" )
{
$searchForStart = false;
continue;
}
else
{
$clearText .= $textContent[ $index ];
}
}
else
{
if ( $textContent[ $index ] == "*" && (( $index + 1 ) < $charsInCss ) && $textContent[ $index + 1 ] == "/" )
{
$searchForStart = true;
$index++;
continue;
}
}
}
return $clearText;
}

Related

Custom realpath() using regex

I want to create my personal realpath() function which uses regex and doesn't expect that file exists.
What I did so far
function my_realpath (string $path): string {
if ($path[0] != '/') {
$path = __DIR__.'/../../'.$path;
}
$path = preg_replace("~/\./~", '', $path);
$path = preg_replace("~\w+/\.\./~", '', $path); // removes ../ from path
return $path;
}
What is not correct
The problem is if I have this string:
"folders/folder1/folder5/../../folder2"
it removes only first occurence (folder5/../):
"folders/folder1/../folder2"
Question
How to I remove (with regex) all folders followed by same number of "../" after them?
Examples
"folders/folder1/folder5/../../folder2" -> "folders/folder2"
"folders/folder1/../../../folder2" -> "../folder2"
"folders/folder1/folder5/../folder2" -> "folders/folder1/folder2"
Can we tell regex that: "~(\w+){n}/(../){n}~", n being greedy but same in both groups?
You can use a recursion-based pattern like
preg_replace('~(?<=/|^)(?!\.\.(?![^/]))[^/]+/(?R)?\.\.(?:/|$)~', '', $url)
See the regex demo. Details:
(?<=/|^) - immediately to the left, there must be / or start of string (if the strings are served as separate strings, eqqual to a more efficient (?<![^/]))
(?!\.\.(?![^/])) - immediately to the right, there should be no .. that are followed with / or end of string
[^/]+ - one or more chars other than /
/ - a / char
(?R)? - recurse the whole pattern, optionally
\.\.(?:/|$) - .. followed with a / char or end of string.
See the PHP demo:
$strings = ["folders/folder1/folder5/../../folder2", "folders/folder1/../../../folder2", "folders/folder1/folder5/../folder2"];
foreach ($strings as $url) {
echo preg_replace('~(?<=/|^)(?!\.\.(?![^/]))[^/\n]+/(?R)?\.\.(?:/|$)~', '', $url) . PHP_EOL;
}
// => folders/folder2, ../folder2, folders/folder1/folder2
Alternatively, you can use
(?<![^/])(?!\.\.(?![^/]))[^/]+/\.\.(?:/|$)
See the regex demo. Details:
(?<![^/]) - immediately to the left, there must be start of string or a / char
(?!\.\.(?![^/])) - immediately to the right, there should be no .. that are followed with / or end of string
[^/]+ - one or more chars other than /
/\.\. - /.. substring followed with...
(?:/|$) - / or end of string.
See the PHP demo:
$strings = ["folders/folder1/folder5/../../folder2", "folders/folder1/../../../folder2", "folders/folder1/folder5/../folder2"];
foreach ($strings as $url) {
$count = 0;
do {
$url = preg_replace('~(?<![^/])(?!\.\.(?![^/]))[^/]+/\.\.(?:/|$)~', '', $url, -1, $count);
} while ($count > 0);
echo "$url" . PHP_EOL;
}
The $count argument in preg_replace('~(?<![^/])(?!\.\.(?![^/]))[^/]+/\.\.(?:/|$)~', '', $url, -1, $count) keeps the number of replacements, and the replacing goes on until no match is found.
Output:
folders/folder2
../folder2
folders/folder1/folder2
You could as well use a non-regex approach:
<?php
$strings = ["folders/folder1/folder5/../../folder2", "folders/folder1/../../../folder2", "folders/folder1/folder5/../folder2"];
function make_path($string) {
$parts = explode("/", $string);
$new_folder = [];
for ($i=0; $i<count($parts); $i++) {
if (($parts[$i] == "..") and count($new_folder) >= 1) {
array_pop($new_folder);
} else {
$new_folder[] = $parts[$i];
}
}
return implode("/", $new_folder);
}
$new_folders = array_map('make_path', $strings);
print_r($new_folders);
?>
This yields
Array
(
[0] => folders/folder2
[1] => ../folder2
[2] => folders/folder1/folder2
)
See a demo on ideone.com.

PHP Preg Replace. Remove strings inside {~ string ~} pattern, but skip <pre>{~ string ~}</pre> [duplicate]

I am using a WordPress plugin named Acronyms (https://wordpress.org/plugins/acronyms/). This plugin replaces acronyms with their description. It uses a PHP PREG_REPLACE function.
The issue is that it replaces the acronyms contained in a <pre> tag, which I use to present a source code.
Could you modify this expression so that it won't replace acronyms contained inside <pre> tags (not only directly, but in any moment)? Is it possible?
The PHP code is:
$text = preg_replace(
"|(?!<[^<>]*?)(?<![?.&])\b$acronym\b(?!:)(?![^<>]*?>)|msU"
, "<acronym title=\"$fulltext\">$acronym</acronym>"
, $text
);
You can use a PCRE SKIP/FAIL regex trick (also works in PHP) to tell the regex engine to only match something if it is not inside some delimiters:
(?s)<pre[^<]*>.*?<\/pre>(*SKIP)(*F)|\b$acronym\b
This means: skip all substrings starting with <pre> and ending with </pre>, and only then match $acronym as a whole word.
See demo on regex101.com
Here is a sample PHP demo:
<?php
$acronym = "ASCII";
$fulltext = "American Standard Code for Information Interchange";
$re = "/(?s)<pre[^<]*>.*?<\\/pre>(*SKIP)(*F)|\\b$acronym\\b/";
$str = "<pre>ASCII\nSometext\nMoretext</pre>More text \nASCII\nMore text<pre>More\nlines\nASCII\nlines</pre>";
$subst = "<acronym title=\"$fulltext\">$acronym</acronym>";
$result = preg_replace($re, $subst, $str);
echo $result;
Output:
<pre>ASCII</pre><acronym title="American Standard Code for Information Interchange">ASCII</acronym><pre>ASCII</pre>
It is also possible to use preg_split and keep the code block as a group, only replace the non-code block part then combine it back as a complete string:
function replace($s) {
return str_replace('"', '"', $s); // do something with `$s`
}
$text = 'Your text goes here...';
$parts = preg_split('#(<\/?[-:\w]+(?:\s[^<>]+?)?>)#', $text, null, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE);
$text = "";
$x = 0;
foreach ($parts as $v) {
if (trim($v) === "") {
$text .= $v;
continue;
}
if ($v[0] === '<' && substr($v, -1) === '>') {
if (preg_match('#^<(\/)?(?:code|pre)(?:\s[^<>]+?)?>$#', $v, $m)) {
$x = isset($m[1]) && $m[1] === '/' ? 0 : 1;
}
$text .= $v; // this is a HTML tag…
} else {
$text .= !$x ? replace($v) : $v; // process or skip…
}
}
return $text;
Taken from here.

How to strip all HTML tags from a string except tags contained inside backticks?

I'm using PHP's strip_tags() function to strip tags from a string. For example:
$text = strip_tags( $text );
My aim is to strip all tags unless the tags happen to be contained inside backticks. If tags are contained inside backticks, I don't want to strip them.
My first thought was to try using the second parameter of strip_tags(). This will let me specify allowable tags which are not to be removed. For example, strip_tags( $text, '<strong>'). However, this doesn't quite do what I'm looking for.
How can I strip all HTML tags from a string except tags that happen to be contained inside backticks?
Ref: http://php.net/manual/en/function.strip-tags.php
To back my comment with an answer, something like:
function strip($input)
{
preg_match_all('/`([^`]+)`/', $input, $retain);
for($i = 0; $i < count($retain[0]); $i++)
{
// Replace HTML wrapped in backticks with match index.
$input = str_replace($retain[0][$i], "{{$i}}", $input);
}
// Strip tags.
$input = strip_tags($input);
for($i = 0; $i < count($retain[0]); $i++)
{
// Replace previous replacements with relevant data.
$replace = $retain[1][$i];
// Do some stuff with $replace here - maybe check that it's a tag
// you're comfortable with else use htmlspecialchars(), etc.
// ...
$input = str_replace("{{$i}}", $replace, $input);
}
return $input;
}
With a test:
echo strip("Hello <strong>there</strong>, what's `<em>`up`</em>`?");
// Output: Hello there, what's <em>up</em>?
If your escape sequence is fixed as a `, you can do something much simpler than obfuscation (which Marty suggests in his comment, and which is one of my favorite techniques if I'm being perfectly honest). Even if you were to use obfuscation or a preg_replace, you would still need to account for escaped ticks.
Instead, you can do something like:
$strippeddown = array();
$breakdown = explode('`', $text);
$j = 1;
foreach ($breakdown AS $i => $gather)
{
if ($j > 1)
{
$j--;
unset($breakdown["$i"]);
continue;
}
$j = 1;
while (strrpos($gather, '\\') === 0 AND isset($breakdown[$i + $j]))
{
$gather = $breakdown[$i + $j];
$breakdown["$i"] .= '`' . $gather;
$j++;
}
}
$breakdown = array_values($breakdown);
foreach ($breakdown AS $i => $gather)
{
if (!$i OR !($i % 2))
{
$strippeddown[] = strip_tags($gather);
}
else
{
$strippeddown[] = $gather;
}
}
$text = implode('`', $strippeddown);

URL conversion (into link and urldecode)

I want to ask 2 questions about url conversion in php.
1 question: I need to convert text into link. I've done my own preg and also read many forums, but all solutions are connected with www. or (ht|f)tp(s), but I need preg that will convert domain names even without www and http in text, for example:
I like stackoverflow.com very much
into
I like <a href='http://stackoverflow.com'>stackoverflow.com</a> very much
Sure it must consider points and commas and etc., like:
I like stackoverflow.com.
into
I like <a href='http://stackoverflow.com'>stackoverflow.com</a>.
And one more question: links with url-encoded symbols on wiki are displayed as they are, but on other sites they are displayed like url-encoded string (%XX%XX%XX). How did wiki do this? Thanks!
For your first question, I would not recommend you to that, it is very difficult to know if the a word containing a dot is a domain name or not, and people often forget to put a space after the dot in the middle of a paragraph.
for your second question, it is simple, you url encode the link in the href but not between the open and close a tag. For example :
http://site.na/test.php?t=sqdl&54"dfd°=+
The function auto_link from CodeIgniter URL helper can help you:
if ( ! function_exists('auto_link'))
{
function auto_link($str, $type = 'both', $popup = FALSE)
{
if ($type != 'email')
{
if (preg_match_all("#(^|\s|\()((http(s?)://)|(www\.))(\w+[^\s\)\<]+)#i", $str, $matches))
{
$pop = ($popup == TRUE) ? " target=\"_blank\" " : "";
for ($i = 0; $i < count($matches['0']); $i++)
{
$period = '';
if (preg_match("|\.$|", $matches['6'][$i]))
{
$period = '.';
$matches['6'][$i] = substr($matches['6'][$i], 0, -1);
}
$str = str_replace($matches['0'][$i],
$matches['1'][$i].'<a href="http'.
$matches['4'][$i].'://'.
$matches['5'][$i].
$matches['6'][$i].'"'.$pop.'>http'.
$matches['4'][$i].'://'.
$matches['5'][$i].
$matches['6'][$i].'</a>'.
$period, $str);
}
}
}
if ($type != 'url')
{
if (preg_match_all("/([a-zA-Z0-9_\.\-\+]+)#([a-zA-Z0-9\-]+)\.([a-zA-Z0-9\-\.]*)/i", $str, $matches))
{
for ($i = 0; $i < count($matches['0']); $i++)
{
$period = '';
if (preg_match("|\.$|", $matches['3'][$i]))
{
$period = '.';
$matches['3'][$i] = substr($matches['3'][$i], 0, -1);
}
$str = str_replace($matches['0'][$i], safe_mailto($matches['1'][$i].'#'.$matches['2'][$i].'.'.$matches['3'][$i]).$period, $str);
}
}
}
return $str;
}
}

PHP Remove URL from string

If I have a string that contains a url (for examples sake, we'll call it $url) such as;
$url = "Here is a funny site http://www.tunyurl.com/34934";
How do i remove the URL from the string?
Difficulty is, urls might also show up without the http://, such as ;
$url = "Here is another funny site www.tinyurl.com/55555";
There is no HTML present. How would i start a search if http or www exists, then remove the text/numbers/symbols until the first space?
I re-read the question, here is a function that would work as intended:
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,'http') || (count(explode('.',$u)) > 1)) {
unset($U[$k]);
return cleaner( implode(' ',$U));
}
}
return implode(' ',$U);
}
$url = "Here is another funny site www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);
Edit #2/#3 (I must be bored). Here is a version that verifies there is a TLD within the URL:
function containsTLD($string) {
preg_match(
"/(AC($|\/)|\.AD($|\/)|\.AE($|\/)|\.AERO($|\/)|\.AF($|\/)|\.AG($|\/)|\.AI($|\/)|\.AL($|\/)|\.AM($|\/)|\.AN($|\/)|\.AO($|\/)|\.AQ($|\/)|\.AR($|\/)|\.ARPA($|\/)|\.AS($|\/)|\.ASIA($|\/)|\.AT($|\/)|\.AU($|\/)|\.AW($|\/)|\.AX($|\/)|\.AZ($|\/)|\.BA($|\/)|\.BB($|\/)|\.BD($|\/)|\.BE($|\/)|\.BF($|\/)|\.BG($|\/)|\.BH($|\/)|\.BI($|\/)|\.BIZ($|\/)|\.BJ($|\/)|\.BM($|\/)|\.BN($|\/)|\.BO($|\/)|\.BR($|\/)|\.BS($|\/)|\.BT($|\/)|\.BV($|\/)|\.BW($|\/)|\.BY($|\/)|\.BZ($|\/)|\.CA($|\/)|\.CAT($|\/)|\.CC($|\/)|\.CD($|\/)|\.CF($|\/)|\.CG($|\/)|\.CH($|\/)|\.CI($|\/)|\.CK($|\/)|\.CL($|\/)|\.CM($|\/)|\.CN($|\/)|\.CO($|\/)|\.COM($|\/)|\.COOP($|\/)|\.CR($|\/)|\.CU($|\/)|\.CV($|\/)|\.CX($|\/)|\.CY($|\/)|\.CZ($|\/)|\.DE($|\/)|\.DJ($|\/)|\.DK($|\/)|\.DM($|\/)|\.DO($|\/)|\.DZ($|\/)|\.EC($|\/)|\.EDU($|\/)|\.EE($|\/)|\.EG($|\/)|\.ER($|\/)|\.ES($|\/)|\.ET($|\/)|\.EU($|\/)|\.FI($|\/)|\.FJ($|\/)|\.FK($|\/)|\.FM($|\/)|\.FO($|\/)|\.FR($|\/)|\.GA($|\/)|\.GB($|\/)|\.GD($|\/)|\.GE($|\/)|\.GF($|\/)|\.GG($|\/)|\.GH($|\/)|\.GI($|\/)|\.GL($|\/)|\.GM($|\/)|\.GN($|\/)|\.GOV($|\/)|\.GP($|\/)|\.GQ($|\/)|\.GR($|\/)|\.GS($|\/)|\.GT($|\/)|\.GU($|\/)|\.GW($|\/)|\.GY($|\/)|\.HK($|\/)|\.HM($|\/)|\.HN($|\/)|\.HR($|\/)|\.HT($|\/)|\.HU($|\/)|\.ID($|\/)|\.IE($|\/)|\.IL($|\/)|\.IM($|\/)|\.IN($|\/)|\.INFO($|\/)|\.INT($|\/)|\.IO($|\/)|\.IQ($|\/)|\.IR($|\/)|\.IS($|\/)|\.IT($|\/)|\.JE($|\/)|\.JM($|\/)|\.JO($|\/)|\.JOBS($|\/)|\.JP($|\/)|\.KE($|\/)|\.KG($|\/)|\.KH($|\/)|\.KI($|\/)|\.KM($|\/)|\.KN($|\/)|\.KP($|\/)|\.KR($|\/)|\.KW($|\/)|\.KY($|\/)|\.KZ($|\/)|\.LA($|\/)|\.LB($|\/)|\.LC($|\/)|\.LI($|\/)|\.LK($|\/)|\.LR($|\/)|\.LS($|\/)|\.LT($|\/)|\.LU($|\/)|\.LV($|\/)|\.LY($|\/)|\.MA($|\/)|\.MC($|\/)|\.MD($|\/)|\.ME($|\/)|\.MG($|\/)|\.MH($|\/)|\.MIL($|\/)|\.MK($|\/)|\.ML($|\/)|\.MM($|\/)|\.MN($|\/)|\.MO($|\/)|\.MOBI($|\/)|\.MP($|\/)|\.MQ($|\/)|\.MR($|\/)|\.MS($|\/)|\.MT($|\/)|\.MU($|\/)|\.MUSEUM($|\/)|\.MV($|\/)|\.MW($|\/)|\.MX($|\/)|\.MY($|\/)|\.MZ($|\/)|\.NA($|\/)|\.NAME($|\/)|\.NC($|\/)|\.NE($|\/)|\.NET($|\/)|\.NF($|\/)|\.NG($|\/)|\.NI($|\/)|\.NL($|\/)|\.NO($|\/)|\.NP($|\/)|\.NR($|\/)|\.NU($|\/)|\.NZ($|\/)|\.OM($|\/)|\.ORG($|\/)|\.PA($|\/)|\.PE($|\/)|\.PF($|\/)|\.PG($|\/)|\.PH($|\/)|\.PK($|\/)|\.PL($|\/)|\.PM($|\/)|\.PN($|\/)|\.PR($|\/)|\.PRO($|\/)|\.PS($|\/)|\.PT($|\/)|\.PW($|\/)|\.PY($|\/)|\.QA($|\/)|\.RE($|\/)|\.RO($|\/)|\.RS($|\/)|\.RU($|\/)|\.RW($|\/)|\.SA($|\/)|\.SB($|\/)|\.SC($|\/)|\.SD($|\/)|\.SE($|\/)|\.SG($|\/)|\.SH($|\/)|\.SI($|\/)|\.SJ($|\/)|\.SK($|\/)|\.SL($|\/)|\.SM($|\/)|\.SN($|\/)|\.SO($|\/)|\.SR($|\/)|\.ST($|\/)|\.SU($|\/)|\.SV($|\/)|\.SY($|\/)|\.SZ($|\/)|\.TC($|\/)|\.TD($|\/)|\.TEL($|\/)|\.TF($|\/)|\.TG($|\/)|\.TH($|\/)|\.TJ($|\/)|\.TK($|\/)|\.TL($|\/)|\.TM($|\/)|\.TN($|\/)|\.TO($|\/)|\.TP($|\/)|\.TR($|\/)|\.TRAVEL($|\/)|\.TT($|\/)|\.TV($|\/)|\.TW($|\/)|\.TZ($|\/)|\.UA($|\/)|\.UG($|\/)|\.UK($|\/)|\.US($|\/)|\.UY($|\/)|\.UZ($|\/)|\.VA($|\/)|\.VC($|\/)|\.VE($|\/)|\.VG($|\/)|\.VI($|\/)|\.VN($|\/)|\.VU($|\/)|\.WF($|\/)|\.WS($|\/)|\.XN--0ZWM56D($|\/)|\.XN--11B5BS3A9AJ6G($|\/)|\.XN--80AKHBYKNJ4F($|\/)|\.XN--9T4B11YI5A($|\/)|\.XN--DEBA0AD($|\/)|\.XN--G6W251D($|\/)|\.XN--HGBK6AJ7F53BBA($|\/)|\.XN--HLCJ6AYA9ESC7A($|\/)|\.XN--JXALPDLP($|\/)|\.XN--KGBECHTV($|\/)|\.XN--ZCKZAH($|\/)|\.YE($|\/)|\.YT($|\/)|\.YU($|\/)|\.ZA($|\/)|\.ZM($|\/)|\.ZW)/i",
$string,
$M);
$has_tld = (count($M) > 0) ? true : false;
return $has_tld;
}
function cleaner($url) {
$U = explode(' ',$url);
$W =array();
foreach ($U as $k => $u) {
if (stristr($u,".")) { //only preg_match if there is a dot
if (containsTLD($u) === true) {
unset($U[$k]);
return cleaner( implode(' ',$U));
}
}
}
return implode(' ',$U);
}
$url = "Here is another funny site badurl.badone somesite.ca/worse.jpg but this badsite.com www.tinyurl.com/55555 and http://www.tinyurl.com/55555 and img.hostingsite.com/badpic.jpg";
echo "Cleaned: " . cleaner($url);
returns:
Cleaned: Here is another funny site badurl.badone but this and and
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
Parsing text for URLs is hard and looking for pre-existing, heavily tested code that already does this for you would be better than writing your own code and missing edge cases. For example, I would take a look at the process in Django's urlize, which wraps URLs in anchors. You could port it over to PHP, and--instead of wrapping URLs in an anchor--just delete them from the text.
thanks mike,
update a bit, it return notice error,
'/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i'
$string = preg_replace('/\b(https?|ftp|file):\/\/[-A-Z0-9+&##\/%?=~_|$!:,.;]*[A-Z0-9+&##\/%=~_|$]/i', '', $string);
$url = "Here is a funny site http://www.tunyurl.com/34934";
$replace = 'http www .com .org .net';
$with = '';
$clean_url = clean($url,$replace,$with);
echo $clean_url;
function clean($url,$replace,$with) {
$replace = explode(" ",$replace);
$new_string = '';
$check = explode(" ",$url);
foreach($check AS $key => $value) {
foreach($replace AS $key2 => $value2 ) {
if (-1 < strpos( strtolower($value), strtolower($value2) ) ) {
$value = $with;
break;
}
}
$new_string .= " ".$value;
}
return $new_string;
}
You would need to write a regular expression to extract out the urls.

Categories