Replace // comments by /* comments */ Except in URLs [duplicate] - php

I need to remove the comment lines from my code.
preg_replace('!//(.*)!', '', $test);
It works fine. But it removes the website url also and left the url like http:
So to avoid this I put the same like preg_replace('![^:]//(.*)!', '', $test);
It's work fine. But the problem is if my code has the line like below
$code = 'something';// comment here
It will replace the comment line with the semicolon. that is after replace my above code would be
$code = 'something'
So it generates error.
I just need to delete the single line comments and the url should remain same.
Please help. Thanks in advance

try this
preg_replace('#(?<!http:)//.*#','',$test);
also read more about PCRE assertions http://cz.php.net/manual/en/regexp.reference.assertions.php

If you want to parse a PHP file, and manipulate the PHP code it contains, the best solution (even if a bit difficult) is to use the Tokenizer : it exists to allow manipulation of PHP code.
Working with regular expressions for such a thing is a bad idea...
For instance, you thought about http:// ; but what about strings that contain // ?
Like this one, for example :
$str = "this is // a test";

This can get complicated fast. There are more uses for // in strings. If you are parsing PHP code, I highly suggest you take a look at the PHP tokenizer. It's specifically designed to parse PHP code.
Question: Why are you trying to strip comments in the first place?
Edit: I see now you are trying to parse JavaScript, not PHP. So, why not use a javascript minifier instead? It will strip comments, whitespace and do a lot more to make your file as small as possible.

Related

Make certain part of string bold using preg_replace()

I have a string of a comment that looks like this:
**#First Last** rest of comment info.
What I need to do is replace the ** with <b> and </b> so that the name is bold.
I currently have this:
$result = preg_replace('/\*\*/i', '<b>$0</b>', $comment);
But that results in:
"<b>**</b>#First Last<b>**</b> rest of comment info"
Which is obviously not what I need. Does anyone have any suggestions on how to achieve this?
A better idea would be to use a markdown parser library which does this for you.
Improper use of the regular expressions can lead to security vulnerabilities with unclosed tags etc..
However, a simple approach could be this:
$result = preg_replace('/\*\*(.+)\*\*/sU', '<b>$1</b>', $comment);
The U means ungreedy and takes the first possible match.
The s modifies . to be any character, including newline.
(see modifiers page)
This also works with an example text like this:
**#First Last** rest of comment info **also bold**.
The next line also **has a bold example**. This spans **two
lines**.
I am sure there are counter examples which are broken with this. Again, please use a proper markdown library to do this for you (or copy their implementation if the LICENSE allows it).

Settings that could influence PHP str_replace behaviour

I am currently working on a replacement tool that will dynamically replace certain strings (including html) in a website using a smarty outputfilter.
For the replacement to take place, I am using PHP's str_ireplace method, which reads the code that is supposed to be replaced and the replacement code from a database, and then pass the result to the smarty output (using an output filter), in a similar way as the below.
$tpl_source = str_ireplace($replacements['sourceHTML'], $replacements['replacementHTML'], $tpl_source);
The problem is, that although it works great on my dev server, once uploaded to the live server replacements occasionally fail. The same replacements work just fine on my dev version though. After some examinations and googling there was not much I could find out regarding this issue. So my question is, what could influence str_replace's behavour?
Thanks
Edit with replacement example:
$htmlsource = file_get_contents('somefile.html');
$newstr = str_replace('Some text', 'sometext', $htmlsource); // the text to be replaced does exist in the html source
fails to replace. After some checking, it looks like the combination of "> creates a problem. But just the combination of it. If I try to change only (") it works, if I try to change only (>) it works.
It might be that special chars like umlauts do not display on the live server correctly and so str_replace() would fail, if there are specialchars inside the string you want to replace.
Is the input string identical on both systems? Have you verified this? Are you sure?
Things to check:
Are the HTML attributes in the same order?
Are the attribute values using the same kind quote marks? (eg <a href='#'> vs <a href="#">)
Is there any other stray HTML code getting in there?
Is the entity encoding the same? (eg vs   - same character; different HTML)
Is the character-set the same? (eg utf-8 vs ISO 8859-1: Accented characters will be encoded differently)
Any of these things will affect the result and produce the failures you're describing.
This was a trikcy problem, and it ended up having nothing to do with the str_replace method itself;
We are using smarty as a tamplating system. The str_replace method was used by a smarty ouput filter in order to change the html in some ocassions, just before it was delivered to the user.
Here is the Smarty outputfilter Code:
function smarty_outputfilter_replace($tpl_source, &$smarty)
{
$replacements = Content::getReplacementsForPage();
if (is_array($replacements))
{
foreach ($replacements as $replacementData)
{
$tpl_source = str_replace($replacementData['sourcecode'], $replacementData['replacementcode'], $tpl_source);
}
}
return ($tpl_source);
}
So this code failed now and then for now apparent reason... until I realized that the HTML code in the smarty template was being manipulated by an Apache filter.
This resulted into the source code in the browser (which we were using as the code to be replaced by something else) not being identical to the template code (which smarty was trying to modify). Result? str_replace failed :)

how to remove the comment line starts with // and not the url like http:// using preg_replace

I need to remove the comment lines from my code.
preg_replace('!//(.*)!', '', $test);
It works fine. But it removes the website url also and left the url like http:
So to avoid this I put the same like preg_replace('![^:]//(.*)!', '', $test);
It's work fine. But the problem is if my code has the line like below
$code = 'something';// comment here
It will replace the comment line with the semicolon. that is after replace my above code would be
$code = 'something'
So it generates error.
I just need to delete the single line comments and the url should remain same.
Please help. Thanks in advance
try this
preg_replace('#(?<!http:)//.*#','',$test);
also read more about PCRE assertions http://cz.php.net/manual/en/regexp.reference.assertions.php
If you want to parse a PHP file, and manipulate the PHP code it contains, the best solution (even if a bit difficult) is to use the Tokenizer : it exists to allow manipulation of PHP code.
Working with regular expressions for such a thing is a bad idea...
For instance, you thought about http:// ; but what about strings that contain // ?
Like this one, for example :
$str = "this is // a test";
This can get complicated fast. There are more uses for // in strings. If you are parsing PHP code, I highly suggest you take a look at the PHP tokenizer. It's specifically designed to parse PHP code.
Question: Why are you trying to strip comments in the first place?
Edit: I see now you are trying to parse JavaScript, not PHP. So, why not use a javascript minifier instead? It will strip comments, whitespace and do a lot more to make your file as small as possible.

Geshi with Markdown

Im trying to get GeSHi to work with markdown.
A simple use for Geshi is as follows:
$geshi = new GeSHi($message, 'c');
print $geshi->parse_code();
The above code takes in the whole of message and turns it into Highlighted code
I also have my Markdown Function
print Markdown($message);
I was trying to use call back function to preg_match the <pre> tags returned from markdown and run the geshi->parse_code(); function on the returned values
Here is my code
print preg_replace_callback(
'/<pre.*?>(.*?[<pre.*?>.*<\/pre>]*)<\/pre>/gism',
create_function(
// single quotes are essential here,
// or alternative escape all $ as \$
'$matches',
'$geshi = new GeSHi($matches[0], \'php\'); return $geshi->parse_code()'
),
Markdown($blog_res['message']));
Am i on the right track?
Is My Regex right? it works on http://gskinner.com/RegExr/
Thanks for the help
for future reference, you might want to check out my plugin for this:
https://github.com/drm/Markdown_Geshi
It is based on the regular markdown plugin adding a block marked with a shebang to highlight code, like this:
#!php
<?php print('This is PHP code'); ?>
Works pretty well and I use it on my own blog regularly.
it was the regex :(
instead of
/<pre.*?>(.*?[<pre.*?>.*<\/pre>]*)<\/pre>/gism
use (remove the global flag)
/<pre.*?>(.*?[<pre.*?>.*<\/pre>]*)<\/pre>/ism
But if you are using markdown, you have to remember to compensate for the code blocks that are on thier own therefore you need to replace only the ones that are in the format of <pre><code>...MyCode</code></pre> and leave out Hello <code>MyCode</code> Therefore you need the following
'/<pre.*?><code.*?>(.*?[<pre.*?><code.*?>.*<\/code><\/pre>]*)<\/code><\/pre>/ism',
I understand that you [were] looking to extend Markdown, adding support for GeSHi syntax highlighting. Beautify does this and more. For example, it can render graphs in DOT.
Beautify's approach to GeSHi code blocks differs from drm/Markdown_Geshi in that "fences" are used. For example:
~~~ php
<?php print('This is PHP code'); ?>
~~~
I'm not sure if Beautify was around back when this question was active, but it seemed worthy of mention in an answer.

PHP regex for filtering out urls from specific domains for use in a vBulletin plug-in

I'm trying to put together a plug-in for vBulletin to filter out links to filesharing sites. But, as I'm sure you often hear, I'm a newb to php let alone regexes.
Basically, I'm trying to put together a regex and use a preg_replace to find any urls that are from these domains and replace the entire link with a message that they aren't allowed. I'd want it to find the link whether it's hyperlinked, posted as plain text, or enclosed in [CODE] bb tags.
As for regex, I would need it to find URLS with the following, I think:
Starts with http or an anchor tag. I believe that the URLS in [CODE] tags could be processed the same as the plain text URLS and it's fine if the replacement ends up inside the [CODE] tag afterward.
Could contain any number of any characters before the domain/word
Has the domain somewhere in the middle
Could contain any number of any characters after the domain
Ends with a number of extentions such as (html|htm|rar|zip|001) or in a closing anchor tag.
I have a feeling that it's numbers 2 and 4 that are tripping me up (if not much more). I found a similar question on here and tried to pick apart the code a bit (even though I didn't really understand it). I now have this which I thought might work, but it doesn't:
<?php
$filterthese = array('domain1', 'domain2', 'domain3');
$replacement = 'LINKS HAVE BEEN FILTERED MESSAGE';
$regex = array('!^http+([a-z0-9-]+\.)*$filterthese+([a-z0-9-]+\.)*(html|htm|rar|zip|001)$!',
'!^<a+([a-z0-9-]+\.)*$filterthese+([a-z0-9-]+\.)*</a>$!');
$this->post['message'] = preg_replace($regex, $replacement, $this->post['message']);
?>
I have a feeling that I'm way off base here, and I admit that I don't fully understand php let alone regexes. I'm open to any suggestions on how to do this better, how to just make it work, or links to RTM (though I've read up a bit and I'm going to continue).
Thanks.
You can use parse_url on the URLs and look into the hashmap it returns. That allows you to filter for domains or even finer-grained control.
I think you can avoid the overhead of this in using the filter_var built-in function.
You may use this feature since PHP 5.2.0.
$good_url = filter_var( filter_var( $raw_url, FILTER_SANITIZE_URL), FILTER_VALIDATE_URL);
Hmm, my first guess: You put $filterthese directly inside a single-quoted string. That single quotes don't allow for variable substitution. Also, the $filterthese is an array, that should first be joined:
var $filterthese = implode("|", $filterthese);
Maybe I'm way off, because I don't know anything about vBulletin plugins and their embedded magic, but that points seem worth a check to me.
Edit: OK, on re-checking your provided source, I think the regexp line should read like this:
$regex = '!(?#
possible "a" tag [start]: )(<a[^>]+href=["\']?)?(?#
offending link: )https?://(?#
possible subdomains: )(([a-z0-9-]+\.)*\.)?(?#
domains to block: )('.implode("|", $filterthese).')(?#
possible path: )(/[^ "\'>]*)?(?#
possible "a" tag [end]: )(["\']?[^>]*>)?!';

Categories