Related
I'm trying to develop a PHP application where it takes comments from users and then match the string to check if the comment is positive or negative. I have list of negative words in negative.txt file. If a word is matched from the word list, then I want a simple integer counter to increment by 1. I tried the some links and created the a code to check if the comment has is negative or positive but it is only matching the last word of the file.Here's the code what i have done.
<?php
function teststringforbadwords($comment)
{
$file="BadWords.txt";
$fopen = fopen($file, "r");
$fread = fread($fopen,filesize("$file"));
fclose($fopen);
$newline_ele = "\n";
$data_split = explode($newline_ele, $fread);
$new_tab = "\t";
$outoutArr = array();
//process uploaded file data and push in output array
foreach ($data_split as $string)
{
$row = explode($new_tab, $string);
if(isset($row['0']) && $row['0'] != ""){
$outoutArr[] = trim($row['0']," ");
}
}
//---------------------------------------------------------------
foreach($outoutArr as $word) {
if(stristr($comment,$word)){
return false;
}
}
return true;
}
if(isset($_REQUEST["submit"]))
{
$comments = $_REQUEST["comments"];
if (teststringforbadwords($comments))
{
echo 'string is clean';
}
else
{
echo 'string contains banned words';
}
}
?>
Link Tried : Check a string for bad words?
I added the strtolower function around both your $comments and your input from the file. That way if someone spells STUPID, instead of stupid, the code will still detect the bad word.
I also added trim to remove unnecessary and disruptive whitespace (like newline).
Finally, I changed the way how you check the words. I used a preg_match to split about all whitespace so we are checking only full words and don't accidentally ban incorrect strings.
<?php
function teststringforbadwords($comment)
{
$comment = strtolower($comment);
$file="BadWords.txt";
$fopen = fopen($file, "r");
$fread = strtolower(fread($fopen,filesize("$file")));
fclose($fopen);
$newline_ele = "\n";
$data_split = explode($newline_ele, $fread);
$new_tab = "\t";
$outoutArr = array();
//process uploaded file data and push in output array
foreach ($data_split as $bannedWord)
{
foreach (preg_split('/\s+/',$comment) as $commentWord) {
if (trim($bannedWord) === trim($commentWord)) {
return false;
}
}
}
return true;
}
1) Your storing $row['0'] only why not others index words. So problem is your ignoring some of word in text file.
Some suggestion
1) Insert the text in text file one by one i.e new line like this so you can access easily explode by newline to avoiding multiple explode and loop.
Example: sss.txt
...
bad
stupid
...
...
2) Apply trim and lowercase function to both comment and bad string.
Hope it will work as expected
function teststringforbadwords($comment)
{
$file="sss.txt";
$fopen = fopen($file, "r");
$fread = fread($fopen,filesize("$file"));
fclose($fopen);
foreach(explode("\n",$fread) as $word)
{
if(stristr(strtolower(trim($comment)),strtolower(trim($word))))
{
return false;
}
}
return true;
}
I am using the following code to place some ad code inside my content .
<?php
$content = apply_filters('the_content', $post->post_content);
$content = explode (' ', $content);
$halfway_mark = ceil(count($content) / 2);
$first_half_content = implode(' ', array_slice($content, 0, $halfway_mark));
$second_half_content = implode(' ', array_slice($content, $halfway_mark));
echo $first_half_content.'...';
echo ' YOUR ADS CODE';
echo $second_half_content;
?>
How can i modify this so that the 2 paragraphs (top and bottom) enclosing the ad code should not be the one having images. If the top or bottom paragraph has image then try for next 2 paragraphs.
Example: Correct Implementation on the right.
preg_replace version
This code steps through every paragraph ignoring those that contain image tags. The $pcount variable is incremented for every paragraph found without an image, if an image is encountered however, $pcount is reset to zero. Once $pcount reaches the point where it would hit two, the advert markup is inserted just before that paragraph. This should leave the advert markup between two safe paragraphs. The advert markup variable is then nullified so only one advert is inserted.
The following code is just for set up and could be modified to split the content differently, you could also modify the regular expression that is used — just in case you are using double BRs or something else to delimit your paragraphs.
/// set our advert content
$advert = '<marquee>BUY THIS STUFF!!</marquee>' . "\n\n";
/// calculate mid point
$mpoint = floor(strlen($content) / 2);
/// modify back to the start of a paragraph
$mpoint = strripos($content, '<p', -$mpoint);
/// split html so we only work on second half
$first = substr($content, 0, $mpoint);
$second = substr($content, $mpoint);
$pcount = 0;
$regexp = '/<p>.+?<\/p>/si';
The rest is the bulk of the code that runs the replacement. This could be modified to insert more than one advert, or to support more involved image checking.
$content = $first . preg_replace_callback($regexp, function($matches){
global $pcount, $advert;
if ( !$advert ) {
$return = $matches[0];
}
else if ( stripos($matches[0], '<img ') !== FALSE ) {
$return = $matches[0];
$pcount = 0;
}
else if ( $pcount === 1 ) {
$return = $advert . $matches[0];
$advert = '';
}
else {
$return = $matches[0];
$pcount++;
}
return $return;
}, $second);
After this code has been executed the $content variable will contain the enhanced HTML.
PHP versions prior to 5.3
As your chosen testing area does not support PHP 5.3, and so does not support anonymous functions, you need to use a slightly modified and less succinct version; that makes use of a named function instead.
Also, in order to support content that may not actually leave space for the advert in it's second half I have modified the $mpoint so that it is calculated to be 80% from the end. This will have the effect of including more in the $second part — but will also mean your adverts will be generally placed higher up in the mark-up. This code has not had any fallback implemented into it, because your question does not mention what should happen in the event of failure.
$advert = '<marquee>BUY THIS STUFF!!</marquee>' . "\n\n";
$mpoint = floor(strlen($content) * 0.8);
$mpoint = strripos($content, '<p', -$mpoint);
$first = substr($content, 0, $mpoint);
$second = substr($content, $mpoint);
$pcount = 0;
$regexp = '/<p>.+?<\/p>/si';
function replacement_callback($matches){
global $pcount, $advert;
if ( !$advert ) {
$return = $matches[0];
}
else if ( stripos($matches[0], '<img ') !== FALSE ) {
$return = $matches[0];
$pcount = 0;
}
else if ( $pcount === 1 ) {
$return = $advert . $matches[0];
$advert = '';
}
else {
$return = $matches[0];
$pcount++;
}
return $return;
}
echo $first . preg_replace_callback($regexp, 'replacement_callback', $second);
You could try this:
<?php
$ad_code = 'SOME SCRIPT HERE';
// Your code.
$content = apply_filters('the_content', $post->post_content);
// Split the content at the <p> tags.
$content = explode ('<p>', $content);
// Find the mid of the article.
$content_length = count($content);
$content_mid = floor($content_length / 2);
// Save no image p's index.
$last_no_image_p_index = NULL;
// Loop beginning from the mid of the article to search for images.
for ($i = $content_mid; $i < $content_length; $i++) {
// If we do not find an image, let it go down.
if (stripos($content[$i], '<img') === FALSE) {
// In case we already have a last no image p, we check
// if it was the one right before this one, so we have
// two p tags with no images in there.
if ($last_no_image_p_index === ($i - 1)) {
// We break here.
break;
}
else {
$last_no_image_p_index = $i;
}
}
}
// If no none image p tag was found, we use the last one.
if (is_null($last_no_image_p_index)) {
$last_no_image_p_index = ($content_length - 1);
}
// Add ad code here with trailing <p>, so the implode later will work correctly.
$content = array_slice($content, $last_no_image_p_index, 0, $ad_code . '</p>');
$content = implode('<p>', $content);
?>
It will try to find a place for the ad from the mid of your article and if none is found the ad is put to the end.
Regards
func0der
I think this will work:
First explode the paragraphs, then you have to loop it and check if you find img inside them.
If you find it inside, you try the next.
Think of this as psuedo-code, since it's not tested. You will have to make a loop too, comments in the code :) Sorry if it contains bugs, it's written in Notepad.
<?php
$i = 0; // counter
$arrBoolImg = array(); // array for the paragraph booleans
$content = apply_filters('the_content', $post->post_content);
$contents = str_replace ('<p>', '<explode><p>', $content); // here we add a custom tag, so we can explode
$contents = explode ('<explode>', $contents); // then explode it, so we can iterate the paragraphs
// fill array with boolean array returned
$arrBoolImg = hasImages($contents);
$halfway_mark = ceil(count($contents) / 2);
/*
TODO (by you):
---
When you have $arrBoolImg filled, you can itarate through it.
You then simply loop from the middle of the array $contents (plural), that is exploded from above.
The startingpoing for your loop is the middle, the upper bounds is the +2 or what ever :-)
Then you simply insert your magic.. And then glue it back together, as you did before.
I think this will work. even though the code may have some bugs, since I wrote it in Notepad.
*/
function hasImages($contents) {
/*
This function loops through the $contents array and checks if they have images in them
The return value, is an array with boolean values, so one can iterate through it.
*/
$arrRet = array(); // array for the paragraph booleans
if (count($content)>=1) {
foreach ($contents as $v) { // iterate the content
if (strpos($v, '<img') === false) { // did not find img
$arrRet[$i] = false;
}
else { // found img
$arrRet[$i] = true;
}
$i++;
} // end for each loop
return $arrRet;
} // end if count
} // end hasImages func
?>
[This is just an idea, I don't have enough reputation to comment...]
After calling #Olavxxx's method and filling your boolean array you could just loop through that array in an alternating manner starting in the middle: Let's assume your array is 8 entries long. Calculating the middle using your method you get 4. So you check the combination of values 4 + 3, if that doesn't work, you check 4 + 5, after that 3 + 2, ...
So your loop looks somewhat like
$middle = ceil(count($content) / 2);
$i = 1;
while ($i <= $middle) {
$j = $middle + (-1) ^ $i * $i;
$k = $j + 1;
if (!$hasImagesArray[$j] && !$hasImagesArray[$k])
break; // position found
$i++;
}
It's up to you to implement further constraints to make sure the add is not shown to far up or down in the article...
Please note that you need to take care of special cases like too short arrays too in order to prevent IndexOutOfBounds-Exceptions.
I have a simple regex which checks an entire string for a function declaration. So in this code:
public function Test($name)
{
echo 'In test';
}
It will find the first part:
function Test($name)
{
And it replaces that with a custom piece:
function Test($name)
{
echo 'New piece';
Which eventually makes my code look like this:
public function Test($name)
{
echo 'New piece';
echo 'In test';
}
This all works perfectly fine with this regex:
preg_match_all ( '/function(.*?)\{/s', $source, $matches )
The problem is, is that i want to ignore everything when the regex sees a script tag. So in this case, this source:
public function Test($name) //<--- Match found!
{
echo 'In test';
}
<script type="text/javascript"> //<--- Script tag found, dont do any matches!
$(function() {
function Test()
{
var bla = "In js";
}
});
</script> //<--- Closed tag, start searching for matches again.
public function Test($name) //<--- Match found!
{
echo 'In test';
}
How can i do this in my regex?
As mentioned in the comments:
If your php functions always have a visibility modifier like public you could do:
(?:public|protected|private)\s+function\s+\w+\(.*?\)\s*\{
Otherwise, you could strip the script part first.
Something like:
$text = preg_replace('/<script(?:(?!<\/script>).)*<\/script>/s','',$text);
I don't know python, but I know regex:
Your original regex is not so good, since it matches
// This is a functional comment { isn't it? }
^^^^^^^^...........^
Maybe if you make it more robust it will solve your problem:
^\s*(public|protected|private)\s+function\s+\(.*?\).*?{
This will ensure it is a function declaration for 99% of the cases. There are still some unusual cases where you can fool it.
No amount of regex is going to achieve a decent fail-proof solution.
The right way to do this is with php tokenizer.
<?php
$code = <<<END
<?php
public function Test(\$name) //<--- Match found!
{
echo 'In test';
}
?>
<script type="text/javascript"> //<--- Script tag found, dont do any matches!
$(function() {
function Test()
{
var bla = "In js";
}
});
</script> //<--- Closed tag, start searching for matches again.
<?
public function Bla(\$name) //<--- Match found!
{
echo 'In test';
}
END;
function injectCodeAtFunctionsStart ($originalCode, $code)
{
$tokens = token_get_all ($originalCode);
$newTokenTree = '';
// iterate tokens
for ($i = 0, $total = count($tokens); $i < $total; $i++)
{
$node = $tokens[$i];
$newTokenTree[] = $node;
if (is_array ($node))
{
// function start
if ($node[0] == T_FUNCTION)
{
// walk to first brace
while ($tokens[$i] !== '{') {
$newTokenTree[] = $tokens[$i];
$i++;
}
$i++;
// keep space
$space = $tokens[$i];
$newTokenTree[] = $space;
// add new piece
$newTokenTree[] = $code;
$newTokenTree[] = $space;
}
}
}
// rebuild code from tokens
$content = '';
foreach ($newTokenTree as $node) {
$content .= is_scalar ($node) ? $node : $node[1];
}
return $content;
}
echo injectCodeAtFunctionsStart ($code, 'echo "new piece";');
I have a huge HTML code in a PHP variable like :
$html_code = '<div class="contianer" style="text-align:center;">The Sameple text.</div><br><span>Another sample text.</span>....';
I want to display only first 500 characters of this code. This character count must consider the text in HTML tags and should exclude HTMl tags and attributes while measuring the length.
but while triming the code, it should not affect DOM structure of HTML code.
Is there any tuorial or working examples available?
If its the text you want, you can do this with the following too
substr(strip_tags($html_code),0,500);
Ooohh... I know this I can't get it exactly off the top of my head but you want to load the text you've got as a DOMDOCUMENT
http://www.php.net/manual/en/class.domdocument.php
then grab the text from the entire document node (as a DOMnode http://www.php.net/manual/en/class.domnode.php)
This won't be exactly right, but hopefully this will steer you onto the right track.
Try something like:
$html_code = '<div class="contianer" style="text-align:center;">The Sameple text.</div><br><span>Another sample text.</span>....';
$dom = new DOMDocument();
$dom->loadHTML($html_code);
$text_to_strip = $dom->textContent;
$stripped = mb_substr($text_to_strip,0,500);
echo "$stripped"; // The Sameple text.Another sample text.....
edit ok... that should work. just tested locally
edit2
Now that I understand you want to keep the tags, but limit the text, lets see. You're going to want to loop the content until you get to 500 characters. This is probably going to take a few edits and passes for me to get right, but hopefully I can help. (sorry I can't give undivided attention)
First case is when the text is less than 500 characters. Nothing to worry about. Starting with the above code we can do the following.
if (strlen($stripped) > 500) {
// this is where we do our work.
$characters_so_far = 0;
foreach ($dom->child_nodes as $ChildNode) {
// should check if $ChildNode->hasChildNodes();
// probably put some of this stuff into a function
$characters_in_next_node += str_len($ChildNode->textcontent);
if ($characters_so_far+$characters_in_next_node > 500) {
// remove the node
// try using
// $ChildNode->parentNode->removeChild($ChildNode);
}
$characters_so_far += $characters_in_next_node
}
//
$final_out = $dom->saveHTML();
} else {
$final_out = $html_code;
}
i'm pasting below a php class i wrote a long time ago, but i know it works. its not exactly what you're after, as it deals with words instead of a character count, but i figure its pretty close and someone might find it useful.
class HtmlWordManipulator
{
var $stack = array();
function truncate($text, $num=50)
{
if (preg_match_all('/\s+/', $text, $junk) <= $num) return $text;
$text = preg_replace_callback('/(<\/?[^>]+\s+[^>]*>)/','_truncateProtect', $text);
$words = 0;
$out = array();
$text = str_replace('<',' <',str_replace('>','> ',$text));
$toks = preg_split('/\s+/', $text);
foreach ($toks as $tok)
{
if (preg_match_all('/<(\/?[^\x01>]+)([^>]*)>/',$tok,$matches,PREG_SET_ORDER))
foreach ($matches as $tag) $this->_recordTag($tag[1], $tag[2]);
$out[] = trim($tok);
if (! preg_match('/^(<[^>]+>)+$/', $tok))
{
if (!strpos($tok,'=') && !strpos($tok,'<') && strlen(trim(strip_tags($tok))) > 0)
{
++$words;
}
else
{
/*
echo '<hr />';
echo htmlentities('failed: '.$tok).'<br /)>';
echo htmlentities('has equals: '.strpos($tok,'=')).'<br />';
echo htmlentities('has greater than: '.strpos($tok,'<')).'<br />';
echo htmlentities('strip tags: '.strip_tags($tok)).'<br />';
echo str_word_count($text);
*/
}
}
if ($words > $num) break;
}
$truncate = $this->_truncateRestore(implode(' ', $out));
return $truncate;
}
function restoreTags($text)
{
foreach ($this->stack as $tag) $text .= "</$tag>";
return $text;
}
private function _truncateProtect($match)
{
return preg_replace('/\s/', "\x01", $match[0]);
}
private function _truncateRestore($strings)
{
return preg_replace('/\x01/', ' ', $strings);
}
private function _recordTag($tag, $args)
{
// XHTML
if (strlen($args) and $args[strlen($args) - 1] == '/') return;
else if ($tag[0] == '/')
{
$tag = substr($tag, 1);
for ($i=count($this->stack) -1; $i >= 0; $i--) {
if ($this->stack[$i] == $tag) {
array_splice($this->stack, $i, 1);
return;
}
}
return;
}
else if (in_array($tag, array('p', 'li', 'ul', 'ol', 'div', 'span', 'a')))
$this->stack[] = $tag;
else return;
}
}
truncate is what you want, and you pass it the html and the number of words you want it trimmed down to. it ignores html while counting words, but then rewraps everything in html, even closing trailing tags due to the truncation.
please don't judge me on the complete lack of oop principles. i was young and stupid.
edit:
so it turns out the usage is more like this:
$content = $manipulator->restoreTags($manipulator->truncate($myHtml,$numOfWords));
stupid design decision. allowed me to inject html inside the unclosed tags though.
I'm not up to coding a real solution, but if someone wants to, here's what I'd do (in pseudo-PHP):
$html_code = '<div class="contianer" style="text-align:center;">The Sameple text.</div><br><span>Another sample text.</span>....';
$aggregate = '';
$document = XMLParser($html_code);
foreach ($document->getElementsByTagName('*') as $element) {
$aggregate .= $element->text(); // This is the text, not HTML. It doesn't
// include the children, only the text
// directly in the tag.
}
What’s the best way to remove comments from a PHP file?
I want to do something similar to strip-whitespace() - but it shouldn't remove the line breaks as well.
For example,
I want this:
<?PHP
// something
if ($whatsit) {
do_something(); # we do something here
echo '<html>Some embedded HTML</html>';
}
/* another long
comment
*/
some_more_code();
?>
to become:
<?PHP
if ($whatsit) {
do_something();
echo '<html>Some embedded HTML</html>';
}
some_more_code();
?>
(Although if the empty lines remain where comments are removed, that wouldn't be OK.)
It may not be possible, because of the requirement to preserve embedded HTML - that’s what’s tripped up the things that have come up on Google.
I'd use tokenizer. Here's my solution. It should work on both PHP 4 and 5:
$fileStr = file_get_contents('path/to/file');
$newStr = '';
$commentTokens = array(T_COMMENT);
if (defined('T_DOC_COMMENT')) {
$commentTokens[] = T_DOC_COMMENT; // PHP 5
}
if (defined('T_ML_COMMENT')) {
$commentTokens[] = T_ML_COMMENT; // PHP 4
}
$tokens = token_get_all($fileStr);
foreach ($tokens as $token) {
if (is_array($token)) {
if (in_array($token[0], $commentTokens)) {
continue;
}
$token = $token[1];
}
$newStr .= $token;
}
echo $newStr;
Use php -w <sourcefile> to generate a file stripped of comments and whitespace, and then use a beautifier like PHP_Beautifier to reformat for readability.
$fileStr = file_get_contents('file.php');
foreach (token_get_all($fileStr) as $token ) {
if ($token[0] != T_COMMENT) {
continue;
}
$fileStr = str_replace($token[1], '', $fileStr);
}
echo $fileStr;
Here's the function posted above, modified to recursively remove all comments from all PHP files within a directory and all its subdirectories:
function rmcomments($id) {
if (file_exists($id)) {
if (is_dir($id)) {
$handle = opendir($id);
while($file = readdir($handle)) {
if (($file != ".") && ($file != "..")) {
rmcomments($id . "/" . $file); }}
closedir($handle); }
else if ((is_file($id)) && (end(explode('.', $id)) == "php")) {
if (!is_writable($id)) { chmod($id, 0777); }
if (is_writable($id)) {
$fileStr = file_get_contents($id);
$newStr = '';
$commentTokens = array(T_COMMENT);
if (defined('T_DOC_COMMENT')) { $commentTokens[] = T_DOC_COMMENT; }
if (defined('T_ML_COMMENT')) { $commentTokens[] = T_ML_COMMENT; }
$tokens = token_get_all($fileStr);
foreach ($tokens as $token) {
if (is_array($token)) {
if (in_array($token[0], $commentTokens)) { continue; }
$token = $token[1]; }
$newStr .= $token; }
if (!file_put_contents($id, $newStr)) {
$open = fopen($id, "w");
fwrite($open, $newStr);
fclose($open);
}
}
}
}
}
rmcomments("path/to/directory");
A more powerful version: remove all comments in the folder
<?php
$di = new RecursiveDirectoryIterator(__DIR__, RecursiveDirectoryIterator::SKIP_DOTS);
$it = new RecursiveIteratorIterator($di);
$fileArr = [];
foreach($it as $file) {
if(pathinfo($file, PATHINFO_EXTENSION) == "php") {
ob_start();
echo $file;
$file = ob_get_clean();
$fileArr[] = $file;
}
}
$arr = [T_COMMENT, T_DOC_COMMENT];
$count = count($fileArr);
for($i=1; $i < $count; $i++) {
$fileStr = file_get_contents($fileArr[$i]);
foreach(token_get_all($fileStr) as $token) {
if(in_array($token[0], $arr)) {
$fileStr = str_replace($token[1], '', $fileStr);
}
}
file_put_contents($fileArr[$i], $fileStr);
}
/*
* T_ML_COMMENT does not exist in PHP 5.
* The following three lines define it in order to
* preserve backwards compatibility.
*
* The next two lines define the PHP 5 only T_DOC_COMMENT,
* which we will mask as T_ML_COMMENT for PHP 4.
*/
if (! defined('T_ML_COMMENT')) {
define('T_ML_COMMENT', T_COMMENT);
} else {
define('T_DOC_COMMENT', T_ML_COMMENT);
}
/*
* Remove all comment in $file
*/
function remove_comment($file) {
$comment_token = array(T_COMMENT, T_ML_COMMENT, T_DOC_COMMENT);
$input = file_get_contents($file);
$tokens = token_get_all($input);
$output = '';
foreach ($tokens as $token) {
if (is_string($token)) {
$output .= $token;
} else {
list($id, $text) = $token;
if (in_array($id, $comment_token)) {
$output .= $text;
}
}
}
file_put_contents($file, $output);
}
/*
* Glob recursive
* #return ['dir/filename', ...]
*/
function glob_recursive($pattern, $flags = 0) {
$file_list = glob($pattern, $flags);
$sub_dir = glob(dirname($pattern) . '/*', GLOB_ONLYDIR);
// If sub directory exist
if (count($sub_dir) > 0) {
$file_list = array_merge(
glob_recursive(dirname($pattern) . '/*/' . basename($pattern), $flags),
$file_list
);
}
return $file_list;
}
// Remove all comment of '*.php', include sub directory
foreach (glob_recursive('*.php') as $file) {
remove_comment($file);
}
If you already use an editor like UltraEdit, you can open one or multiple PHP file(s) and then use a simple Find&Replace (Ctrl + R) with the following Perl regular expression:
(?s)/\*.*\*/
Beware the above regular expression also removes comments inside a string, i.e., in echo "hello/*babe*/"; the /*babe*/ would be removed too. Hence, it could be a solution if you have few files to remove comments from. In order to be absolutely sure it does not wrongly replace something that is not a comment, you would have to run the Find&Replace command and approve each time what is getting replaced.
Bash solution: If you want to remove recursively comments from all PHP files starting from the current directory, you can write this one-liner in the terminal. (It uses temp1 file to store PHP content for processing.)
Note that this will strip all white spaces with comments.
find . -type f -name '*.php' | while read VAR; do php -wq $VAR > temp1 ; cat temp1 > $VAR; done
Then you should remove temp1 file after.
If PHP_BEAUTIFER is installed then you can get nicely formatted code without comments with
find . -type f -name '*.php' | while read VAR; do php -wq $VAR > temp1; php_beautifier temp1 > temp2; cat temp2 > $VAR; done;
Then remove two files (temp1 and temp2).
Following upon the accepted answer, I needed to preserve the line numbers of the file too, so here is a variation of the accepted answer:
/**
* Removes the php comments from the given valid php string, and returns the result.
*
* Note: a valid php string must start with <?php.
*
* If the preserveWhiteSpace option is true, it will replace the comments with some whitespaces, so that
* the line numbers are preserved.
*
*
* #param string $str
* #param bool $preserveWhiteSpace
* #return string
*/
function removePhpComments(string $str, bool $preserveWhiteSpace = true): string
{
$commentTokens = [
\T_COMMENT,
\T_DOC_COMMENT,
];
$tokens = token_get_all($str);
if (true === $preserveWhiteSpace) {
$lines = explode(PHP_EOL, $str);
}
$s = '';
foreach ($tokens as $token) {
if (is_array($token)) {
if (in_array($token[0], $commentTokens)) {
if (true === $preserveWhiteSpace) {
$comment = $token[1];
$lineNb = $token[2];
$firstLine = $lines[$lineNb - 1];
$p = explode(PHP_EOL, $comment);
$nbLineComments = count($p);
if ($nbLineComments < 1) {
$nbLineComments = 1;
}
$firstCommentLine = array_shift($p);
$isStandAlone = (trim($firstLine) === trim($firstCommentLine));
if (false === $isStandAlone) {
if (2 === $nbLineComments) {
$s .= PHP_EOL;
}
continue; // Just remove inline comments
}
// Stand-alone case
$s .= str_repeat(PHP_EOL, $nbLineComments - 1);
}
continue;
}
$token = $token[1];
}
$s .= $token;
}
return $s;
}
Note: this is for PHP 7+ (I didn't care about backward compatibility with older PHP versions).
For Ajax and JSON responses, I use the following PHP code, to remove comments from HTML/JavaScript code, so it would be smaller (about 15% gain for my code).
// Replace doubled spaces with single ones (ignored in HTML any way)
$html = preg_replace('#(\s){2,}#', '\1', $html);
// Remove single and multiline comments, tabs and newline chars
$html = preg_replace(
'#(/\*([^*]|[\r\n]|(\*+([^*/]|[\r\n])))*\*+/)|((?<!:)//.*)|[\t\r\n]#i',
'',
$html
);
It is short and effective, but it can produce unexpected results, if your code has bad syntax.
Run the command php --strip file.php in a command prompt (for example., cmd.exe), and then browse to WriteCodeOnline.
Here, file.php is your own file.
In 2019 it could work like this:
<?php
/* hi there !!!
here are the comments */
//another try
echo removecomments('index.php');
/* hi there !!!
here are the comments */
//another try
function removecomments($f){
$w=Array(';','{','}');
$ts = token_get_all(php_strip_whitespace($f));
$s='';
foreach($ts as $t){
if(is_array($t)){
$s .=$t[1];
}else{
$s .=$t;
if( in_array($t,$w) ) $s.=chr(13).chr(10);
}
}
return $s;
}
?>
If you want to see the results, just let's run it first in XAMPP, and then you get a blank page, but if you right click and click on view source, you get your PHP script ... it's loading itself and it's removing all comments and also tabs.
I prefer this solution too, because I use it to speed up my framework one file engine "m.php" and after php_strip_whitespace, all source without this script I observe is slowest: I did 10 benchmarks, and then I calculate the math average (I think PHP 7 is restoring back the missing cr_lf's when it is parsing or it is taking a while when these are missing).
php -w or php_strip_whitespace($filename);
documentation
The catch is that a less robust matching algorithm (simple regex, for instance) will start stripping here when it clearly shouldn't:
if (preg_match('#^/*' . $this->index . '#', $this->permalink_structure)) {
It might not affect your code, but eventually someone will get bit by your script. So you will have to use a utility that understands more of the language than you might otherwise expect.