Easiest way to remove all whitespace from a code file? - php

I'm participating in one of the Code Golf competitions where the smaller your file size is, the better.
Rather than manually removing all whitespace, etc., I'm looking for a program or website which will take a file, remove all whitespace (including new lines) and return a compact version of the file. Any ideas?

You could use:
sed 's/\s\s+/ /g' youfile > yourpackedfile`
There is also this online tool.
You can even do it in PHP (how marvelous is life):
$data = file_get_contents('foobar.php');
$data = preg_replace('/\s\s+/', ' ', $data);
file_put_contents('foobar2.php', $data);
You have to note this won't take care of a string variable like $bar = ' asd aa a'; it might be a problem depending on what you are doing. The online tool seems to handle this properly.

$ tr -d ' \n' <oldfile >newfile

In PowerShell (v2) this can be done with the following little snippet:
(-join(gc my_file))-replace"\s"
or longer:
(-join (Get-Content my_file)) -replace "\s"
It will join all lines together and remove all spaces and tabs.
However, for some languages you probably don't want to do that. In PowerShell for example you don't need semicolons unless you put multiple statements on a single line so code like
while (1) {
"Hello World"
$x++
}
would become
while(1){"HelloWorld"$x++}
when applying aforementioned statements naïvely. It both changed the meaning and the syntactical correctness of the program. Probably not too much to look out for in numerical golfed solutions but the issue with lines joined together still remains, sadly. Just putting a semicolon between each line doesn't actually help either.

This is a PHP function that will do the work for you:
function compress_php_src($src) {
// Whitespaces left and right from this signs can be ignored
static $IW = array(
T_CONCAT_EQUAL, // .=
T_DOUBLE_ARROW, // =>
T_BOOLEAN_AND, // &&
T_BOOLEAN_OR, // ||
T_IS_EQUAL, // ==
T_IS_NOT_EQUAL, // != or <>
T_IS_SMALLER_OR_EQUAL, // <=
T_IS_GREATER_OR_EQUAL, // >=
T_INC, // ++
T_DEC, // --
T_PLUS_EQUAL, // +=
T_MINUS_EQUAL, // -=
T_MUL_EQUAL, // *=
T_DIV_EQUAL, // /=
T_IS_IDENTICAL, // ===
T_IS_NOT_IDENTICAL, // !==
T_DOUBLE_COLON, // ::
T_PAAMAYIM_NEKUDOTAYIM, // ::
T_OBJECT_OPERATOR, // ->
T_DOLLAR_OPEN_CURLY_BRACES, // ${
T_AND_EQUAL, // &=
T_MOD_EQUAL, // %=
T_XOR_EQUAL, // ^=
T_OR_EQUAL, // |=
T_SL, // <<
T_SR, // >>
T_SL_EQUAL, // <<=
T_SR_EQUAL, // >>=
);
if(is_file($src)) {
if(!$src = file_get_contents($src)) {
return false;
}
}
$tokens = token_get_all($src);
$new = "";
$c = sizeof($tokens);
$iw = false; // Ignore whitespace
$ih = false; // In HEREDOC
$ls = ""; // Last sign
$ot = null; // Open tag
for($i = 0; $i < $c; $i++) {
$token = $tokens[$i];
if(is_array($token)) {
list($tn, $ts) = $token; // tokens: number, string, line
$tname = token_name($tn);
if($tn == T_INLINE_HTML) {
$new .= $ts;
$iw = false;
}
else {
if($tn == T_OPEN_TAG) {
if(strpos($ts, " ") || strpos($ts, "\n") || strpos($ts, "\t") || strpos($ts, "\r")) {
$ts = rtrim($ts);
}
$ts .= " ";
$new .= $ts;
$ot = T_OPEN_TAG;
$iw = true;
} elseif($tn == T_OPEN_TAG_WITH_ECHO) {
$new .= $ts;
$ot = T_OPEN_TAG_WITH_ECHO;
$iw = true;
} elseif($tn == T_CLOSE_TAG) {
if($ot == T_OPEN_TAG_WITH_ECHO) {
$new = rtrim($new, "; ");
} else {
$ts = " ".$ts;
}
$new .= $ts;
$ot = null;
$iw = false;
} elseif(in_array($tn, $IW)) {
$new .= $ts;
$iw = true;
} elseif($tn == T_CONSTANT_ENCAPSED_STRING
|| $tn == T_ENCAPSED_AND_WHITESPACE)
{
if($ts[0] == '"') {
$ts = addcslashes($ts, "\n\t\r");
}
$new .= $ts;
$iw = true;
} elseif($tn == T_WHITESPACE) {
$nt = #$tokens[$i+1];
if(!$iw && (!is_string($nt) || $nt == '$') && !in_array($nt[0], $IW)) {
$new .= " ";
}
$iw = false;
} elseif($tn == T_START_HEREDOC) {
$new .= "<<<S\n";
$iw = false;
$ih = true; // in HEREDOC
} elseif($tn == T_END_HEREDOC) {
$new .= "S;";
$iw = true;
$ih = false; // in HEREDOC
for($j = $i+1; $j < $c; $j++) {
if(is_string($tokens[$j]) && $tokens[$j] == ";") {
$i = $j;
break;
} else if($tokens[$j][0] == T_CLOSE_TAG) {
break;
}
}
} elseif($tn == T_COMMENT || $tn == T_DOC_COMMENT) {
$iw = true;
} else {
if(!$ih) {
$ts = strtolower($ts);
}
$new .= $ts;
$iw = false;
}
}
$ls = "";
}
else {
if(($token != ";" && $token != ":") || $ls != $token) {
$new .= $token;
$ls = $token;
}
$iw = true;
}
}
return $new;
}
// This is an example
$src = file_get_contents('foobar.php');
file_put_contents('foobar3.php',compress_php_src($src));

If your code editor programs supports regular expressions, you can try this:
Find this: [\r\n]{2,}
Replace with this: \n
Then Replace All

Notepad++ is quite a nice editor if you are on Windows, and it has a lot of predefined macros, trimming down code and removing whitespace among them.
It can do regular expressions and has a plethora of features to help the code hacker or script kiddie.
Notepad++ website

Run php -w on it!
php -w myfile.php
Unlike a regular expression, this is smart enough to leave strings alone, and it removes comments too.

Related

How does this code "know how to escape special characters"?

Hello Stack Overflow Experts. I found this post and the author states, "... function knows how to escape special characters". I am still learning and I'm having difficulty seeing where the actual "handling" of special characters takes place and how they are "handled." Could someone point me in the right direction? Thank you. Here is the listed code:
$dataArray = csvstring_to_array( file_get_contents('Address.csv'));
function csvstring_to_array($string, $separatorChar = ',', $enclosureChar = '"', $newlineChar = "\n") {
// #author: Klemen Nagode
$array = array();
$size = strlen($string);
$columnIndex = 0;
$rowIndex = 0;
$fieldValue="";
$isEnclosured = false;
for($i=0; $i<$size;$i++) {
$char = $string{$i};
$addChar = "";
if($isEnclosured) {
if($char==$enclosureChar) {
if($i+1<$size && $string{$i+1}==$enclosureChar){
// escaped char
$addChar=$char;
$i++; // dont check next char
}else{
$isEnclosured = false;
}
}else {
$addChar=$char;
}
}else {
if($char==$enclosureChar) {
$isEnclosured = true;
}else {
if($char==$separatorChar) {
$array[$rowIndex][$columnIndex] = $fieldValue;
$fieldValue="";
$columnIndex++;
}elseif($char==$newlineChar) {
echo $char;
$array[$rowIndex][$columnIndex] = $fieldValue;
$fieldValue="";
$columnIndex=0;
$rowIndex++;
}else {
$addChar=$char;
}
}
}
if($addChar!=""){
$fieldValue.=$addChar;
}
}
if($fieldValue) { // save last field
$array[$rowIndex][$columnIndex] = $fieldValue;
}
return $array;
}

Shortening an IPv6 address ending in 0's via PHP

I'm using an IPv6 class found on GitHub to do some IP manipulation but I noticed that there is an issue with shortening certain address, typically ending in 0.
When I enter the address 2001::6dcd:8c74:0:0:0:0, it results in 2001::6dcd:8c74::::.
$address = '2001::6dcd:8c74:0:0:0:0';
// Check to see if address is already compacted
if (strpos($address, '::') === FALSE) {
$parts = explode(':', $address);
$new_parts = array();
$ignore = FALSE;
$done = FALSE;
for ($i = 0; $i < count($parts); $i++) {
if (intval(hexdec($parts[$i])) === 0 && $ignore == FALSE && $done == FALSE) {
$ignore = TRUE;
$new_parts[] = '';
if ($i == 0) {
$new_parts = '';
}
} else if (intval(hexdec($parts[$i])) === 0 && $ignore == TRUE && $done == FALSE) {
continue;
} else if (intval(hexdec($parts[$i])) !== 0 && $ignore == TRUE) {
$done = TRUE;
$ignore = FALSE;
$new_parts[] = $parts[$i];
} else {
$new_parts[] = $parts[$i];
}
}
// Glue everything back together
$address = implode(':', $new_parts);
}
// Remove the leading 0's
$new_address = preg_replace("/:0{1,3}/", ":", $address);
// $this->compact = $new_address;
// return $this->compact;
echo $new_address; // Outputs: 2001::6dcd:8c74::::
To compress ipv6 addresses, use php functions inet_ntop and inet_pton to do the formatting, by converting the ipv6 to binary and back.
Sample ipv6 - 2001::6dcd:8c74:0:0:0:0
Test usage:
echo inet_ntop(inet_pton('2001::6dcd:8c74:0:0:0:0'));
Output : 2001:0:6dcd:8c74::
Without that problem line at the bottom you get 2001::6dcd:8c74:0:0:0:0.
Now, before the leading 0's are all replaced, the function checks to see if the address ends in :0 before removing all leading 0's.
if (substr($address, -2) != ':0') {
$new_address = preg_replace("/:0{1,3}/", ":", $address);
} else {
$new_address = $address;
}
Another check is added to catch other possible valid IPv6 addresses from being malformed.
if (isset($new_parts)) {
if (count($new_parts) < 8 && array_pop($new_parts) == '') {
$new_address .= ':0';
}
}
The new full function looks like this:
// Check to see if address is already compacted
if (strpos($address, '::') === FALSE) {
$parts = explode(':', $address);
$new_parts = array();
$ignore = FALSE;
$done = FALSE;
for ($i = 0; $i < count($parts); $i++) {
if (intval(hexdec($parts[$i])) === 0 && $ignore == FALSE && $done == FALSE) {
$ignore = TRUE;
$new_parts[] = '';
if ($i == 0) {
$new_parts = '';
}
} else if (intval(hexdec($parts[$i])) === 0 && $ignore == TRUE && $done == FALSE) {
continue;
} else if (intval(hexdec($parts[$i])) !== 0 && $ignore == TRUE) {
$done = TRUE;
$ignore = FALSE;
$new_parts[] = $parts[$i];
} else {
$new_parts[] = $parts[$i];
}
}
// Glue everything back together
$address = implode(':', $new_parts);
}
// Check to see if this ends in a shortened :0 before replacing all
// leading 0's
if (substr($address, -2) != ':0') {
// Remove the leading 0's
$new_address = preg_replace("/:0{1,3}/", ":", $address);
} else {
$new_address = $address;
}
// Since new_parts isn't always set, check to see if it's set before
// trying to fix possibly broken shortened addresses ending in 0.
// (Ex: Trying to shorten 2001:19f0::0 will result in unset array)
if (isset($new_parts)) {
// Some addresses (Ex: starting addresses for a range) can end in
// all 0's resulting in the last value in the new parts array to be
// an empty string. Catch that case here and add the remaining :0
// for a complete shortened address.
if (count($new_parts) < 8 && array_pop($new_parts) == '') {
$new_address .= ':0';
}
}
// $this->compact = $new_address;
// return $this->compact;
echo $new_address; // Outputs: 2001::6dcd:8c74:0:0:0:0
It's not the cleanest solution and could possibly have holes in its logic depending on what the address is. If I find any other issues with it I will update this question/answer.

PHP4 (legacy servers) XML=>JSON converion, a la simplexml_file_load() or simpleXMLelement

I'm developing on a server using an old version of PHP (4.3.9), and I'm trying to convert an XML string into a JSON string. This is easy in PHP5, but a lot tougher with PHP4.
I've tried:
Zend JSON.php
require_once 'Zend/Json.php';
echo Zend_Json::encode($sxml);
Error:
PHP Parse error: parse error, unexpected T_CONST, expecting T_OLD_FUNCTION or T_FUNCTION or T_VAR or '}'
simplexml44
$impl = new IsterXmlSimpleXMLImpl;
$NDFDxmltemp = $impl->load_file($NDFDstring);
$NDFDxml = $NDFDxmltemp->asXML();
Error:
WARNING isterxmlexpatnonvalid->parse(): nothing to read
xml_parse
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, "_start_element", "_end_element");
xml_set_character_data_handler($xml_parser, "_character_data");
xml_parse($xml_parser, $NDFDstring);
Error:
PHP Warning: xml_parse(): Unable to call handler _character_data() in ...
PHP Warning: xml_parse(): Unable to call handler _end_element() in ...
Does anyone have any other alternatives to simplexml_file_load() and new simpleXMLelement in PHP4?
Upgrading PHP is not an option in this particular case, so do not bother bringing it up. Yes, I know its old.
NOTE: This is the XML I'm trying to parse into a multidimensional array OR json.
http://graphical.weather.gov/xml/sample_products/browser_interface/ndfdXMLclient.php?lat=40&lon=-120&product=time-series&begin=2013-10-30T00:00:00&end=2013-11-06T00:00:00&maxt=maxt&mint=mint&rh=rh&wx=wx&wspd=wspd&wdir=wdir&icons=icons&wgust=wgust&pop12=pop12&maxrh=maxrh&minrh=minrh&qpf=qpf&snow=snow&temp=temp&wwa=wwa
XML are quite easy to parse by yourself, anyway here is exists xml_parse in php 4 and domxml_open_file
and here is json for php4 and another one
according parsing xml:
if you know the structure of xml file, you can go even with RegExp, as XML is strict format (I mean all tags must be closed, all attributes in quotes, all special symbols always escaped)
if you parse arbitrary xml file, here is student sample, which works with php4, do not understand all XML features, but can give you "brute-force" like idea:
<?php
define("LWG_XML_ELEMENT_NULL", "0");
define("LWG_XML_ELEMENT_NODE", "1");
function EntitiesToString($str)
{
$s = $str;
$s = eregi_replace(""", "\"", $s);
$s = eregi_replace("<", "<", $s);
$s = eregi_replace(">", ">", $s);
$s = eregi_replace("&", "&", $s);
return $s;
}
class CLWG_dom_attribute
{
var $name;
var $value;
function CLWG_dom_attribute()
{
$name = "";
$value = "";
}
}
class CLWG_dom_node
{
var $m_Attributes;
var $m_Childs;
var $m_nAttributesCount;
var $m_nChildsCount;
var $type;
var $tagname;
var $content;
function CLWG_dom_node()
{
$this->m_Attributes = array();
$this->m_Childs = array();
$this->m_nAttributesCount = 0;
$this->m_nChildsCount = 0;
$this->type = LWG_XML_ELEMENT_NULL;
$this->tagname = "";
$this->content = "";
}
function get_attribute($attr_name)
{
//echo "<message>Get Attribute: ".$attr_name." ";
for ($i=0; $i<sizeof($this->m_Attributes); $i++)
if ($this->m_Attributes[$i]->name == $attr_name)
{
//echo $this->m_Attributes[$i]->value . "</message>\n";
return $this->m_Attributes[$i]->value;
}
//echo "[empty]</message>\n";
return "";
}
function get_content()
{
//echo "<message>Get Content: ".$this->content . "</message>\n";
return $this->content;
}
function attributes()
{
return $this->m_Attributes;
}
function child_nodes()
{
return $this->m_Childs;
}
function loadXML($str, &$i)
{
//echo "<debug>DEBUG: LoadXML (".$i.": ".$str[$i].")</debug>\n";
$str_len = strlen($str);
//echo "<debug>DEBUG: start searching for tag (".$i.": ".$str[$i].")</debug>\n";
while ( ($i<$str_len) && ($str[$i] != "<") )
$i++;
if ($i == $str_len) return FALSE;
$i++;
while ( ($i<strlen($str)) && ($str[$i] != " ") && ($str[$i] != "/") && ($str[$i] != ">") )
$this->tagname .= $str[$i++];
//echo "<debug>DEBUG: Tag: " . $this->tagname . "</debug>\n";
if ($i == $str_len) return FALSE;
switch ($str[$i])
{
case " ": // attributes comming
{
//echo "<debug>DEBUG: Tag: start searching attributes</debug>\n";
$i++;
$cnt = sizeof($this->m_Attributes);
while ( ($i<strlen($str)) && ($str[$i] != "/") && ($str[$i] != ">") )
{
$this->m_Attributes[$cnt] = new CLWG_dom_attribute;
while ( ($i<strlen($str)) && ($str[$i] != "=") )
$this->m_Attributes[$cnt]->name .= $str[$i++];
if ($i == $str_len) return FALSE;
$i++;
while ( ($i<strlen($str)) && ($str[$i] != "\"") )
$i++;
if ($i == $str_len) return FALSE;
$i++;
while ( ($i<strlen($str)) && ($str[$i] != "\"") )
$this->m_Attributes[$cnt]->value .= $str[$i++];
$this->m_Attributes[$cnt]->value = EntitiesToString($this->m_Attributes[$cnt]->value);
//echo "<debug>DEBUG: Tag: Attribute: '".$this->m_Attributes[$cnt]->name."' = '".$this->m_Attributes[$cnt]->value."'</debug>\n";
if ($i == $str_len) return FALSE;
$i++;
if ($i == $str_len) return FALSE;
while ( ($i<strlen($str)) && ($str[$i] == " ") )
$i++;
$cnt++;
}
if ($i == $str_len) return FALSE;
switch ($str[$i])
{
case "/":
{
//echo "<debug>DEBUG: self closing tag with attributes (".$this->tagname.")</debug>\n";
$i++;
if ($i == $str_len) return FALSE;
if ($str[$i] != ">") return FALSE;
$i++;
return TRUE;
break;
}
case ">";
{
//echo "<debug>DEBUG: end of attributes (".$this->tagname.")</debug>\n";
$i++;
break;
}
}
break;
}
case "/": // self closing tag
{
//echo "<debug>DEBUG: self closing tag (".$this->tagname.")</debug>\n";
$i++;
if ($i == $str_len) return FALSE;
if ($str[$i] != ">") return FALSE;
$i++;
return TRUE;
break;
}
case ">": // end of begin of node
{
//echo "<debug>DEBUG: end of begin of node</debug>\n";
$i++;
break;
}
}
if ($i == $str_len) return FALSE;
$b = 1;
while ( ($i<$str_len) && ($b) )
{
//echo "<debug>DEBUG: searching for content</debug>\n";
while ( ($i<strlen($str)) && ($str[$i] != "<") )
$this->content .= $str[$i++];
//echo "<debug>DEBUG: content: ".$this->content."</debug>\n";
if ($i == $str_len) return FALSE;
$i++;
if ($i == $str_len) return FALSE;
if ($str[$i] != "/") // new child
{
$cnt = sizeof($this->m_Childs);
//echo "<debug>DEBUG: Create new child (" . $cnt . ")</debug>\n";
$this->m_Childs[$cnt] = new CLWG_dom_node;
$this->m_Childs[$cnt]->type = LWG_XML_ELEMENT_NODE;
$i--;
if ($this->m_Childs[$cnt]->loadXML($str, $i) === FALSE)
return FALSE;
}
else
$b = 0;
}
$i++;
$close_tag = "";
while ( ($i<strlen($str)) && ($str[$i] != ">") )
$close_tag .= $str[$i++];
//echo "<debug>DEBUG: close tag: ".$close_tag." - ".$this->tagname."</debug>\n";
if ($i == $str_len) return FALSE;
$i++;
$this->content = EntitiesToString($this->content);
//echo "<debug>DEBUG: content: ".$this->content."</debug>\n";
return ($close_tag == $this->tagname);
}
}
class CLWG_dom_xml
{
var $m_Root;
function CLWG_dom_xml()
{
$this->m_Root = 0;
}
function document_element()
{
return $this->m_Root;
}
function loadXML($xml_string)
{
// check xml tag
if (eregi("<\\?xml", $xml_string))
{
// check xml version
$xml_version = array();
if ( (eregi("<\\?xml version=\"([0-9\\.]+)\".*\\?>", $xml_string, $xml_version)) && ($xml_version[1] == 1.0) )
{
// initialize root
$this->m_Root = new CLWG_dom_node;
$i = 0;
return $this->m_Root->loadXML(eregi_replace("<\\?xml.*\\?>", "", $xml_string), $i);
}
else
{
echo "<error>Cannot find version attribute in xml tag</error>";
return FALSE;
}
}
else
{
echo "<error>Cannot find xml tag</error>";
return FALSE;
}
}
}
function lwg_domxml_open_mem($xml_string)
{
global $lwg_xml;
$lwg_xml = new CLWG_dom_xml;
if ($lwg_xml->loadXML($xml_string))
return $lwg_xml;
else
return 0;
}
?>
PHP4 has no support for ArrayAccess nor does it have the needed __get(), __set() and __toString() magic methods which effectively prevents you from creating an object mimicking that feature of SimpleXMLElement.
Also I'd say creating an in-memory structure yourself which is done by Simplexml is not going to work out well with PHP 4 because of it's limitation of the OOP-object-model and garbage collection. Especially as I would prefer something like the Flyweight pattern here.
Which brings me to the point that you're probably more looking for an event or pull-based XML parser.
Take a look at the XML Parser Functions which are the PHP 4 way to parse XML with PHP. You find it well explained in years old PHP training materials also with examples and what not.

Php function UTF-8 characters issue

Here is my function that makes the first character of the first word of a sentence uppercase:
function sentenceCase($str)
{
$cap = true;
$ret = '';
for ($x = 0; $x < strlen($str); $x++) {
$letter = substr($str, $x, 1);
if ($letter == "." || $letter == "!" || $letter == "?") {
$cap = true;
} elseif ($letter != " " && $cap == true) {
$letter = strtoupper($letter);
$cap = false;
}
$ret .= $letter;
}
return $ret;
}
It converts "sample sentence" into "Sample sentence". The problem is, it doesn't capitalize UTF-8 characters. See this example.
What am I doing wrong?
The most straightforward way to make your code UTF-8 aware is to use mbstring functions instead of the plain dumb ones in the three cases where the latter appear:
function sentenceCase($str)
{
$cap = true;
$ret = '';
for ($x = 0; $x < mb_strlen($str); $x++) { // mb_strlen instead
$letter = mb_substr($str, $x, 1); // mb_substr instead
if ($letter == "." || $letter == "!" || $letter == "?") {
$cap = true;
} elseif ($letter != " " && $cap == true) {
$letter = mb_strtoupper($letter); // mb_strtoupper instead
$cap = false;
}
$ret .= $letter;
}
return $ret;
}
You can then configure mbstring to work with UTF-8 strings and you are ready to go:
mb_internal_encoding('UTF-8');
echo sentenceCase ("üias skdfnsknka");
Bonus solution
Specifically for UTF-8 you can also use a regular expression, which will result in less code:
$str = "üias skdfnsknka";
echo preg_replace_callback(
'/((?:^|[!.?])\s*)(\p{Ll})/u',
function($match) { return $match[1].mb_strtoupper($match[2], 'UTF-8'); },
$str);

php regular comment remove

how remove multi line comment ? ( /* comment php */) from file .php
how remove single comment ? ( // coment ) from file.php
how remove enter end line ?
sample :
single comment line
$G["url"] = "http://".$_SERVER["HTTP_HOST"]
multi comment line
/*
* 310
* - "::"
* - $col
*/
Try using the code posted below, it even ignores comment tokens on strings like " /* comment2 */ "
<?php
$fileIn = "src.php";
$fileOut = "srcCompressed.php";
$text = file_get_contents($fileIn);
$text = compress_php_src($text);
file_put_contents($fileOut, $text);
function compress_php_src($src) {
// Whitespaces left and right from this signs can be ignored
static $IW = array(
T_CONCAT_EQUAL, // .=
T_DOUBLE_ARROW, // =>
T_BOOLEAN_AND, // &&
T_BOOLEAN_OR, // ||
T_IS_EQUAL, // ==
T_IS_NOT_EQUAL, // != or <>
T_IS_SMALLER_OR_EQUAL, // <=
T_IS_GREATER_OR_EQUAL, // >=
T_INC, // ++
T_DEC, // --
T_PLUS_EQUAL, // +=
T_MINUS_EQUAL, // -=
T_MUL_EQUAL, // *=
T_DIV_EQUAL, // /=
T_IS_IDENTICAL, // ===
T_IS_NOT_IDENTICAL, // !==
T_DOUBLE_COLON, // ::
T_PAAMAYIM_NEKUDOTAYIM, // ::
T_OBJECT_OPERATOR, // ->
T_DOLLAR_OPEN_CURLY_BRACES, // ${
T_AND_EQUAL, // &=
T_MOD_EQUAL, // %=
T_XOR_EQUAL, // ^=
T_OR_EQUAL, // |=
T_SL, // <<
T_SR, // >>
T_SL_EQUAL, // <<=
T_SR_EQUAL, // >>=
);
if(is_file($src)) {
if(!$src = file_get_contents($src)) {
return false;
}
}
$tokens = token_get_all($src);
$new = "";
$c = sizeof($tokens);
$iw = false; // ignore whitespace
$ih = false; // in HEREDOC
$ls = ""; // last sign
$ot = null; // open tag
for($i = 0; $i < $c; $i++) {
$token = $tokens[$i];
if(is_array($token)) {
list($tn, $ts) = $token; // tokens: number, string, line
$tname = token_name($tn);
if($tn == T_INLINE_HTML) {
$new .= $ts;
$iw = false;
} else {
if($tn == T_OPEN_TAG) {
if(strpos($ts, " ") || strpos($ts, "\n") || strpos($ts, "\t") || strpos($ts, "\r")) {
$ts = rtrim($ts);
}
$ts .= " ";
$new .= $ts;
$ot = T_OPEN_TAG;
$iw = true;
} elseif($tn == T_OPEN_TAG_WITH_ECHO) {
$new .= $ts;
$ot = T_OPEN_TAG_WITH_ECHO;
$iw = true;
} elseif($tn == T_CLOSE_TAG) {
if($ot == T_OPEN_TAG_WITH_ECHO) {
$new = rtrim($new, "; ");
} else {
$ts = " ".$ts;
}
$new .= $ts;
$ot = null;
$iw = false;
} elseif(in_array($tn, $IW)) {
$new .= $ts;
$iw = true;
} elseif($tn == T_CONSTANT_ENCAPSED_STRING
|| $tn == T_ENCAPSED_AND_WHITESPACE)
{
if($ts[0] == '"') {
$ts = addcslashes($ts, "\n\t\r");
}
$new .= $ts;
$iw = true;
} elseif($tn == T_WHITESPACE) {
$nt = #$tokens[$i+1];
if(!$iw && (!is_string($nt) || $nt == '$') && !in_array($nt[0], $IW)) {
$new .= " ";
}
$iw = false;
} elseif($tn == T_START_HEREDOC) {
$new .= "<<<S\n";
$iw = false;
$ih = true; // in HEREDOC
} elseif($tn == T_END_HEREDOC) {
$new .= "S;";
$iw = true;
$ih = false; // in HEREDOC
for($j = $i+1; $j < $c; $j++) {
if(is_string($tokens[$j]) && $tokens[$j] == ";") {
$i = $j;
break;
} else if($tokens[$j][0] == T_CLOSE_TAG) {
break;
}
}
} elseif($tn == T_COMMENT || $tn == T_DOC_COMMENT) {
$iw = true;
} else {
if(!$ih) {
$ts = strtolower($ts);
}
$new .= $ts;
$iw = false;
}
}
$ls = "";
} else {
if(($token != ";" && $token != ":") || $ls != $token) {
$new .= $token;
$ls = $token;
}
$iw = true;
}
}
return $new;
}
?>
Credits to gelamu function compress_php_src().
There's no reason to remove comments from your code. The overhead from processing them is so incredibly minimal that it doesn't justify the effort.
I think you are probably looking for php_strip_whitespace().

Categories