Remove useless files from code base

Remove useless files from code base - php

Is there any tool out there which could tell the useless files in the code base?
We have a big code base (PHP, HTML, CSS, JS files) and I want to be able to remove the not needed files. Any help would be appreciated.

I'm guessing deleting files and running your phpunit tests is a none starter.
If your files are not already in a version-control system - add them. Having the files in a version control system (such as svn or git) is crucial to allow you to recover from deleting any files that you thought were not being used but you later find out were.
Then, you can delete anything you think may not be being used, and if it doesn't affect the running of your application you can conclude that the files aren't used. If adverse effects show up - you can restore them from your repository with ease.
The above is most appropriate (probably) for frontend files (css, js, images). Any files you delete that are requested will show up in your webserver error log giving you a quick reference for files that nolonger exist that you need to restore.
For your php files, that's quite a bit more tricky, How did you arrive at a position where you have php files which you aren't using? Anyway you could for example:
Use xdebug
Enable profiling
Use append mode (one profile)
Use all the functions of your application
and you would then have a profile which includes all files you loaded. Scanning the generated profile for each php file in your codebase will give you some indication of which files you didn't use.
If you are only looking for unused files, don't be tempted to use code coverage analysis - it is very intensive and not the level of detail you're asking for.
A slightly less risky way would be to log whenever a file is loaded. e.g. put this at line one of each file:
<?php file_put_contents('/some/location/fileaccess.log', __FILE__, FILE_APPEND); ?>
and simply leave your application to be used for a while (days, weeks). Thereafter just scan that log, for any file that is named - remove the above line of code. For any that are not - delete (preferably after looking for the filename in your whole sourcecode and confirming it's nowhere).
OR: you could use a shutdown function which dumps the response of get_included_files() to a log file. This would allow you to achieve the same without editing all php files in your source tree.
Caveat: Be careful deleting your php files. Whereas a missing css/js/image will probably mean your application still works, a missing php file of course will have rather more impact :).

If it is in Git why not delete the local file and then do a git rm <file name> to remove it from that branch.

Agree with everything said by #AD7six.
What you might like to try with PHP is to log the use of the files in someway (logging to flat file or database).
This technique does not have to be in place for long you can do it with an include and require_once at the top of each file.
That technique also works for javascript functions you can just print to the console each function, and then unit test your site. You can probably clean out a lot of redundant code that way.
The rest is not so easy, but version tracking is the way to go.

Related

How to track the file that loads an specific line from a rendered HTML

I got a website in Wordpress and recently we discovered that it was infected by several malware scripts that insert scripts using the common base64 and eval functions like this:
We were able to solve most of the infected files but there are still some scripts being injected into the index.html, like these:
All these scripts marked in red make a requests to sites that immediately trigger my computer antivirus.
So question here is, how can I track which file loads these lines? How can I know which file prints them? I can't just search for the string since the code is encrypted like on the first image...

The truth is, it's probably going to be more than one file, and/or it's going to be something hidden deep in a plugin/upload folder.
This is going to be a bit time-consuming, but these are generally the steps I follow when fixing a hacked site to narrow things down and make sure I got all the crap out:
1) Before you do anything else, make sure you have a backup of both the files and db. That way, if you accidentally delete something, it's easy to restore.
2) Delete any unused themes or plugins, and make sure all existing plugins are up-to-date.
3) Update WordPress to the current version. Seriously. Keeping up-to-date is important. If you're more than two major releases behind, you'll want to update incrementally. (https://codex.wordpress.org/Upgrading_WordPress_-_Extended_Instructions)
4) After you've updated, connect via FTP and look for files older than when you updated. Look for extra files that shouldn't be there--this can be tricky, because hacked files are usually named things like wp-shortcode-s.php. I usually have a copy of WP core files open in a window beside my FTP client as a reference.
5) Check the first few lines of code on php and js files in your plugins folder for malicious code. Again, you might want to have a freshly downloaded copy of the plugin to compare files to.
6) Check the uploads folder and subfolders for malicious files.
I also keep checking my hacked site here to see how I'm doing:
http://isithacked.com/
And when you're finished, you might want to read up on how to harden WP to make it more difficult to hack.

Depending on the source of the malware, it's hard to give you a precise hint. There are a few more in-depth walk-through about the topic you can find on Google, here are some good examples which could help:
https://www.wordfence.com/docs/how-to-clean-a-hacked-wordpress-site-using-wordfence/
https://blog.sucuri.net/2011/02/cleaning-up-an-infected-web-site-part-i-wordpress-and-the-pharma-hack.html
Also if you are on a shared host, potentially the issue could be coming from an other compromised user. Hopefully you have a clean version of the site so that potentially moving to an other host (and upgrading) is an option.

Potential Security Issues with PHP's ZipArchive

I want to allow members the option of uploading content using a zip file. Once uploaded, I want to use PHP's ZipArchive class to decompress the zip file contents to a directory, and then move the files into our system.
I'm concerned about the potential security risks though, and I can't find any documentation on php.net. The first (Well, the only) risk that comes to mind, is someone creating a zip file with relative paths like "../../etc/passwd" (If they assume I decompress the file in /tmp/somedir).
I'm actually having a hard time creating a relative path in a zip file, so I can't test if such a thing would be possible. I also can't find any way to extract the contents of the zip file using ZipArchive, and have it ignore directories (Decompress all the files, but don't create the directory structure inside the zip).
Can anyone tell me if such an exploit is possible, and/or how to ignore the directory structure in a zip file using ZipArchive?

Interesting question, but I urge you to go about this a different way. I would highly recommend you run your web process with least privileges in a chroot jail. Assuming you do that, the WORST thing that can happen is your website get's defaced, and then you restore a backup and do some forensics to plug that specific hole.
New holes are discovered constantly, you will have a very difficult time completely securing your website going after hunches like these. Minimizing the attacker's sandbox really goes a long way.

I had the same concerns and had a look at the PHP 5.3 source code where I found this:
/* Clean/normlize the path and then transform any path (absolute or relative)
to a path relative to cwd (../../mydir/foo.txt > mydir/foo.txt)
*/
virtual_file_ex(&new_state, file, NULL, CWD_EXPAND TSRMLS_CC);
path_cleaned = php_zip_make_relative_path(new_state.cwd, new_state.cwd_length);
if(!path_cleaned) {
return 0;
}
Looks fine to me. Checkout PHP and see ./ext/zip/php_zip.c for details.

You need to make sure that the extracted contents are not served directly by your application server. So if someone has a php file in his archive that he cant execute it via your webserver.
Another thing is you should keep things safe from being included in user generated content. But this should be considered also without having zip archives in place.

In the end I'm going with Pekka's solution, of using the command line unzip utility. It provides switches to ignore directories in the zip file. The concerns others have pointed out aren't an issue here. Once the files are unzipped, we add them to the system using the same process as our regular uploads, which means each file is scrutinized using the security measures we already have in place.

Is it possible to create a self-installing PHP framework?

Ok this might seems a bad idea or an obvious one. But let's imagine a CMS like PHPBB. And let's imagine you'd build one. I'd create just 1 file called PHPBB.install.php and running it it will create all folders and files needed with PHP. I mean, the user run it just once and every file and folder of the app is created via the PHP file.
Why to do this?
Well mostly because it's cleaner and you are pretty much sure it creates everything as you wish (obliviously checking everything about the server first). Also, having all the files backed-up inside a file you would be able to restore it very easily by deleting everything and reinstalling it running again PHPBB.install.php. Backing-up files like this will allow you to also prevent errors: How? When an error occurred in a file, this file is restored as it was and automatically re-run.
It would be too heavy!
The installation would happen only once and you'd be sure the user will not forget to place the files correctly. The error-preventing will worth the cause and it would also happen only once.
Now the questions:
Does this technique exists? If so, What's its name?
Why would you discourage it?

As others have said, an installer.
It requires the web server to have permission to write to the filesystem, and ends up having the files owned by the user the web server runs as. Even when one has the ability to change filesystem permissions, it's usually a longer process than just extracting an archive and having the initial setup verify permissions.

Does this technique exists? If so, What's its name?
I'd advise to read about __halt_compiler(). It allows you to mix PHP code with non-php data which is not parsed, so you may have PHP code ("installer") and binary data (e.g., compressed contents of all the files) in single PHP file.

1 - Yes, there is a single install file in PHPBB. You run through an online wizard defining your settings and then it installs automatically.
http://www.phpbb.com/support/documents.php?mode=install&version=3&sid=908f5766fc04868ccb985c1b1e6dee4b#quickinstall
2 - The only reason to discourage it would be if you want the user to understand exactly how the system works. Automatically installing it means the user has no need to understand the nitty gritty of it all - of course, many see this as a good thing.

Mapping PHP script and file dependency structure

I have recently become an intern on a startup online classroom system. So now, I'm scrambling to learn the system, and get to know the code for the program, which is written in PHP. This program spans around 3000 PHP files and associated images, html pages, CSS files and so forth, across over a hundred folders.
I was wondering if there was some program or utility that could parse the files and directories and create a map of sorts, showing which PHP files include which other files, so that I could see quickly which files and scripts are no longer in use or obsolete, and which files depend on other files, and so forth. In other words, I can see the file and directory structure. I would now like to see the dependency structure, in terms of includes. Without having to open each file individually and track down the includes statements.
Any help would be appreciated!

It's not exactly what you want, but the "inclued" PECL extension is almost certainly going to help you. It works on a per-request basis, and maps out the file inclusion chain. It can even make pretty graphs!
Because it works on a request basis, unfortunately it can't map out your entire codebase for you.

How to self-update PHP+MySQL CMS?

I'm writing a CMS on PHP+MySQL. I want it to be self-updatable (throw one click in admin panel). What are the best practices?
How to compare current version of cms and a version of the update (application itself and database). Should it just download zip archive, upzip it and overwrite files? (but what to do with files that are no longer used). How to check if an update is downloaded correctly? Also it supports modules and I want this modules to be downloadable from the admin panel of cms.
And how should I update MySQL tables?

Keep your code in a separate location from configuration and otherwise variable files (uploaded images, cache files, etc.)
Keep the modules separate from the main code as well.
Make sure your code has file system permissions to change itself (use SuPHP for example).
If you do these, simplest would be to completely download the new version (no incremental patches), and unzip it to a directory adjacent to the one containing the current version. Because there won't be variable files inside the code directory, you can just remove or rename the old one and rename the new one to replace it.
You can keep the version number in a global constant in the code.
As for MySQL, there's no other way than making an upgrade script for every version that changes the DB layout. Even automatic solutions to change the table definition can't know how to update the existing data.

A slightly more experimental solution could be to use something like the phpsvnclient library.
With features:
List all files in a given SVN repository directory
Retrieve a given revision of a file
Retrieve the log of changes made in a repository or in a given file between two revisions
Get the repository latest revision
This way you can see if there are new files, removed files or updated files and only change those in your local application.
I recon this will be a little harder to implement, but the benefit would probably be that it is easier and quicker to add updates to your CMS.

You have two scenarios to deal with:
The web server can write to files.
The web server can not write to files.
This just dictates if you will be decompressing a ZIP file or using FTP to update the files. In ether case, your first step is to take a dump of the database and a backup of the existing files, so that the user can roll back if something goes horribly wrong. As others have said, its important to keep anything that the user will likely customize out of the scope of the update. Wordpress does this nicely. If a user has made changes to core logic code, they are likely smart enough to resolve any merge conflicts on their own (and smart enough to know that a one click upgrade is probably going to lose their modifications).
Your second step is to make sure that your script doesn't die if the browser is closed. This is a process that really should not be interrupted. You could accomplish this via ignore_user_abort(true);, or some other means. Or, if you like, allow the user to check a box that says "Keep going even if I get disconnected". I'm assuming that you'll be handling errors internally.
Now, depending on permissions, you can either:
Compress the files to be updated to the system /tmp directory
Compress the files to be updated to a temporary file in the home directory
Then you are ready to:
Download and decompress the update en situ , or in place.
Download and decompress the update to the system's /tmp directory and use FTP to update the files in the web root
You can then:
Apply any SQL changes as needed
Ask the user if everything went OK
Roll back if things went badly
Clean up your temp directory in the system /tmp directory, or any staging files in the user's web root / home directory.
The most important aspect is making sure you can roll back changes if things went bad. The other thing to ensure is that if you use /tmp, be sure to check permissions of your staging area. 0600 should do nicely.
Take a look at how Wordpress and others do it. If your choice of licenses and their's agree, you might even be able to re-use some of that code.
Good luck with your project.

There is a SQL library called SQLOO (that I created) that attempts to solve this problem. It's a little rough still, but the basic idea is that you setup the SQL schema in PHP code and then SQLOO changes the current database schema to match the code. This allows for the SQL schema and attached PHP code to be changed together and in much smaller chunks.
http://code.google.com/p/sqloo/
http://code.google.com/p/sqloo/source/browse/#svn/trunk/example <- examples

Based on experience with a number of applications, CMS and otherwise, this is a common pattern:
Upgrades are generally one-way. It's possible to take a snapshot of full system state for a restore upon failure, but to restore usually entails losing any data/content/logs added to the system since the upgrade. Performing an incremental rollback can put data at risk if something were not converted properly (e.g. database table changes, content conversions, foreign key constraints, index creation, etc.) This is especially true if you've made customizations that rollback scripts couldn't possibly account for.
Upgrade files are packaged with some means of authentication/verification, such as md5 or sha1 hashes and/or digital signature to ensure it came from a trusted source and was not tampered. This is particularly important for automated upgrade processes. Suppose a hacker exploited a vulnerability and told it to upgrade from a rogue source.
Application should be in an offline mode during the upgrade.
Application should perform a self-check after an upgrade.

I agree with Bart van Heukelom's answer, it's the most usual way of doing it.
The only other option would be to turn your CMS into a bunch of remote Web Services/scripts and external CSS/JS files that you host in one location only.
Then everyone using your CMS would connect to your central "CMS server" and all that would be on their (calling) server is a bunch of scripts to call your Web Services/scripts that do all the processing and output. If you went down this route you'd need to identify/authenticate each request so that you returned the corresponding data for the given CMS user.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.