Mapping PHP script and file dependency structure - php

I have recently become an intern on a startup online classroom system. So now, I'm scrambling to learn the system, and get to know the code for the program, which is written in PHP. This program spans around 3000 PHP files and associated images, html pages, CSS files and so forth, across over a hundred folders.
I was wondering if there was some program or utility that could parse the files and directories and create a map of sorts, showing which PHP files include which other files, so that I could see quickly which files and scripts are no longer in use or obsolete, and which files depend on other files, and so forth. In other words, I can see the file and directory structure. I would now like to see the dependency structure, in terms of includes. Without having to open each file individually and track down the includes statements.
Any help would be appreciated!

It's not exactly what you want, but the "inclued" PECL extension is almost certainly going to help you. It works on a per-request basis, and maps out the file inclusion chain. It can even make pretty graphs!
Because it works on a request basis, unfortunately it can't map out your entire codebase for you.

Related

Dependencies graph for large PHP application

I've recently inherited a large PHP application with NO objects/modules/namespaces...only a lot of files containing functions.
Of course, there is a LOT of dependencies (and all files and almost always included).
I'm looking for a tool that could analyse the files and generate a dependencies graph. It would then be easier to detect independent files/set of files and re-factor the whole thing.
So far the best solution I've found would be to write a CodeSniffer sniff to detect all functions calls and then use that to generate the graph.
It seems something useful for other, so I'm sure tools already exists for it.
What would you recommend ?
I think that the best solution is use a doc generat + grapviz, PHPDocumentor looks to have a Grapviz extension at https://github.com/phpDocumentor/GraphViz
This is a example made with PHPDocumentor:
http://demo.phpdoc.org/Clean/graphs/classes.svg
Too you can use a hierarchical profiler like xhprof (https://github.com/facebook/xhprof), this can draw a tree of all call to functions from a execution.
A example form xhprof draw done by Graphviz
I could recommend a lightweight project I wrote few days ago. Basically I had a 300+ files PHP project and I wanted to detect what files do these files require/include and vice-versa. Moreover, I wanted to check for each individual file what files does this file requires/includes (directly or indirectly, ie. via file inheritance) and vice-versa: what are the files that include this particular file. For any combination of these I wanted an interactive dependency graph (base on file inclusion and not on class/function calls/usage).
Check out the project's sandbox and its source code.
Note that the whole thing was written in only 2 days so don't judge it
too harsh. What's important is that it's doing its job!

File and Folder Attributes - Programming API

I knew that PHP is able to read file content by different ways, for example: fread, file_get_contents, file, readfile, etc.
Currently, I am looking for an API that can read real index of files and folders in specific partition or folder, for example:
drive d:\ in windows contains three folders (folder1, folder2, folder3), and each folder contains some files, we can get these directory structure using PHP (opendir, scandir, readdir, etc) and list them as I want, however, windows saved file and folder names inside hard-disk with their attributes (size, last modified, created on, etc).
How I can read hard-disc using PHP and retrieving all file and folder attributes for a specific path?
for instance, if we consider last modified time we can use (filemtime()) function, but this attribute not saved inside the file, its saved some where else inside hard-drive, other attributes also saved in other location not inside the file.
When windows user copying file from flash-drive to local hard, windows will copy all file and folder attributes and saves them inside local hard drive. When using PHP for copying file, it depends on OS to handle this job, its not native support (as I think) for file and folder operations.
Do you have any idea?
There are many recovery program that uses this technology for reading hard-drive indexes, however, for PHP: I cant find any source for this problem.
Applications if I get correct answer:
I can check if such file securely deleted from my hard-drive? I can create secure delete application using PHP, or clearing hard-drive indexes for a given file.
Your help appreciated.
Problems with the proposition
The attributes of files, such as timestamps, permission flags etc, are stored in the file system (FAT, NTFS, Ext3 etc). As you say some of them can be read using PHPs different file and directory methods, but they all act through the OS file system abstraction and cant have access to block level information on the disk, such as what precise byte on disk stores the archive flag for file X. The whole point of the OS and FS is to abstract away this information from the user/client programs.
As suggested there are external tools, written in c or similar, that does have this access and that you can call from inside PHP. If you want a 'native' PHP way of doing this you'll have to compile a c extension for PHP that exposes these low level functions to you.
I'd say external tools is the way to go if you want to stick with PHP but for the task at hand, as far as we can see from your description, I'd go with another language that has more low level access. Like C or C++. PHP is a high level language for HTML pre processing and as such is a poor choice for low level system programming.
Practical advice
After looking through the PHP documentation and assorted third party libraries:
An of the shelf solution for reading file system information on a file allocation table level doesn't exist for PHP. The lowest level you get is the fstat() function, and that is not very far for what you want.
External tools
No mater exactly what you want to do there is probably a small binary that does it. PHP can be integrated with these programs, as suggested elsewhere, via the exec() function. This is probably the easiest approach for you unless you have serious amounts of time and/or development resources to devote to this problem.
Wrapping a library
There are libraries that solves this problem for you, written in low level languages. An open source library can be wrapped with SWIG to expose it to PHP. This will give you access to the low level methods you need, but it's a non trivial task. These kind of libraries also often require sole access to the device while they work on it, something that is difficult to achieve in most normal operating environments.
Note also that you will probably need a library per file system. Microsofts VFAT extension to FAT12/16/32 requiers a licens to use. So if you want to work with FAT and have files with long names (not 8.3 format) you'll have to fork up some dough to be legit.
Low level implementation
A last middle ground would be to write your own CLI tool that uses an external library to access the low level FS functions. You can then use exec() from inside PHP to interact with your own implementation.
This might be a reasonable path if you cant find an existing tool that solves your problem and you are not willing to spend the time to wrap a library.
In closing
You give a very narrow problem description with little to go on as for what the application is about. A broader discussion (in another forum) might yield better results since the problem might be better solved in another way entirely.
I found something on PHP.net which appears to do what you want:
http://php.net/manual/en/function.readdir.php#103418
Edit: I mis-understood the question. Attributes such as the last modified time, last accessed date and the like are stored in the file systems master file table. As far as I can tell, this isn't accessible with PHP, and if you were to write your own method to do this then you'd also have to account for different file systems as they all handle the storage of these attributes in their own unique way.
It could be that to get all of the information you're looking for is not possible with PHP without writing some form of extension to PHP itself.
Edit 2: Upon researching a little more...
http://php.net/manual/en/function.fileinode.php
This function could be an interesting one to look at.
Well if I understand correctly you just want to securely delete a file. You can just call [shred][1]
[1]: http://linux.die.net/man/1/shred via system or exec if you are on linux and you are good to go

Remove useless files from code base

Is there any tool out there which could tell the useless files in the code base?
We have a big code base (PHP, HTML, CSS, JS files) and I want to be able to remove the not needed files. Any help would be appreciated.
I'm guessing deleting files and running your phpunit tests is a none starter.
If your files are not already in a version-control system - add them. Having the files in a version control system (such as svn or git) is crucial to allow you to recover from deleting any files that you thought were not being used but you later find out were.
Then, you can delete anything you think may not be being used, and if it doesn't affect the running of your application you can conclude that the files aren't used. If adverse effects show up - you can restore them from your repository with ease.
The above is most appropriate (probably) for frontend files (css, js, images). Any files you delete that are requested will show up in your webserver error log giving you a quick reference for files that nolonger exist that you need to restore.
For your php files, that's quite a bit more tricky, How did you arrive at a position where you have php files which you aren't using? Anyway you could for example:
Use xdebug
Enable profiling
Use append mode (one profile)
Use all the functions of your application
and you would then have a profile which includes all files you loaded. Scanning the generated profile for each php file in your codebase will give you some indication of which files you didn't use.
If you are only looking for unused files, don't be tempted to use code coverage analysis - it is very intensive and not the level of detail you're asking for.
A slightly less risky way would be to log whenever a file is loaded. e.g. put this at line one of each file:
<?php file_put_contents('/some/location/fileaccess.log', __FILE__, FILE_APPEND); ?>
and simply leave your application to be used for a while (days, weeks). Thereafter just scan that log, for any file that is named - remove the above line of code. For any that are not - delete (preferably after looking for the filename in your whole sourcecode and confirming it's nowhere).
OR: you could use a shutdown function which dumps the response of get_included_files() to a log file. This would allow you to achieve the same without editing all php files in your source tree.
Caveat: Be careful deleting your php files. Whereas a missing css/js/image will probably mean your application still works, a missing php file of course will have rather more impact :).
If it is in Git why not delete the local file and then do a git rm <file name> to remove it from that branch.
Agree with everything said by #AD7six.
What you might like to try with PHP is to log the use of the files in someway (logging to flat file or database).
This technique does not have to be in place for long you can do it with an include and require_once at the top of each file.
That technique also works for javascript functions you can just print to the console each function, and then unit test your site. You can probably clean out a lot of redundant code that way.
The rest is not so easy, but version tracking is the way to go.

How to call a CSS file from the right place

I'm starting a project in PHP, and I want to structure my files properly from the start (unlike my last project, which had almost every file in a single directory). The problem is the following, which I will describe with an example:
Take the following files: index.php, includes/header.php, and css/common.css. index.php 'includes' the header (as will many other php files). The header then calls common.css so that its html elements can be placed properly. common.css will also provide styling for general elements in index.php and other files.
Notice that since the header is being included, when the header calls common.css, it does so from the location of the file calling it; in this case, index.php. But if I add, say, modules/friends.php and call the header with it, it will be looking for the CSS file in the wrong spot!
Initially I tried to remedy this by using the actual path for when I call CSS files. However, my local machine and web server have a different layout of directories, and therefore I cannot simply call /var/www/whatever.
Can anyone help me or redirect me to a place where this sort of thing is documented?
Thanks,
Paragon
Always specify absolute paths to all your resources: .css, .js, images, etc...
http://en.wikipedia.org/wiki/Absolute_path
However, my local machine and web server have a different layout of directories, and therefore I cannot simply call /var/www/whatever.
You can. Web paths is not the same thing as local filesystem paths. When you specify path in web - the root sign / specifies to the webroot (the directory your project is placed at), not your filesystem root.
Congratulations on recognizing a huge problem.
Yes, this is always the big, important question that you need to answer at the start.
I've finally learned -- and this is after quite a few years -- to try my best to make the file structure on the development machine (my PC, say) be exactly like the file structure on the host machine (a Linux host, for example). That one thing alone has saved me unending hours of grief.
If you can accomplish that, then the rest is a piece of cake, believe me. You can put files in whatever directories you want, wherever it makes sense to you, on both machines. You can figure out what files should go where.
If you don't bother to try for near-identical file-directory setups on both machines, you are forever going to be wondering, as you edit away, "Hey, what machine am I on? If I'm on the host, then very-important-file.php is in /toplevel, and everything else is under it. But if I'm on the PC, then very-important-file.php is over here in /my-files, see, and then other files are on different levels and did I delete that file and ..." My God, don't make me think, much less think about that mindless crap.
You can handle and remember just the root being in different spots on different machines, but other than that, forget it.
Now when you come to run your stuff, you will always know where the pieces of that stuff are: CSS files, JS files, whatever. PLUS you can (maybe; if you're lucky) debug your code on the PC or the host equally well, with no differences and with no changes anywhere. PLUS when you upload your new code, you can FTP it up to the host in one big chunk rooted where you like. (Which has the very nice ancillary benefit of your being able to move files around wherever you want on the development machine.)
Piece of cake! Don't pass up this chance to save yourself days or weeks (literally) of time.
Always IMHO.

PHP core code directory structure

I'm looking to centralize a lot of my web applications code, so that multiple components have access to the same core functionality. This is how I have the website set up:
/var/www/website - domain.com
/var/www/subdomain1 - subdomain1.domain.com
/var/www/subdomain2 - subdomain2.domain.com
Naturally I've had a lot of trouble when it comes to the duplication of common functionality, as any changes made to one area would also need to be applied to other areas. My proposed solution is to create a new directory in /var/www which will contain all of the core scripts:
/var/www/code - core code
I would then set the PHP include directory to /var/www/code, so scripts can include these files without having to specify the absolute path.
Can you think of any more efficient ways of centralizing the code code?
Many thanks!
Your approach is good enough for this purpose.
Little suggestion:
store your front-end scripts in directory like /var/www/website/www instead of /var/www/website. There will be index file and ajax processors and scripts like that. But your project-based inclusions (as well as other miscellaneous stuff) would be stored in directory like /var/www/website/includes. It is simple yet efficient defense from hacker attacks on your inclusion files
so, your document roots will be in /var/www/website/www (domain) and /var/www/website/subdomain/www/ (subdomain)
It seems that you are thinking correctly :
Share Code between multiple PHP sites
It's only a suggestion, but you should put the public content in the /var/www/* which may end being publicly accessible—either because of your http server or because of some misconfiguration—and create some other directories for your shared code/libs like /usr/local/lib/php/*.
For more security you should frame it with open_basedir adding the private and public dirs—as well as upload and session dirs.
And don't forget to version your libs, e.g.:
/usr/local/lib/php/myLib-1.0
/usr/local/lib/php/myLib-1.2
etc.
Thus, you'll be able to make changes without breaking everything.

Categories