Deleting large chunks of PHP effectively - php

I've just inherited a project, and been told that an entire folder, "includes/" needs to be removed due to licensing issues -- We don't have the right to redistribute the files in that folder, so we need to cut our dependencies on them, and fix whatever breaks. I've been told "Less than 5% of the lines in that folder are ever even called by our program", but I have no way of verifying this.
There are about 50 files in the folder, each with a couple hundred lines of code. There is no unit testing currently in place. There's one master file, include.php, that require()s all 49 other files, so I can't just grep for any file doing import() on includes/.*.
This is about as much detail as I've really figured out at this point. I spent all last week reading through the files in the includes/ folder, and it won't be hard to rewrite any of this, but I'm having trouble deciding where to start. I tried deleting the folder and slowly fixing things that break, but I'm afraid that this route will cause me to miss some crucial functions in my rewrite.
Can anyone point me in a direction to get started? Are there tools that will simplify this process? I'm looking at xdebug right now, but I'm not sure exactly how I'd use it for this.

You may want to search for "php code coverage." That should help you figure out what code is used. For instance, this appears like it might help:
http://www.xdebug.org/docs/code_coverage

Your initial approach isn't bad at all. It's certainly a reasonable place to start:
delete that code that isn't allowed.
try to run what's left.
if things break: create a stub for a method that is now missing, and set it to return some sensible "default" value for now.
goto 2.
Then, itemize all the things that were missing, and make a sensible schedule to re-implement each thing.

I would start by grepping for files that reference include.php. Check through them if they're manageable, one by one. Then I'd grep for each of the functions in the /include/*php files. See if they're called anywhere, find 'em, replace 'em.
Because PHP is so dynamically typed, I don't think there's going to be a tool for this.
(Eagerly awaiting someone to prove me wrong because I have similar tasks all the time... )

See SD PHP Test Coverage Tool. It will provide a visual view of what code actually executes, as well as a report on what parts of files are used (including "no parts", which is your cue that
the code is a likely candidate to delete).
It doesn't require any hand-modifications of your code, or any unit tests to run it.

To answer my own question, I wound up using xdebug profiler to do the job, as I was initially investigating (after a friend's suggestion prompted me to take a second look).
In my /etc/php5/apache2/conf.d/xdebug.ini (on ubuntu 9.10), I set xdebug.profiler_enable=1 and xdebug.profiler_output_dir=/var/log/xdebug/, then loaded up the resulting cachegrind files with KCacheGrind and just ran a search on filenames for "includes/".
Now I have a mountain of work ahead of me to remove all this, but at least I've got a good overview of what I'll be modifying!

Related

phpdoc does not update my documentation

phpDocumentor v1.4.4
Fedora 24
Command line: phpdoc -d ./docsrc -t ./output
I am running phpDocumentor on Fedora 24 and have successfully generated documentation for my project one time.
I added a docblock to a function, and ran phpdoc again. But the output has not been updated. I verified the time stamps of the files and they have been regenerated, but do not reflect the changes.
I subsequently made numerous changes, and reran phpdoc after each change, but the generated documentation does not update.
I erased all the output files, renamed the directory of the input files, in short have done all I can to persuade phpdoc to generate new documentation that reflects the changes to my php files to no avail.
It would seem that phpdoc is caching the output somewhere but I cannot find where. I searched every path on my disk containing phpdoc then searched for the word "cache" in each path but it does not occur.
I tried changing the template with the --template directive but it does not recognise this directive.
I have tried using the --force directive but it does not recognise this directive.
Can someone enlighten me?
Cheers,
Peter
This sounds like one of those times where I would just walk through the process from the beginning:
Am I modifying source in the ./docsrc directory tree? Verify by opening the source member in vi/vim/nano/some-other-editor just to be sure the source has changed.
Have I modified the source using the correct syntax? (Please post some code that shows documentation that isn't being updated)
Modify documentation in another file with a simple change and see if that simple change appears when I regenerate my documentation.
Am I explicitly --ignore-ing the file or directory I'm expecting to change? (You don't appear to be)
Do I have a phpdoc.xml or phpdoc.dist.xml file with an <ignore> directive? details
Do I have the necessary permissions to create/update files in the ./output directory?
After I've executed phpdoc -d ./docsrc -t ./output do I see the expected change when using vi/vim/nano/some-other-editor?
Is my browser caching previous versions of the documentation? (I know you've already ruled this out Peter, I'm just trying to make my answer complete)
This is EXACTLY one reason why I created PHPFUI/InstaDoc! The problem with most documentation is that it is static. While that is great for libraries that don't change, if you want to document your own code, guess what? It tends to change every day! With InstaDoc, you can see the documentation instantly on your local machine before you even check it in. InstaDoc creates the documentation when you request the page. It is hands down the fastest documentation system out there. Most documentation systems create static pages and brag about how fast they can create the documentation. But guess what? Who cares? What you want is to see the documentation of your current code base right now. Turns out it only takes a few seconds to scan through all the files of the libraries you are using. InstaDoc caches that information, so you only have a long scan (and then only seconds) the first time, or when ever you add a new library.
Once you have a library scanned, the documentation comes up instantly, since it uses PHP reflection classes to read the file and display the documentation. So that file you just modified, it is completely 100% documented. Don't like the comments, change them, refresh the page. See an issue, correct it, refresh the page. Notice something could be better? Refresh the page. Want to check out the docs on a PR? Easy, just delete the cached index and refresh the page.
InstaDoc is open source and still young. Check it out and submit comments or PR's if it does not meet your needs, but it is the future of documentation. It will also generate static files for high volume sites, but the most important feature is that it gives you an instant reflection of your just edited code, and that is what makes it awesome.

How to debug a PHP script that never finishes loading?

I have been tasked with setting up a website on various environments for different stages of evaluation (dev/test/staging/etc).
On our staging environment however, it seems there is some difference preventing the PHP script from finishing, so the page is never delivered to the browser.
I'm wondering if there is a way I can output to log some sort of stack trace or backtrace upon cutting the connection, or is there some other method to find out what exactly PHP is doing at any given point in the script's life cycle?
It's a Drupal site, so it involves a lot of code I'm not familiar with, and could take hours to sprinkle die; commands throughout to see where the script is loading to.
I understand I should probably be looking at the differences in environments, however all should have very similar configuration (Ubuntu 11.04) and the staging environment seems entirely happy to serve other PHP sites whilst this particular site is refusing to finish. If anything this staging site has more resources available that other environments which are not having problems.
UPDATE: Sorry all, found the problem in the end. The staging environment was on a VLAN that was not permitted to access itself via public IP, and for whatever reason (still confused about this) it was trying to access itself as part of the page load and never completing the request. Setting a hosts file entry for 127.0.0.1 fixed the issue.
Debugging an issue like this step-by-step using a tool like xDebug is an option, but will probably take a long time -- finding where to put the breakpoints is going to be on about the same level as working out where to put die statements around the code. The debugger option is a better way of doing it, but won't save much in comparison, when you have a problem like this where you have an unknown blocker somewhere in large amounts of unknown code.
But xDebug also has a profiler tool which can show you what functions were called during the program run, how long they took, and highlight where the bottlenecks are. This will probably be a better place to start. Just configure xDebug to generate a profiler trace, and then use kCacheGrind to view the trace in a graphical environment.
If your program is getting stuck in a loop or something specific is taking a long time to complete, this will pinpoint the problem almost straight away; you'll be able to see exactly which function is taking the time, and what the call chain looks like to get to it.
It's quite possible that once you've seen that, you'll be able to find the problem just by looking at the relevant code. But if you can't, you can then use xDebug's step-thru debugger to analyse the function as it runs and see what the variables are set to to see why it's looping.
xDebug can be found here: http://www.xdebug.org/
Use xDebug.
Its very easy to install and use.
it has few options like breakpoints and step by step to track status of PHP script before finishes loading
and you can download xDebug from here http://www.xdebug.org/
step by step tutoril for set up xdebug is availble at sachithsays.blogspot.com/

xdebug, having problems with profiler output

Right, since watching Rasmus Lerdorf's talk on PHP performance I've been wanting to profile the ERP / Accounting application I am working on, not least because I know there are performance issues with it, profiling should highlight the major problems for me to investigate.
So downloaded xdebug and put the following few lines in my php.ini file:
zend_extension="/usr/lib/php5/20090626+lfs/xdebug.so"
xdebug.profiler_output_dir="/home/me/xdebug/profiles/"
xdebug.ptofiler_enable_trigger=On
With this I simply aim my browser as my app with &XDEBUG_PROFILE in the query string and the profiling begins. The problem is the output I am viewing with KCacheGrind doesn't include any of the functions from with my application, and no flow between entities.
When the page is executing I copied (in the terminal) the profile file to a separate file, to capture it's state throughout the profile. I loaded each of these separately into KCacheGrind and they all show the full profile of the application, all but the last one?
Can anyone tell me why the full profile isn't being output? Looking at the file sizes of my copied files it appears the first few are rather large, but the last one is significantly smaller, is xdebug messing with them after it has been captured?
Many thanks :-)
EDIT
Just to help, this is what I see when I open up one of the copied profiles (before the profile has completed), I'm sure there is much more to this.
And this is what I get from the final profile, no relationships, just a bunch of PHP functions. I want to see all the full profile.
EDIT 2
So here I am constantly running the ls -als command, the last list is the cut down version, the previous one is the last ls where the file was at it's full size.
I cannot upload the large file as it's over 3 million lines long, if it helps here is the xdebug php info section.
Right, I've actually solved the problem myself, I added this option to my php.ini file:
xdebug.profiler_append=1
This will append the data to the same filename if it exists, therefore I'll need to make sure the filename option is set correctly, but I think that has solved my problem for now.
Thanks to those that answered :-)

Will including unnecessary php files slow down website?

The question might prompt some people to say a definitive YES or NO almost immediately, but please read on...
I have a simple website where there are 30 php pages (each has some php server side code + HTML/CSS etc...). No complicated hierarchy, nothing. Just 30 pages.
I also have a set of purely back-end php files - the ones that have code for saving stuff to database, doing authentication, sending emails, processing orders and the like. These will be reused by those 30 content-pages.
I have a master php file to which I send a parameter. This specifies which one of those 30 files is needed and it includes the appropriate content-page. But each one of those may require a variable number of back-end files to be included. For example one content page may require nothing from back-end, while another might need the database code, while something else might need the emailer, database and the authentication code etc...
I guess whatever back-end page is required, can be included in the appropriate content page, but one small change in the path and I have to edit tens of files. It will be too cumbersome to check which content page is requested (switch-case type of thing) and include the appropriate back-end files, in the master php file. Again, I have to make many changes if a single path changes.
Being lazy, I included ALL back-end files inthe master file so that no content page can request something that is not included.
First question - is this a good practice? if it is done by anyone at all.
Second, will there be a performance problem or any kind of problem due to me including all the back-end files regardless of whether they are needed?
EDIT
The website gets anywhere between 3000 - 4000 visits a day.
You should benchmark. Time the execution of the same page with different includes. But I guess it won't make much difference with 30 files.
But you can save yourself the time and just enable APC in the php.ini (it is a PECL extension, so you need to install it). It will cache the parsed content of your files, which will speed things up significantly.
BTW: There is nothing wrong with laziness, it's even a virtue ;)
If your site is object-oriented I'd recommend using auto-loading (http://php.net/manual/en/language.oop5.autoload.php).
This uses a magic method (__autoload) to look for a class when needed (it's lazy, just like you!), so if a particular page doesn't need all the classes, it doesn't have to get them!
Again, though, this depends on if it is object-oriented or not...
It will slow down your site, though probably not by a noticable amount. It doesn't seem like a healthy way to organize your application, though; I'd rethink it. Try to separate the application logic (eg. most of the server-side code) from the presentation layer (eg. the HTML/CSS).
it's not a bad practice if the files are small and contains just definition and settings.
if they actually run code, or extremely large, it will cause a performance issue.
now - if your site has 3 visitors an hour - who cares, if you have 30000... that's another issue, and you need to work harder to minimize that.
You can migitate some of the disadvantages of PHP code-compiling by using XCache. This PHP module will cache the PHP-opcode which reduces compile time and performance.
Considering the size of your website; if you haven't noticed a slowdown, why try to fix it?
When it comes to larger sites, the first thing you should do is install APC. Even though your current method of including files might not benefit as much from APC as it could, APC will still do an amazing job speeding stuff up.
If response-speed is still problematic, you should consider including all your files. APC will keep a cached version of your sourcefiles in memory, but can only do this well if there are no conditional includes.
Only when your PHP application is at a size where memory exhaustion is a big risk (note that for most large-scale websites Memory is not the bottleneck) you might want to conditionally include parts of your application.
Rasmus Lerdorf (the man behind PHP) agrees: http://pooteeweet.org/blog/538
As others have said, it shouldn't slow things down much, but it's not 'ideal'.
If the main issue is that you're too lazy to go changing the paths for all the included files (if the path ever needs to be updated in the future). Then you can use a constant to define the path in your main file, and use the constant any time you need to include/require a file.
define('PATH_TO_FILES', '/var/www/html/mysite/includes/go/in/here/');
require_once PATH_TO_FILES.'database.php';
require_once PATH_TO_FILES.'sessions.php';
require_once PATH_TO_FILES.'otherstuff.php';
That way if the path changes, you only need to modify one line of code.
It will indeed slow down your website. Most because of the relative slow loading and processing of PHP. The more code you'd like to include, the slower the application will get.
I live by "include as little as possible, as much as necessary" so i usually just include my config and session handling for everything and then each page includes just what they need using an include path defined in the config include, so for path changes you still just need to change one file.
If you include everything the slowdown won't be noticeable until you get a lot of page hits (several hits per second) so in your case just including everything might be ok.

Best Practices for locating a function definition in PHP

Is there a simple way to find the file path to where a function is defined? I currently use dreamweavers FIND in an entire directory. Would be nice to have something that doesn't require downloading the entire site tho.
Any suggestions?
Personally I use an IDE like Netbeans or Eclipse PDT. In the case of Netbeans you can ctrl-click on a function and it'll take you to the definition. Sometimes there is a choice in which case it'll make you select one.
But its generally bad form to reuse a function name within your code in different files. It can lead to hard-to-find bugs because it's hard for any program to figure out exactly which one function is actually getting called since source files can be included dynamically.
Would be nice to have something that doesnt require downloading the entire site tho.
I hope this doesn't mean that you're modifying the site remotely.
Have a local working copy, make the changes, test them locally, then upload the changes.
A simple combo of vim and ctags makes the "go to definition" task a piece of cake.
You can't search for something (and expect to find it) unless you have a copy of all the files it might be in.
A number of IDEs have the ability to click and go from a use of a variable or function to its definition. If not that, then a multi-file searching tool within your editor, or something from a command line (such as ack) that is a little more specialised at searching source code can help. Good naming conventions can also help a lot for consistency.
It's not the question, but why don't you have a copy of the site locally - and while you are at it, keep it in version control as well?
I'd sure like this get_functionPath() ability and anyone that has extensively had to work on other people's code would probably find it incredibly useful. We have function_exists, if that could simply return the file the function is defined in for user defined functions it would save a TON of trouble. No, not all of us use IDEs, and yes some of us have been doing this long enough to code on the production machine. Test boxes and sandboxes are for rookies.
One trick is to purposely trigger an error in the function you are trying to locate. Can save a ton of time.
You'd need to use some kind of tool that could build an index on a remote filesystem that you could download and perform local lookup and search upon. I don't know of anything that can do this and a few moments with Google didn't turn up anything.
Maybe a good idea for an open source project? hinthint
so there is no function that would do this? Something like get_class() which would output the parent class but in the case the file path on the server...

Categories