I ran composer in order to use guzzle. It resulted in these directories:
composer
gabrieldarezzo
guzzlehttp
psr
ralouphie
symfony
I noticed the file car.png in the gabrieldarezzo/colorizzar/. That whole directory seemed useless for guzzle so I deleted it and the code still works. I tried deleting some of the other directories, one at a time, but to code failed. Is there a way to know which files are actually required?
Edited after comments:
The purpose of this question is to ask if all of the files composer adds are necessary. I re-ran composer to a new location and it installed version 6.5.8. The gabrieldarezzo was not included so I must have ran composer for some other package at some point. From all of the replies I can see that the answer to my question is yes, they are required. I appreciate all of the replies to this.
That whole directory seemed useless for guzzle so I deleted it and the code still works.
This statement is meaningless without talking about what code still worked - in other words, which files are required depends on what you're actually doing.
If you ask Composer to install Guzzle and then write a PHP file that just says echo 'Hello world'; then you could delete the whole of the vendor directory, and clearly nothing would break. Or you could write echo \GuzzleHttp\RequestOptions::ALLOW_REDIRECTS; and delete everything except for vendor/guzzle/guzzlehttp/src/RequestOptions.php where that constant is defined.
Is there a way to know which files are actually required?
In theory, you could statically analyze a piece of code, recursively identifying which pieces of code were reachable, and therefore what the minimum set of files used would be. You could also monitor a running application and see which files it opened, at the PHP autoloader level or even at the OS / file system level.
But the question is why do you care?
It's important to understand that no file will actually be read and loaded into memory unless it is referenced in some way. This is the purpose of autoloading. So deleting files will not make any difference to the compilation or execution speed of your application.
Deleting the files will reduce the disk space needed to store the application, but it would be rare for the space involved to be a significant proportion of what you have available. It would also reduce the bandwidth needed to deploy it, but source code generally compresses well, so once bundled into something like a tar.gz, this saving is generally also insignificant.
A final note which might be relevant is that none of these files should be committed in your version history. You should commit composer.json and composer.lock, and mark the entirety of the vendor directory as "ignored" (e.g. in a .gitignore file). You can then get the exact dependencies used by any version by running composer install, which reads the versions from composer.lock.
Whenever you require third-party packages without deeply reviewing and track-listing every single file with cryptographically secure content hashes of the reviews outcome, you can easily run into the situation you describe.
The problem with "randomly" deleting such directories is that it does not replace proper dependency checking, but merely prevents the (now removed) code from being loaded onto production systems to be analysed and executed in memory (or in the case of the car.png for some other reason?).
Now between those two poles there is a lot of room and commonly development projects and the people running them aren't important enough that dependencies actually get reviewed thoroughly albeit most of them come without fitness for a particular purpose and a disclaimer in very bold letters.
However if that is a project you look into and you find that some files look fishy, report the issue to the project if you care and it means something to you.
Sometimes projects do not make (extended) use of the dist distribution of a Composer Package (concept) and there is room for improvement (it should go without saying that this is with no judgement of any of those projects practices). E.g. to exclude development resources in production use. This has the benefit that you don't need to remove files "randomly" but you cooperate with actual developers and software distributors.
Related
Is there any tool out there which could tell the useless files in the code base?
We have a big code base (PHP, HTML, CSS, JS files) and I want to be able to remove the not needed files. Any help would be appreciated.
I'm guessing deleting files and running your phpunit tests is a none starter.
If your files are not already in a version-control system - add them. Having the files in a version control system (such as svn or git) is crucial to allow you to recover from deleting any files that you thought were not being used but you later find out were.
Then, you can delete anything you think may not be being used, and if it doesn't affect the running of your application you can conclude that the files aren't used. If adverse effects show up - you can restore them from your repository with ease.
The above is most appropriate (probably) for frontend files (css, js, images). Any files you delete that are requested will show up in your webserver error log giving you a quick reference for files that nolonger exist that you need to restore.
For your php files, that's quite a bit more tricky, How did you arrive at a position where you have php files which you aren't using? Anyway you could for example:
Use xdebug
Enable profiling
Use append mode (one profile)
Use all the functions of your application
and you would then have a profile which includes all files you loaded. Scanning the generated profile for each php file in your codebase will give you some indication of which files you didn't use.
If you are only looking for unused files, don't be tempted to use code coverage analysis - it is very intensive and not the level of detail you're asking for.
A slightly less risky way would be to log whenever a file is loaded. e.g. put this at line one of each file:
<?php file_put_contents('/some/location/fileaccess.log', __FILE__, FILE_APPEND); ?>
and simply leave your application to be used for a while (days, weeks). Thereafter just scan that log, for any file that is named - remove the above line of code. For any that are not - delete (preferably after looking for the filename in your whole sourcecode and confirming it's nowhere).
OR: you could use a shutdown function which dumps the response of get_included_files() to a log file. This would allow you to achieve the same without editing all php files in your source tree.
Caveat: Be careful deleting your php files. Whereas a missing css/js/image will probably mean your application still works, a missing php file of course will have rather more impact :).
If it is in Git why not delete the local file and then do a git rm <file name> to remove it from that branch.
Agree with everything said by #AD7six.
What you might like to try with PHP is to log the use of the files in someway (logging to flat file or database).
This technique does not have to be in place for long you can do it with an include and require_once at the top of each file.
That technique also works for javascript functions you can just print to the console each function, and then unit test your site. You can probably clean out a lot of redundant code that way.
The rest is not so easy, but version tracking is the way to go.
Recently I've read this article: http://www.smashingmagazine.com/2009/09/25/svn-strikes-back-a-serious-vulnerability-found/
Developers of many popular sites like apache.org, php.net (http://ru2.php.net/.svn/entries), classmates.com and russian Yandex use SVN, but do not follow the recommendations given by SVN (to use command export).
So, what are the reasons for not using svn export instead of updating the public copy like all they do?
Some people, not including myself, think that to deploy onto production you should just issue an svn up. If you do an export it loses the meta data about the versioning so you can t do that, you have to use another mechanism for tracking which version is where. It is an easy solution, but I think it can make for lazy packaging and also for "fixing in production" as if you do this you can also check back in from production...
From my perspective what I do is lock off/block access to any .svn files on the server (either Apache2 or IIS) this way the hidden folders are not accessible externally, and it allows for version tracking for sites that we use which do not require compiling before rollout
Languages like:
PHP
ASP (not .NET)
PLAIN HTML
COLDFUSION
PDF / IMAGE versioning (if needed, in my case we needed it for updated PDF docs for customers).
So certainly you can use SVN for web development, but you do need to be cautions as you expose your .svn folders to the world if you are not cautious. Otherwise it is a tool you could use to make your job easier and more efficient.
With that said, we simply run an SVN UPDATE on our production to update changed files, and with limited developers working on one piece of code at a time (like I said in my case) we don't get mixups with wrong things getting deployed. PLUS to be safe, always do a SVN CHECK FOR MODIFICATIONS to see what is going to be updated, and hey, if you do make a mistake, roll it back.
With svn export files can never get deleted, only added and modified. This could be issue sometimes.
When the entire website is open-source and available for downloading over a public resource (like PHP's). Protecting the .svn directories so other's can't get the source code is probably not worth the effort over simply doing a svn up.
I am developing (solo web developer) a rather large web based system which needs to run at various different locations. Unfortunately, due to some clients having dialup, we have had to do this and not have a central server for them all. Each client is part of our VPN, and those on dialup/ISDN get dialed on demand from our Cisco router. All clients are accessable within a matter of seconds.
I was wondering what the best way to release an update to all these clients at once would be. Automation would be great as their are 23+ locations to deploy the system to, each of which is used on a very regular basis. Because of this, when deploying, I need to display a 'updating' page so that the clients don't try access the system while the update is partially complete.
Any thoughts on what would be the best solution
EDIT: Found FileSyncTask which allows me to rsync with Phing. Going to use that.
There's also a case here for maintaining a "master" code repository (in SVN, CVS or maybe GIT). This isn't your standard "keep editions of your code in the repo and allow roll backs"... this repo holds your current production code (only). Once an update is ready you check the working updated code into the master repo. All of your servers check the repo on a scheduled bases to see if it's changed, downloading new code if a change is found. That check process could even include turning on the maintenance.php file (that symcbean suggested) before starting the repo download and removing the file once the download is complete.
At the company I work for, we work with huge web-based systems which are both Java and PHP. For all systems we have our development environments and production environments.
This company has over 200 developers, so I guess you can imagine the size of the products we develop.
What we have done is use ANT and RPM build archives for creating deployment packages. This is done quite easily. I haven't done this myself, but might be worth for you to look into.
Because we use Linux systems we can easily deploy RPM packages, the setup scripts within a RPM package can make sure everything gets to the correct place. Also you get a more proper version handling and release process.
Hope this helped you.
Br,
Paul
There's 2 parts to this, lets deal with the simple one first:
I need to display a 'updating' page
If you need to disable the entire site while maintaining transactional integrity, and publishing a message to the users from the server being updated, then the only practical way to do this is via an auto-prepend - this needs to be configured in advance (note - I believe this can be done using a .htaccess file without having to restart the webserver for a new PHP config):
<?php
if (file_exists($_SERVER['DOCUMENT_ROOT'] . '/maintenance.php')) {
include_once($_SERVER['DOCUMENT_ROOT'] . '/maintenance.php');
exit;
}
Then just drop maintenance.php into your webroot and that file will be displayed instead of the expected file. Note that it should probably include a session_start() and auto-refresh to ensure the session is not expired. You might want to extend the above to allow a grace period where POSTs will still be processed e.g. by adding a second php file.
In terms of deploying to remote sites, I'd recommend using rsync over ssh for copying content files - which should be invoked via a controlling script which:
Applies the lock file(s) as shown above
runs rsync to replicate files
runs any database deployment script
removes the lock file(s)
If each site has a different set up then I'd recommend either managing the site specific stuff via a hierarchy of include paths, or even maintaining a comlpete image of each site locally.
C.
I'm writing a CMS on PHP+MySQL. I want it to be self-updatable (throw one click in admin panel). What are the best practices?
How to compare current version of cms and a version of the update (application itself and database). Should it just download zip archive, upzip it and overwrite files? (but what to do with files that are no longer used). How to check if an update is downloaded correctly? Also it supports modules and I want this modules to be downloadable from the admin panel of cms.
And how should I update MySQL tables?
Keep your code in a separate location from configuration and otherwise variable files (uploaded images, cache files, etc.)
Keep the modules separate from the main code as well.
Make sure your code has file system permissions to change itself (use SuPHP for example).
If you do these, simplest would be to completely download the new version (no incremental patches), and unzip it to a directory adjacent to the one containing the current version. Because there won't be variable files inside the code directory, you can just remove or rename the old one and rename the new one to replace it.
You can keep the version number in a global constant in the code.
As for MySQL, there's no other way than making an upgrade script for every version that changes the DB layout. Even automatic solutions to change the table definition can't know how to update the existing data.
A slightly more experimental solution could be to use something like the phpsvnclient library.
With features:
List all files in a given SVN repository directory
Retrieve a given revision of a file
Retrieve the log of changes made in a repository or in a given file between two revisions
Get the repository latest revision
This way you can see if there are new files, removed files or updated files and only change those in your local application.
I recon this will be a little harder to implement, but the benefit would probably be that it is easier and quicker to add updates to your CMS.
You have two scenarios to deal with:
The web server can write to files.
The web server can not write to files.
This just dictates if you will be decompressing a ZIP file or using FTP to update the files. In ether case, your first step is to take a dump of the database and a backup of the existing files, so that the user can roll back if something goes horribly wrong. As others have said, its important to keep anything that the user will likely customize out of the scope of the update. Wordpress does this nicely. If a user has made changes to core logic code, they are likely smart enough to resolve any merge conflicts on their own (and smart enough to know that a one click upgrade is probably going to lose their modifications).
Your second step is to make sure that your script doesn't die if the browser is closed. This is a process that really should not be interrupted. You could accomplish this via ignore_user_abort(true);, or some other means. Or, if you like, allow the user to check a box that says "Keep going even if I get disconnected". I'm assuming that you'll be handling errors internally.
Now, depending on permissions, you can either:
Compress the files to be updated to the system /tmp directory
Compress the files to be updated to a temporary file in the home directory
Then you are ready to:
Download and decompress the update en situ , or in place.
Download and decompress the update to the system's /tmp directory and use FTP to update the files in the web root
You can then:
Apply any SQL changes as needed
Ask the user if everything went OK
Roll back if things went badly
Clean up your temp directory in the system /tmp directory, or any staging files in the user's web root / home directory.
The most important aspect is making sure you can roll back changes if things went bad. The other thing to ensure is that if you use /tmp, be sure to check permissions of your staging area. 0600 should do nicely.
Take a look at how Wordpress and others do it. If your choice of licenses and their's agree, you might even be able to re-use some of that code.
Good luck with your project.
There is a SQL library called SQLOO (that I created) that attempts to solve this problem. It's a little rough still, but the basic idea is that you setup the SQL schema in PHP code and then SQLOO changes the current database schema to match the code. This allows for the SQL schema and attached PHP code to be changed together and in much smaller chunks.
http://code.google.com/p/sqloo/
http://code.google.com/p/sqloo/source/browse/#svn/trunk/example <- examples
Based on experience with a number of applications, CMS and otherwise, this is a common pattern:
Upgrades are generally one-way. It's possible to take a snapshot of full system state for a restore upon failure, but to restore usually entails losing any data/content/logs added to the system since the upgrade. Performing an incremental rollback can put data at risk if something were not converted properly (e.g. database table changes, content conversions, foreign key constraints, index creation, etc.) This is especially true if you've made customizations that rollback scripts couldn't possibly account for.
Upgrade files are packaged with some means of authentication/verification, such as md5 or sha1 hashes and/or digital signature to ensure it came from a trusted source and was not tampered. This is particularly important for automated upgrade processes. Suppose a hacker exploited a vulnerability and told it to upgrade from a rogue source.
Application should be in an offline mode during the upgrade.
Application should perform a self-check after an upgrade.
I agree with Bart van Heukelom's answer, it's the most usual way of doing it.
The only other option would be to turn your CMS into a bunch of remote Web Services/scripts and external CSS/JS files that you host in one location only.
Then everyone using your CMS would connect to your central "CMS server" and all that would be on their (calling) server is a bunch of scripts to call your Web Services/scripts that do all the processing and output. If you went down this route you'd need to identify/authenticate each request so that you returned the corresponding data for the given CMS user.
At my company we have a group of 8 web developers for our business web site (entirely written in PHP, but that shouldn't matter). Everyone in the group is working on different projects at the same time and whenever they're done with their task, they immediately deploy it (cause business is moving fast these days).
Currently the development happens on one shared server with all developers working on the same code base (using RCS to "lock" files away from others). When deployment is due, the changed files are copied over to a "staging" server and then a sync script uploads the files to our main webserver from where it is distributed over to the other 9 servers.
Quite happily, the web dev team asked us for help in order to improve the process (after us complaining for a while) and now our idea for setting up their dev environment is as follows:
A dev server with virtual directories, so that everybody has their own codebase,
SVN (or any other VCS) to keep track of changes
a central server for testing holding the latest checked in code
The question is now: How do we manage to deploy the changed files on to the server without accidentaly uploading bugs from other projects? My first idea was to simply export the latest revision from the repository, but that would not give full control over the files.
How do you manage such a situation? What kind of deployment scripts do you have in action?
(As a special challenge: the website has organically grown over the last 10 years, so the projects are not split up in small chunks, but files for one specific feature are spread all over the directory tree.)
Cassy - you obviously have a long way to go before you'll get your source code management entirely in order, but it sounds like you are on your way!
Having individual sandboxes will definitely help on things. Next then make sure that the website is ALWAYS just a clean checkout of a particular revision, tag or branch from subversion.
We use git, but we have a similar setup. We tag a particular version with a version number (in git we also get to add a description to the tag; good for release notes!) and then we have a script that anyone with access to "do a release" can run that takes two parameters -- which system is going to be updated (the datacenter and if we're updating the test or the production server) and then the version number (the tag).
The script uses sudo to then run the release script in a shared account. It does a checkout of the relevant version, minimizes javascript and CSS1, pushes the code to the relevant servers for the environment and then restarts what needs to be restarted. The last line of the release script connects to one of the webservers and tails the error log.
On our websites we include an html comment at the bottom of each page with the current server name and the version -- makes it easy to see "What's running right now?"
1 and a bunch of other housekeeping tasks like that...
You should consider using branching and merging for individual projects (on the same codebase), if they make huge changes to the shared codebase.
we usually have a local dev enviroment for testing (meaning, webserver locally) for testing the uncommited code (you don't want to commit non functioning code at all), but that dev enviroment could even be on a separeate server using shared folders.
however, committed code, should be deployed to a staging server for testing before putting it in production.
You can probably use Capistrano even though is more for ruby there are some articles that describe how to use it for PHP
I think Phing can be use with CVS but not with SVN (at least that what I last read)
There are also some project around that mimic Capistrano but written in PHP.
Otherwise there is also a custom made solution :
tag files you want to deploy.
checkout files using the tag in a
specific directory
symlink the directory to the current
document root (easy to rollback to
the previous version)
Naturally check out SVN for the repository, Trac to track things, and Apache Ant to deploy.
The basic process is managing in Subversion, tracking the repositroy and developers in Trac and using Ant deployment scripts to push your site out with the settings needed. Ant allows you to easily deploy a project to a specific location. (Dev/test/prod) etc.
You need to look at:
Continuous Integration
Running unit tests on check-in of code to check it is bug free
Potentially rejecting code if it contains a bug
Having nightly builds
Releasing only the last build that was bug free
You may not get to a perfect solution, especially not at first, but the more you use your chosen solution, the more comfortable everyone will get and be able to make suggestions on improving it.
We check for the stability with ant, every night. And use ant script to deploy. It is very easy to configure and use.
I gave a similar answer yesterday to another question. Basically you can work in branches and integrate before going live.
The biggest thing you will have to get your head round is that you are dealing with changes to files, rather than individual files. Once you have branches there isn't really a current version there are just versions with different changes in.