What are best practices for self-updating PHP+MySQL applications?

What are best practices for self-updating PHP+MySQL applications? - php

It is pretty standard practice now for desktop applications to be self-updating. On the Mac, every non-Apple program that uses Sparkle in my book is an instant win. For Windows developers, this has already been discussed at length. I have not yet found information on self-updating web applications, and I hope you can help.
I am building a web application that is meant to be installed like Wordpress or Drupal - unzip it in a directory, hit some install page, and it's ready to go. In order to have broad server compatibility, I've been asked to use PHP and MySQL -- is that **MP? In any event, it has to be broadly cross-platform. For context, this is basically a unified web messaging application for small businesses. It's not another CMS platform, think webmail.
I want to know about self-updating web applications. First of all, (1) is this a bad idea? As of Wordpress 2.7 the automatic update is a single button, which seems easy, and yet I can imagine so many ways this could go terribly, terribly wrong. Also, isn't the idea that the web files are writable by the web process a security hole?
(2) Is it worth the development time? There are probably millions of WP installs in the world, so it's probably worth the time it took the WP team to make it easy, saving millions of man hours worldwide. I can only imagine a few thousand installs of my software -- is building self-upgrade worth the time investment, or can I assume that users sophisticated enough to download and install web software in the first place could go through an upgrade checklist?
If it's not a security disaster or waste of time, then (3) I'm looking for suggestions from anyone who has done it before. Do you keep a version table in your database? How do you manage DB upgrades? What method do you use for rolling back a partial upgrade in the context of a self-updating web application? Did using an ORM layer make it easier or harder? Do you keep a delta of version changes or do you just blow out the whole thing every time?
I appreciate your thoughts on this.

Frankly, it really does depend on your userbase. There are tons of PHP applications that don't automatically upgrade themselves. Their users are either technical enough to handle the upgrade process, or just don't upgrade.
I purpose two steps:
1) Seriously ask yourself what your users are likely to really need. Will self-updating provide enough of a boost to adoption to justify the additional work? If you're confident the answer is yes, just do it.
Since you're asking here, I'd guess that you don't know yet. In that case, I purpose step 2:
2) Release version 1.0 without the feature. Wait for user feedback. Your users may immediately cry for a simpler upgrade process, in which case you should prioritize it. Alternately, you may find that your users are much more concerned with some other feature.
Guessing at what your users want without asking them is a good way to waste a lot of development time on things people don't actually need.

I've been thinking about this lately in regards to database schema changes. At the moment I'm digging into WordPress to see how they've handled database changes between revisions. Here's what I've found so far:
$wp_db_version is loaded from wp-includes/version.php. This variable corresponds to a Subversion revision number, and is updated when wp-admin/includes/schema.php is changed. (Possibly through a hook? I'm not sure.) When wp-admin/admin.php is loaded, the WordPress option named db_version is read from the database. If this number is not equal to $wp_db_version, wp-admin/upgrade.php is loaded.
wp-admin/includes/upgrade.php includes a function called dbDelta(). dbDelta() scans $wp_queries (a string of SQL queries that will create the most recent database schema from scratch) and compares it to the schema in the database, altering the tables as necessary so that the schema is brought up-to-date.
upgrade.php then runs a function called upgrade_all() which runs specific upgrade_NNN() functions if $wp_db_version is less than target values. (ie. upgrade_250(), the WordPress 2.5.0 upgrade, will be run if the database version is less than 7499.) Each of these functions run their own data migration and population procedures, some of which are called during the initial database setup script. Nicely cuts down on duplicate code.
So, that's one way to do it.

Yes it would be a security feature if PHP went and overwrote its files from some place on the internet with no warning. There's no guarantee that the server is connecting correctly to your update server (it might download someone code crafted by someone else if DNS poisoning occured) - giving someone else access to your client's data. Therefore digital signing would be important.
The user could control updates by setting permissions on the web directory so that PHP only has read access to the files - this procedure could simply be documented with your program.
One question remains (I really don't know the answer to): can PHP overwrite files if it's currently using them (e.g. if the update.php file itself needed to be updated)? Worth testing.

I suppose you've already ruled this out, but you could host it as a service. (Think wordpress.com)

I'd suggest that you package your application with pear and set up a channel. Your users can then upgrade the application through a standard interface (pear). It's not entirely automatic (unless the users have some kind of automation running on top of pear), but it's standard, so any sysadmin can maintain it.

I think your best option is an update checking mechanism that will alert the administrator when there are update(s).
As you mention, there are a number of potential security problems. Due to those alone, I would suggest not doing this. Instead, try creating a fairly smart upgrading script.

Just my 2 cents: I'd consider an automatically self updating application within my CMS as a security hole, so if you decide to code this feature, you should consider to implement different levels of this behavior:
Automatically update
Check for updates and notify
Disable

Related

eval Remote Code From My Server

I am building a site platform similar to Wordpress that allows my users to download a .zip file, upload it onto their server, and be good to go.
I know everyone says eval() is evil - but the code will not include any user or variable input.
The benefit here is that updates will occur automatically. I can just change the code being grabbed on my server.
My clients using the code will have pretty low traffic sites - so I'm not worried about overloading their server. Most of the heavy lifting will be done by us.
Here's the basic code concept:
$code=file_get_contents("http://myserver.com/code.txt");
eval($code);
Is this a realistic option? What security holes do I need to worry about?

It's "realistic" in the sense that it will work, but at the same time it sounds like a sysadmin's nightmare. If you are meaning to have a client download and execute remote code every time a request is made, your clients are at your whim if the master server goes down or is unreachable at any point. It's now a mission-critical service you'll have to keep running forever for as long as your clients need it.
You list automatic updates as a benefit, but is it? In nearly every software platform, the features users depend on can change over time; function signatures can change, or functionality may be dropped entirely in favour of a more refined alternative. Since it sounds like you're writing some form of framework, can you guarantee that future versions will always be backwards-compatible? Not everyone is using the cutting-edge version of a piece of software in production for a reason -- they want what they are using to be stable. If an upgraded version of your platform rolls out overnight, and it breaks some custom code written by the client (at least one of them will try doing this, even if you don't want them to) or even, old, standard functionality that was deprecated but still worked with the previous release, how are they going to roll it back to a version that works?
It just sounds like something that will eventually incur a ton of technical debt.

Does provisioning have a place in the Drupal world?

My team is relatively new to Drupal. One thing we have struggled to understand is how to work with it from a DevOps point of view. I realize this is too large a subject for one question so I have a more specific question that gets at the heart of the matter.
How does one provision a Drupal instance? By "provision", I mean create a provisioning script that builds my CMS (we're only using Drupal for that purpose) starting with a clean virtual machine with only OS and web server. The script would install and configure Drupal and its modules and connect to an existing database containing my content. Or perhaps I can even have it add my content to Drupal instance with an empty database. I'm just not sure what makes sense.
What I am trying to avoid is the uncertainty and non-reproducability that comes with doing everything interactively via Drupal's UI. I realize that Drupal has lots of techniques for exporting various things but there doesn't appear to be any coherent overall picture. Every bit of advice is of the form, "If you want to do (some specific thing), this is how you might do it." Or, even worse, "This worked for me." Neither of these things gives me much confidence or, more importantly, gives decent "best practices" advice that tells us what Drupal's designers intended.
There are some Drupal "best practices" articles but they don't go much beyond advice such as, "Do a backup before changing anything." I need more useful advice.

Yes, in Drupal 7 the Features module allow configuration export to code. Then on deployment, a single command (executable from CLI using Drush) can be used to sync. the in-database configuration from the code.
The features is now native in Drupal 8 and is called configuration management.

Drupal is a Database driven application and in that case snapshot of db has to be released as well.
Pre-deployment step would be to create db and dump the snapshot.
You can run post-deployment scripts to configure environment specific configuration.

Planning Ahead For Website Upgrades

I've noticed while developing my first site, that the smallest changes in database columns, connection options, and various other components cause the website to fail until I correct the problem which may or may not require a lot of time (woe is me). I'm wondering what steps I can take now in order to prevent these headaches, as I've delayed the website launch in order to keep upgrading. Once I'm done implementing the changes I would like to see, I know I won't truly be done, but at some point I have to move on to the next stage.
Yes, I know there is probably no one good solution, and ultimately a self-correcting design is more trouble than its worth at this point. But if any
grey beards have any tips that they could offer based on their own experiences working with WebDev, particularly with LAMP stacks, I would greatly appreciate it.
Specifically, I would like to know what to look out for when modifying databases and website code after customer information is in active use, in order to prevent errors, and how to roll out the changes.
EDIT 1:
Yes, so the answer seems to be that I need to copy the live site to my testing environment. I'm looking going to some of the already suggested development solutions. Regular backups are crucial, but I can just see inserting new columns and modifying queries as a cause for mis-ordered tables and such. "That's where being a good programmer and testing diligently comes in handy", someone in the corner said. As I look into the proposed solutions, I welcome all others in the meantime. A real-time copy of the 'live-site' would be nice to create on the fly during testing.

The above answers are all very valid and in the end, they represent your target solution.
In the meantime, you may already do a lot for your website, even with a gradual migration to those practices.
In order to do so, I suggest you to install PHPUnit (or whatever Unit comes with the web languages you use). There are also "graphical" versions of it, like VisualPHPUnit, if that's more of your taste.
These tools are not the permanent solution. You should actually aim adding them to your permanent solution, that is setting up development server etc.
However, even as interim solution they help you reach a fairly stable degree of quality for your software components and avoid 80-90% of the surprises that come with coding on a live server.
You can develop your code in a separate directory and test it before you move it into production. You can create mock objects which your code under test may freely interact with, without fear of repercussions. Your tests may load their own alternate configuration so they work on a second, copy database.
Moving even further, you may include your website into tests itself. There are several applications like Selenium that allow you both to automate and test your production website, so that you can reliably know that your latest changes did not negatively affect your website semantics.
In short, while you should certainly aim at getting a proper development environment running, you can do something very good even today, with few hours of study.

Start using some (maybe simplified) sort of release management:
Maintain a development environment. Either locally, or in a second .htaccess-protected folder online. Make it use it´s own db.
Test every change in this dev env. Once you are satisfied, move them to productive env.
Use git or svn (svn might be simpler to learn, but opinions vary. Checkout "tortoise") to save a snapshot ("commit") of every change you make. That way you can diff throu your latest commits if anything goes wrong.
Maintain regular backups.

Beta testing new website features with live data and real customers

The main goal of this question is to determine the pitfalls of deploying a slightly modified version of a website alongside a live website.
This secondary website would be pulling from the same database as the live but would have modified features for beta testers.
The end goal is to allow certain customers test our new features with their data.
So:
They don't have to do things twice by going to a copied version of the site.
They are using familiar data sets
Another possibility would be setting a flag per user account to allow them to see certain features but this would require a lot of extra work. Also, once it is ready for release, we would have to remove all the extra checks.
I am having a hard time seeing the disadvantages of this, but I know there has to be some glaring at me. Thank you for any assistance.
Git version controlled, Capistrano Deployment workflow, Cakephp framework, MySql
We currently have local and testing servers that are separate from our production servers.
EDIT 12-20-2012 10:30am EST
Based on some comments and one answer I have an update based on feedback.
Meticulous internal testing should be done before 'beta'/user feedback testing. (which we already do)
If we take these precautions and the code base seems solid, the risk in deploying alongside the production server could be manageable. We are working within a framework here so the likelihood of mass deletion and bad sql is relatively low.
All that being said, I would rather not take this approach because it still has inherent risk. Does anyone do beta testing with live server data another way?

It depends...
If this is a beta to get customer feedback, on a product that has been fully tested and is known to be stable, the risks are relatively manageable (though see points below). This is the way Google defines "beta".
If "beta" means code complete, and sorta-kinda tested, but who knows what bugs are in there, you risk corrupting your live database. No matter how clever your backup strategy, if something goes wrong, the best case scenario is that the beta users face data loss or corruption; the worst case is that all your users lose data (I've seen broken "where" clauses in delete or update statements do all kinds of entertaining damage).
Another issue to consider is whether the database is backward and forward compatible between versions - can you migrate your beta users back to the mainstream version if they don't like the upgrade, or if something goes wrong? This is a far bigger deal if "beta" means "untested", of course.
In general, it's a lot easier to deal with one-way compatibility - allowing users to upgrade, but not downgrade - another strong argument for "beta" to mean "user feedback"...

How do I manage development and deployment of my website as part of a group?

I've been reading this site here and there and appears as though you guys have a wonderful community.
As for my background, I am a sophomore at university familiar with SQL, C++, Visual Basic, and some PHP. One of my school projects for the summer term involves building a web application that allows users to log in and schedule specific timeslots over the internet. Typically, I have been the only person working on a project, but in this case I will be part of a group. Since we're all relatively new to working as a team, I would like to set up source control for my group so we're not all working off a shared drive somewhere. Additionally, I would like to make sure that all of us are able to test our changes in some sort of development server that hosts an instance of our website.
My actual question is in regards to the toolset that we should use to achieve this. As a group, we are most familiar with PHP and MySQL so we'll end up using that for the code and database. I have used SVN in the past for my own personal use, but my group members aren't very familiar with source control. We'll probably stick with something simple like Excel for the project management and bug tracking side of things. Ideally, we would like the tools to be free and open source.
How as a group should we manage the construction of the actual application? Are there methods out there that I can use that will allow any one of us to move the files to our development machine and keep track of who did it so we don't end up overwriting each other's changes? If this is not possible, one of us will write some scripts to handle it - but I would like to avoid building basically a separate software application that will only be used to manage our project. Another issue I foresee will be updating the database running on the development machine. Are there any standardised methods that we can use to manage our SQL scripts among the four of us?
I do not expect a really long winded answer here (after all, this is our project!), but any helpful tips would be greatly appreciated. Once I return from holiday I am looking forward to getting started! Thanks!

I recommend your group use source control to synchronize your code. You can either setup your own server or just use a free provider such as github, Google code, or bitbucket.
If you do decide to use one of these sites, a nice feature is that they provide free issue tracking as well, so you can use that instead of Excel.
The best way to manage the SQL scripts is to break them out into separate files and place them under source control as well. You can either create .sql files, or use a tool to manage these changes - for example, have a look at Ruby on Rails' Migrations. This may take some effort to setup, but you'll thank yourself later if you are working on a project of any size...

Draw up a plan for how you would do it if it were just you.
Split the plan up into tasks that take around 3-4 hours to complete. Make sure each task has a measurable objective.
Divy out the tasks. Try to sort them if possible to maximize developer efficiency.
Teach them to use source control. Explain to them that they will use this (maybe not svn, but SOMETHING) in a few years, so they might as well learn how now. Additionally, this will help in every group project they do down the road.
Make a script for building and running your tests. Also script your deployment. This will ensure you have the same mechanism going to live as you do going to test, which increases the number of defects found in testing. (This is as opposed to letting them exist but not found in testing.)
You mentioned updating the development database. It would be entirely reasonable to dump the development database often with a refresh from live. You may want to make 3 environments. Development, staging, and production. The development database would contain fabricated test data. The staging database would have a copy of live (recent to within a few days maybe.) And of course live is live.
Excel works fine as a "bug database." Consider putting it in source control that you manipulate and commit. This will give you a good idea of what happened over time, and you can correct mistakes quicker.

As far as source/version control, I would recommend subversion. There are some GUI tools they might use, or even webDAV to access the SVN. This will allow users to edit files collaboratively and also give you details as to who edited what, when, and why... SVN will also do a pretty good job at merging files that happen to be saved at the same time.
It's not the easiest concept to wrap your head around, but its not very complicated once you get running.
I suggest having everyone read the first chapter from: http://svnbook.red-bean.com/en/1.5/
and they should have a good idea of what's happening.
I am also curious to see what people have to say about the database

How as a group should we manage the construction of the actual application? Are there methods out there that I can use that will allow any one of us to move the files to our development machine and keep track of who did it so we don't end up overwriting each other's changes?
It sounds like you're looking for build management. In the case of PHP, a true "build" is as simple as a collection of source files because the language is interpreted; there is no compilation.
It just so happens that I am one of the developers for BuildMaster, a tool which basically solves every problem you have listed in your question... and it also sounds like it would be free in your case under the Community Edition license. I'll try to address some of your individual pain points and how BuildMaster could be used as a solution.
Source Control
As suggested by others, you must use it. The trick when it comes to deployment is to set up some form of continuous integration so that every time someone checks in, a new "build" is created. In BuildMaster, you can set this up for any source control provider you want.
Issue/Bug Tracking
Excel will work, but it's not an optimal solution. There are plenty of free issue tracking tools you can use to manage your bugs and features. With BuildMaster, you can link your bugs and features list with the application by their release number so you could view them within the tool at any time. It can also modify issue statuses and add descriptions automatically if you want.
Deployments
Using BuildMaster, you can create automated deployment plans for your development environment, e.g.:
Get Latest Source Code
Create Artifact
Copy Files To Development Machine
Deploy Configuration Files
Update Database
The best part is, once you set these up for other environments (glowcoder's point #6), pushing all of your code and database updates is as simple as clicking a button.
Another issue I foresee will be updating the database running on the development machine. Are there any standardised methods that we can use to manage our SQL scripts among the four of us?
Database Updates
Not surprisingly, BuildMaster handles these as well by using the change scripts module. When a member of your team creates a script (e.g. ALTER TABLE ADD [Blah] INT NOT NULL) he can upload it into BuildMaster, then run it on any environment you have created.
The best part is that you can add a step in your automated deployment and never worry about it again. As Justin mentions, you can use .sql files for your object code (stored procedures, views, triggers, etc.) and have those executed on every build since they are essentially code anyway. You can keep those in source control.
Configuration Files
One aspect of all this you may have neglected (but will inevitably run into) is dealing with configuration files. With PHP, you may have an .htaccess file, a php.ini file, a prepend.php, or roll your own custom config file. Since by definition configuration files need to change between your personal machine and the development machine, grabbing them from source control wouldn't necessary work without some bit of hacking a la:
if (DEV) {
// do one thing
}
else if (PROD) {
// do another
}
With BuildMaster, you can templatize your configuration files and associate them with an environment so they can be deployed automatically. It will also maintain a history of changes for you.
Automated Testing
If you want the full ALM effect, you can automatically unit test your code during an automated build, and notify you if anything fails so you know as soon as possible that something is broken.
Apologies for the "long winded" response, but I feel like you're already ahead of the game by observing the problems you might run into in the future and really believe BuildMaster will make all of this deployment stuff simple for your team so you can focus on the fun part, coding!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.