Advantages of PHAR archives in PHP

Advantages of PHAR archives in PHP - php

PHP 5.3 has a new feature called PHAR similar to JAR in JAVA. It's basically a archive of PHP files. What are its advantages? I can't understand how they can be helpful in the web scenario.
Any other use other than "ease of deployment" - deploy an entire application by just copying one file

There are tremendous benefits for open source projects (in no particular order).
Easier deployment means easier adoption. Imagine: You install a CMS, forum, or blog system on your website by dragging it into your FTP client. That's it.
Easier deployment means easier security. Updating to the latest version of a software package will be much less complicated if you have only one file to worry about.
Faster deployment. If your webhost doesn't give you shell access, you don't need to unzip before uploading, which cuts out per-file transfer overhead.
Innate compartmentalization. Files that are part of the package are clearly distinguished from additions or customizations. You know you can easily replace the archive but you need to backup your config and custom templates (and they aren't all mixed together).
Easier libraries. You don't need to figure out how to use the PEAR installer, or find out whether this or that library has a nested directory structure, or whether you have to include X, Y, or Z (in that order?). Just upload, include archive, start coding.
Easier to maintain. Not sure whether updating a library will break your application? Just replace it. Broken? Revert one file. You don't even need to touch your application.
What you see is what you get. Chances are, someone is not going to go to the trouble of fudging with an archive, so if you see one installed on a system you maintain, you can be fairly confident that it doesn't have someone's subtly buggy random hacks thrown in. And a hash can quickly tell you what version it is or whether it's been changed.
Don't poo-poo making it easier to deploy things. It won't make any difference for homegrown SaaS, but for anyone shipping or installing PHP software packages it's a game-changer.

In theory it should also improve loading speed. If you have alot of files which need to be included, replacing it with single include will save you time on file opening operations.

In my experience, loosely packaged PHP source files sitting in a production environment invite tinkering with live code when a fix is needed. Deploying in a .phar file discourages this behaviour and helps reinforce better practices, i.e. build and test in a local environment, then deploy to production.

The advantage is mainly ease of deployment. You deploy an entire application by just copying one file.
Libraries can also be used without being expanded.

Any tool that works on a single file "suddenly" works with all files of an application at once.
E.g. transport: You can upload the entire application through a single input/file element without additional steps.
E.g. signing an application: checksum/sign the file -> checksum/signature for the whole application.
...

Related

We use SVN in our company. My whole team wants us to deploy by copying everything, everytime

The current deployment process entails that we only move the differences between the current SVN revision and the SVN revision of the last deployment, which works flawlessly in my project.
Other projects complain from this method and want to deploy everything, move every file from the development environment to other environments like testing, staging, or live.
The Java team lead and the PHP team lead agree on this. I am a PHP developer and find this way inefficient, and useless. We don't have to use this much time and bandwidth when we decide to deploy to live by copying everything.
When we deploy using SVN differences, the server admins save a compressed file containing all of the modified files that relates to the current deployment, so it's easier to revert back when we want to.
I just want some good reasons to present to the manager of the company, who is technically aware of the problems with the deployment process, just to let him understand that when something gets messed up after it's deployed, it's because the developers didn't do it right, not because we have to deploy everything in order for things to work. I want to convince him that deploying using SVN is way better than deploying everything (primitive copy/paste) without relying at all on SVN.

(I had to use an aswer because there was not enough space in the comments)
Interesting question.
I guess that Java developers (as I am) are just used to deploy the whole application each time (and the same probably goes for any type of language that doesn't run from sources, as PHP does instead).
In a former company where I was employed, that was the way to release an update, and since the application WAR was more than one hundred megabytes, that always took a couple of hours for the whole process, even when just a couple of classes had changed.
In the company where I'm employed right now, instead, they put together a system that works with differences, so in a way similar to what you described (although Java class files have to be wholly replaced, of course).
I think that's a way better approach, much easier and lightweight to cope with.
Since PHP relies on source files even at runtime, I think that a difference-based approach like what you already have is better. So +1 for your current approach.
So, I think that faster deployment, easier backup and the other things you mention in your question are just enough good reasons for keeping the current approach.
Of course, it is important that a fully functional version can be produced and deployed from SVN at any time and that it can replace the corresponding delta-based version on the server without any fault (but I'm sure you already have that).
About the people that have opinions against yours: ask them to prove (with real world examples) where your approach is faulty.
(Maybe this would find a better fit on programmers.stackexchange.com?

A sketch of our deployment script (we are using Git, using Subversion doesn't make any difference for the algorithm, only the actual commands are different). We are using a working copy (a local repository with Git) and another directory (named export) where the next version of the live code is prepared (kind of staging if you prefer):
update the local copy of the code (it's git pull for Git or svn update for Subversion);
cleanup the export directory then copy the code into it; we are using rsync instead of cp because it's easier to provide it a list of directories and files it should ignore (.git, .svn a.s.o.);
apply any needed configuration settings to the files from the export directory; f.e. we don't keep the sensitive data (users, passwords) into the code stored in the repository but placeholder values; this step replaces the placeholder values with the actual users, passwords, keys a.s.o.
do other needed fixups; f.e. we use some symlinks to points to directories that contain data uploaded by the users; in the code repositories we have empty directories for them; on the fixup phase these directories are removed from export then symlinks having the same names are created; the symlinks points to permanent directories, external to the web root, where the data is stored; also, we use symlinks to 3rd party libraries - they are not stored in the repo and their deployment follows a different pattern (they are usually frozen to the version they were when the project started, to avoid incompatibilities);
use rsync with the appropriate parameters (--archive and others) to make the live version of the code identical with the version just prepared in the local export directory.

The problem you are most likely experiencing is not because of "copying everything everytime", but actually due to releasing individual file fixes as a release instead of a an entire build as a release. The best-practice is to capture build artifacts for the entire application or application component, and once you've gotten to that point, it is irrelevant whether or not you are copying all the files, or just the files that have changed since that is now an implementation detail of your deployment software (whether that's a rudimentary file copy, FTP, rsync, or an enterprise-level tool like my company's product BuildMaster).

Performance Issues with Zipped Archives in PHP

First Some Background
I'm planning out the architecture for a new PHP web application and trying to make it as easy as possible to install. As such, I don't care what web server the end user is running so long as they have access to PHP (setting my requirement at PHP5).
But the app will need some kind of database support. Rather than working with MySQL, I decided to go with an embedded solution. A few friends recommended SQLite - and I might still go that direction - but I'm hesitant since it needs additional modules in PHP to work.
Remember, the aim is easy of installation ... most lay users won't know what PHP modules their server has or even how to find their php.ini file, let alone enable additional tools.
My Current Objective
So my current leaning is to go with a filesystem-based data store. The "database" would be a folder, each "table" would be a specific subfolder, and each "row" would be a file within that subfolder. For example:
/public_html
/application
/database
/table
1.data
2.data
/table2
1.data
2.data
There would be other files in the database as well to define schema requirements, relationships, etc. But this is the basic structure I'm leaning towards.
I've been pretty happy with the way Microsoft built their Open Office XML file format (.docx/.xlsx/etc). Each file is really a ZIP archive of a set of XML files that define the document.
It's clean, easy to parse, and easy to understand.
I'd like to actually set up my directory structure so that /database is really a ZIP archive that resides on the server - a single, portable file.
But as the data store grows in size, won't this begin to affect performance on the server? Will PHP need to read the entire archive in to memory to extract it and read its composite files?
What alternatives could I use to implement this kind of file structure but still make it as portable as possible?

Sqlite is enabled by default since PHP5 so most all PHP5 users should have it.
I think there will be tons of problems with the zip approach, for example adding a file to a relatively large zip archive is very time consuming. I think there will be horrible concurrency and locking issues.
Reading zip files requires a php extension anyway, unless you went with a pure PHP solution. The downside is most php solutions WILL want to read the whole zip into memory, and will also be way slower than something that is written in C and compiled like the zip extension in PHP.
I'd choose another approach, or make SQLite/MySQL a requirement. If you use PDO for PHP, then you can allow the user to choose SQLite or MySQL and your code is no different as far as issuing queries. I think 99%+ of webhosts out there support MySQL anyway.

Using a real database will also affect your performance. It's worth loading the extra modules (and most PHP installations have at least the mysql module and probably sqlite as well) for the fact that those modules are written in C and run much faster than PHP, and have been optimized for speed. Using sqlite will help keep your web app portable, if you're willing to deal with sqlite BS.

Zip archives are great for data exchange. They aren't great for fast access, though, and they're awful for rewriting content. Both of these are extremely important for a database used by a web application.
Your proposed solution also has some specific performance issues -- the list of files in a zip archive is internally stored as a "flat" list, so accessing a file by name takes O(n) time relative to the size of the archive.

Understanding Git, working with a php ide and ftp

In my past projects I have failed to invest time in setting up my workspace correctly.
For backups + version control I simply copy my webdirectory files to a separate folder on my hard-disk, if I find I have made a mistake somewhere, I reload a previous backup and start again from that point, often wasting precious time repeating work I have already done.
My IDE has no ftp functionality, I have to manually copy the files from my desktop to my webserver, constantly overwriting files and duplicating.
I am certain there is a better, more efficient way of doing the above. I have read about Git for version control and know I should be using it.
What is the suggested way to work efficiently (OS is Windows) with an IDE, version control and FTP that will save me sweat, tears and data loss?
EDIT: I am currently using netbeans IDE

Version control - when used properly - is much better than simply copying files around. In a single user environment, version control allows you to have fine-grain control over your versioning, often in a way that is more space efficient than full file copies because the old versions are stored as diffs.
In a multi-developer environment, version control provides the same benefits, but you also have to consider the case when multiple people edit the same file at the same time. In the simplest case, two people edit the same file in different places and you can safely take both modifications in sequence. In more complex cases, two or more developers make changes to the same region of code and it needs to be manually merged.
Git is different from traditional revision control systems in that it was designed to be used in a distributed fashion. That is - each developer has their own repository, and merges happen when they need to happen. You can have an authoritative central server if you want one, but you don't need to make very commit to that server all the time. This makes git particularly suitable to individual or remote development. Git doesn't require a heavy server on your desktop, just one small binary.
There are a lot of tutorials out there on git. Some of them are:
Getting into SCM with git
github's Introduciton to Git
A Visual Git Reference
Git Tutorial

Here's how my setup works - this may or may not be feasible for you, but I hope it helps somehow. I no longer use FTP for anything.
You should get a DVCS setup, and which one you choose is entirely up to you. Any of them will be better than manually copying or not having anything at all. I suggest taking a look at both Git and Mercurial and making a decision from there. In my opinion, if you're using Windows primarily, Mercurial might be a better choice. If not, I'd say go for Git. You could always try both!
I setup a gitolite server which acts as a central repository for all of my git projects. It is great to have a remote central repository because your entire codebase is backed up in the case that your workstation should fail - and on top of that, you can use it to do some code coordination to move your files around (and stop using FTP).
Once that is setup, I start the process of pulling and pushing to it - you talk about your IDE here, and there are a lot of Git IDE options, but I just use the command line - I just find it faster. Again, up to you on how to incorporate that.
In terms of web development, I setup my gitolite server to use git-hooks to propagate changes to my servers. They all have the git client installed, so the codebase is usually in the webroot. When a change is pushed from my workstation to the gitolite server, it fires off some commands that automatically updates the production server. Not only is it convenient, but it also puts a copy of the codebase and its versions on your servers as well. Be careful with this though; you need to make sure you aren't sharing your /.git directory.
The basic idea is to improve your development ecosystem. My Git setup is perfect for that. You might need to assess your entire workflow and make adjustments based on your needs.
Here's the git plugin for NetBeans. I suggest using the command line when you get started, though.

PHP Code Deployment Tips

In the past, I have been developing in a very amateurish fashion, meaning I had a local machine where I developed and tested code and a production machine to which I copied the code when I was done. Recently I modified this slightly to where I developed locally, checked the code into SVN and then updated the production machine through SVN.
Now I would like to start a new project and improve my workflow. Ideally I had the following in mind:
Have one or more local dev environments
Develop and test on local machine(s)
Use SVN (or Git) as code repository
Use a build tool to set up new environments (either dev, staging or production) and deploy code
Since I am not very familiar with this process, I am looking for suggestions on how to best set this idea up and the tools to use, especially when it comes to the build tools. I was looking into Ant and Phing (possibly make), but I am so new to this that I would really like to get some guidance. Are there any good tutorials or books about PHP deployment, especially for beginners? What I am especially interested in are the following topics:
Deployment to different types of servers with different settings (e.g. dev uses different db, db passwords, PHP error reporting than production or staging).
Deployment that automatically pulls code from SVN.
Deployment that temporarily sets a "Maintenance" page for production environment.
Once I mastered the above, maybe even do some testing in the build process.
I know my question might sound quite confused... I admit, I am new to this and might be a little off the target in what I really need. That's why any help is greatly appreciated.

I would suggest making your testing deployment strategy a production-ready install-script -- since you're going to need one of those anyway eventually.
A few tips that may seem obvious to some, but are worth pointing out:
Your config file saved in your VCS should be a template, and should be named differently from the file that will eventually contain the actual settings. E.g. config-dist.php or config-sample.conf or sample/config-mysql.php or something along those lines. Otherwise you will end up accidentally checking in a server-specific configuration file over your template.
For PHP deployment, anticipate that some users will not be able to run server-side scripts through any mechanism other than the web server itself. A PHP-based installer is almost non-negotiable.
You should include a consumer-friendly update mechanism, and for that, wordpress is a great example of a project to emulate. A PHP script can (a) download the latest build, (b) use the ftp functions to update your application's files, and (c) execute an update script which makes the appropriate changes to the database, etc.
For heaven's sake don't do like [redacted] and make your users download and install separate patches for each point release. Have them download the latest (final) release which contains all the updates to date, and applies the correct ALTER TABLE functions in sequence.
Whether the files are deployed via SVN or through FTP, the install/update mechanism should be the same: get the latest files, run the update script. The updater uses the version listed in the PHP script and the version listed in the DB, and uses that knowledge to apply the appropriate DB patches in order. As for how to generate those patches, there are other questions here that you can refer to for more info.
As for the "Maintenance" page, just use the version trick mentioned above to trigger it (compare the version in the DB against the version in the PHP code). It's also useful to be able to mark a site as "down" to the public but make it visible to admins (like Joomla does), which you can trigger through database or filesystem flags.
As for automatically pulling code from SVN, I'd say you're better off with either a cron script or with commit triggers than working that into your application, since it wouldn't be relevant to end users.

This isn't exactly part of your question, but it's relevant:
If you go into distributing code intended for a wide audience, I would advise you to go with building and distributing OpenSSL-signed PHAR packages. You can distribute them over HTTP without a problem, and because they're OpenSSL-signed, you're also mitigating the risk of man-in-the-middle attacks and protecting end-users/customers/clients from someone injecting code if you want to setup an automatic or one-click update.
There's a set of tools I've contributed to in the past that work great for this, but you'll either need PHP 5.3, or you'll need PHP 5.2 with PHAR installed via PECL. https://github.com/koto/phar-util
As far as testing goes, PHPUnit is the de facto standard.

If you are interested in using Git then you should check out this build system from CodeMeme. From what you described it sounds like it would be a good fit. You can add it to any project as a submodule and with the included code you can tailor a build script that will deploy to different multiple servers in multiple environments. It uses Git to build the code for deployment but unfortunately SVN is not supported.
https://github.com/CodeMeme/Phingistrano

How to wean myself from FTP in favor of Version Control

I have heard that uploading your website with FTP is now for n00bs, but it's the only way I've known how for the 8 or so years I've been building websites. Apparently all the buzz now is using a version control system, like SVN or Git, and somehow using SSH to upload only the files that have changed (if I understand correctly). I'm wondering if someone could explain or point me to a "comprehensive" guide. What do I need to install, and will my shared host (Dreamhost) be compatible? I currently develop on a WAMP environment. I have never used version control and wouldn't know where to start. I usually develop with a framework such as CakePHP or Zend.

You've got things mixed up a bit. A version control system is used internally to keep track of your code during development. With centralized systems like SVN, you regularly upload your code to a SVN server, which keeps track of what has changed, makes sure conflicting changes are merged correctly, and keeps a history so you can roll back changes.
Decentralized or distributed version control systems eliminate the one central server, instead allowing every single copy of the code to track its own change history, and then letting you merge and combine these separate branches at will.
But once you have a complete product, you push it out to the production server any way you like. FTP is certainly one option for doing that.

For the file uploads, what you are looking for is rsync. There is a Windows wrapper for this called DeltaCopy and the DreamHost wiki has instructions.

First you'll want to decide what you want to use for version control. I hear great things about Git, but am still an SVN user myself.
Dreamhost actually lets you create SVN repositories with their webpanel, very keen there and I can't remember but I thought they had some additional really nice features to help.
I would suggest reading or skimming through at best: http://www.svnbook.org it is very comprehensive if you plan to actually use SVN over Git.

Everyone is completely missing the point. Development using a version control system is a great thing and has massive upside even for developers working on their own. The question here is about deployment using version control systems.
This is a newer and great idea, consider something like Magento which has 6,744 files in the base install, not mentioning when you start adding your own skins which usally run to around 500 files. Using version control to DEPLOY something like this saves major time uploading this many tiny files via FTP as only the modified ones are sent.
Asside from this, I've never actually tried deploying like this so I can't offer any real world experience, however there are several good articles on how to get this setup, a good one can be seen here.

Here's an wiki that should give you all the information you need on adding Subversion to Dreamhost.
http://wiki.dreamhost.com/index.php/Subversion
I've used Subversion now for my sites, and it does make it much easier. I use Aptana on my Windows machine and upload everything through that program. It allows me to compare old versions, revert to them, branch off new versions, etc...
It's a huge timesaver!

Eric Sink's articles about source control are a great place to learn about the basic concepts.
http://www.ericsink.com/scm/source_control.html

I also develop with Zend Framework and here is how I use FTP and Version control.
On my local machine I have Subversion and TortoiseSVN installed.
If I start a new project, I set up an SVN repository the way I like it (I use the trunk/branches/tags system).
I checkout an initial working copy from the trunk to a project folder in my local webroot.
I create a new project in Aptana and set the project path to my project folder on the localhost.
Aptana understands that this project is versioned and shows appropriate icons on each file. I can do many of the version control functions directly in my file tree in Aptana, no need for any shell or even Tortoise.
Once I have a stable, deployable version of my app, I create a version control tag. Then I do an export of that (unversioning it).
The exported app is then uploaded via FTP.
That's how I do it at the moment anyways, maybe it clarifies somethings. Tips on improving the procedure are welcome!

As others have said, you can set up version control locally ... or on your host. I recommend you do whatever works best for you.
You mention using Dreamhost. I support one small site there, and know that they do allow uploading via scp and sftp. This would allow you to upload your files with your password encrypted. (And you don't have to adopt a version control method if you don't want to! ;-) Scroll down the sftp page I linked to and you'll find some suggestions for scp & sftp clients.
FWIW, if you're using Windows, I've used WinSCP for years and liked it. Also, if you want full login access, I suggest PuTTY; its full download also includes command line based clients for sftp and scp.

There is no problem using ftp to upload. The only disadvantage is that the password is transferred as plain text.
It would be good to have a local version control system, that would allow you to easily see changes between versions, and quickly revert to an older version, and much more...
I don't think there is a need to install a version control system on your shared host. Only if you want to access the version control system from different sites (at home, at work, while traveling), it can be handy.

There is an awesome plugin for bzr called bzr-upload, designed exactly for your kind of use-case. bzr is very light-weight (no need to set up any repository) and super easy to start using, even if you haven't used any kind of source control before. It's a plugin for bzr and every time you make a commit on your local machine, it will s/ftp the changed files up to your web host. It doesn't push up all the version control info, just the files themselves.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.