Related
I have a few dozens of php apps that I want to dockerize. I am wondering what will be the best design for management and performance wise.
one big container with all services included (php-fpm, mysql, nginx etc)
separate containers for all services:
container-php-fpm-app1
container-nginx-app1
container-mysql-app1
container-php-fpm-app2
container-nginx-app2
container-mysql-app2
one container for service, that service hosts all apps:
container-php-fpm - for all php-fpm pools
container-nginx - for all nginx virtual hosts
container-mysql - for all databases
I understand running separate containers lets you make changes to one service without impacting another. You can run different php configurations and extensions and versions without worrying about the other services being affected. All of my apps are Wordpress based, so configuration will (or should) be consistent across the board.
For now I am leaning toward separation, however I am not sure if this is the best approach.
What do you guys think?
You should run one service in a container, that's how it's designed. So 1 is out the door.
If you look at three, you have a tight coupling between your apps. If you want to migrate to a new php-version for app1, or have a different dependency there, you're in trouble, so that's not a good one.
The standard is to do 2. A container per service.
Per docker documentation multi-service container:
It is generally recommended that you separate areas of concern by
using one service per container. That service may fork into multiple
processes (for example, Apache web server starts multiple worker
processes). It’s ok to have multiple processes, but to get the most
benefit out of Docker, avoid one container being responsible for
multiple aspects of your overall application. You can connect multiple
containers using user-defined networks and shared volumes.
Also based on their best practices:
Each container should have only one concern
Decoupling applications into multiple containers makes it much easier
to scale horizontally and reuse containers.
I would suggest using option 2 (separate containers for all services).
The most common pattern that I have seen is a separate container per application. That being said, there also value in having related containers near one another but still distinct, hence the concept of Pods used in Kubernetes.
I would recommend one container per application.
We have a pretty large Symfony 2 web application which has many different endpoints and features:
api for data from our legacy product
web components for use in our legacy product
api to our new iOS POS
api to loyalty end-user portal
web interface for loyalty end-user portal
web interface for (seperate) invoice end-user portal
big admin area with configuration for all of the above
The database layer (in Doctrine) on this is tightly coupled. Transactions from both the POS and our legacy product are used in the loyalty end-user portals and invoices are based on the same transactions. Obviously there's also many entities that are solely for specific parts of the application.
We originally decided on the single app+bundle approach for ease of programming, which has served us well in developing the whole platform. Unfortunately the main drawbacks are:
very bad performance (although things like further caching, minimizing assets etc can help, we think that having such a bloated bundle that needs to be able to handle everything and also included different 3rd-party libraries only used in specific parts of the application is slowing everything down.)
we use continuous integration and generating new builds and running all the functional tests is taking 20+ minutes.. and we still have many classes lacking (proper) tests.
when we change part of the application, another part breaks easily. Although more and more decoupling and functional tests help with that, it's still far from ideal.
I've done some research to splitting a Symfony project into multiple projects (each with it's own github) and using SOA to connect them. My personal experience so far with SOA is that it makes things very hard to test fully and adds lots of overhead when migrating from standard Symfony 2 forms (which I totally love).
I was also thinking on another solution by creating a shared bundle with the shared entities and repositories. This would make it much easier to test code and share common services (managers), although I've also heard argumentation against big managers. The big downside to this is that we cannot simply use doctrine:schema:update then, because sharing the database and updating the database on a project with a lower version of the shared bundle, will remove fields.. causing loss of data. Also on this approach I have been unable to find any examples or use-cases.. which leads me to wonder if it wouldn't have many more downsides.
So my question is: what are common approaches and solutions for splitting a big project like this? And: are there reasons that maybe it should not be split at all?
Although I'm answering your question, It's kinda hard to come with a magical solution for your problems. This is not an attempt to solve all of your problems, nor impose you of following it. This is not the only possible solution, actually this might not even solve your problems. That said, let's begin.
I'd split the project in 4 layers:
Presentation Layer: Desktop aplications, Web interfaces (no matter if
is php, C#, if it uses Symphony or any other framework and third
library components), Mobile Apps, everything end users can see and
interact with (also known as GUI). These guys only communicate with
Application/Service to request something, like a list of available
products, update some data somewhere, send an e-mail for customers.
The key here is they really don't know how and where is
Appication/Service layer going to do the requested actions.
Application/Service Layer: I'd treat this as controllers which can receive requests from the Presentation Layer, and external webservices as well. They look like APIs, and will decide if they have to access/manipulate data through a Repository, or send e-mails using some SMPT service. They just makes the communication between GUI or external webservices which might consume your APIs and Domain/Infra layers. Yet they don't actually know what SMPT service they are using, or where data is going to be stored and how (in a MySql through Doctrine? in Sql Server through Entity Framework? in a NoSql database? txt files?). Application layers usually have their own Models (also known as ViewModels), which are exposed to the world and returned to the requester (GUI or external Webservice), representing part of the domain models. This mapping (convert Domain classes to Application classes) can be done with patterns like Facade and Adapters (also called the Anti-corruption layer), and there are plenty of packages to resolve this (for C#, there is Automapper, for PHP there might exist something either). Why should you need this? To avoid exposing your full domain to the world. Suppose you have Invoice and Loyalty end-users, but you wanna treat them as one unique domain class "User" with their corresponding properties together. You could create a LoyaltyUser and an InvoiceUser classes in your application, each one containing only the necessary properties for that purpose, then use this Mapping technique to map the domain User class to each one of them. Therefore, the application layer usually contains authentication and authorization rules, so only the Loyalty end-user would have permission to access controller's actions which would deal with the LoyaltyUser model. Inside a single action in a controller, you shouldn't take different paths/ways depending on the requester (for mobile, do this, for website, do that). Instead, you might have different actions for each one, and tue Presentation layer knows what they want to request.
Domain Layer: This is your core, containing all business logic. This is what provide value to your business. Domain layer container
models/classes representing real entities from your world, interfaces
for services and repositories. Domain must be the most clean and
natural possible. They can't know what application is asking
something, nor how type of infra is being used. They just do business
logic. The Domain layer don't know if your are using Doctrine or Laravel as an ORM, nor if the application is a php website done with Symphony, or an Android Native App.
Infra Layer: Here you implement things like database, SMPT service, Logging, and other things your application might need.
Doctrine would reside here. Therefore, you would create Repository
classes implementing the repository interfaces of your domain. The
Repository implementation uses Doctrine to do stuff. These
implementations are providen to Application Layer (normally via
Dependency Injection). This means the Application Layer shouldn't
know if is Doctrine or Laravel, that's why the Application uses the
Repository (so logic to access database are encapsulated).
Your web interfaces would reside in Presentation. If the framework you use in your web have to use MVC and therefore have controllers, these controllers should dispatch to the Application Layer (I know it sounds redundant). Your APIs would reside in Application Layer.
This is very decoupled, if you need to change from Doctrine to Laravel, your don't need to change your Domain nor your Apps. If your need to change from Symphony to anything else, or even change your website from PHP to ASP or Java, your domain don't have to be changed.
Adding more layers, mapping objects, using DI shouldn't make requests slower, considering the hardware's price and capacity nowadays, the difference in time is almost imperceptible. You should put efforts attempting to improve your domain, which brings value for the business. Separating layers improve decoupling, chances of changing part of application breaking other parts, increase flexibility of scaling your app, and makes testing easier.
Rein, what was the solution you've finally ended up with? Have you actually split your project?
There is really a lack of information in this area, I just found one reasonable article https://ig.nore.me/presentations/2015/04/splitting-a-symfony-project-into-separate-tiers/
I was working with Silex and Doctrine ORM. To make my database queries faster, I wanted to have a caching of some sort.
I looked at PhpFastCache - which provides a good caching framework - but does not really integrate with Doctrine. The best part about this is that I can have a local cache independent of any external service - like memcached. Since I have a small site which is hosted on shared host, I cannot spend money on having a service on cloud.
I also looked at existing cache providers for Doctrine ORM and all of them use external cache service.
The last thing I know I would have to do is write a provider myself using the PhpFastCache, but just wanted to make sure that there is no alternative online that I can use. I have tried my best by searching online all day today, but I just wanted to make sure.
Just to add: I have looked at APC and Memcache, but I have my site on shared hosting, and I would need a dedicated hosting for installing the PECL modules for APC/Memcache :(.
Doctrine includes quite a few cache drivers that do not seem to be documented. There is not one for PhpFastCache, but there are two that cache directly to the filesystem. Check out FilesystemCache and PhpFileCache. You can see the full list in the repository.
If I had to guess, I'd say that FilesystemCache is what you want. It stores serialized data in a plain file. PhpFileCache stores it as a PHP file, and then uses include to read it later. That means it has to be parsed by PHP on read, which is probably slower unless you use a PHP bytecode cache like APC.
Neither solution will be as fast as something like Memcache since they both read from the filesystem instead of memory, but they should provide an optimization for slow database queries that are run often.
Edit: As Kiran Madipally pointed out, it should be easy create your own PhpFastCache driver by extending CacheProvider.
I quickly wrote a provider for PhpFastCache. I have added the gist here:
https://gist.github.com/thephoenics/ee7de9f95bfdf5f6c24f
I have a folder of PHP scripts, they are mostly utility scripts. How to share those scripts among different PHP applications so that reuse and deployment are easy?
I would have to package my app into an installer, and let the user install it.
I could put the lib and hardcode the include path, but that means I haven to change the PHP code every time i deploy the web application to a new customer. This is not desirable.
Another route I consider is to copy the lib to other apps, but still, since the lib is constantly updating, that means that I need to constantly do the copying, and this will introduce a lot of problems. I want an automated way to do this.
Edit: Some of the applications are Symfony, some are not.
You could create a PEAR package.
See Easy PEAR Package Creation for more information on how to do this.
This assumes that when you say anyone, you mean outside your immediate organisation.
Updated: You do not need to upload to a website to install the PEAR package. Just extract your archive into the pear folder to use in a PHP application.
Added: Why not create a new SVN repository for your library? Lets say you create a library called FOO. Inside the repostory you could use the folder heirachy of trunk\lib\foo. Your modules could then go into trunk\lib\foo\modules and have a file called trunk\lib\foo\libfoo.php. Now libfoo.php can include once or require once all the modules as required.
PHP now supports Phar archives. There's full documentation on php.net.
There's a complete tutorial on IBM website as well.
One neat thing you can do with Phar archives is package an entire application and distribute it that way.
http://php.net/phar
http://www.ibm.com/developerworks/opensource/library/os-php-5.3new4/index.html
Ahh, libraries...
There are two conflicting purposes here:
Sanity when updating scripts (ie. not breaking 10 other apps).
Keeping things in one organized logical place for developer efficiency.
I suggest you take a close look at git and git submodules
We use git submodules extensively for this very purpose. It allows the best of both worlds because shared scripts can be upgraded at will in any project, and then that change can be moved to the other projects (deliberately) when you have time to do so and test correctly.
Of course, you need to be using git to take advantage of submodules, but if you are not using git, and you start, you'll eventually wonder how you ever lived without it.
Edit: Since the original poster is using svn, consider using SVN Externals.
UPDATED:
you just have to put the lib in some place reachable by your apps (in a place where you can reach it via http or ftp or https or something else) and include it.
If you have to update it often you can package your library in a single phar file and you can then provide your client a function to pull the library from some remote path and update a parameter in their local configuration accordingly, like:
function updateLocalLibary(){
//read the remote library in a variable
$file= file_get_content($remoteLibraryRepository.$libraryPharFile);
//give it a unique name
$newLibraryName=$libraryPharFile."_".date('Ymdhsi');
//store the library it on a local file
file_put_content($localLibraryPath.$newLibraryName,$file);
//update the configuration, letting your app point to the new library
updateLatestLibraryPathInConfig($newLibraryName);
//possibly delete the old lib
}
In your include path then you don't have necesasrily to hardcode a path, you can include a parameter based on your config, like:
include( getLatestLibraryPathFromConfig() )
(you are responsible to secure the retrieval in order to let only your clients see the library)
Your conf can be in a db, so that when you call updateLibraryPathInConfig() you can perform an atomical operation and you are sure not to have client read dirty data.
The clients can then update their library as needed. They may even schedule regular updates.
There are a lot of options:
tar + ftp/scp
PEAR (see above #Wayne)
SVN
rsync
NFS
I recommend to use a continuous integration software (Atlassian Bamboo, CruiseControl); check out your repository, build a package, and then use rsync. Automatically.
You should also look into using namespace in order to avoid conflicts with other libraries you might use. pear is probably a good idea for the delivery method, however, you can just place it in the standard path /usr/share/php/, or any other place that is set as the include path in your php settings file.
Good question, and probably one that doesn't have a definite answer. You can basically pick between two different strategies for distributing your code: Either you put commonly used code in one place and let individual applications load from the same shared place, or you use a source-control-system to synchronise between local copies. They aren't mutually exclusive, so you'll often see both patterns in use at the same time.
Using the file system to share code
You can layer the include_path to create varying scopes of inclusion. The most obvious application of this pattern is a globally maintained PEAR repository and a local application. If your it-system consists of multiple applications that share a common set of libraries, you can add a layer in between these (a framework layer). If you structure the include_path such that the local paths come before the global paths, you can use this to make local overrides of files. This is a rather crude way to extend code, since it works per-file, but it can be useful in some cases.
Use source-control
Another strategy is to make a lot of local checkouts of a single shared repository. Some benefits over the layered-include-pattern is that you can make more fine grained local changes. It can be a bit of a challenge to manage the separation between application layers (infrastructure, framework, application). svn:externals can work, but has some limitations. It's also slightly more complicated to propagate global changes to all applications. An automated deployment process can help with that.
From my experience, one of the bigger problems we come across during our webdevelopment process is keeping different setups updated and secure across different servers.
My company has it's own CMS which is currently installed across 100+ servers. At the moment, we use a hack-ish FTP-based approach, combined with upgrade scripts at specific locations to upgrade all of our CMS setups. Efficiently managing these setups becomes increasingly difficult and risky when there are several custom modules involved.
What is the best way to keep multiple setups of a web application secure and up-to-date?
How do you do it?
Are there any specific tips regarding modularity in applications, in order to maintain flexibility towards our clients, but still being able to efficiently manage multiple "branches" of an application?
Some contextual information: we mainly develop on the LAMP-stack. One of the main factors that helps us sell our CMS is that we can plugin pretty much anything our client wants. This can very from 10 to to 10.000 lines of custom code.
A lot of custom work consists of very small pieces of code; managing all these small pieces of code in Subversion seems quite tedious and inefficient to me (since we deliver around 2 websites every week, this would result in a lot of branches).
If there is something I am overlooking, I'd love to hear it from you.
Thanks in advance.
Roundup: first of all, thanks for all of your answers. All of these are really helpful.
I will most likely use a SVN-based approach, which makes benlumley's solution closest to what I will use. Since the answer to this question might differ in other usecases, I will accept the answer with the most votes at the end of the run.
Please examine the answers and vote for the ones that you think have the most added value.
I think using a version control system and "branching" the part of the codes that you have to modify could turn out to be the best approach in terms of robustness and efficiency.
A distributed version system could be best suited to your needs, since it would allow you to update your "core" features seamlessly on different "branches" while keeping some changes local if need be.
Edit: I'm pretty sure that keeping all that up to date with a distributed version system would be far less tedious than what you seem to expect : you can keep the changes you are sure you're never going to need elsewhere local, and the distributed aspect means each of your deployed application is actually independent from the others and only the fix you mean to propagate will propagate.
If customizing your application involves changing many little pieces of code, this may be a sign that your application's design is flawed. Your application should have a set of stable core code, extensibility points for custom libraries to plug into, the ability to change appearance using templates, and the ability to change behavior and install plugins using configuration files. In this way, you don't need a separate SVN branch for every client. Rather, keep the core code and extension plugin libraries in source control as normal. In another repository, create a folder for each client and keep all their templates and configuration files there.
For now, creating SVN branches may be the only solution that helps you keep your sanity. In your current state, it's almost inevitable that you'll make a mistake and mess up a client's site. At least with branches you are guaranteed to have a stable code base for each client. The only gotcha with SVN branches is if you move or rename a file in a branch, it's impossible to merge that change back down to the trunk (you'd have to do it manually).
Good luck!
EDIT: For an example of a well-designed application using all the principles I outlined above, see Magento E-Commerce. Magento is the most powerful, extensible and easy to customize web application I've worked with so far.
I may be wrong, but it seems to me what Aron is after is not version control. Versioning is great, and I'm sure they're using it already, but for managing updates on hundreds of customized installations, you need something else.
I'm thinking something along the lines of a purpose-built package system. You'll want every version of a module to keep track of its individual dependencies and 'guaranteed compatibilities', and use this information to automatically update only the 'safe' modules.
E.g. let's say you've built a new version 3 of your 'Wiki' module. You want to propagate the new version to all the servers running your application, but you've made changes to one of the interfaces within the Wiki module since version 2. Now, for all default installations, that is no problem, but it would break installations with custom extensions on top of the old interface. A well-planned package system would take care of this.
To address the security question, you should look into using digital signatures on your patches. There are lots of good libraries available for public-key-based signatures, so just go with whatever seems to be the standard for your chosen platform.
Not sure whether someone's said this, there are a lot of long responses here, and I've not read them all.
I think a better approach to your version control would be to have your CMS sat on its own in its own repository and each project in its own. (or, all of these could be subfolders within one repo i guess)
You can then use its trunk (or a specific branch/tag if you prefer) as an svn:external in each project that requires it. This way, any updates you make to the CMS can be committed back to its repository, and will be pulled into other projects as and when they are svn updated (or the external is svn:switch 'ed).
As part of making this easier, you will need to make sure the CMS and the custom functionality sit in different folders, so that svn externals works properly.
IE:
project
project/cms <-- cms here, via svn external
project/lib <-- custom bits here
project/www <-- folder to point apache/iis at
(you could have cms and lib under the www folder if needed)
This will let you branch/tag each project as you wish. You can also switch the svn:external location on a per branch/tag basis.
In terms of getting changes live, I'd suggest that you immediately get rid of ftp and use rsync or svn checkout/exports. Both work well, the choice is up to you.
I've got most experience with the rsync route, rsyncing an svn export to the server. If you go down this route, write some shell scripts, and you can create a test shell script to show you the files it will upload without uploading them as well, using the -n flag. I generally use a pair of scripts for each environment - one a test, and one to actually do it.
Shared key authentication so you don't need a password to send uploads up may also be useful, depending on how secure the server to be given the access is.
You could also maintain another shell script for doing bulk upgrades, which simply calls the relevant shell script for each project you want to upgrade.
Have you looked at Drupal? No, not to deploy and replace what you have, but to see how they handle customizations and site-specific modules?
Basically, there's a "sites" folder which has a directory for every site you're hosting. Within each folder is a separate settings.php which allows you to specify a different database. Finally, you can (optionally) have "themes" and "modules" folders within sites.
This allows you to do site-specific customizations of particular modules and limit certain modules to those sites. As a result, you end up with a site that the vast majority of everything is perfectly identical and only the differences get duplicated. Combine that with the way it handles upgrades and updates and you might have a viable model.
Build into the code a self-updating process.
It will check for updates and run them when/where/how you have configured it for the client.
You will have to create some sort of a list of modules (custom or not) that need to be tested with the new build prior to roll-out. When deploying an update you will have to ensure these are tested and integrated correctly. Hopefully your design can handle this.
Updates are ideally a few key steps.
a) Backup so you can back out. You should be able to back out
the entire update at any time. So,
that means creating a local archive
of the application and database
first.
b) Update Monitoring Process - Have the CMS system phone home to look for a new build.
c) Schedule Update on availability - Chances are you don't want the update to run the second it is available. This means you will have to create a cron/agent of some kind to do the system update automatically in the middle of the night. You can also consider client requirements to update on weekends, or on specific days. You can also stagger rolling out your updates so you don't update 1000 clients in 1 day and get tech support hell. Staggered roll-out of some kind might be beneficial for you.
d) Add maintenance mode to update the site -- Kick the site into maintenance mode.
e) SVN checkout or downloadable packages -- ideally you can deploy via svn checkout, and if not, setup your server to deliver svn generated packages into an archive that can be deployed on client sites.
f) Deploy DB Scripts - Backup the databases, update them, populate them
g) Update site code - All this work for one step.
h) Run some tests on it. If your code has self-tests built in, it would be ideal.
Here's what I do...
Client-specific include path
Shared, common code is in shared/current_version/lib/
Site specific code is in clients/foo.com/lib
The include path is set to include from the clients/foo.com/lib, and then share/lib
The whole thing is in a version control system
This ensures that the code uses shared files wherever possible, but if I need to override a particular class or file for some reason, I can write a client specific version in their folder.
Alias common files
My virtual host configuration will contain a line like
Alias /common <path>/shared/current_version/public_html/common
Which allows common UI elements, icons, etc to be shared across projects
Tag the common code with each site release
After each site release, I tag the common code by creating a branch to effectively freeze that point in time. This allows me to deploy /shared/version_xyz/ to the live server. Then I can have a virtual host use a particular version of the common files, or leave it pointing at the current_version if I want it to pick up the latest updates.
Have you looked at tools such as Puppet (for system administration incl. app deployment) or Capistrano (deployment of apps in RubyOnRails but not limited to these)?
One option would be to set up a read-only version control system (Subversion). You could integrate access to the repository into your CMS and invoke the updates through a menu, or automatically if you do not want the user to have a choice about an update (could be critical). Using a version control system would also allow you to keep different branches easily
As people have already mentioned that using version control (I prefer Subversion due to functionality) and branching would be the best option. Another open source software available on sourceforge called cruisecontrol. Its amazing, you configure cruisecontrol with subversion in sach a way that any code modification or new code added in serversion, Cruise control will know automatically and will do build for you. It will save your hell of time.
I have done the same way in my company. we have four projects and have to deploy that project on different servers. I have setup cruiseconrol in such a way that any modification in code base triggers automatic build. and another script will deploy that build on the server. your are good to go.
If you use a LAMP stack I would definitely turn the solutions files into a package of your distribution and use it for propagate changes. I recommend for that matter Redhat/Fedora because of RPM and it's what I have experience on. Anyway you can use any Debian based distribution too.
Sometime ago I made a LAMP solution for managing an ISP hosting servers. They had multiple servers to take care of web hosting and I needed a way to deploy the changes of my manager, because every machine was self-contained and had a online manager. I made a RPM package containing the solution files (php mostly) and some deploying scripts that runned with the RPM.
For automated updating we had our own RPM repository set on every server in yum.conf. I set an crontab job to update the servers daily with the latest RPMs from that trusted repository.
Trustiness can be achieve too because you can use trust settings in the RPM packages, like signing them with your public key file and accepting only signed packages.
Hm could it be an idea to add configuration files? You wrote that a lot of small script are doing something. Now if you'd build them into the sources and steered them with configuration files shouldn't that "ease" that?
On the other hand having branches for every customer looks like an exponential growth to me. And how would you "know" which areas you've done something and do not forget to "make" changes in all other branches also. That looks quite ugly to me.
It seems a combination of revision controls, configuration options and/or deployment receipts seems to be a "good" idea.....
With that many variations on your core software, I think you really need a version control system to stay on top of pushing updates from the trunk to the individual client sites.
So if you think Subversion would be tedious, you've got a good sense for what the pain points will be... Personally, I wouldn't recommend Subversion for this, since it's not really that good at managing & tracking branches. Although benlumley's suggestion to use externals for your core software is a good one, this breaks down if you need to tweak the core code for your client sites.
Look into Git for version control, it's built for branching, and it's fast.
Check out Capistrano for managing your deployments. It's a ruby script, often used with Rails, but it can be used for all sorts of file management on remote servers, even non-ruby sites. It can get the content to the remote end through various stragegies including ftp, scp, rsync, as well as automatically checking out the latest version from your repository. The nice features it provides include callback hooks for every step of the deploy process (e.g. so you can copy your site-specific configuration files which might not be in version control), and a release log system--done through symlinks--so you can quickly roll back to a previous release in case of trouble.
I'd recommend a config file with the list of branches and their hosted location, then run through that with a script that checks out each branch in turn and uploads the latest changes. This could be cron'd to do nightly updates automatically.