We periodically need to modify the structure of our production database. In many cases, these changes correspond to changes in the codebase which reference the new changes.
We usually end up putting up a fail whale page for a minute while we pull the new changes and run the new SQL queries, but I'm interested in writing a component or something to run the SQL queries on the fly, so that no downtime would be required.
I haven't tried it yet, but here's my plan:
Write a component or something to run a specific query or queries. The queries will probably have checks like IF EXISTS so they run only once. I think it will also have to clear the model / persistent caches too.
Call the above component/query from inside the AppController's beforeFilter.
Pull the changes into the live (with the above code in place).
Wait a few seconds for the app to run once (either by another user or us).
Remove the beforeFilter code which triggered the SQL queries to run.
Is this a crazy idea? Here's my questions:
A. Will something like this work?
B. Is there a better way to do this that I'm missing?
C. What do I need to know about the model caches to keep from throwing errors. Presumably, the debug level will be set to 0 (because the site will be in production).
By the way, it seems relevant to mention that we are not on a load balanced system. We're on a single dedicated server.
Will something like this work?
Sort of. You will likely still suffer downtime. Even if only briefly while the queries finish running. You may also have concurrency issues if the site has heavy traffic during your deployment.
Is there a better way to do this that I'm missing?
With a load balanced system you could update one system at a time while diverting traffic to another system until all systems are updated.
You could feature flag your code. Code will not run until this feature is enabled. So launch your code and when your database updates are complete, enable the feature.
What do I need to know about the model caches to keep from throwing errors.
I am not familiar enough with caches, but cache could provide the illusion that the site is available. But any dynamic request (form submission) could still result in an error.
As an aside, take a look at Schema Migrations if you haven't already.
Will something like this work?
I'm sure it would work, but you can also pound a nail with a shoe if you hit it hard enough.
Is there a better way to do this?
Jason McCreary is absolutely correct with his comment mentioning how code shouldn't deploy itself and that you should be using a deployment process - whether that be (at the very least) custom scripts, migrations, or even a custom tool like my company's tool BuildMaster which handles database deployments as a first-class concept.
Using a tool like this to flesh out your process will allow you to incrementally build-up to a fully automated system easily, that way if you ever do add load-balancing (or some other additional infrastructure) you don't have spend a whole lot of time updating your deployment process all over the place. You can also plan for rollbacks, advanced deployment scenarios, and other cool things that cannot be accomplished simply by putting deployment code within application code.
For database deployments specifically, BuildMaster can manage the scripts and will deploy them in the same order automatically when the appropriate deployment actions are used. This will minimize any downtime you may experience trying to run them manually. You can also put in stop/start application actions which will clear any server-side caches. There will always be the problem of ViewState or other client side persistence however, but that's a different issue altogether.
Related
I am a somewhat experienced web developer who is interested in gaining more experience with Laravel. For this purpose, I am hoping to write a Laravel-based control panel for my home server. Again, this is more for the learning experience than anything else. My question is this: What is the correct way to go about controlling system-level services without creating a massive security hole?
I know all about exec and the like, but I'm wondering what is the generally accepted way of accomplishing this? I've considered a couple options:
First, I considered writing a 'front-end-ish' Laravel app that the user can interact with, click buttons, etc and that issues commands through a unix socket to a 'backer-end' service (probably in Python) that reads the commands and executes those that are white-listed - thus bypassing (or reducing) the command-injection issue. This would allow me to give php-fpm very limited rights, but would be A LOT more work.
On the flip side I've considered just sanitizing the hell out of any user input, and giving php-fpm elevated rights on the system. Obviously this would be faster, and easier to manage, BUT would run the risk of opening a major security hole.
Ultimately I'm curious if someone with more experience could weigh in on this? Am I missing a better approach? What is the standard way to do this? Or is there no standard way? Everything is running on Ubuntu Server, with Nginx and PHP-FPM. I've done a fair amount of reading on the subject, but have yet to find answers to the questions above. Also, I'm am a pretty savvy Linux admin - but if I'm doing something stupidly don't be afraid to say so :)
Thank you for your time!
Back in the day we solved this by separating things out into two applications. Rather than call tasks directly, your web application would write state down to dispatch tables. In over simplified terms, if you needed the frontend application to call
cp /file/one /file/two
Your application would write data out to a table that looked something like this (again, over simplified)
command arguments job_type has_run
cp /file/one,/file/two account_setup 0
Then, there'd be a separate application whose job it was to monitor this table, and run any commands that needed running. This removed the need to give the PHP process any sort of elevated privileges. It also gave us the ability to have the application control servers it wasn't on. The separate application for running jobs also gave us a second chance at sanitizing strings, and a de-facto log of any attempts to attack the application. Finally, with a seperate application for running jobs we could use different user accounts to run different sort of jobs, and give those accounts only the permissions they needed.
If you wanted an ultra modern approach I imagine you'd try utilizing a message queue rather than the poor mans "queue via MySQL table and cron" -- the important bit is your PHP application shouldn't run the commands, your PHP application should send an event that a command needs to run.
Hope that helps!
if u dont want a giant security whole dont accept any user input. Only have specified shell scripts that execute the commands.
you can use shell_exec() and only execute shell scripts on the server.
You should look into Process Component which is pre-required with Laravel. However you still need to carefully plan the implementation, user input should only used for command options/arguments.
I have created a PHP+MYSQL web app and I am trying to implement now a logging system to store and track some actions of each user.
The purpose of this is the following: track the activity of each user's session by logging the IP+time+action, then see which pages he accessed later on by logging time+pagename; for each user there will be a file in the format: log{userid}_{month}.log
Each log will then be viewed only by the website owner, through a custom admin panel, and the data will be used only for security purposes (as in to show to the user if he logged in from a different IP or if someone else logged in from a different IP and to see which areas of the website the user accessed during his login session).
Currently, I have a MYSQL MyISAM table where I store the userid,IP,time,action and the app is still not launched, but we intend to have very many users (over 100k), and using a database for this solutions feels like suicide.
So what do you suggest? How should the logging be done? Using files, using a table in the current database, using a separate database? Are there any file-logging frameworks available for PHP?
How should the reading of the file be done then? Read the results by row?
Thank you
You have many options, so I'll speak from my experience running a startup with about 500k users, 100k active every month, which seems to be in your range.
We logged user actions in a MySQL database.
Querying your data is very easy and fast (provided good indexes)
We ran on Azure, and had a dedicated MySQL (with slaves, etc) for storing all user data, including logs. Space was not an issue.
Logging to MySQL can be slow, depending on everything you are logging, so we just pushed a log to Redis and had a Python app read it from Redis and insert into MySQL in the background. This made that logging basically had no impact on loading times.
We decided to log in MySQL for user actions because:
We wanted to run queries on anything at any time without much effort. The structured format of the user action logs made that incredibly easy to do.
It also allows you to display certain logs to users, if you would require it.
When we introduced badges, we had no need to parse text logs to award badges to those who performed a specific action X number of times. We simply wrote a query against the user action logs, and the badges were awarded. So adding features based on actions was easy as well.
We did use file logging for a couple of application logs - or things we did not query on a daily basis - such as the Python app writing to the database, Webserver access and error logs, etc.
We used Logstash to process those logs. It can simply hook into a log file and stream it to your Logstash server. Logstash can also query your logs, which is pretty cool.
Advanced uses
We used Slack for team communications and integrated the Python database writing app with it, this allowed us to send critical errors to a channel (via their API) where someone could action a fix immediately.
Closing
My suggestion would be to not over think it for now, log to MySQL, query and see the stats. Make updates, rinse and repeat. You want to keep the cycle between deploy and update quick, so making decisions from a quick SQL query makes it easy.
Basically what you want to avoid is logging into a server, finding a log and grep your way through it to find something, the above achieved that.
This is what we did, it is still running like that and we have no plans to change it soon. We haven't had any issues where we could not find anything that we needed. If there is a massive burst of users and we scale to 1mil monthly active users, then we might change it.
Please note: whichever way you decide to log, if you are saving the POST data, be sure to never do that for credit card info, unless you are compliant. Or rather use Stripe's JavaScript libraries.
If you are sure that reading the log will mainly target one user at a time, you should consider partioning your log table:
http://dev.mysql.com/doc/refman/5.1/en/partitioning-range.html
using your user_id as partitioning key.
Maximum number of partitions being 1024, you will have one partition storing 1/1000 of your 100k users, which is something reasonable.
Are there any file-logging frameworks available for PHP?
There is this which is available on packagist: https://packagist.org/packages/psr/log
Note that it's not a file logging framework but an API for a logger based on the PSR-3 standard from FIG. So, if you like, it's the "standard" logger interface for PHP. You can build a logger that implements this interface or search around on packagist for other loggers that implement that interface (either file or MySQL based). There are a few other loggers on packagist (teacup, forestry) but it would be preferable to use one that sticks to the PSR standard.
We do logging with the great tool Graylog.
It scales as high as you want it, has great tools on data visualization, is incredibly fast even for complex querys and huge datasets, and the underlying search-enginge (elasticsearch) is schemaless. The latter may be an advantage as you get more possibilities on extending your logs without the hassle mysql-schemas can give you.
Graylog, elasticsearch and mongodb (which is used as to save the configuration of graylog and its webinterface) are easily deployable via tools like puppet, chef and the like.
Actually logging to graylog is easy with the already mentioned php-lib monolog.
Of curse the great disadvantage here is that you have to learn a bunch of new tools and softwares. But it is worth it in my opinion.
The crux of the matter is the data you are writing is not going to be changed. In my experience in this scenario I would use either:
MySQL with a blackhole storage engine. Set it up right and its blisteringly fast!
Riak Cluster (NoSQL solution) - though this may be a learning curve for you it might be one you may need to eventually take anyway.
Use SysLog ;)
Set it up on another server and it can log all of your processes seperately (such as networking, servers, sql, apache, and your php).
It can be usefull for you and decreasing the time spend of debugging. :)
I have added a chat capability to a site using jquery and PHP and it seems to generally work well, but I am worried about scalability. I wonder if anyone has some advice. The key area for me I think is efficiently managing awareness of who is onine.
detail:
I haven't implemented long-polling (yet) and I'm worried about the raw number of long-running processes in PHP (Apache) getting out of control.
My code runs a periodic jquery ajax poll (4secs), that first updates the db to say I am active and sets a timestamp.
Then there is a routine that checks the timestamp for all active users and sets those outside (10mins) to inactive.
This is fairly normal from my research so far. However, I am concenred that if I allow every active user to check every other active user and then everyone update the db to kick off inactive users, then I will get duplicated effort, record locks and unnecessary server load.
So I have implemented an idea of the role of a 'sweeper'. This is just one of the online users, who inherits the role of the person doing the cleanup. Everyone else just checks whether there is a 'sweeper' in existence (DB read) and carries on. If there is no sweeper when they check, they make themselves sweeper (DB write for their own record). If there are more than one, make yourself 'non-sweeper', sleep for a random period and check again.
My theory is that this way there is only one user regularly writing updates to several records on the relevant table and everyone else is either reading or just writing to their own record.
So it works OK, but the problem possibly is that the process requires a few DB reads and may actually be less efficient than just letting everyone do the cleanup as with other research as I mentioned.
I have had over 100 concurrent users running OK so far, but the client wants to scale up to several 100's, even over 1,000 and I have no idea of knowing at this stage whether this idea is good or not.
Does anyone know whether this is a good approach or not, whether it is scalable to hundreds of active users, or whether you can recommend a different approach?
AS an aside, long polling / comet for the actual chat messages seems simple and I have found a good resource for the code, but there are several blog comments that suggest it's dangerous with PHP and apache specifically. active threads etc. Impact minimsed with usleep and session_write_close.
Again does anyone have any practical experience of a PHP long polling set up for hundreds of active users, maybe you can put my mind at ease ! Do I really ahve to look to migrate this to node.js (no experience) ?
Thank you in advance
Tony
My advice would be to do this with meteor framework, which should be pretty trivial to do, even if you are not an expert, and then simply load such chat into your PHP website via iframe.
It will be scalable, won't consume much resources, and it will get only better in the future, I presume.
And it sure beats both PHP comet solutions and jquery & ajax timeout based calls to server.
I even believe you could find on github more or less a completed solution that just requires tweaking.
But of course, do read the docs before you implement it.
If you worry about security issues, read security with meteor
Long polling is indeed pretty disastrous for PHP. PHP is always runs with limited concurrent processes, and it will scale great as long as you optimize for handling each request as quickly as possible.
Long polling and similar solutions will quickly fill up your pipe.
It could be argued that PHP is simply not the right technology for this type of stuff, with the current tools out there. If you insist on using PHP you could try ReactPHP, which is a framework for PHP quite similar to how NodeJS is built. The implication with React is also that it's expected to run as a separate deamon, and not within a webserver such as apache. I have no experience on the stability of this, and how well it scales, so you will have to do the testing yourself.
NodeJS is not hard to get into, if you know javascript well. NodeJS + socket.io make it really easy to write the chat-server and client with websockets. This would be my recommendations. When I started with this is, I had something nice up and running within several hours.
If you want to keep your application stack using PHP, you want the chat application running in your actual web app (not an iframe) and your concerned about scaling your realtime infrastructure then I'd recommend you look at a hosted service for the realtime updates, such as Pusher who I work for. This way the hosted service handles the scaling of the realtime infrastructure for you and lets you concentrate on building your application functionality.
This way you only need to handle the chat message requests - sanitize/verify the content - and then push the information through Pusher to the 1000's of connected clients.
The quick start guide is available here:
http://pusher.com/docs/quickstart
I've a full list of hosted services on my realtime web tech guide.
Let's say your writing a PHP application that will be hosted in a load-balanced/multi-server setup. What are the things you need to know in order to ensure smooth operation? Right now the only thing I think will be an issue is PHP sessions (i.e., you must use a custom database handler for it). Anything else?
Let's turn this into an answer:
In my experience, the overwhelming majority of PHP applications is not or not only constrained by PHP horsepower on the webserver, but at least as much by backing store, i.e. Database and/or files.
So load balancing a PHP application without carefull analysis bears the potential to make things worse: Hit the weakest link in the chain with more and more load.
So the first - and IMHO most important "thing to know when writing a web app hosted in a load-balanced server" is the load pattern, and its potential for balancing. If your app performs bad, you load-balance it on more servers, then find out you now have more servers waiting for the DB, you are in trouble.
Here is an out-of-the blue checklist, please reagrd it as a brainstorm (or a brainfart) only:
First: Are you really CPU-bound?
Which pages are hit most (see your log)
For the top N of these (with a suitable N) check the processing pattern: Where do the CPU cycles go?
What would be the side effects of making sessions, uploads, file storage (add whatever you use) shared and would it be offset by the load balancing?
Comments welcome, I am very sure to have not even scratched the surface!
Edit
Just thought of something that bit me once in this context: Resource locking. Brace yourself for a higher degree of concurrency, if you go multi-server
File uploads/downloads could be also an issue - you probably would need them to be visible all servers
I am building a web-application and have a couple of quick questions. From what I learnt, one should not worry about scalability when initially building the app and should only start worrying when the traffic increases. However, this being my first web-application, I am not quite sure if I should take an approach where I design things in an ad-hoc manner and later "fix" them. I have been reading stories about how people start off with an app that gets millions of users in a week or two. Not that I will face the same situation but I can't help but wonder, how do these people do it?
Currently, I bought a shared hosting account on Lunarpages and that got me started in building and testing the application. However, I am interested in learning how to build the same application in a scalable-manner using the cloud, for instance, Amazon's EC2. From my understanding, I can see a couple of components:
There is a load balancer that first receives requests and then decides where to route each request
This request is then handled by a server replica that then processes the request and updates (if required) the database and sends back the response to the client
If a similar request comes in, then a caching mechanism like memcached kicks into picture and returns objects from the cache
A blackbox that handles database replication
Specifically, I am trying to do the following:
Setting up a load balancer (my homework revealed that HAProxy is one such load balancer)
Setting up replication so that databases can be synchronized
Using memcached
Configuring Apache to work with multiple web servers
Partitioning application to use Amazon EC2 and Amazon S3 (my application is something that will need great deal of storage)
Finally, how can I avoid burning myself when using Amazon services? Because this is just a learning phase, I can probably do with 2-3 servers with a simple load balancer and replication but until I want to avoid paying loads of money accidentally.
I am able to find resources on individual topics but am unable to find something that starts off from the big picture. Can someone please help me get started?
Personally, I think you should be considering how your app will scale initially - as otherwise you'll run into problems down the line.
I'm not saying you need to build it initially as a multi-server system, but if you think you'll need to do it later, be mindful of the concerns now.
In my experience, this includes things like:
Sessions. Unless you use 'sticky' load balancing, you will have to have some way of sharing session state between servers. This probably means storing session data on either shared storage, or in a DB.
File uploads and replication. If you allow users to upload files, or you have a CMS that allows you to upload images/documents, it needs to cater for the fact that these files will also need to find their way onto other nodes in your cluster. However, if you've gone down the shared storage route mentioned above, this should cover it.
DB scalability. If you're using traditional DB servers, you might want to think about how you'll implement scalability at that level. This may mean coding your app so you use one connection string for reads, and another for writes. Then, you are free to implement replication with one master node handling the inserts/updates cascading the changes to read only nodes that handle the bulk of the work.
Middleware. You might even want to go down the route of implementing some kind of message oriented middleware solution to completely hand off business logic functions - this will give you a great level of flexibility in how you wish to scale this business logic layer in the future. Although initially this will be a lot of complication and work for not a great deal of payoff.
Have you considered playing around with VMs first? You can run 2-3 VMs on your local machine and set them up like you would actual servers, they just won't be able to handle real traffic levels. If all you're looking for is the learning experience, it might be an ideal way to go about it.