This is probably a odd question but is something that I have been wondering about lately.
I have a application that requests a page (php script, works like a API and outputs a simple string) from my webserver every second. That seems quite a lot of spam and I was wondering if any issue could arrive from that.
Like, I should probably have attention to the webserver logging, to make sure it doesnt spam the disk until its full. RAM/CPU isn't a problem at this point. APC is enabled. The scripts are optimized. What else should I look into, if anything ?
This is probably the same situation I would encounter with a lot of visitors comming to my site, but I never had that experience yet.
Thanks!
Every second? That's 86400 times a day per client. That's a lot for php! but it should be okay unless you have multiple clients, some kind of I/O heavy or database system behind it.
Otherwise, php5[-fpm] with APC on nginx sounds suited for this use, if you must use PHP.
If this component of your application aggregates data without a database, by mining other data sources over the internet, you may want to check with the data providers that realtime polling is permissible and to ensure your addresses are whitelisted explicitly.
Firewalls aren't to be forgotten: using a permit-by-exception security policy, i.e. iptables -t filter -P INPUT DROP, fine-turned to the packet level using the iptables -t raw table as well. One of the greatest threats to mission-critical webserver performance is the ability of an adversary to identify a node as critical by analyzing traffic frequency and volume. Closing all non-critical ports at the lowest-level is an easy defense.
Another option is automated failover strung together with node monitoring for this server and rapid deployment of a drop-in replacement appliance using a cloud VPS provider such as Digital Ocean or Amazon Web Services. This is an alternative to running redundant servers (or instances) permanently, and fun to setup.
Applications which require realtime request processing with failover are often seen in the financial industry in high-value risk environments, as well as in the security and transportation industries in safety-critical risk environments. If either of these scenarios applies to you, you may wish to consider rebuilding this component of your application from the ground up using a specially-purposed language set including Ada, Erlang, Haskell. This would allow you to optimize resource utilization at a lower-level, and therefore obtain optimum performance. Depending on your risk environment, this may or may not be worthwhile for you.
Related
Currently I'm working with PHP programming, and I find that I can load a web page just only by using PHP CL, so I don't understand exactly why we have to install additional server like Apache or Nginx.
I don't know why your question was voted down. I see it as a question for focusing on a slightly broader but highly related question: Why should we be extremely careful to only allow specific software onto public-facing infrastructure? And, even more generally, what sort of software is okay to place onto public-facing infrastructure? And its corollary, what does good server software look like?
First off, there is no such thing as secure software. This means you should always hold a very skeptical view of anything that opens a single port on a computer to enable network connections (in either direction). However, there is a very small set of software that has had enough eyeballs on it to guarantee a certain minimum level of assurance that things will probably not go horribly wrong. Apache is the most battle-tested server out there and Nginx comes in at a close second as far as modern web servers are concerned. The built-in PHP HTTP server is not a good choice for a public-facing system let alone testing production software as it lacks the qualities of good network server design and may have undiscovered security vulnerabilities in it. For those and other reasons, the developers include a warning against using the built-in PHP server. It was added because users kept asking for it but that doesn't mean it should be used.
It is also a good idea to not trust network servers written by someone who doesn't know what they are doing. I frequently see ill-conceived network servers written in Node or Go, typically WebSocket-based solutions or just used to work around some issue with another piece of software, that implicitly opens security holes in the infrastructure even if the author didn't intend to do so. Just because someone can do something doesn't mean that they should and, when it comes to writing network servers, they shouldn't. Frequently those servers are proxied behind Apache or Nginx, which affords some defense against standard attacks. However, once an attacker gets past the defenses of Apache or Nginx, it's up to the software to provide its own defenses, which, sadly, is almost always significantly lacking. As a result, any time I see a proxied service running on a host, I brace myself for the inevitable security disaster that awaits - Ruby, Node, and Go developers being the biggest offenders. The moment a developer decides to write a network server is the moment they've probably chosen the wrong strategy unless they have a very specific reason to do so AND must be aware of and prepared to defend against a wide range of attack scenarios. A developer needs to be well-versed in a wide variety of disciplines before taking on the extremely difficult task of writing a network server, scalable or otherwise. It is my experience that few developers out there are actually capable of that task without introducing major security holes into their own or their users' infrastructure. While the PHP core developers generally know what they are doing elsewhere, I have personally found several critical bugs in their core networking logic, which shows that they are collectively lacking in that department. Therefore their built-in web server should be used sparingly, if at all.
Beyond security, Apache and Nginx are designed to handle "load" more so than the built-in PHP server. What load means is the answer to the question of, "How many requests per second can be serviced?" The answer is actually extremely complicated. Depending on code complexity, what is being hosted, what hardware is in use, and what is running at any point in time, a single host can handle anywhere from 20 to 20,000 requests per second and that number can vary greatly from moment to moment. Apache comes with a tool called Apache Bench (ab) that can be used to benchmark performance of a web server. However, benchmarks should always be taken with a grain of salt and viewed from the perspective of "Can we get this application to go any faster?" rather than "My application is faster than yours."
As far as developing software in PHP goes (since SO is a programming question site), I recommend trying to mirror your production environment as best as possible. If Apache will be running remotely, then running Apache locally provides the best simulation of the real thing so that there aren't a bunch of last-minute surprises. PHP code running under the Apache module may have significantly different behavior than PHP code running under the built-in PHP server (e.g. $_SERVER differences)!
If you are like me and don't like setting up Apache and PHP and don't need Apache running all the time, I maintain a set of scripts for setting up portable versions of Apache, PHP, and Maria DB (roughly equivalent to MySQL) for Windows over here:
https://github.com/cubiclesoft/portable-apache-maria-db-php-for-windows/
If your software application is actually intended to be run using the built-in PHP server (e.g. a localhost only server), then I highly recommend introducing a buffer layer such as the CubicleSoft WebServer class:
https://github.com/cubiclesoft/ultimate-web-scraper/
By using a PHP userland class like that one, you can gain certain assurances that the built-in PHP server cannot provide while still being a pure PHP solution (i.e. no extra dependencies): Fewer, if any, buffer overflow opportunities, the server is interpreted through the Zend Engine resulting in fewer rogue code execution opportunities, and has more features than the built-in server including complete customization of the server request/response cycle itself. PHP itself can start such a server during an OS boot by utilizing a tool similar to Service Manager:
https://github.com/cubiclesoft/service-manager/
Of course, that all means that a user has to trust your application's code that opened a port to run on their computer. For example, what happens if a website starts port scanning localhost ports via the user's web browser? And, if they do find the port that your software is running on, can that website start deleting files or run code that installs malware? It's the unusual exploits that will really trip you up. A "zero open ports" with "disconnected network cable/disabled WiFi" strategy is the only known way to truly secure a device. Every open port and established connection carries risk.
Good network-enabled software will have been battle-tested and hardened against a wide range of attacks. Writing such software is a responsibility that takes a lot of time to get right and it will generally show if it is done wrong. PHP's built-in server feels sloppy and lacks basic configuration options. I can't recommend its use for any reasonable purpose.
If you refer to the PHP documentation:
Warning
This web server was designed to aid application development. It may
also be useful for testing purposes or for application demonstrations
that are run in controlled environments. It is not intended to be a
full-featured web server. It should not be used on a public network.
http://php.net/manual/en/features.commandline.webserver.php
So yes, as it states, this is a good tool for testing purposes. You can quickly start a server and test your scripts in your browser. But that does not mean it provides all of the features you get with a production level server like apache or Nginx :)
You can use the built in server in your local development environment. But you should you use a more secure, feature rich web server in your production environment which requires much more features in terms of security, handling large number of requests etc.
I have developed a Social Networking website based on WAMP(Windows, Apache, MySQL, PHP) server. I put it up on a Free Host (the hosting serves it in LAMP), and it works fine.
Now, I researched a little and found out that PHP applications are difficult to scale, and take a lot of parallel algorithms. I would like to test how many users does the webhost support for my website, and how much does my localhost.
It's a social network like any other, involving:
Posting data on main page(with images).
Chat between users (with polling every 3 seconds, consider as one in
facebook).
A question-answer forum(just as this one, or yahoo answers -
including upvotes,downvotes, points etc)
Two HTML5 server sent event loops running infinitely.
Many AJAX requests for retrieving data from MySQL database.
As for now, I haven't applied any Cache options, which I plan for later. Also, the chat application has to be switched from Polling to Websockets(HTML5).
My estimated user database goes to be a lot more than 100,000 users. That may need some serious scalability.
I need to know what kind of server may I need for the same. Should it be a dedicated server, should it be 2 of 'em or even more?
I tried this ab.exe located in bin folder of Apache, but it tests the location we provide manually. A social network needs login information to access all the data, which unfortunately limits the functionality of ab.exe only to availability of the "Welcome" page, and nothing towards the AJAX and HTML5 features which I mentioned above.
So, how exactly should I test the scalibilty of website for a hardware same as my laptop(Windows, Intel i5, 4gb Ram, 2.0 GHz), and what about scalability on Shared servers available out there, or even the dedicated ones.
Simply put: You are counting your chickens before they hatch. If you become overwhelmed by a bunch of new users, then that is what we tend to call a "good problem". If you're worried about scalability along the way, then you should look into:
Caching with Memcached or Redis.
Load balancing.
Switching from Apache to Nginx.
Providing a proper CDN.
Since you're using PHP, you should install an opcache.
There are a lot of different ways to squeeze out results. Until you need them, I'd suggest sticking by best practices (normalization, etc).
If you are worried about the capacity of your hosting company to cope with your application then the first thing to do is to contact them and discuss what capacity they have available and how scalable their environment is.
Unless you are hosting yourself, then you have virtually no control over the situation.
But if you believe the number of users is going to grow very quickly then it would be wise to enter a dialogue very early with your provider.
I was thinking about web-security and then this thought popped into my head.
Say that there's this jerk who hates me and knows how to program. I am managing a nice website/blog with a considerable amount of traffic. Then that jerk creates a program that automatically request my website over and over again.
So if I am hosting my website on a shared hosting provider then obviously my website will stop responding.
This type of attacks may not be common, but if someone attempts something like that on my website i must do something about it. I don't think that popular CMS's like wordpress or drupal do something about this type of attacks.
My assumption is ;
If a user requests more than x times (let's say 50) in 1-minute, block that user. (stop responding)
My questions are;
Is my assumption ok ? If not what to do about it ?
Do websites like Google, Facebook, Youtube...[etc] do something about this type of attacks.
What you are facing is the DoS.[Denial of Service] Attack. Where one system tries to go on sending packets to your webserver and makes it unresponsive.
You have mentioned about a single jerk, what if the same jerk had many friends and here comes DDoS [Distributed DoS] Attack. Well this can't be prevented.
A Quick fix from Apache Docs for the DoS but not for the DDoS ...
All network servers can be subject to denial of service attacks that
attempt to prevent responses to clients by tying up the resources of
the server. It is not possible to prevent such attacks entirely, but
you can do certain things to mitigate the problems that they create.
Often the most effective anti-DoS tool will be a firewall or other
operating-system configurations. For example, most firewalls can be
configured to restrict the number of simultaneous connections from any
individual IP address or network, thus preventing a range of simple
attacks. Of course this is no help against Distributed Denial of
Service attacks (DDoS).
Source
The issue is partly one of rejecting bad traffic, and partly one of improving the performance of your own code.
Being hit with excess traffic by malicious intent is called a Denial of Service attack. The idea is to hit the site with traffic to the point that the server can't cope with the load, stops responding, and thus no-one can get through and the site goes off-line.
But you can also be hit with too much traffic simply because your site becomes popular. This can easily happen overnight and without warning, for example if someone posts a link to your site on another popular site. This traffic might actually be genuine and wanted (hundred of extra sales! yay!), but can have the same effect on your server if you're not prepared for it.
As others have said, it is important to configure your web server to cope with high traffic volumes; I'll let the other answers speak for themselves on this, and it is an important point, but there are things you can do in your own code to improve things too.
One of the main reasons that a server fails to cope with increased load is because of the processing time taken by the request.
Your web server will only have the ability to handle a certain number of requests at once, but the key word here is "simultaneous", and the key to reducing the number of simultaneous requests is to reduce the time it takes for your program to run.
Imagine your server can handle ten simultaneous requests, and your page takes one second to load.
If you get up to ten requests per second, everything will work seamlessly, because the server can cope with it. But if you go just slightly over that, then the eleventh request will either fail or have to wait until the other ten have finished. It will then run, but will eat into the next second's ten requests. By the time ten seconds have gone by, you're a whole second down on your response time, and it keeps getting worse as long as the requests keep pouring in at the same level. It doesn't take long for the server to get overwhelmed, even when it's only just a fraction over it's capacity.
Now imagine the same page could be optimised to take less time, lets say half a second. Your same server can now cope with 20 requests per second, simply because the PHP code is quicker. But also, it will be easier for it recover from excess traffic levels. And because the PHP code takes less time to run, there is less chance of any two given requests being simultaneous anyway.
In short, the server's capacity to cope with high traffic volumes increases enormously as you reduce the time taken to process a request.
So this is the key to a site surviving a surge of high traffic: Make it run faster.
Caching: CMSs like Drupal and Wordpress have caching built in. Make sure it's enabled. For even better performance, consider a server-level cache system like Varnish. For a CMS type system where you don't change the page content much, this is the single biggest thing you can do to improve your performance.
Optimise your code: while you can't be expected to fix performance issues in third-party software like Drupal, you can analyse the performance of your own code, if you have any. Custom Drupal modules, maybe? Use a profiler tool to find your bottlenecks. Very often, this kind of analysis can reveal that a single bottleneck is responsible for 90% of the page load time. Don't bother with optimising the small stuff, but if you can find and fix one or two big bottlenecks like this, it can have a dramatic effect.
Hope that helps.
These types of attacks are called (D)DoS (Distributed Denial of Service) attacks and are usually prevented by the webserver hosting your PHP Application. Since apache is used the most, I found an article you might find interesting: http://www.linuxforu.com/2011/04/securing-apache-part-8-dos-ddos-attacks/.
The article states that apache has multiple mods available specifically created to prevent (D)DoS attacks. These still need to be installed and configured to match your needs.
I do believe that Facebook, Google etc. have their own similar implementations to prevent DoS attacks. I know for a fact that Google Search engine uses a captcha if alot of search requests are coming from the same network.
Why it is not wise to prevent DoS within a PHP script is because the PHP processor still needs to be started whenever a request is made, which causes alot of overhead. By using the webserver for this you will have less overhead.
EDIT:
As stated in another answer it is also possible to prevent common DoS attacks by configuring the server's firewall. Checking for attacks with firewall rules happens before the webserver is getting hit, so even less overhead there. Furthermore you can detect attacks on other ports aswell (such as portscans). I believe a combination of the 2 works best as both complement each other.
In my opinion, the best way to prevent DoS is to set the firewall to the lower level: at the entry of the server. By settings some network firewall config with iptables, you can drop packets from senders which are hitting too hard your server.
It'll be more efficient than passing through PHP and Apache, since them need to use a lot (relatively) of processus to do the checking and they may block your website, even if you detect your attacker(s).
You can check on this topic for more information: https://serverfault.com/questions/410604/iptables-rules-to-counter-the-most-common-dos-attacks
What are the usual bottlenecks (And what tends to break first) to Lamp based sites on EC2 when your number of users increase?
Assuming:
-Decent DB design
-There are some Ram and CPU intensive processes on cron but no ram/cpu intensive stuff during normal use.
Good question - we replaced the A with Nginx, our PHP is fpm'd now. And that allows us to setup more app balancers to handle traffic spikes and all that. We also moved the main database to CouchDB (BigCouch) but generally there is no recipe to avoid disaster without knowing what your application does.
EC2 bottlenecks
EC2 bottlenecks or issues are easier to generalize and pin down.
Disk i/o
E.g., a very general bottleneck is the disk i/o.
Even though EBS is faster than the instance storage and also persistence, it's also slow. There are ways to get more EBS performance using RAID setups, but they'll never get you near the speed of SAS.
Network latency
Another bottleneck is internal network latency. You shouldn't rely on anything being instant, and I guess that's the general rule of thumb with cloud computing. It really is eventually consistent, which also requires your app to adjust to that and behave different.
Capacity
Last but not least - capacity errors. They happen - e.g., you can't start another instance in the same zone. I've also had instances reboot themselves or disappear. All these things happen in the cloud and need to be dealt with.
Automate, automate!
The biggest change when moving to EC2 is to let go of actual servers and automate instance bootstrapping. Before I went to the DC for half a day and racked new hardware, installed servers, etc..
Being able to start up and terminate application servers, loadbalancers etc. is the biggest change and also the greatest advantage of the cloud. It helps you to deal with many, many issues easily.
You really need to tell us more about your application. What breaks depends entirely on how it uses resources.
Since you've switched to lighttpd, the webserver itself is going to use fewer resources than apache would, but Apache is rarely the bottleneck unless you've run out of ram or seriously misconfigured it.
Have you tried actually testing your application using ab? Load it up and see what happens.
Consider a web app in which a call to the app consists of PHP script running several MySQL queries, some of them memcached.
The PHP does not do very complex job. It is mainly serving the MySQL data with some formatting.
In the past it used to be recommended to put MySQL and the app engine (PHP/Apache) on separate boxes.
However, when the data can be divided horizontally (for example when there are ten different customers using the service and it is possible to divide the data per customer) and when Nginx +FastCGI is used instead of heavier Apache, doesn't it make sense to put Nginx Memcache and MySQL on the same box? Then when more customers come, add similar boxes?
Background: We are moving to Amazon Ec2. And a separate box for MySQL and app server means double EBS volumes (needed on app servers to keep the code persistent as it changes often). Also if something happens to the database box, more customers will fail.
Clarification: Currently the app is running with LAMP on a single server (before moving to EC2).
If your application architecture is already designed to support Nginx and MySQL on separate instances, you may want to host all your services on the same instance until you receive enough traffic that justifies the separation.
In general, creating new identical instances with the full stack (Nginx + Your Application + MySQL) will make your setup much more difficult to maintain. Think about taking backups, releasing application updates, patching the database engine, updating the database schema, generating reports on all your clients, etc. If you opt for this method, you would really need to find some big advantages in order to offset all the disadvantages.
You need to measure carefully how much memory overhead everything has - I can't see enginex vs Apache making much difference, it's PHP which will use all the RAM (this in turn depends on how many processes the web server chooses to run, but that's more of a tuning issue).
Personally I'd stay away from enginex on the grounds that it is too risky to run such a weird server in production.
Databases always need lots of ram, and the only way you can sensibly tune the memory buffers is to have them on dedicated servers. This is assuming you have big data.
If you have very small data, you could keep it on the same box.
Likewise, memcached makes almost no sense if you're not running it on dedicated boxes. Taking memory from MySQL to give to memcached is really robbing Peter to pay Paul. MySQL can cache stuff in its innodb_buffer_pool quite efficiently (This saves IO, but may end up using more CPU as you won't cache presentation logic etc, which may be possible with memcached).
Memcached is only sensible if you're running it on dedicated boxes with lots of ram; it is also only sensible if you don't have enough grunt in your db servers to serve the read-workload of your app. Think about this before deploying it.
If your application is able to work with PHP and MySQL on different servers (I don't see why this wouldn't work, actually), then, it'll also work with PHP and MySQL on the same server.
The real question is : will your servers be able to handle the load of both Apache/nginx/PHP, MySQL, and memcached ?
And there is only one way to answer that question : you have to test in a "real" "production" configuration, to determine own loaded your servers are -- or use some tool like ab, siege, or OpenSTA to "simulate" that load.
If there is not too much load with everything on the same server... Well, go with it, if it makes the hosting of your application cheapier ;-)