How much overhead does the NewRelic PHP agent add? - php

By no means, NewRelic is taking the world by storm with many successful deployments.
But what are the cons of using it in production?
PHP monitoring agent works as a .so extension. If I understand correctly, it connects to another system aggregation service, which filters data out and pushes them into the NewRelic cloud.
This simply means that it works transparently under the hood. However, is this actually true?
Any monitoring, profiling or api service adds some overhead to the entire stack.
The extension itself is 0.6 MB, which adds up to each php process, this isn't much so my concern is rather CPU and IO.
The image shows CPU Utilization on a production EC2 t1.micro instances with NewRelic agent (top blue one) and w/o the agent (other lines)
What does NewRelic really do what cause the additional overhead?
What are other negative sides when using it?

Your mileage may vary based on the settings, your particular site's code base, etc...
The additional overhead you're seeing is less the memory used, but the tracing and profiling of your php code and gathering analytic data on it as well as DB request profiling. Basically some additional overhead hooked into every php function call. You see similar overhead if you left Xdebug or ZendDebugger running on a machine or profiling. Any module will use some resources, ones that hook deep in for profiling can be the costliest, but I've seen new relic has config settings to dial back how intensively it profiles, so you might be able to lighten it's hit more than say Xdebug.
All that being said, with the newrelic shared PHP module loaded with the default setup and config from their site my company's website overall server response latency went up about 15-20% across the board when we turned it on for all our production machines. I'm only talking about the time it takes for php-fpm to generate an initial response. Our site is http://www.nara.me. The newrelic-daemon and newrelic-sysmon services running as well, but I doubt they have any impact on response time.
Don't get me wrong, I love new relic, but the perfomance hit in my specific situation hit doesn't make me want to keep the PHP module running on all our live load balanced machines. We'll probably keep it running on one machine all the time. We do plan to keep the sysmon stuff going 100% and keep the module disabled in case we need it for troubleshooting.
My advice is this:
Wrap any calls to new relic functions in if(function_exists ( $function_name )) blocks so your code can run without error if the new relic module isn't loaded
If you've multiple identical servers behind a loadbalancer sharing the same code, only enable the php module on one image to save performance. You can keep the sysmon stuff running if you use new relic for this.
If you've just one server, only enable the shared php module when you need it--when you're actually profiling your code or mysql unless a 10-20% performance hit isn't a problem.
One other thing to remember if your main source of info is the new relic website: they get paid by the number of machines you're monitoring, so don't expect them to convince you to not use it on anything less than 100% of your machines even if it not needed. I think one of their FAQ's or blogs state basically you should expect some performance impact, but if you use it as intended and fix the issues you see from it, you should recoup the latency lost. I agree, but I think once you fix the issues, limit the exposure to the smallest needed number of servers.

The agent shouldn't be adding much overhead the way it is designed. Because of the level of detail required to adequately troubleshoot the problem, this seems like a good question to ask at https://support.newrelic.com

Related

Server side minify leads to PHP process bottleneck on high traffic site. What are my options?

I am currently tasked with finding a solution for a serious PHP bottleneck which is apparently caused by server-side minification of CSS and JS when our sites are under high load.
Some details and what I have found out so far
I inherited a web application running on Wordpress and which uses a complex constellation of Doctrine, Memcached and W3 Total Cache for minification and caching. When under heavy load our application begins to slow down rapidly. So far we have narrowed part of the problem down to the server-side minification process. Preliminary analysis has shown that the number PHP processes start to stack up under load, and when reaching the process limit of 500 processes, start to slow everything down. Something which is also mentioned by the author of the minify library.
Solutions I have evaluated so far
Pre-minification
The most logical solution would be to pre-minify any of the files before going live. Unfortunately our workflow demands that non-developers should be able to edit said files on our production servers (i.e. after the web app has gone live). Therefore I think that pre-processing is out of the question, as it limits the editability of minified files.
Serving unminified files
75% of our users are accessing our web application with their mobile devices, especially smartphones. Unminified JS and CSS amounts to 432KB and is reduced by 60-80% in size when minified. Therefore serving unminified files, while solving the performance and editability problem, is for the sake of mobile users out of the question.
I understand that this is as much a technical problem as it is a workflow problem and I guess we are open to working on both as long as we end up with a better overall performance.
My questions
Is there a reasonable compromise which solves the PHP bottleneck
problem, allows for non-devs to make changes to live CSS/JS and
still serves reasonably sized files to clients.
If there is no such one-size-fits-all solution, what can I do to
better our workflow and / or server-side behaviour?
EDIT: Because there were some questions / comments regarding the server configuration, our servers run Debian and are equipped with 32GB of RAM and 24 core CPUs.
You can run a css/javascript compilation service like Gulp or Grunt via Node.js that minifies all your js and css assets on change.
This service can run in production but that is not recommended without some architectural setup ( having multiple versioned compiled files and auto-checking them via gulp or another extension ).
I emphasize that patching features into production and directly
editing it is strongly discouraged as it can present live issues to
your visitors reducing your credibility.
http://gulpjs.com/
Using Gulp/Grunt would require you to change how you write your css/javascript files.
I would solve this with 2 solutions - first, removing any WP-CRON operation that runs every time a user runs the application and move that to actual CRON on the server. Second I would use load balancing so that a single server is not taking the load of the work. That is your real problem and even if you fix your perceived code issues you are still faced with the load issue.
I don't believe you need to change your workflow at all or go down the road of major modification to your existing system.
The WP-CRON tasks that runs each time a page is loaded causes significant load and slowness. You can shift this from the users visiting running that process to your server just running it at the server level. This reduces load. It is also most likely running these processes that you believe are slowing down the site.
See this guide:
http://www.inmotionhosting.com/support/website/wordpress/disabling-the-wp-cronphp-in-wordpress
Next - load balancing. Having a single server supplying all your users when you have a lot of traffic is a terrible idea. You need to split the webservers load.
I'm not sure where or how you are hosted but I would move things to AWS. Setup my WordPress site for load balancing # AWS: http://www.mornin.org/blog/scalable-wordpress-amazon-web-services/
This will involve:
Load Balancer
EC2 Instances running PHP/Apache
RDS for your database storage for all EC2 instances
S3 Storage for the sites media
For user sessions I suggest you just setup stickiness on the load balancer so users are continually served the same node they arrived on.
You can get a detailed guide on how to do this here:
http://www.mornin.org/blog/scalable-wordpress-amazon-web-services/
Or at server fault another approach:
https://serverfault.com/questions/571658/load-balancing-wordpress-on-amazon-web-services-managing-changes
The assumption here is that if you are high traffic you are making revenue from this high traffic so anytime your service responds slowly it will turn away users or possibly discourage them from returning. Changing the software could help - but you're treating the symptom not the illness. The illness is that your server comes under heavy load. This isn't uncommon with WordPress and high traffic, so you need to spread the load instead of try and micro-optimize. The difference is the optimizations will be small gains while the load balancing and spread of load actually solves the problem.
Finally - consider using a CDN for serving all of your media. This loads media faster and it removes load from your system by reducing the amount of requests to the server and it's output to the clients. It also loads pages faster consistently for people wherever they are visiting from by supplying media from nodes closest to them. At AWS this is called CloudFront. WordPress also offers this service free via Jetpack (I believe) but it does not handle all media from my understanding.
I like the idea of using GulpJS. One thing you might consider is to have a wp-cron or even just a system cron that runs every 5 minutes or so and then runs a gulp task to minify and concatenate your css and js files.
Another option that doesn't require scheduling but is based off of watching the file system for changes and then triggering a Gulp build to happen is to use incron (inotify cron). Check out the incron man page. Incron is great in that it triggers actions based on file system events such as file changes. You could use this to trigger a gulp build when any css file changes on the file system.
One caveat is that this is a Linux solution so if you're hosting on Windows you might have to look for something similar.
Edit:
Incron Documentation

Internal Server Error uploading a file

Error
I have a web app with a mass uploader (Plupload) for photos and when I upload say twenty photos, about six (around 30 %) will fail with an Internal Server Error. I have checked the Apache error.log for this domain and it has nothing new (I know I'm looking at the right error.log since older errors did show here).
This only happens on my VPS on Dreamhost (my hosting provider) servers while on my development server it runs silky smooth.
Oh, and things used to work just fine a month ago and then just started to fail. Back then I was using Uploadify and since that used Flash, it was impossible for me to debug where the upload failed.
Files and script
Uploaded files are photos, all about 100 kB big, even though I've successfully uploaded (and still can) 3 MB photos. My .htaccess naturally doesn't change during uploads. On the server side is a PHP script that uses GD2 library to move and resize the photo.
Server state
I have recently upgraded my VPS from 300 to 400 MB of RAM. This thing used to work and I upgraded it just so that memory is ruled out as a reason. Also my memory limit for PHP is at 200 MB, so this should sufice.
I am getting mighty frustrated that Dreamhost does not want to help, stating that "we can not be responsible for an error your code causes" and "We still will not be able to assist you in debugging the issue unfortunately."
It has been a week of sparse "support" while my app doesn't work and my clients are frustrated.
Questions
Is this kind of "You're on your own" support a standard across the
industry, i.e. would your host handle this differently?
How exactly can I debug this?
I'm going to assume that you have a standard Apache + PHP setup. One possible configuration is the pre-forked setup; in this case Apache will adapt to system load by forking more children of itself.
With only 400 MB of RAM you're pretty tight, so if you're running 20 processes that each take 200MB (assuming every process handles pretty big files using GD) you're getting into some hot waters with the memory manager.
I would reduce the total number of instances to 2 first to see how this will go; also keep an eye on the memory usage by running top.
Regardless, it might be beneficial for you to run a separate task manager such as Gearman to perform resize tasks so that the upload only has to focus on moving the uploaded file and run the resize task; this way you can greatly reduce the memory required to run your PHP instances.
As to your Q1: the simple answer is that you get what you pay for. A 300Mb RAM Dreamhost VPS costs ~$360 per annum. For this you get the VPS service and responses on service failure relating to the provision of the virtual environment. The OS, the software stack and the applications are outside this service scope. Why? This sort of custom knowledge-base support could cost $50-300 per hour. You are being unreasonable and deluding yourself if you expect Dreamhost to provide such services pro-bono. That's what sites like this one do.
So my suggestion is that you suck up that anger and frustration and work out how to help yourself.
As to your Q2. (i) You need to understand where your Apache errors go to; (ii) Ditto any SQL errors if you are using a D/B. (iii) You need to ensure that PHP error logging is enable and verify where the PHP logs are going to. (iv) You need to inspect those logs, and verify that logging is working correctly, by using a small script which generates runtime errors.
You should also consider using enhanced facilities such as php_xdebug to enhance logging levels and introducing application logging.
In my experience systems and functions rarely die silently. However, applications programmers often ignore return statuses, etc. For example in the GD library, imagecopyresized() can fail and it returns a status code to tell the application when it has, but if the application doesn't test this status and act accordingly then it can end up going down bizarre execution paths silently, and just appear to the user (or developer) as "it just stopped working".
My last comment is that you should really consider setting up a private VPS within your development environment which mirrors your Dreamhost production config, and use this for integration, acceptance test and support. This is pretty easy to do and you can mess this and add debug / what if options and then roll back without polluting your production environment. Tools like VMare Appliances and VirtualBox make this easy. See this blog post for a description of how I did this for my hosted service.
trying to respond the question 2: if you checked all your code and you didn't see any bug, I thing that the best thing that you can do is to check the version of all the programs running on the server (apache, php, ...), e.g., I remember that I had a problem with a web service it was running on apache and php, the php version was 5.2.8, and after a lot of investigation I found out that that version had a problem parsing xml data.
Regarding the first part of the question: Dreamhost do offer a paid support service with "call back". We used this once to get the low down on something. They are very good with general support (better than many hosts IMO) but you can't expect dedicated service, and they must handle a lot of piddling questions. But pay for a call back and, in about 2 minutes on the phone, you can get the answer you want, plus they get their $10 (recurring) for the time. You both win. Just remember to cancel the recurring charges.
Regarding the second part of the question, we had this very same issue with them. Their response (as suggested by Linus in the comments) was that they keep a tally of the CPU use of all processes used by your "user". If that total exceeds a threshold, they will simply kill the process(es) to get the cycles down. No error messages, no warnings, no nothings. Processes can include MySQL, CGI (perl) or PHP. No way to monitor or predict, and we couldn't program round it. Solution... not DreamHost, unfortunately. (webhostingtalk.com will give you loads of host ideas). So we use for some sites, but not for others.

deploying WAMP -> live site - any random tips?

In the next few weeks I'll be taking my site from the localhost (WAMP) and puting it on a new server. This will be the first site, on my first server, so basically...i'm a noob!
This must be an important moment for any independent web developer / small business so i'd love to hear about some experiences, mistakes and system default security holes that one should fix straight away...
I'm using php, mysql, cpanel and WHM, and looking for tips like "Turn off error reporting in PHP"
First and foremost if you are worried about security then you should use LAMP. As long as the Linux platform is using AppArmor or SELinux (Ubuntu and fedora respectively), then you are much better off than any version of Windows. I know this from first hand experience of developing exploit code for the two platforms.
Before you lock your system down, test your code for vulnerablites using Wapiti. Acunetix is also good, but expensive. This type of testing, especially sql injection testing must be done with dispaly_errors=On set in your php.ini
There is a lot that can go wrong with PHP Configuration that makes your system less secure. You should run PHPSecInfo and remove all red. dispaly_errors=Off is what you want, and phpsecinfo tests for it.
You should also use a web application firewall like Mod_secuirty.
It's actually quite a huge undertaking, but well worth the experience. Here are just one or two suggestions...
Site security also means being heavily involved in managing your sometimes scarce resources. Just as important is obeying any limits your host has, and guessing all possible ways your site users can push you over those limits, leaving you responsible to pay a hefty bill. IE downloading or uploading large files over and over, spamming email lists, repeatedly requesting pages using too many database connections and queries, etc. Get overusage limits and fees in writing from your host before you begin, and have response plans ready. Really, this part is like buying a cellphone service.
A lot would also depend on what features you'll have on your site. File uploads? Forum? Logins? Email? Etc? For example - If you're running a file-sharing site: along with upload/download rate limiting, I suggest you first check available disk space before permitting any file to be uploaded, or do regular audits so you're prepared to archive or delete old and unused files. It's a quick check just to make sure you're not caught by surprise a year down the road when you suddenly start getting disk full errors or get shafted by your host with a large bill.
There are literally a hundred more issues to consider. Gather up a complete overview - an itemized list - of all features and functions of your site. Google each one to get more ideas on handling security. Your host should also publish their own security considerations and have a handy manual for operating with all of their services. If they don't, well, I wouldn't personally feel comfortable with them.

Does a separate MySQL server make sense when using Nginx instead of Apache?

Consider a web app in which a call to the app consists of PHP script running several MySQL queries, some of them memcached.
The PHP does not do very complex job. It is mainly serving the MySQL data with some formatting.
In the past it used to be recommended to put MySQL and the app engine (PHP/Apache) on separate boxes.
However, when the data can be divided horizontally (for example when there are ten different customers using the service and it is possible to divide the data per customer) and when Nginx +FastCGI is used instead of heavier Apache, doesn't it make sense to put Nginx Memcache and MySQL on the same box? Then when more customers come, add similar boxes?
Background: We are moving to Amazon Ec2. And a separate box for MySQL and app server means double EBS volumes (needed on app servers to keep the code persistent as it changes often). Also if something happens to the database box, more customers will fail.
Clarification: Currently the app is running with LAMP on a single server (before moving to EC2).
If your application architecture is already designed to support Nginx and MySQL on separate instances, you may want to host all your services on the same instance until you receive enough traffic that justifies the separation.
In general, creating new identical instances with the full stack (Nginx + Your Application + MySQL) will make your setup much more difficult to maintain. Think about taking backups, releasing application updates, patching the database engine, updating the database schema, generating reports on all your clients, etc. If you opt for this method, you would really need to find some big advantages in order to offset all the disadvantages.
You need to measure carefully how much memory overhead everything has - I can't see enginex vs Apache making much difference, it's PHP which will use all the RAM (this in turn depends on how many processes the web server chooses to run, but that's more of a tuning issue).
Personally I'd stay away from enginex on the grounds that it is too risky to run such a weird server in production.
Databases always need lots of ram, and the only way you can sensibly tune the memory buffers is to have them on dedicated servers. This is assuming you have big data.
If you have very small data, you could keep it on the same box.
Likewise, memcached makes almost no sense if you're not running it on dedicated boxes. Taking memory from MySQL to give to memcached is really robbing Peter to pay Paul. MySQL can cache stuff in its innodb_buffer_pool quite efficiently (This saves IO, but may end up using more CPU as you won't cache presentation logic etc, which may be possible with memcached).
Memcached is only sensible if you're running it on dedicated boxes with lots of ram; it is also only sensible if you don't have enough grunt in your db servers to serve the read-workload of your app. Think about this before deploying it.
If your application is able to work with PHP and MySQL on different servers (I don't see why this wouldn't work, actually), then, it'll also work with PHP and MySQL on the same server.
The real question is : will your servers be able to handle the load of both Apache/nginx/PHP, MySQL, and memcached ?
And there is only one way to answer that question : you have to test in a "real" "production" configuration, to determine own loaded your servers are -- or use some tool like ab, siege, or OpenSTA to "simulate" that load.
If there is not too much load with everything on the same server... Well, go with it, if it makes the hosting of your application cheapier ;-)

Best practices for withstanding launch day traffic burst

We are working on a website for a client that (for once) is expected to get a fair amount of traffic on day one. There are press releases, people are blogging about it, etc. I am a little concerned that we're going to fall flat on our face on day one. What are the main things you would look at to ensure (in advance without real traffic data) that you can stay standing after a big launch?
Details: This is a L/A/M/PHP stack, using an internally developed MVC framework. This is currently being launched on one server, with Apache and MySQL both on it, but we can break that up if need be.
We are already installing Memcached and doing as much PHP-level caching as we can think of. Some of the pages are rather query intensive, and we are using Smarty as our template engine. Keep in mind there is no time to change any of these major aspects--this is the just the setup. What sorts of things should we watch out for?
Measure first, and then optimize. Have you done any load testing? Where are the bottlenecks?
Once you know your bottlenecks then you can intelligently decide if you need additional database boxes or web boxes. Right now you'd just be guessing.
Also, how does your load testing results compare against your expected traffic? Can you handle two times the expected traffic? Five times? How easy/fast can you acquire and release extra hardware? I'm sure the business requirement is to not fail during launch, so make sure you have lots of capacity available. You can always release it afterwards when the load has stabilized and you know what you need.
I would at least factor out all static content. Set up another vhost somewhere else and load all the graphics, CSS, and JavaScript onto it. You can buy some extra cycles, offloading the serving of that type of content. If you're really concerned, you can signup and use a content distribution service. There are lots now similar to Akamai and quite cheap.
Another idea might be to utilize Apache mod_proxy to keep the generated page output for a specific amount of time. APC would also be quite usable... You could employ output buffering capture + the last modified time of related data on the page, and use the APC cached version. If the page isn't valid any more, you regenerate and store in APC again.
Good luck. It'll be a learning experience!
Have a beta period where you allow in as many users as you can handle, measure your site's performance, and work out bugs before you go live.
You can either control the number of users explicitly in a private beta, or a Google-style semi-public beta where each user has a number of referrals that they can offer to their friends.
To prepare or handle a spike (or peak) performance, I would first determine whether you are ready through some simple performance testing with something like jmeter.
It is easy to set up and get started and will give you early metrics whether you will handle an expected peak load.
However, given your time constraints, other steps to take would be to prepare static versions of content that will attract the highest attention (such as press releases, if your launch day). Also ensure that you are making the best use of client-side caching (one fewer request to your server can make all the difference). The web is already designed for extremely high scalability and effective use content caching is your best friend in these situations.
There is an excellent podcast on high scalability on software engineering radio on the design of the new Guardian website when things calm down.
Good luck on the launch.
I'd, personally, do a few things
1) Put in some sort of load balancer/database replication system
This means that you can have your service spread across multiple servers. Can't afford to have more than one server permanently? Use Amazon E3 - It's good for putting in place for things like this (switch on a few more servers to handle the load)
2) Code in some "High Load" restrictions
For example, if your searching is inefficient - switch it off when load gets to a certain level. "Sorry, we're busy, try again later for searching"
3) Load test... Use something like ApacheBench to stress test your servers.
4) Personally, I think that switching "Keep-Alive" Connections off is better. It may slightly reduce overall performance, but - it means that instead of having something where the site works well for a few people, and the others get timeouts, everyone gets inconsistent service, if it gets to that level
Linux Format did a good article on "How to survive a slashdotting"... which I've found useful in the past. It's available online as a PDF
Basic first steps to harden your site for high traffic.
Use a low-cost tool like https://browsermob.com/ to load-test your site. At a minimum, you should be looking at 100K unique visitors per hour. If you get an ad off of the MSN home page, look to be able to handle 500K unique visitors per hour.
Move all static graphic/video content to a CDN. Edgecast and Amazon are two excellent choices.
Use Jet Profiler to profile your MySQL server to analyze any slow performing queries. Minor changes can have huge benefits.
Look into using Varnish - it's a caching reverse proxy server (like Squid, but much more single purpose).
I've run some pretty big sites behind it, and it seemed to work really well.

Categories