Horrible performance on Azure App Service - Wordpress - php

Evening All,
At by absolute wits end and hoping someone may be able to save me! I am in the process of migrating a number of PHP applications into Azure. I am using:
Linux based App Service running PHP 7.4 (2 vCPUs, 8Gb RAM) at a cost of £94 a month.
Azure Database on MySQL 8.0 (2 vCPUs) at £114 a month.
My PHP apps run well, decent load time of under 1 second per page. Wordpress performance however is awful. I am going from a 1 second page load to around 10 seconds, particularly on the back end. I have read all of the Azure guides and have implemented the following obvious points:
Both the App Service and the MySQL install are in the same data center
App Service is set to 'Always On'
Connection Redirection is set to Preferred and tested as working
The same app runs fine on a very basic £10 or so a month shared hosting package. I have also tried the same setup in Amazon Web Services today and page load is back to a second or so.
In Chrome Console, the delay is in TTFB. I have disabled all the plugins and none stand out as making a huge difference. Each adds a second or so page load, suggesting to me a consistent issue when a page requires a number of database calls.
What is going on with Azure and the awful Wordpress performance?! Is there anything else I can investigate or try? Really keen to stay with Azure but can't cope with the huge increase in cost for a performance hit.

The issue turned out to be the way the file system runs in the app service. It is NOT an issue with the database. The App Service architecture is just too slow at present with file read/writes, of which Wordpress uses a lot. Investigated the various file cache options but none improved enough.
Ended up setting up a fairly basic, and significantly cheaper, virtual machine, running with the same database and performance is hugely improved.
Not a great answer, but App Services are not up to Wordpress at present!

The comments below are correct. The "problem" is the database. You can either move MySQL to a Virtual Machine (which will give you a better performance) or you can also try to use cache plugins such as WP Super Cache as well decrease the number of requests.
You can find a full list of tips in the following link:
https://azure.microsoft.com/en-us/blog/10-ways-to-speed-up-your-wordpress-site-on-azure-websites/
PS: ignore the date, it's still relevant

Related

App Engine Standard connection to Cloud SQL Latency Randomly

I have a pretty "basic" app that we designed that was originally on a local plesk server and we migrated to GAE/GSQL/GCS. app engine, mysql, cloud storage.
Here's some background info:
App is PHP based, and runs great on the local server. When we migrate to the cloud we notice this random yet extremely latency that happens. It's so bad that the app times out and gives a SPDY timeout error. We utilize cloudflare for SPDY assistance so we started there and they said it's the the server. Then we went to google. We've been going back and forth back and forth and I am looking for other avenues of help.
I am running an app on a F2 standard GAE instance and a G1-small CloudSQL instance (gen 2). All same region/zone. There is also a failover sql instance.
There is really no pattern to it but users on the app notice a bad timeout very frequently and it dies after 60 seconds. (which points to a PHP timeout right? We checked the code and it runs fine on the local server)
I dont have a whole lot of traffic on this app yet (maybe a few users a day) so i dont know if it's traffic load. Here's some basic stats for you:
https://imgur.com/a/U1tk5ak
Some Google Engineers said our app has trouble scaling (QPS never will get about 1)
https://imgur.com/a/XWh44bm
And asked if we are threading. We are not. We do not use memcache yet either.
I also see a ton of these:
https://imgur.com/a/eVSNqc3
Which looks like this bug: https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues/126
But I am unsure if this is all related.
We've tried going through Google's tech support, they said we have "manual locks" but our dev team doesn't agree nor know what this really means. Again, the same framework of the app (session handling etc) code is used in many apps with a ton of users on it (non GAE, they're on compute on AWS) so this is our first venture to GAE.
We connect using standard MySQL connection parameters and use the same framework in a lot of applications and it runs fine. We use the required proxy to connect to CloudSQL.
The speed and constant lag shouldn't be there. We don't know what this issue could be. My questions are:
1) Do you see any issues here? All database logs are above and summaries
2) Can you help me understand what may be wrong here?
Thank you!
The biggest latency spike I can see from your Screenshot it's about 20 seconds at 9:00 am, that its about the same time where you have the biggest amount of queries, read/write operations, and memory usage.
Even though you have a small amount of users they can be doing many queries, If GCP support suggest that it has problems scaling you can check the auto-scale property and see if it is enable.
From what i can see from your images and looking through the Cloud SQL docs I would suggest a horizontal scale of your Cloud SQL Instance.
Also take a look at diagnose-issues docs, maybe you can get more info on whats causing the MySQL aborted connections error.
There was a query we found running that caused a huge database lag.

AWS Site Slowness (LAMP)

My site is experiencing slowness ever since we moved over to AWS from a different server provider - the site is built on a LAMP stack and we're not sure what is causing the slowness. We did not experience such significant lag on the previous server.
Some notes:
The site runs fine on a localhost environment which is why we think it's something with our AWS setup (however, feel free to correct me)
For reference, the site IS image-heavy, uses PHP5.6 (so it's outdated), and DOES use a few SELECT * queries on various pages, etc.
We just don't believe the above is the issue, but please let me know if there's often dispensaries between localhost and production environments, because this is the first time I'm experiencing such.
As noted, we did not experience such significant lag on the previous server.
We have tried asking AWS support, where they made a suggestion which made a noticeable difference to the site, but the site still experiences around 5sec lag time per page. They now say they can't find any reason for the lags, but at the same time because of the lags, our site is experiencing much lower traffic (most of the lags occurred with high traffic volume)
We've updated the aws settings a few different times in a few different ways, such as:
Updating the EC2 instance
Updating RDS
Implementing this: https://aws.amazon.com/elasticache/
Some people have reported that there's no lagtime unless a user logs into the site
Any suggestions are helpful! We are hoping to not have to move the site's server again.

SilverStripe save and publish is taking over 60 seconds on Plesk

I have a Plesk reseller VPS and have set it up for running SilverStripe sites. For the most part the sites load well (under a second on average) and works as expected.
The admin also performs as expected, with the exception of Save and Publish. Doing a save and publish (even with one small change) takes over 60 seconds (where as a standard write action takes maybe a few seconds). This is happening for all page types. We are not using any custom onBeforeWrite or onAfterWrite calls and are not using static publisher.
On our development server (also Apache based), save and publish time is less than 10 seconds. Switching to Dev mode on live seems to make no difference.
I am kind of at a loss as to why this is happening or how to diagnose the issue. Has anyone else had this issue?
I am running SilverStripe 3.5, PHP 5.6.3 and Mysql 5.5.
EDIT: I have checked all the logs, the only thing being logged that I can see was the timeout error (which goes away when I increase script execution time).
UPDATE - 13/06/17: I have now installed a smaller (largely vanilla) SilverStripe site on the same server, Save and Publish for this site works as expected (and is very snappy).
I am assuming that this is a module causing the error. I have also contacted support, the on only thing they can think could cause this would be a script is accessing a third party server (which is being stopped by a network firewall). The only module that springs to mind would be the Live SEO module (as this talks to google for it's scoring system).
OK, finally managed to get this sorted. The hosting provider were quite helpful (they usually are), it turned out to be an routing issue in their data centre.
The hosting provider told me that they had identified an issue with IP6 routing that was causing a timeout in some instances. They have resolved this and "Save and Publish" is now working as expected.
If anyone else gets this issue and has ruled out the items mentioned above, I would advise contacting their hosting provider, it may be an external issue.

How much overhead does the NewRelic PHP agent add?

By no means, NewRelic is taking the world by storm with many successful deployments.
But what are the cons of using it in production?
PHP monitoring agent works as a .so extension. If I understand correctly, it connects to another system aggregation service, which filters data out and pushes them into the NewRelic cloud.
This simply means that it works transparently under the hood. However, is this actually true?
Any monitoring, profiling or api service adds some overhead to the entire stack.
The extension itself is 0.6 MB, which adds up to each php process, this isn't much so my concern is rather CPU and IO.
The image shows CPU Utilization on a production EC2 t1.micro instances with NewRelic agent (top blue one) and w/o the agent (other lines)
What does NewRelic really do what cause the additional overhead?
What are other negative sides when using it?
Your mileage may vary based on the settings, your particular site's code base, etc...
The additional overhead you're seeing is less the memory used, but the tracing and profiling of your php code and gathering analytic data on it as well as DB request profiling. Basically some additional overhead hooked into every php function call. You see similar overhead if you left Xdebug or ZendDebugger running on a machine or profiling. Any module will use some resources, ones that hook deep in for profiling can be the costliest, but I've seen new relic has config settings to dial back how intensively it profiles, so you might be able to lighten it's hit more than say Xdebug.
All that being said, with the newrelic shared PHP module loaded with the default setup and config from their site my company's website overall server response latency went up about 15-20% across the board when we turned it on for all our production machines. I'm only talking about the time it takes for php-fpm to generate an initial response. Our site is http://www.nara.me. The newrelic-daemon and newrelic-sysmon services running as well, but I doubt they have any impact on response time.
Don't get me wrong, I love new relic, but the perfomance hit in my specific situation hit doesn't make me want to keep the PHP module running on all our live load balanced machines. We'll probably keep it running on one machine all the time. We do plan to keep the sysmon stuff going 100% and keep the module disabled in case we need it for troubleshooting.
My advice is this:
Wrap any calls to new relic functions in if(function_exists ( $function_name )) blocks so your code can run without error if the new relic module isn't loaded
If you've multiple identical servers behind a loadbalancer sharing the same code, only enable the php module on one image to save performance. You can keep the sysmon stuff running if you use new relic for this.
If you've just one server, only enable the shared php module when you need it--when you're actually profiling your code or mysql unless a 10-20% performance hit isn't a problem.
One other thing to remember if your main source of info is the new relic website: they get paid by the number of machines you're monitoring, so don't expect them to convince you to not use it on anything less than 100% of your machines even if it not needed. I think one of their FAQ's or blogs state basically you should expect some performance impact, but if you use it as intended and fix the issues you see from it, you should recoup the latency lost. I agree, but I think once you fix the issues, limit the exposure to the smallest needed number of servers.
The agent shouldn't be adding much overhead the way it is designed. Because of the level of detail required to adequately troubleshoot the problem, this seems like a good question to ask at https://support.newrelic.com

Magento 500 Errors, Slowness

We're running a magento web store on Knownhost (VPS).
Most of the time the site works fine. Occasionally (every few hours?) the site will get very slow and unresponsive, and will throw '500 Internal Server Errors'. There doesn't seem to be anything relevant in the webserver or Magento system/exception logs.
Also, it seems that we're seeing high CPU usage on this account.
I have increased the memory limit to 512MB, and tried everything else I could find. No dice.
We have a managed VPS, so we can change pretty much everything. We had our hosted provider install ImageMagick after reading a suggestion online - didn't help.
Any ideas?
(website is available at myerstownsheds.com if anyone would like a look)
TL;DR; You have an under resourced server. Any code or configuration steps you take to reduce load are only going to postpone the inevitable.
It's impossible to provide a concrete answer to your question with the information given. If you could look at your server logs and see the full error message being generated it would be a big help. "Server logs" probably mean "Apache Logs" in this situation, since the error text you provided is a standard Apache/PHP error, and not a Magento error.
All that said, the most likely culprit is a PHP out of memory error. Magento's performance profile is different than most LAMP stack applications, and most generic VPS hosts are unable/unwilling to make the tweaks needed to run it. If you want to solve this problem long term you need a web host that specializes in Magento. I recommend Nexcess (affiliate link) these days, but Magento has a list of recommended hosting partners, and the Magento Speed Test site offers a nice breakdown of the top Magento hosts.
Take a look at your host's plans
The highest level plan tops out at 4GB of RAM (4096 MB).
Take a look at the starting Nexcess plans
The entry level plan provides 16GB. Four times as much RAM as your current host. Magento is a RAM hungry application. Your current host isn't equipped to handle Magento. Any code or configuration steps you take to reduce load are only going to postpone the inevitable.
I followed the instructions posted by allendar:
Backup and delete app/etc/local/local.xml
Go to site in browser, and follow the configuration process
So far, everything seems to be working fine! It's a little hard to say this soon since the issue was so intermittent, but our site has remained responsive for nearly two hours.
I'm going to mark this as Answered in a couple days. I'll look into getting a better hosting plan.
Thanks everyone!

Categories