sudden memory usage spike hitting GCP App Engine project

sudden memory usage spike hitting GCP App Engine project - php

I have a GCP app running in the standard environment on php 7.4 and using mysqli to access google cloud SQL. Instance class is F2 and has been for years. This morning, out of nowhere and with no code or configuration changes, the memory usage has spiked to the point where it is exceeding the quota and restarting the service so often the system has become unusable. It doesn't matter how many people are logged in. Pages that don't access the database seem unaffected so it appears to be related to SQL.
One other note: my WordPress site calls into this system using XMLHttp, and that query is working fine.
I am at a loss and my clients are dead in the water. How can I figure out what's causing this? It sure seems like they must have changed something on the platform side.

Related

Horrible performance on Azure App Service - Wordpress

Evening All,
At by absolute wits end and hoping someone may be able to save me! I am in the process of migrating a number of PHP applications into Azure. I am using:
Linux based App Service running PHP 7.4 (2 vCPUs, 8Gb RAM) at a cost of £94 a month.
Azure Database on MySQL 8.0 (2 vCPUs) at £114 a month.
My PHP apps run well, decent load time of under 1 second per page. Wordpress performance however is awful. I am going from a 1 second page load to around 10 seconds, particularly on the back end. I have read all of the Azure guides and have implemented the following obvious points:
Both the App Service and the MySQL install are in the same data center
App Service is set to 'Always On'
Connection Redirection is set to Preferred and tested as working
The same app runs fine on a very basic £10 or so a month shared hosting package. I have also tried the same setup in Amazon Web Services today and page load is back to a second or so.
In Chrome Console, the delay is in TTFB. I have disabled all the plugins and none stand out as making a huge difference. Each adds a second or so page load, suggesting to me a consistent issue when a page requires a number of database calls.
What is going on with Azure and the awful Wordpress performance?! Is there anything else I can investigate or try? Really keen to stay with Azure but can't cope with the huge increase in cost for a performance hit.

The issue turned out to be the way the file system runs in the app service. It is NOT an issue with the database. The App Service architecture is just too slow at present with file read/writes, of which Wordpress uses a lot. Investigated the various file cache options but none improved enough.
Ended up setting up a fairly basic, and significantly cheaper, virtual machine, running with the same database and performance is hugely improved.
Not a great answer, but App Services are not up to Wordpress at present!

The comments below are correct. The "problem" is the database. You can either move MySQL to a Virtual Machine (which will give you a better performance) or you can also try to use cache plugins such as WP Super Cache as well decrease the number of requests.
You can find a full list of tips in the following link:
https://azure.microsoft.com/en-us/blog/10-ways-to-speed-up-your-wordpress-site-on-azure-websites/
PS: ignore the date, it's still relevant

App Engine Standard connection to Cloud SQL Latency Randomly

I have a pretty "basic" app that we designed that was originally on a local plesk server and we migrated to GAE/GSQL/GCS. app engine, mysql, cloud storage.
Here's some background info:
App is PHP based, and runs great on the local server. When we migrate to the cloud we notice this random yet extremely latency that happens. It's so bad that the app times out and gives a SPDY timeout error. We utilize cloudflare for SPDY assistance so we started there and they said it's the the server. Then we went to google. We've been going back and forth back and forth and I am looking for other avenues of help.
I am running an app on a F2 standard GAE instance and a G1-small CloudSQL instance (gen 2). All same region/zone. There is also a failover sql instance.
There is really no pattern to it but users on the app notice a bad timeout very frequently and it dies after 60 seconds. (which points to a PHP timeout right? We checked the code and it runs fine on the local server)
I dont have a whole lot of traffic on this app yet (maybe a few users a day) so i dont know if it's traffic load. Here's some basic stats for you:
https://imgur.com/a/U1tk5ak
Some Google Engineers said our app has trouble scaling (QPS never will get about 1)
https://imgur.com/a/XWh44bm
And asked if we are threading. We are not. We do not use memcache yet either.
I also see a ton of these:
https://imgur.com/a/eVSNqc3
Which looks like this bug: https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues/126
But I am unsure if this is all related.
We've tried going through Google's tech support, they said we have "manual locks" but our dev team doesn't agree nor know what this really means. Again, the same framework of the app (session handling etc) code is used in many apps with a ton of users on it (non GAE, they're on compute on AWS) so this is our first venture to GAE.
We connect using standard MySQL connection parameters and use the same framework in a lot of applications and it runs fine. We use the required proxy to connect to CloudSQL.
The speed and constant lag shouldn't be there. We don't know what this issue could be. My questions are:
1) Do you see any issues here? All database logs are above and summaries
2) Can you help me understand what may be wrong here?
Thank you!

The biggest latency spike I can see from your Screenshot it's about 20 seconds at 9:00 am, that its about the same time where you have the biggest amount of queries, read/write operations, and memory usage.
Even though you have a small amount of users they can be doing many queries, If GCP support suggest that it has problems scaling you can check the auto-scale property and see if it is enable.
From what i can see from your images and looking through the Cloud SQL docs I would suggest a horizontal scale of your Cloud SQL Instance.
Also take a look at diagnose-issues docs, maybe you can get more info on whats causing the MySQL aborted connections error.

There was a query we found running that caused a huge database lag.

AWS Site Slowness (LAMP)

My site is experiencing slowness ever since we moved over to AWS from a different server provider - the site is built on a LAMP stack and we're not sure what is causing the slowness. We did not experience such significant lag on the previous server.
Some notes:
The site runs fine on a localhost environment which is why we think it's something with our AWS setup (however, feel free to correct me)
For reference, the site IS image-heavy, uses PHP5.6 (so it's outdated), and DOES use a few SELECT * queries on various pages, etc.
We just don't believe the above is the issue, but please let me know if there's often dispensaries between localhost and production environments, because this is the first time I'm experiencing such.
As noted, we did not experience such significant lag on the previous server.
We have tried asking AWS support, where they made a suggestion which made a noticeable difference to the site, but the site still experiences around 5sec lag time per page. They now say they can't find any reason for the lags, but at the same time because of the lags, our site is experiencing much lower traffic (most of the lags occurred with high traffic volume)
We've updated the aws settings a few different times in a few different ways, such as:
Updating the EC2 instance
Updating RDS
Implementing this: https://aws.amazon.com/elasticache/
Some people have reported that there's no lagtime unless a user logs into the site
Any suggestions are helpful! We are hoping to not have to move the site's server again.

MySQL service periodically goes offline and gives ERROR 2002 (HY000): Can't connect to local MySQL server [duplicate]

I am currently using an AWS micro instance as a web server for a website that allows users to upload photos. Two questions:
1) When looking at my CloudWatch metrics, I have recently noticed CPU spikes, the website receives very little traffic at the moment, but becomes utterly unusable during these spikes. These spikes can last several hours and resetting the server does not eliminate the spikes.
2) Although seemingly unrelated, whenever I post a link of my website on Twitter, the server crashes (i.e.,Error Establishing a Database Connection). Once restarting Apache and MySQL, the website returns to normal functionality.
My only guess would be that the issue is somehow the result of deficiencies with the micro instance. Unfortunately, when I upgraded to the small instance, the site was actually slower due to fact that the micro instances can have two EC2 compute units.
Any suggestions?

If you want to stay in the free tier of AWS (micro instance), you should off load as much as possible away from your EC2 instance.
I would suggest you to upload the images directly to S3 instead of going through your web server (see some example for it here: http://aws.amazon.com/articles/1434).
S3 can also be used to serve most of your web pages (images, js, css...), instead of your weak web server. You can also add these files in S3 as origin to Amazon CloudFront (CDN) distribution to improve your application performance.
Another service that can help you in off loading the work is SQS (Simple Queue Service). Instead of working with online requests from users, you can send some requests (upload done, for example) as a message to SQS and have your reader process these messages on its own pace. This is good way to handel momentary load cause by several users working simultaneously with your service.
Another service is DynamoDB (managed NoSQL DB service). You can put on dynamoDB most of your current MySQL data and queries. Amazon DynamoDB also has a free tier that you can enjoy.
With the combination of the above, you can have your micro instance handling the few remaining dynamic pages until you need to scale your service with your growing success.

Wait… I'm sorry. Did you say you were running both Apache and MySQL Server on a micro instance?
First of all, that's never a good idea. Secondly, as documented, micros have low I/O and can only burst to 2 ECUs.
If you want to continue using a resource-constrained micro instance, you need to (a) put MySQL somewhere else, and (b) use something like Nginx instead of Apache as it requires far fewer resources to run. Otherwise, you should seriously consider sizing up to something larger.

I had the same issue: As far as I understand the problem is that AWS will slow you down when you reach a predefined usage. This means that they allow for a small burst but after that things will become horribly slow.
You can test that by logging in and doing something. If you use the CPU for a couple of seconds then the whole box will become extremely slow. After that you'll have to wait without doing anything at all to get things back to "normal".
That was the main reason I went for VPS instead of AWS.

Internal Server Error uploading a file

Error
I have a web app with a mass uploader (Plupload) for photos and when I upload say twenty photos, about six (around 30 %) will fail with an Internal Server Error. I have checked the Apache error.log for this domain and it has nothing new (I know I'm looking at the right error.log since older errors did show here).
This only happens on my VPS on Dreamhost (my hosting provider) servers while on my development server it runs silky smooth.
Oh, and things used to work just fine a month ago and then just started to fail. Back then I was using Uploadify and since that used Flash, it was impossible for me to debug where the upload failed.
Files and script
Uploaded files are photos, all about 100 kB big, even though I've successfully uploaded (and still can) 3 MB photos. My .htaccess naturally doesn't change during uploads. On the server side is a PHP script that uses GD2 library to move and resize the photo.
Server state
I have recently upgraded my VPS from 300 to 400 MB of RAM. This thing used to work and I upgraded it just so that memory is ruled out as a reason. Also my memory limit for PHP is at 200 MB, so this should sufice.
I am getting mighty frustrated that Dreamhost does not want to help, stating that "we can not be responsible for an error your code causes" and "We still will not be able to assist you in debugging the issue unfortunately."
It has been a week of sparse "support" while my app doesn't work and my clients are frustrated.
Questions
Is this kind of "You're on your own" support a standard across the
industry, i.e. would your host handle this differently?
How exactly can I debug this?

I'm going to assume that you have a standard Apache + PHP setup. One possible configuration is the pre-forked setup; in this case Apache will adapt to system load by forking more children of itself.
With only 400 MB of RAM you're pretty tight, so if you're running 20 processes that each take 200MB (assuming every process handles pretty big files using GD) you're getting into some hot waters with the memory manager.
I would reduce the total number of instances to 2 first to see how this will go; also keep an eye on the memory usage by running top.
Regardless, it might be beneficial for you to run a separate task manager such as Gearman to perform resize tasks so that the upload only has to focus on moving the uploaded file and run the resize task; this way you can greatly reduce the memory required to run your PHP instances.

As to your Q1: the simple answer is that you get what you pay for. A 300Mb RAM Dreamhost VPS costs ~$360 per annum. For this you get the VPS service and responses on service failure relating to the provision of the virtual environment. The OS, the software stack and the applications are outside this service scope. Why? This sort of custom knowledge-base support could cost $50-300 per hour. You are being unreasonable and deluding yourself if you expect Dreamhost to provide such services pro-bono. That's what sites like this one do.
So my suggestion is that you suck up that anger and frustration and work out how to help yourself.
As to your Q2. (i) You need to understand where your Apache errors go to; (ii) Ditto any SQL errors if you are using a D/B. (iii) You need to ensure that PHP error logging is enable and verify where the PHP logs are going to. (iv) You need to inspect those logs, and verify that logging is working correctly, by using a small script which generates runtime errors.
You should also consider using enhanced facilities such as php_xdebug to enhance logging levels and introducing application logging.
In my experience systems and functions rarely die silently. However, applications programmers often ignore return statuses, etc. For example in the GD library, imagecopyresized() can fail and it returns a status code to tell the application when it has, but if the application doesn't test this status and act accordingly then it can end up going down bizarre execution paths silently, and just appear to the user (or developer) as "it just stopped working".
My last comment is that you should really consider setting up a private VPS within your development environment which mirrors your Dreamhost production config, and use this for integration, acceptance test and support. This is pretty easy to do and you can mess this and add debug / what if options and then roll back without polluting your production environment. Tools like VMare Appliances and VirtualBox make this easy. See this blog post for a description of how I did this for my hosted service.

trying to respond the question 2: if you checked all your code and you didn't see any bug, I thing that the best thing that you can do is to check the version of all the programs running on the server (apache, php, ...), e.g., I remember that I had a problem with a web service it was running on apache and php, the php version was 5.2.8, and after a lot of investigation I found out that that version had a problem parsing xml data.

Regarding the first part of the question: Dreamhost do offer a paid support service with "call back". We used this once to get the low down on something. They are very good with general support (better than many hosts IMO) but you can't expect dedicated service, and they must handle a lot of piddling questions. But pay for a call back and, in about 2 minutes on the phone, you can get the answer you want, plus they get their $10 (recurring) for the time. You both win. Just remember to cancel the recurring charges.
Regarding the second part of the question, we had this very same issue with them. Their response (as suggested by Linus in the comments) was that they keep a tally of the CPU use of all processes used by your "user". If that total exceeds a threshold, they will simply kill the process(es) to get the cycles down. No error messages, no warnings, no nothings. Processes can include MySQL, CGI (perl) or PHP. No way to monitor or predict, and we couldn't program round it. Solution... not DreamHost, unfortunately. (webhostingtalk.com will give you loads of host ideas). So we use for some sites, but not for others.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.