I have a pretty "basic" app that we designed that was originally on a local plesk server and we migrated to GAE/GSQL/GCS. app engine, mysql, cloud storage.
Here's some background info:
App is PHP based, and runs great on the local server. When we migrate to the cloud we notice this random yet extremely latency that happens. It's so bad that the app times out and gives a SPDY timeout error. We utilize cloudflare for SPDY assistance so we started there and they said it's the the server. Then we went to google. We've been going back and forth back and forth and I am looking for other avenues of help.
I am running an app on a F2 standard GAE instance and a G1-small CloudSQL instance (gen 2). All same region/zone. There is also a failover sql instance.
There is really no pattern to it but users on the app notice a bad timeout very frequently and it dies after 60 seconds. (which points to a PHP timeout right? We checked the code and it runs fine on the local server)
I dont have a whole lot of traffic on this app yet (maybe a few users a day) so i dont know if it's traffic load. Here's some basic stats for you:
https://imgur.com/a/U1tk5ak
Some Google Engineers said our app has trouble scaling (QPS never will get about 1)
https://imgur.com/a/XWh44bm
And asked if we are threading. We are not. We do not use memcache yet either.
I also see a ton of these:
https://imgur.com/a/eVSNqc3
Which looks like this bug: https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues/126
But I am unsure if this is all related.
We've tried going through Google's tech support, they said we have "manual locks" but our dev team doesn't agree nor know what this really means. Again, the same framework of the app (session handling etc) code is used in many apps with a ton of users on it (non GAE, they're on compute on AWS) so this is our first venture to GAE.
We connect using standard MySQL connection parameters and use the same framework in a lot of applications and it runs fine. We use the required proxy to connect to CloudSQL.
The speed and constant lag shouldn't be there. We don't know what this issue could be. My questions are:
1) Do you see any issues here? All database logs are above and summaries
2) Can you help me understand what may be wrong here?
Thank you!
The biggest latency spike I can see from your Screenshot it's about 20 seconds at 9:00 am, that its about the same time where you have the biggest amount of queries, read/write operations, and memory usage.
Even though you have a small amount of users they can be doing many queries, If GCP support suggest that it has problems scaling you can check the auto-scale property and see if it is enable.
From what i can see from your images and looking through the Cloud SQL docs I would suggest a horizontal scale of your Cloud SQL Instance.
Also take a look at diagnose-issues docs, maybe you can get more info on whats causing the MySQL aborted connections error.
There was a query we found running that caused a huge database lag.
Related
I have a GCP app running in the standard environment on php 7.4 and using mysqli to access google cloud SQL. Instance class is F2 and has been for years. This morning, out of nowhere and with no code or configuration changes, the memory usage has spiked to the point where it is exceeding the quota and restarting the service so often the system has become unusable. It doesn't matter how many people are logged in. Pages that don't access the database seem unaffected so it appears to be related to SQL.
One other note: my WordPress site calls into this system using XMLHttp, and that query is working fine.
I am at a loss and my clients are dead in the water. How can I figure out what's causing this? It sure seems like they must have changed something on the platform side.
Evening All,
At by absolute wits end and hoping someone may be able to save me! I am in the process of migrating a number of PHP applications into Azure. I am using:
Linux based App Service running PHP 7.4 (2 vCPUs, 8Gb RAM) at a cost of £94 a month.
Azure Database on MySQL 8.0 (2 vCPUs) at £114 a month.
My PHP apps run well, decent load time of under 1 second per page. Wordpress performance however is awful. I am going from a 1 second page load to around 10 seconds, particularly on the back end. I have read all of the Azure guides and have implemented the following obvious points:
Both the App Service and the MySQL install are in the same data center
App Service is set to 'Always On'
Connection Redirection is set to Preferred and tested as working
The same app runs fine on a very basic £10 or so a month shared hosting package. I have also tried the same setup in Amazon Web Services today and page load is back to a second or so.
In Chrome Console, the delay is in TTFB. I have disabled all the plugins and none stand out as making a huge difference. Each adds a second or so page load, suggesting to me a consistent issue when a page requires a number of database calls.
What is going on with Azure and the awful Wordpress performance?! Is there anything else I can investigate or try? Really keen to stay with Azure but can't cope with the huge increase in cost for a performance hit.
The issue turned out to be the way the file system runs in the app service. It is NOT an issue with the database. The App Service architecture is just too slow at present with file read/writes, of which Wordpress uses a lot. Investigated the various file cache options but none improved enough.
Ended up setting up a fairly basic, and significantly cheaper, virtual machine, running with the same database and performance is hugely improved.
Not a great answer, but App Services are not up to Wordpress at present!
The comments below are correct. The "problem" is the database. You can either move MySQL to a Virtual Machine (which will give you a better performance) or you can also try to use cache plugins such as WP Super Cache as well decrease the number of requests.
You can find a full list of tips in the following link:
https://azure.microsoft.com/en-us/blog/10-ways-to-speed-up-your-wordpress-site-on-azure-websites/
PS: ignore the date, it's still relevant
I am currently using an AWS micro instance as a web server for a website that allows users to upload photos. Two questions:
1) When looking at my CloudWatch metrics, I have recently noticed CPU spikes, the website receives very little traffic at the moment, but becomes utterly unusable during these spikes. These spikes can last several hours and resetting the server does not eliminate the spikes.
2) Although seemingly unrelated, whenever I post a link of my website on Twitter, the server crashes (i.e.,Error Establishing a Database Connection). Once restarting Apache and MySQL, the website returns to normal functionality.
My only guess would be that the issue is somehow the result of deficiencies with the micro instance. Unfortunately, when I upgraded to the small instance, the site was actually slower due to fact that the micro instances can have two EC2 compute units.
Any suggestions?
If you want to stay in the free tier of AWS (micro instance), you should off load as much as possible away from your EC2 instance.
I would suggest you to upload the images directly to S3 instead of going through your web server (see some example for it here: http://aws.amazon.com/articles/1434).
S3 can also be used to serve most of your web pages (images, js, css...), instead of your weak web server. You can also add these files in S3 as origin to Amazon CloudFront (CDN) distribution to improve your application performance.
Another service that can help you in off loading the work is SQS (Simple Queue Service). Instead of working with online requests from users, you can send some requests (upload done, for example) as a message to SQS and have your reader process these messages on its own pace. This is good way to handel momentary load cause by several users working simultaneously with your service.
Another service is DynamoDB (managed NoSQL DB service). You can put on dynamoDB most of your current MySQL data and queries. Amazon DynamoDB also has a free tier that you can enjoy.
With the combination of the above, you can have your micro instance handling the few remaining dynamic pages until you need to scale your service with your growing success.
Wait… I'm sorry. Did you say you were running both Apache and MySQL Server on a micro instance?
First of all, that's never a good idea. Secondly, as documented, micros have low I/O and can only burst to 2 ECUs.
If you want to continue using a resource-constrained micro instance, you need to (a) put MySQL somewhere else, and (b) use something like Nginx instead of Apache as it requires far fewer resources to run. Otherwise, you should seriously consider sizing up to something larger.
I had the same issue: As far as I understand the problem is that AWS will slow you down when you reach a predefined usage. This means that they allow for a small burst but after that things will become horribly slow.
You can test that by logging in and doing something. If you use the CPU for a couple of seconds then the whole box will become extremely slow. After that you'll have to wait without doing anything at all to get things back to "normal".
That was the main reason I went for VPS instead of AWS.
I'm using the Fore.com PHP toolkit for integrating with the SFDC API.
I have the app hosted in a client's office, and it is getting periodic errors where the connection times out, and can't reach the salesforce host.
The app runs fine on my laptop, regardless of the internet connection I'm using. I've also used this library in numerous environments, so confident that this is not a code issue.
Can someone recommend a way I can analyze what's happening in terms of connectivity on this box, so I can prove/articulate the problem to the client's Sys Admin? We're in a bit of a finger-pointing situation now, want to find a resolution.
I realize I can increase the timeout of the app, but for the app to be practical for the client, they can't be waiting long periods for the connections to be made while running through a wizard.
I had the same problem and when i contacted SFDC support, they mentioned that
this problem is because of the performance degradation on SFDC server.
When I check http://trust.salesforce.com to verify system status there was no record of degradation.
When asked, SFDC support mentioned that it was not mentioned on http://trust.salesforce.com it was for very short span.
They also recommended that calling PHP script should have retry mechanization so that it try 3 times or so before giving up.
hope this might help you.
This may seem like an obvious question but we have a PHP/MySQL app that runs on Windows 2008 server. The server has about 10 different sites running from it in total. Admin options on the site in question allow an administrator to run reports (through the site) which are huge and can take about 10mins in some cases. These reports are huge mysql queries that display the data on screen. When these reports are running the entire site goes slow for all users. So my questions are:
Is there a simple way to allocate server resources so if a (website) administrator runs reports, other users can still access the site without performance issues?
Even though running the report kills the website for all users of that site, it doesn't affect other sites on the same server. Why is that?
As mentioned, the report can take about 10 minutes to generate - is
it bad practice to make these kinds of reports available on the
website? Would these typically be generated by overnight scheduled tasks?
Many thanks in advance.
The load your putting on the server will most likely have nothing to do with the applications but the mysql table that you are probably slamming. Most people get around this by generating reports in down time or using mysql replication to have a second database which is used purely for reporting.
I recommend trying to get some server monitoring to see what is actually going on. I think Newrelic just released windows versions of its platform and you can try it out for free for 30 days i think.
There's the LOW_PRIORITY flag, but I'm not sure whether that would have any positive effect, since it's most likely a table / row locking issue that you're experiencing. You can get an idea of what's going on by using the SHOW PROCESSLIST; query.
If other websites run fine, it's even more likely that this is due to database locks (causing your web processes to wait for the lock to get released).
Lastly, it's always advisable to run big reporting queries overnight (or when the server load is minimal). Having a read replicated slave would also help.
I strongly suggest you install a replicated MySQL server, then running large administrator queries (SELECT only naturally) on it, to avoid the burden of having your website blocked!
If there's not too much transaction per second, you could even run the replica on a desktop computer remotely from your production server, and thus have a backup off-site of your DB!
Are 100% sure you have added all necessary indexes?
You need to have a insanely large website to have this kinds of problems unless you are missing indexes.
Make sure you have the right indexing and make sure you do not have connection fields of varchar, not very fast.
I have a database with quite a few large tables and millions of records that is working 24/7.
Has loads of activity and automated services processing it without issues due to proper indexing.