Performance testing my PHP / MySQL Application for AWS

Performance testing my PHP / MySQL Application for AWS - php

I am writing a PHP application which uses MySQL in the backend. I am expecting about 800 users a second to be hitting our servers, with requests coming from an iOS app.
The application is spread out over about 8 diffrenet PHP scripts which are doing very simple SELECT queries (occasionally with 1 join) and simple INSERT queries where I'm only inserting one row at a time (with less than 10kb of data per row on average). Theres about a 50/50 split between SELECTS and INSERTS.
The plan is to use Amazon Web Services and host the application on EC2s to spread the CPU load and RDS (with MySQL) to handle the database, but I'm aware RDS doesn't scale out, only up. So, before committing to an AWS solution, I need to benchmark my application on our development server (not a million miles of the medium RDS solution spec) to see roughly how many requests a second my application and MySQL can handle (for ballpark figures) - before doing an actual benchmark on AWS itself.
I believe I only really need to performance test the queries within the PHP, as EC2 should handle the CPU load, but I do need to see if / how RDS (MySQL) copes under that many users.
Any advice on how to handle this situation would be appreciated.
Thank-you in advance!

Have you considered using Apache Benchmark? Should do the job here. Also I've heard good things about siege but haven't tested yet.

If you have 800 user hits per second, it could be a good idea to consider sharding to begin with. Designing and implementing sharding right at the beginning will allow you to start with a small number of hosts, and then scale out more easily later. If you design for only one server, even if for now it will handle the load, pretty soon you will need to scale up and then it will be much more complex to switch to a sharding architecture when the application is already in production.

Related

Looking for best server setup solution for my web application

I have 3 codeigniter based application instances on two separate servers.
Server 1.
First instance is application, second instance is rest API, both use same database. ( I know there is no benefit to have two instances on same machine, other than cleanliness, and that is why I have it this way ).
Server 2.
This server holds only rest API with whole bunch of php data processing functions. I call this server worker because that is what it only does.
This server works as an endpoint for many API services I am connecting with.
So all this server does as first function is receive requests from application, sometimes it processes those requests before anything else.
Then sends requests to API service. Process is complete this session is over.
In short time API service responds with results where this server takes and processes the data then it sends the result to the application.
Application is at times heavy on amount of very simple sql queries, for the most part insert/update on single table. Amount of sent requests is kept to minimal as well, just because for the most part I send data as many requests in one. I call this bulk request.
What is very heavy is amount of responses I get, I can get up to a 1000 responses to one request within few seconds.( I can't minimize that, because I need every single one ), and then each response I get also is being followed by another two identical responses just to make sure I got it, which I threat as duplicate as soon as I can, and stopping that one process.
Then I process every response with php ( not too heavy just matching result arrays ) and post it to my rest API on the application server to update application tables.
Now when I run say 1 request that returns 1000 responses, application is processing data fine with correct results, but the server is pretty much not accessible in this time for other users.
Everything running on an (LAMP) Ubuntu 16.04 with mysql and apache.
Framework is latest codeigniter.
Currently my setup is...
...for the application server
2 vCPUs
4GB RAM
...for worker API server
1 vCPUs
1GB RAM
I know the server setup is very weak, and it bottlenecks for sure. But this was just for development stage.
Now I am moving into production and would like to hear opinions if you have any on how to best approach this.
I am a programmer first, then server administrator.
So I was debating switching to NGINX, I think I will definitely go with php-fpm, maybe MariaDB but I read of thread management is important. This app will not run heavy all the time probably 50/50 so I think just because of that I may not be able to set it to optimal for all times anyway, and may end up with not any better performance at the end.
Then probably will have to multiply servers and setup load balancing, also high availability.
Not sure about all this.
I don't think that just upgrading the servers to maximum will help tho. I can go all the way up too 64 GB RAM and 32 vCPUs per server.
Can I hear your opinions please?
Maybe share some experience?
Links to resources if you have some good ones?
Thank you very much. I hope you can help me.
Thank you.

None of your questions matter. Well, that is an exaggeration. Machines today are not enough different to worry about starting with the "best" on day one. Instead, implement something, run with it for a while, then see where your bottlenecks in order to decide what to do next.
Probably you won't have any bottlenecks for a long time.

Can I use cron jobs for my application (needs to be extremely scalable)?

I'm about to undertake a large project, where I'll need scheduled tasks (cron jobs) to run a script that will loop through my entire database of entities and make calls to multiple API's such as Facebook, Twitter & Foursquare every 10 minutes. I need this application to be scalable.
I can already foresee a few potential pitfalls...
Fetching data from API's is slow..
With thousands of records (constantly increasing) in my database, it's going to take too much time to process every record within 10 minutes.
Some shared servers only stop scripts running after 30 seconds.
Server issues due to constant intensive scripts running.
My question is how to structure my application...?
Could I create multiple cron jobs to handle small segments of my database (this will have to be automated)?
This will require potentially thousands of cron jobs.. Is that sustainable?
How to bypass the 30 sec issue with some servers?
Is there a better way to go about this?
Thanks!

I'm about to undertake a large project, where I'll need scheduled
tasks (cron jobs) to run a script that will loop through my entire
database of entities and make calls to multiple API's such as
Facebook, Twitter & Foursquare every 10 minutes. I need this
application to be scalable.
Your best option is to design the application to make use of a distributed database, and deploy it on multiple servers.
You can design it to work in two "ranks" of servers, not unlike the map-reduce approach: lightweight servers that only perform queries and "pre-digest" some data ("map"), and servers that aggregate the data ("reduce").
Once you do that, you can establish a performance baseline and calculate that, say, if you can generate 2000 queries per minute and you can handle as many responses, then you need a new server every 20,000 users. In that "generate 2000 queries per minute" you need to factor in:
data retrieval from the database
traffic bandwidth from and to the control servers
traffic bandwidth to Facebook, Foursquare, Twitter etc.
necessity to log locally (and maybe distill and upload log digests to Command and Control)
An advantage of this architecture is that you can start small - a testbed can be built with a single machine running both Connector, Mapper, Reducer, Command and Control and Persistence. When you grow, you just outsource different services to different servers.
On several distributed computing platforms, this also allows you to run queries faster by judiciously allocating Mappers geographically or connectivity-wise, and reduce the traffic costs between your various platforms by playing with, e.g. Amazon "zones" (Amazon has also a message service that you might find valuable for communicating between the tasks)
One note: I'm not sure that PHP is the right tool for this whole thing. I'd rather think Python.
At the 20,000 users-per-instance traffic level, though, I think that you'd better take this up with the guys at Facebook, Foursquare etc. . At a minimum you might glean some strategies such as running the connector scripts as independent tasks, each connector sorting its queue based on that service's user IDs, to leverage what little data locality there might be, and taking advantage of pipelining to squeeze more bandwidth with less server load. At the most, they might point you to bulk APIs or different protocols, or buy you for one trillion bucks :-)

See http://php.net/manual/en/function.set-time-limit.php to bypass the 30 second limit.
For scheduling jobs in PHP look at:
http://www.phpjobscheduler.co.uk/
http://www.zend.com/en/products/server/zend-server-job-queue
I personally would look at a more robust framework that handles job scheduling (see Grails with Quartz) instead of reinventing the wheel and writing your own job scheduler. Don't forget that you are probably going to need to be checking on the status of tasks from time to time so you will need a logging solution around the tasks.

Keeping two distant programs in-sync using MySql

I am trying to write a client-server app.
Basically, there is a Master program that needs to maintain a MySQL database that keeps track of the processing done on the server-side,
and a Slave program that queries the database to see what to do for keeping in sync with the Master. There can be many slaves at the same time.
All the programs must be able to run from anywhere in the world.
For now, I have tried setting up a MySQL database on a shared hosting server as where the DB is hosted
and made C++ programs for the master and slave that use CURL library to make request to a php file (ex.: www.myserver.com/check.php) located on my hosting server.
The master program calls the URL every second and some PHP code is executed to keep the database up to date. I did a test with a single slave program that calls the URL every second also and execute PHP code that queries the database.
With that setup however, my web hoster suspended my account and told me that I was 'using too much CPU resources' and I that would need to use a dedicated server (200$ per month rather than 10$) from their analysis of the CPU resources that were needed. And that was with one Master and only one Slave, so no more than 5-6 MySql queries per second. What would it be with 10 slaves then..?
Am I missing something?
Would there be a better setup than what I was planning to use in order to achieve the syncing mechanism that I need between two and more far apart programs?

I would use Google App Engine for storing the data. You can read about free quotas and pricing here.

I think the syncing approach you are taking is probably fine.
The more significant question you need to ask yourself is, what is the maximum acceptable time between sync's that is acceptable? If you truly need to have virtually realtime syncing happening between two databases on opposite sites of the world, then you will be using significant bandwidth and you will unfortunately have to pay for it, as your host pointed out.
Figure out what is acceptable to you in terms of time. Is it okay for the databases to only sync once a minute? Once every 5 minutes?
Also, when running sync's like this in rapid succession, it is important to make sure you are not overlapping your syncs: Before a sync happens, test to see if a sync is already in process and has not finished yet. If a sync is still happening, then don't start another. If there is not a sync happening, then do one. This will prevent a lot of unnecessary overhead and sync's happening on top of eachother.

Are you using a shared web host? What you are doing sounds like excessive use for a shared (cPanel-type) host - use a VPS instead. You can get an unmanaged VPS with 512M for 10-20USD pcm depending on spec.
Edit: if your bottleneck is CPU rather than bandwidth, have you tried bundling up updates inside a transaction? Let us say you are getting 10 updates per second, and you decide you are happy with a propagation delay of 2 seconds. Rather than opening a connection and a transaction for 20 statements, bundle them together in a single transaction that executes every two seconds. That would substantially reduce your CPU usage.

How many members will an AJAX chat be able to handle on a dedicated server before it overloads and becomes slow?

The details of the dedicated server (at the time the site starts) are as follows:
OS: Linux CentOS
CPU: Intel® Pentium 4 - 3.0 GHz
RAM: 2 GB
Storage: 2 x 120 GB hard drives
Bandwidth: 500 GB per month
The AJAX chat is customly coded. It runs by sending and receiving Javascript commands to and from the database, and then evaluating them.
The chat refresh rate will probably be somewhere around 250ms, although the answers here may change the decision.

If you want to implement browser-based chat application that's going to work on a relatively cheap server and be able to serve lots of users (say, 500 at a time) without crashing - your approach isn't effective.
Reasons: using DB to send JS to the clients who evaluate the code isn't really safe. It's also expensive. It also means that per every line of chat, you need to invoke the DB at least once.
That implies A LOT of I/O for the RDBMS.
If I were you, I'd check out Node.js.
Node.js allows you to write chat servers in JavaScript. The JS itself is not executed by the browser but by server.
It is extremely I/O efficient. It is also simple enough to allow a non-expert programmer to create proper chat apps.

Using php for chat server isn't a good idea. Polling can give too much unnecessary load to server and doesn't scale well. In my opinion you should rethink whole architecture.
Polling is bad for chat - use Comet (long polling) instead
With comet you will need fast HTTP server which can handle thousands of concurrent connections - use NGINX
PHP isn't good solution for chat app - process-like non-blocking solution is much better - look at Tornado and their chat example
From now on you must decide whether you need to scale - if not you can write single-thread server on top of tornado and store all data in memory - of course you will need to delete old unneeded data. If you need to scale choose fast scalable datastore like MongoDB - you'll have to poll database - but it will be single query for ALL clients connected to single tornado instance - not for every client
Facebook used (or still uses) this kind of architecture (using MySQL backend) for their needs

Should I be concerned about performance with connections to multiple databases?

I have a PHP app that is running on an Apache server with MySQL databases.
Based on the subdomain that users access, I am connecting them to a database (sub1.domain.com connects to database_sub1 and sub2.domain.com connects to database_sub2). Right now there are 10 subdomain-database combos, but that number could potentially grow to well over 100.
So, is this a bad thing?
Considering my situation, is mysql_pconnect the way to go?
Thanks, and please let me know if more info would be helpful.
Josh

Is this an app you have written?
If so, from a maintenance standpoint this may turn into a nightmare.
What happens when you alter the program and need to change the database?
Unless you have a sweet migration tool to help you make changes to all your databases to the new schema you may find yourself in a world of hurt.
I realize you may be too far into this project now, but if a little additional relation was added to the schema to differentiate between the domains (companies/users), you could run them all off one database with little additional overhead.
If performance really becomes a problem (Read this) you can implement Clustering or another elegant solution, but at least you won't have 100+ databases to maintain.

It partly depends on the rest of your configuration, but as long as each transaction only involves one connection then the database client code should perform as you would expect - about the same as with a single database but with more possibilities to improve the performance of the database servers, up to the limit of the network bandwidth.
If more than one connection participates in a transaction, then you probably need an XA compliant transaction manager, and these usually carry a significant performance overhead.

No, it's not a bad thing.
It's rather question of number of parallel connections in total. This can be defined by "max_connections" in mysql settings (default it's 151 since MySQL 5.1.15), and is limited by capability of your platform (i.e. 2048< on Windows, more on Linux), hardware (RAM) and system settings (mainly by limit of open files). It can be a bottleneck if you have many parallel users, number of databases is not important.
I made a script which connects 400+ databases in one execution (one after one, not parallel) and i found mysql + php handling it very well (no significant memory leaks, no big overhead). So i assume there will be no problem with your configuration.
And, finnaly - mysql_pconnect is generally not good think in web developement if there is no significant overhead in connecting database per se. You have to manage it really carefully to avoid problems with max_connections, locks, pending scripts etc. I think that pconnect has limited use (ie. cron job runned every second or something like that)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.