I just wonder how "Maintain client load" test' work and how to properly configure our environment (LAMP + nginX) to get the best result ? Can anyone explain me this test?
loader.io engineer here. I fully expect this question to be closed by noon, but I'll take a stab at explaining it anyway.
"maintain load" tests are kind of a strange beast. It may help to think of any loader test in terms of a "workload", which consists of the list of URLs you are testing.
In loader, you specify a number of clients for your test, and each client takes a copy of the workload and runs it. If the client is in "maintain load" mode, it iterates over the URLs in the workload repeatedly - maintaining its load. All the other clients do the same.
Below is a visualization of what the pattern of requests looks like, taken from a loader.io blog post
This has some interesting side-effects. If you configure your test to ramp up the number of clients, what we see quite often is that response times at the beginning of a test are low, so clients are iterating fast over their workload. As more clients are added, responses get slower, effectively slowing down the request rate. This can make maintain load tests difficult to reason about, and that's why I personally don't recommend starting with maintain load tests.
As far as configuring your stack for best results, it depends on what "best results" means for you and what you are even doing with your stack. There is no silver bullet. If you're serving a static website then cache the heck out of it for best performance. If you have a complex app making database queries on every request and rendering things - profile your code, db queries, and everything else to tune your performance.
Define some requirements and set some performance goals - e.g. do you expect to a hundred page views within an hour? A minute? Figure out what those requirements are and then go ahead and test it.
Once you have your requirements, you can use loader.io and/or other load testing tools in a much more meaningful way. If your current performance doesn't match your requirements and goals, you can use these tools to check your progress. Start with small tests that your servers handle easily and increase it until things break. Then optimize your code/database queries/etc and test again to see how much you've improved.
Related
If I have a website where users login and logout, and each user has 4 session variables being used, how will this affect my site?
Say if I have 100,000 active members, then that would be effectively 400,000 session variables being passed at the same time. Will this affect the loading of my site? I understand php has a memory limit but do not fully understand it.
Thanks
4 variables per user is nothing, but my suggestion would be on a different level: Focus on what's causing actual bottlenecks in your web site. This issue is probably not relevant right now, and is really easy to switch from in the future if this what slows you down. (And it won't)
I bet you have much more important stuff to work on than worry about another variable, and when you get to that amount of active users, your whole structure will probably change, including servers and solutions. Good luck!
To answer shortly - yes, it will affect loading of your site, if you have 100k users. But it won't be only because of sessions, they will be a part of the bottleneck.
It's easy to calculate possible memory consumption and according to that you should decide how to scale your site.
Scaling options are endless (well, as a phrase of course, there's a finite amount of ways to scale programs but still there are many to choose).
If it happens that you attract that many users, chances are you will be able to afford professional help when it comes to scaling your site.
If it's a case of you wondering what those options are, then it might be the best to ask a question with specific things in mind that trouble you when determining when and where the bottlenecks might be.
Also, you don't deploy sites with so many active users on a single server, using default PHP configuration, especially the session one.
The answer, of course, is "it depends".
STRONG SUGGESTION:
1) Establish a "performance baseline"
2) Consider resources like CPU, memory, network and disk
3) Consider OS, Web server, Application and database
4) Run stress tests, and compare your performance between "normal loads" to "high loads"
5) Identify the bottlenecks, and deal with them appropriately
Assuming you're running Linux/Apache/MySql (a total guess on my part - you didn't say), here's an excellent three-part article that might get you started in the right direction:
Tuning LAMP systems, Part 1: Understanding the LAMP architecture
But trust me: worrying about minutia like whether you have 5 session variables instead or 4, instead of trying to gather solid baseline statistics, is NOT going to help you scale to 100,000 users :)!
I am writing a ruby on rails application and one of the most important featuers of the website is live voting. We fully expect that we will get 10k voting requests in as little as 1 minutes. Along with other requests that means we could be getting a ton of requests.
My initial idea is to set up the server to use apache + phusion, however, for the voting specifically I'm thinking about writing a php script on side and to write/read the information in memcached. The data only needs to persist for about 15 minutes, so writing to the database 10,000 times in 1 minute seems pointless. We also need to mark the ip of the user so they don't vote twice thus being extra complicated in memcached.
If anyone has any suggestions or ideas to make this work as best as possible, please help.
If you're architecting an app for this kind of massive influx, you're going to need to strip down the essential components of it to the absolute minimum.
Using a full Rails stack for that kind of intensity isn't really practical, nor necessary. It would be much better to build a very thin Rack layer that handles the voting by making direct DB calls, skipping even an ORM, basically being a wrapper around an INSERT statement. This is something Sinatra and Sequel, which serves as an efficient query generator, might help with.
You should also be sure to tune your database properly, plus run many load tests against it to be sure it performs as expected, with a healthy margin for higher loading.
Making 10,000 DB calls in a minute isn't a big deal, each call will take only a fraction of a millisecond on a properly tuned stack. Memcached could offer higher performance especially if the results are not intended to be permanent. Memcached has an atomic increment operator which is exactly what you're looking for when simply tabulating votes. Redis is also a very capable temporary store.
Another idea is to scrap the DB altogether and write a persistent server process that speaks a simple JSON-based protocol. Eventmachine is great for throwing these things together if you're committed to Ruby, as is NodeJS if you're willing to build out a specialized tally server in JavaScript.
10,000 operations in a minute is easily achievable even on modest hardware using a specialized server processes without the overhead of a full DB stack.
You will just have to be sure that your scope is very well defined so you can test and heavily abuse your implementation prior to deploying it.
Since what you're describing is, at the very core, something equivalent to a hash lookup, the essential code is simply:
contest = #contest[contest_id]
unless (contest[:voted][ip])
contest[:voted][ip] = true
contest[:votes][entry_id] += 1
end
Running this several hundred thousand times in a second is entirely practical, so the only overhead would be wrapping a JSON layer around it.
This is a very general question from a newbie thinking about web application scalability. I am hosting my php based web application on a single microsoft IIS server. How do I determine how maximum number of connections that a IIS server can support without affecting performance? Also, main performance criteria for a web application in this situation would be the http response time correct ? I have a mysql database that does some expensive joins. So, my question really is - how to figure out how many max connections the server can handle? And How to speed up database performance ? I m looking for general recommendations.
ufff this is really generic question.
regarding the maximum amount of request the server can server. Try using some tool to stress it. I would recommend jmeter
regarding scalability:
Use optimized indexes
Cache much as you can: scripts, pages, images, etc.
optimize your site
but remember that premature optimization is the root of all evil and can cost you more than you think
To stress test you can use: http://support.microsoft.com/kb/231282/en-us
For what regards the database the only way (if you want to stick with one server) is to do less query per request and maybe use materialized view (be aware of table updates at this point)
The best of course is to cache your HTML so when users request your pages you don't need even the db connction, you just sends the html cached
First you need to understand what performance is acceptable to your user experience. That usually breaks down to response time of the server. If your maximum response time can not exceed 1 second for users to have a good experience, then you figure out how many queries per second the server can handle, end to end , without violating the 1 second response time for 99% of the queries. Once it violates that, its time to add more capacity in the form of servers.
I constantly read on the Internet how it's important to correctly architect my PHP applications so that they can scale.
I have built a simple/small CMS that is written in PHP (think of Wordpress, but waaaay simpler).
I essentially have URLs like such: http://example.com/?page_id=X where X is the id in my MySQL database that has the page content.
How can I configure my application to be load balanced where I'm simply performing PHP read activities.
Would something like Nginx as the front door setup to route traffic to multi-nodes running my same code to handle example.com/?page_id=X be enough to "load balance" my site?
Obviously, MySQL is not being load balanced in this situation, though for simplicity - that makes that out of scope for this question.
These are some well known techniques for scaling such an app.
Reduce DB hits
Most often the bottle neck will be your DB, so cache recent pages so that you reduce DB activity, perhaps in something like memcached.
Design your schema such that it is partition-able.
In the simplest case, separate your data into logical partitions, and store each partition in a separate mysql DB. Craigslist, for example, partitions data by city, and in some cases, by section within that. In your case, you could partition by Id quite simply.
Manage php sessions
Putting ngnx in front of a php website will not work if you use sessions. Load balancing php does have issues as sessions are persisted on local storage. Therefore you need to do session management explicitly. The traditional solution is to use memcached to store and look up some kind of cookie.
Don't optimize prematurely.
Focus on getting your application out so that the next magnitude of current users gets the optimal experience.
Note: Your main potential pain points are discussed here on SO
No, it is not at all important to scale your application if you don't need to.
My view on this is:
Make it work
Make sure it works correctly - testability, robustness
Make it work efficiently enough to be cost effective to run
Then, if you have to so much traffic that your system cannot handle it, AND you've already thrown all the hardware that (sensible) money can buy at it, then you need to scale. Not sooner.
Yes it is relatively easy to scale read-workloads, because you can simply perform reads against readonly database replicas. The challenge is to scale write-workloads.
A lot of sites have few writes, even if they're really busy.
The correct approach is to use some kind of load balancer such as:
http://www.softwareprojects.com/resources/programming/t-how-to-install-and-configure-haproxy-as-an-http-loa-1752.html
What this does is forward a certain user session only to a certain server, hence you dont have to worry about sessions and where they are stored at all. What you do have to worry is how to distribute the filesystem if the 2 servers are running on two different machines, especially if you make heavy use of the filesystem. Hope this article above helps...
I'm developing a web app that will access and work with large amounts of data in a MySQL database, something like a dictionary/thesaurus. I need to test the performance of the DB as its size increases, so I know how slow each request will be in the future.
Any ideas? Like are there specific tools to check DB performance for a particular query, etc?
Do you know what, specifically you're testing? Measuring "performance" is almsot always useless, unless you know exactly what it is you want.
For example, are you looking for low latency on query result retrieval? Perhaps high throughput on date retrieval? Perhaps you care more about fast insertions into the database, and less about fast query results? Perhaps you care about different things on different tables (in fact, that's almost always the case).
My advice will probably be ignored, but I'll say it anyway:
Don't optimise before you know what you want.
Don't optimise as you write the code.
When you do get around to optimising your database, make sure you optimise for the right things. Use realistic data - if you're testing dictionary-sized hunks of text, don't test with binary data (for example).
Anyway, I realise you were probably looking for a more technical answer, but hey...
You can use Maatkit's query profiler to measure impact of data amount on MySQL performances.
And generatedata.com to generate the data you need to test your app.
You can also test your application responsiveness using HTTP testing tools like :
Apache's bundled 'ab' tool (Apache Bench)
JMeter
Selenium
a good tool to use is apache's ab, which comes standard with apache httpd server. this tool can make multiple connections to a web server and benchmark its performance. while firebug is a good way to see in what order things lod, how long each item takes to load, etc., you're only seeing one user's experience. against an unloaded test server, that information can only take you so far. ab simulate multiple users connecting and will give a more realistic picture of how a particular page handles concurrent users.
which leads to me a limitation in ab: it only tests one URL. i get around this often by whipping up a simple test webpage that makes a random selection from a list of pre-defined URL's that i want to test. for example: the login page, a search result, posting a comment, and so on. ab hits the test page, and the test page simply calls one of the test URL's (possibly with a randomized paramter) and returns that page. in this manner, you get a better idea of how your whole site handles concurrent users.
PS: your OS is unanswerable. you'll have to figure that out yourself based on how your application is written, the layout of your data, the configuraiton of the web server and the database server, etc.