I have a theoretical question I'm hoping you can help me with.
Alright I have a home based server
300 down/30 up Internet (the best I can get where I am at).
I have a static IP and Permission from my IP to host my said server.
Alright everything's in place right? No, well maybe.
Being my server isn't top of the line I worry about 1 single thing,
let's say when I release my App (using my server as a back-end).
Let's say I have 20 thousand active users daily for 1 hour each per day, what's the likelihood of hundreds of people submitting posts (think of twitter posting, only text though) at one given time between 0-300 milliseconds.
What I mean is, think of MySQL, running queries, would 500 people posting text of 140 characters each drop a system to a crawl even with (in a perfect world - perfectly designed queries). And what would the likelihood be that 500 people submit at the same 0-300ms.. To me it doesn't seem very likely until you get into hundreds of thousands of people.
In other words, theory of time.
20K Active users Daily.
Likelihood of estimated theoretical queries per second IF they are all active at one given time. Lets say the average time per people posting ranges from 5-90 seconds. (Reading > Then posting).
I just don't see an issue, but something in the back of my head is making me over think this. The reason this came up is because my Web Hoster (Host-Gator) I found out had like 25 max connections to MySQL. Which freaked me out, and made me realize - what's the likelihood of 25 people at any given time on a small (20K active users on an app) reading that limit.
I haven't yet set any max connections on my server yet, I will eventually. (For obvious reasons). I just want to make sure I set it up to be optimally proficient, and in doing so, I need to at least articulate how many queries per second I can expect on average per so many people, E.G. 20K people.
To me 20K users isn't that many, but at the same time, I'm not very good at averaging things out in this kind of situation, because it's nothing but a big ('Unknown'), I mean how can I truly predict something like this without being in a live production environment. But being in a live production environment and having it smack be right back in the face and ruining my credibility to the end user is the worst that can happen.
Server Specs: (TO BE UPGRADED WHEN NEEDED).
Windows 10
8 Gigs of Ram.
Intel Core I3
Apache + PHP + MySQL (Not Xampp or Wamp), just each individual things set up.
1 TB SSD
Nothing else running on it - dedicated only for this sole purpose.
Yes I know, not very strong, but it's all I have and can afford at the moment.
There's no way in the world you'll catch me spending $174 a month for a decent dedicated server when i can save up a few months and just outright buy a new one later.
So this is just a temporary solution until I can afford something better.
Thanks guys/gals.
First off, I would drop the Windows 10 OS and go with Ubuntu 16.
Be realistic when buying a server, as some companies advertise servers in the low hundreds, and after you really get things built out correctly, you are in the thousands of dollars.
Think of it like you are getting a 'chassis only'.
Second, don't try to emulate what good IP companies do for pennies on the dollar - you'll go broke and not have enough time to focus on your core application.
Once it takes off and starts generating revenue, then put your first dollars into infrastructure improvements.
Take if from someone who has gone done this path before ;)
Good luck !
Related
we have a PHP/MySQL/Apache Web app which holds a rating system. From time to time we do full recalculations for ratings, which means about 500 iterations of calculation, each taking 4-6 minutes and depending on the results of previous iteration (i.e., parallel solutions are not possible). Time is taken mostly by MySQL queries and loops for each rated player (about 100000 players on each iteration, but complex logic of linking between players gives no possibility for parallelization here also).
The problem is - when we start recalculation in plain old way (one PHP POST request), it dies after about 30-40 minutes from start (which gives only 10-15 iterations completed). The question "why it dies?" and other optimization issues are kinda out of league now - too complex logic, which needs to be refactored and even maybe rewritten in other language/infrastructure, yes, but we have no resources (time/people) for it now. We just need to make things work in the least annoying way.
So, the question: what is the best way to organize such recalculation, if possible, so that site admin can start recalculation by just one click and forget about it for one day, and it still does the thing?
I found on the web few advices for similar problems, but no silver bullet:
move iterations (and, therefore, timeouting) from server to client with usage of AJAX requests instead of plain old PHP requst - could possibly make the browser freeze (and AJAX's async nature is kinda bad for iterations);
make PHP to start a backend service which does the thing (like advised here) - it should take lot of work and I have no idea how to implement it.
So, I humbly ask for any advices possible in such situation.
I currently have a PHP script that collects similar data from various sources, each data source is scraped and parsed every 120 seconds. At the moment I have 20 data sources, but I expect to integrate another 100 over the coming weeks.
Currently each data source is scraped in it's own thread, there is one main PHP script that will execute other scripts to perform the scraping work. This method allows all sources to be scraped at the same time, but it also puts a strain on the server, and a bottleneck on the database (MySQL).
I'm looking for a way to scale my current application, could I do something like this with AWS? Perhaps each of these scraping scripts could run in their own small server instance, each of these instances would be automatically created by a "main" instance and then die once the script has finished. I don't have any experience with AWS, so I'm not entirely sure if this is possible, or maybe it's just a bad idea.
The main question here is: How can I scale my current scraping script to allow for many new data sources? I'm interested in any solution even if I need to buy additional services.
You need a queueing system
You're describing a sort of worker / queue pattern, with your main server performing both the en-queueing and the worker execution, which of course is going to be a huge strain on your server.
First and foremost, your workers need to be asynchronous: you shouldn't be waiting for something that may or may not come back. You really should take a look at ZeroMQ which, I might add, contains some of the best documentation on the planet. If you're willing to learn, take a look at how this works and follow some tutorials, there are plenty out there. Have your queue taking on new jobs and dispatching others elsewhere (i.e. to other boxes) hosted on your main server.
Horizontal Scaling
You can create some sort of Instance Controller to handle AWS instances. You really just need to sit down and think about your logic (when do I want this many boxes, when do I want to shut them down). The API is pretty simple to use once you get your head around it. Here's some code I wrote a while back to wrap Amazon's SDK for PHP. I'm not sure if it's working 100% with the latest version (I used it around a year ago), but the concepts are there - you have simple methods like startBox() or stopBox() that you call from your queue, and have your box automatically start doing it's stuff once it starts up.
You could use the t1.micro instances from Amazon pricing here, which has a free tier info here up to a certain limit.
Get it working properly, with a loop on your main server deciding how many boxes you need working at any one time given certain circumstances (no. of jobs in your database table, for example), and you'll have theoretically infinite scaling. Here's how I did it for my code:
Tier 1: > 5 jobs, < 10 jobs = 1 box
Tier 2: > 10 jobs, < 20 jobs = 2 boxes
etc. etc.
Advice
Log everything. Log every box coming up, every box coming down. Calculate your costs in your code and store them, maybe in a database, or log them, so you know exactly how much you're spending - your don't want things to get out of hand.
Make sure you open up your DB ports so your instances can talk to your DB to say when a job is done or anything else you need to pass between your "master" box and your "slave" boxes.
Also, if you're paying for web servers, you'll be billed for the hour with aws, so you need to get the time you start the box, and when it's time to shut down, only actually shut it down when 55 minutes or so has passed - you might as well get those extra minutes for what you're paying.
I can't really think of anything else. Do your research, figure out the best way to build a queueing system, and build it with scalability in mind (it can react and change to numbers that you control).
Split your scraping up across multiple instances (say 5 per server) and have them talk to a central DB like Amazon RDS.
No need to kill the instances after you have finished scraping if your doing this every 120 seconds.
Hey,
I currently have over 300+ qps on my mysql. There is roughly 12000 UIP a day / no cron on fairly heavy PHP websites. I know it's pretty hard to judge if is it ok without seeing the website but do you think that it is a total overkill?
What is your experience? If I optimize the scripts, do you think that I would be able to get substantially lower of qps? I mean if I get to 200 qps that won't help me much. Thanks
currently have over 300+ qps on my mysql
Your website can run on a Via C3, good for you !
do you think that it is a total overkill?
That depends if it's
1 page/s doing 300 queries, yeah you got a problem.
30-60 pages/s doing 5-10 queries each, then you got no problem.
12000 UIP a day
We had a site with 50-60.000, and it ran on a Via C3 (your toaster is a datacenter compared to that crap server) but the torrent tracker used about 50% of the cpu, so only half of that tiny cpu was available to the website, which never seemed to use any significant fraction of it anyway.
What is your experience?
If you want to know if you are going to kill your server, or if your website is optimizized, the following has close to zero information content :
UIP (unless you get facebook-like numbers)
queries/s (unless you're above 10.000) (I've seen a cheap dual core blast 20.000 qps using postgres)
But the following is extremely important :
dynamic pages/second served
number of queries per page
time duration of each query (ALL OF THEM)
server architecture
vmstat, iostat outputs
database logs
webserver logs
database's own slow_query, lock, and IO logs and statistics
You're not focusing on the right metric...
I think you are missing the point here. If 300+ qps are too much heavily depends on the website itself, on the users per second that visit the website, that the background scripts that are concurrently running, and so on. You should be able to test and/or compute an average query throughput for your server, to understand if 300+ qps are fair or not. And, by the way, it depends on what these queries are asking for (a couple of fields, or large amount of binary data?).
Surely, if you optimize the scripts and/or reduce the number of queries, you can lower the load on the database, but without having specific data we cannot properly answer your question. To lower a 300+ qps load to under 200 qps, you should on average lower your total queries by at least 1/3rd.
Optimizing a script can do wonders. I've taken scripts that took 3 minutes before to .5 seconds after simply by optimizing how the calls were made to the server. That is an extreme situation, of course. I would focus mainly on minimizing the number of queries by combining them if possible. Maybe get creative with your queries to include more information in each hit.
And going from 300 to 200 qps is actually a huge improvement. That's a 33% drop in traffic to your server... that's significant.
You should not focus on the script, focus on the server.
You are not saying if these 300+ querys are causing issues. If your server is not dead, no reason to lower the amount. And if you have already done optimization, you should focus on the server. Upgrade it or buy more servers.
I'm currently re-writing my site using my own framework (it's very simple and does exactly what I need, i've no need for something like Zend or Cake PHP). I've done alot of work in making sure everything is cached properly, caching pages in files so avoid sql queries and generally limiting the number of sql queries.
Overall it looks like it's very speedy. The average time taken for the front page (taken over 100 times) is 0.046152 microseconds.
But one thing i'm not sure about is whether i've done enough to reduce php memory usage. The only time i've ever encountered problems with it is when uploading large files.
Using memory_get_peak_usage(TRUE), which I THINK returns the highest amount of memory used whilst the script has been running, the average (taken over 100 times) is 1572864 bytes.
Is that good?
I realise you don't know what it is i'm doing (it's rather simple, get the 10 latest articles, the comment count for each, get the user controls, popular tags in the sidebar etc). But would you be at all worried with a script using that sort of memory getting hit 50,000 times a day? Or once every second at peak times?
I realise that this is a very open ended question. Hopefully you can understand that it's a bit of a stab in the dark and i'm really just looking for some re-assurance that it's not going to die horribly come re-launch day.
EDIT: Just an mini experiment I did for myself. I downloaded and installed Wordpress and a default installation with no extra add ons, just one user and just one post and it used 10.5 megabytes of memory or "11010048 bytes". Quite pleased with my 1.5mb now.
Memory usage values can vary heavily and are subject to fluctuation, but as you already say in your update, a regular WordPress instance is much, much fatter than that. I have had great troubles to get the WordPress backend running with a memory_limit of sixteen megabytes - let alone when Plug-ins come into play. So from that, I'd say a peak of 1,5 Megabytes performing normal tasks is quite okay.
Generation time is extremely subject to the hardware your site runs on, obviously. However, a generation time of 0.046152 seconds (I assume you mean seconds here) sounds very okay to me under normal circumstances.
It is a subjective question. PHP has a lot of overhead and when calling the function with TRUE, that overhead will be included. You'll see what I mean when you call the function in a simple Hello World script. Also keep in mind that results can differ greatly depending on whether PHP is run as an apache module or FastCGI.
Unfortunately, no one can provide assurances. There will always be unforseen variables that can bring down a site. Perform load testing. Use a code profiler to narrow down the location of any bottlenecks to see if there are ways to make those code blocks more efficient
Encyclopaedia Britannica thought they were prepared when they launched their ad-supported encyclopedia ten years ago. The developers didn't know they would be announcing it on Good Morning America the day of the launch. The whole thing came crashing down for days.
As long as your systems aren't swapping, your memory usage is reasonable. Any additional concern is just premature optimization.
I am developing an image bank site that will hold royalty-free images for download. I want to slow down anyone using a bot or who is downloading too often, so I have a daily file limit and have incorporated a variable sleep into the script that delivers the files. I do that by writing the completion time of the last download to a database, then checking the elapsed time when the next download begins. If that is less that N seconds then I delay the download by M seconds, doubling M on successive infractions. That works fine until the script hits the server's execution time limit.
My hosting company confirms that sleep time counts towards execution time.
Am I being over-cautious at the development stage?
Any suggestions about how to detect and slow down users who are abusing the site without using php sleep?
I don't think you're being over-cautious, but I do think that this is a bad way to be cautious. If sleep time counts toward execution time, aren't you paying for that? It probably also counts toward CPU usage and a bunch of other cost factors too. Additionally, slowly choking off service doesn't give your user any indication that they are doing something wrong, it just makes your service seem slow.
You'd probably be better off serving a friendly message-image letting the person know what's going on so they can modify their behavior (this is particularly good given that some people might trigger it by accident while performing completely innocent activities). If they insist on serving your message-image more than five or ten times, then it's definitely a script, so just stop answering their requests entirely.
Why don't you simply make the user aware of what he/she is doing "wrong" and display an error?
This way, the user will know what is going on and might decide to correct the behavior. With random delays, I would suspect something wrong with your server and maybe just look for a competing offering that works more stable.
Use a div with a time counter and implement this time mechanism in javascript.example: (www.rapidshare.com) If sleep time is counted as execution time, that means that you have a pretty high chance of crossing the execution time limit.
If any one delay is much longer than the script execution timeout, you might want to block that user entirely for some period of time (24 hours?).
How are you deciding exactly who is aggressively downloading? The IP address is not 100% reliable, as you might have a number of people behind NAT that all appear to come from the same IP address.