So, for a simple test game, I'm working on generating user images based on their current in-game avatar. I got this idea from Club Penguin and GTA V. They both generate images of the current in-game avatar.
I created a script to simply put a few images together and print out the final image to the client. It's similar to how Club Penguin does it, I believe: http://cdn.avatar.clubpenguin.com/%7B13bcb2a5-2e21-442c-b8e4-10516be6abc6%7D/cp?size=300
As you can see, the penguin is wearing multiple clothing items. The items are each different images located at http://mobcdn.clubpenguin.com/game/items/images/paper/image/300/ (ex: http://mobcdn.clubpenguin.com/game/items/images/paper/image/300/210.png)
Anyway, I've already made the script and all, but I have a few questions.
When going to Club Penguin's or Grand Theft Auto's avatar generator, you'll notice it finishes the request so fast. Even when it's a new user, (so before it has a chance to cache the image since it hasn't been generated yet), it finishes in under a second.
How could I possibly speed up the image generation process? Right now I'm just using PHP, but I could definitely switch over to another language. I know a few others too and I'm willing to learn. Which language can provide the fastest web-image generator (it has to connect to a database first to grab the user avatar info)?
For server specs, how much RAM and all that fun stuff would be an okay amount? Right now I'm using an OVH cloud server (VPS Cloud 2) to test it and it's fine and all. But, if someone with experience with this could help, what might happen if I started getting a lot more traffic and there were people with 100+ image requests being made per client when they first log in (relationship system that shows their friend's avatar). I'll probably use Cloudflare and other caching tools to help so that most of them get cached for a maximum of 24 hours, but I can't completely rely on that.
tl;dr:
Two main questions:
What's the fastest way to generate avatars on the web (right now I'm using PHP)?
What are some good server specs for around 100+ daily unique clients (at minimum) using this server for generating these avatars?
Edit: Another question, which webserver could process more requests for this? Right now I'm using Apache for this server, but my other servers are using nginx for other API things (like logging users in, getting info, etc).
IMHO, language is not the bottleneck. PHP is fast enough for real-time small images processing. You just need right algorithm. Also, check out bytecode caching engines such as APC, or XCache, or even HHVM. They can significantly improve PHP performance.
I think, any VPS can do the job until you have >20 concurrent requests. The more clients use service at the same time the more RAM you need. You can easily determine your script memory needs and other performance info by using profiler, such as XHProf.
Nginx or Lighttpd in FastCGI mode use less RAM than Apache http server and they can handle more concurrent connections. But is's not important until you have many concurrent connections.
Yes, PHP is can do this job fast and flexible(example generate.php?size=32)
I know only German webspaces, but they have also an English interface. www.nitrado.net
Related
I have developed a Social Networking website based on WAMP(Windows, Apache, MySQL, PHP) server. I put it up on a Free Host (the hosting serves it in LAMP), and it works fine.
Now, I researched a little and found out that PHP applications are difficult to scale, and take a lot of parallel algorithms. I would like to test how many users does the webhost support for my website, and how much does my localhost.
It's a social network like any other, involving:
Posting data on main page(with images).
Chat between users (with polling every 3 seconds, consider as one in
facebook).
A question-answer forum(just as this one, or yahoo answers -
including upvotes,downvotes, points etc)
Two HTML5 server sent event loops running infinitely.
Many AJAX requests for retrieving data from MySQL database.
As for now, I haven't applied any Cache options, which I plan for later. Also, the chat application has to be switched from Polling to Websockets(HTML5).
My estimated user database goes to be a lot more than 100,000 users. That may need some serious scalability.
I need to know what kind of server may I need for the same. Should it be a dedicated server, should it be 2 of 'em or even more?
I tried this ab.exe located in bin folder of Apache, but it tests the location we provide manually. A social network needs login information to access all the data, which unfortunately limits the functionality of ab.exe only to availability of the "Welcome" page, and nothing towards the AJAX and HTML5 features which I mentioned above.
So, how exactly should I test the scalibilty of website for a hardware same as my laptop(Windows, Intel i5, 4gb Ram, 2.0 GHz), and what about scalability on Shared servers available out there, or even the dedicated ones.
Simply put: You are counting your chickens before they hatch. If you become overwhelmed by a bunch of new users, then that is what we tend to call a "good problem". If you're worried about scalability along the way, then you should look into:
Caching with Memcached or Redis.
Load balancing.
Switching from Apache to Nginx.
Providing a proper CDN.
Since you're using PHP, you should install an opcache.
There are a lot of different ways to squeeze out results. Until you need them, I'd suggest sticking by best practices (normalization, etc).
If you are worried about the capacity of your hosting company to cope with your application then the first thing to do is to contact them and discuss what capacity they have available and how scalable their environment is.
Unless you are hosting yourself, then you have virtually no control over the situation.
But if you believe the number of users is going to grow very quickly then it would be wise to enter a dialogue very early with your provider.
I have added a chat capability to a site using jquery and PHP and it seems to generally work well, but I am worried about scalability. I wonder if anyone has some advice. The key area for me I think is efficiently managing awareness of who is onine.
detail:
I haven't implemented long-polling (yet) and I'm worried about the raw number of long-running processes in PHP (Apache) getting out of control.
My code runs a periodic jquery ajax poll (4secs), that first updates the db to say I am active and sets a timestamp.
Then there is a routine that checks the timestamp for all active users and sets those outside (10mins) to inactive.
This is fairly normal from my research so far. However, I am concenred that if I allow every active user to check every other active user and then everyone update the db to kick off inactive users, then I will get duplicated effort, record locks and unnecessary server load.
So I have implemented an idea of the role of a 'sweeper'. This is just one of the online users, who inherits the role of the person doing the cleanup. Everyone else just checks whether there is a 'sweeper' in existence (DB read) and carries on. If there is no sweeper when they check, they make themselves sweeper (DB write for their own record). If there are more than one, make yourself 'non-sweeper', sleep for a random period and check again.
My theory is that this way there is only one user regularly writing updates to several records on the relevant table and everyone else is either reading or just writing to their own record.
So it works OK, but the problem possibly is that the process requires a few DB reads and may actually be less efficient than just letting everyone do the cleanup as with other research as I mentioned.
I have had over 100 concurrent users running OK so far, but the client wants to scale up to several 100's, even over 1,000 and I have no idea of knowing at this stage whether this idea is good or not.
Does anyone know whether this is a good approach or not, whether it is scalable to hundreds of active users, or whether you can recommend a different approach?
AS an aside, long polling / comet for the actual chat messages seems simple and I have found a good resource for the code, but there are several blog comments that suggest it's dangerous with PHP and apache specifically. active threads etc. Impact minimsed with usleep and session_write_close.
Again does anyone have any practical experience of a PHP long polling set up for hundreds of active users, maybe you can put my mind at ease ! Do I really ahve to look to migrate this to node.js (no experience) ?
Thank you in advance
Tony
My advice would be to do this with meteor framework, which should be pretty trivial to do, even if you are not an expert, and then simply load such chat into your PHP website via iframe.
It will be scalable, won't consume much resources, and it will get only better in the future, I presume.
And it sure beats both PHP comet solutions and jquery & ajax timeout based calls to server.
I even believe you could find on github more or less a completed solution that just requires tweaking.
But of course, do read the docs before you implement it.
If you worry about security issues, read security with meteor
Long polling is indeed pretty disastrous for PHP. PHP is always runs with limited concurrent processes, and it will scale great as long as you optimize for handling each request as quickly as possible.
Long polling and similar solutions will quickly fill up your pipe.
It could be argued that PHP is simply not the right technology for this type of stuff, with the current tools out there. If you insist on using PHP you could try ReactPHP, which is a framework for PHP quite similar to how NodeJS is built. The implication with React is also that it's expected to run as a separate deamon, and not within a webserver such as apache. I have no experience on the stability of this, and how well it scales, so you will have to do the testing yourself.
NodeJS is not hard to get into, if you know javascript well. NodeJS + socket.io make it really easy to write the chat-server and client with websockets. This would be my recommendations. When I started with this is, I had something nice up and running within several hours.
If you want to keep your application stack using PHP, you want the chat application running in your actual web app (not an iframe) and your concerned about scaling your realtime infrastructure then I'd recommend you look at a hosted service for the realtime updates, such as Pusher who I work for. This way the hosted service handles the scaling of the realtime infrastructure for you and lets you concentrate on building your application functionality.
This way you only need to handle the chat message requests - sanitize/verify the content - and then push the information through Pusher to the 1000's of connected clients.
The quick start guide is available here:
http://pusher.com/docs/quickstart
I've a full list of hosted services on my realtime web tech guide.
I am designing a file download network.
The ultimate goal is to have an API that lets you directly upload a file to a storage server (no gateway or something). The file is then stored and referenced in a database.
When the file is requsted a server that currently holds the file is selected from the database and a http redirect is done (or an API gives the currently valid direct URL).
Background jobs take care of desired replication of the file for durability/scaling purposes.
Background jobs also move files around to ensure even workload on the servers regarding disk and bandwidth usage.
There is no Raid or something at any point. Every drive ist just hung into the server as JBOD. All the replication is at application level. If one server breaks down it is just marked as broken in the database and the background jobs take care of replication from healthy sources until the desired redundancy is reached again.
The system also needs accurate stats for monitoring / balancing and maby later billing.
So I thought about the following setup.
The environment is a classic Ubuntu, Apache2, PHP, MySql LAMP stack.
An url that hits the currently storage server is generated by the API (thats no problem far. Just a classic PHP website and MySQL Database)
Now it gets interesting...
The Storage server runs Apache2 and a PHP script catches the request. URL parameters (secure token hash) are validated. IP, Timestamp and filename are validated so the request is authorized. (No database connection required, just a PHP script that knows a secret token).
The PHP script sets the file hader to use apache2 mod_xsendfile
Apache delivers the file passed by mod_xsendfile and is configured to have the access log piped to another PHP script
Apache runs mod_logio and an access log is in Combined I/O log format but additionally estended with the %D variable (The time taken to serve the request, in microseconds.) to calculate the transfer speed spot bottlenecks int he network and stuff.
The piped access log then goes to a PHP script that parses the url (first folder is a "bucked" just as google storage or amazon s3 that is assigned one client. So the client is known) counts input/output traffic and increases database fields. For performance reasons i thought about having daily fields, and updating them like traffic = traffic+X and if no row has been updated create it.
I have to mention that the server will be low budget servers with massive strage.
The can have a close look at the intended setup in this thread on serverfault.
The key data is that the systems will have Gigabit throughput (maxed out 24/7) and the fiel requests will be rather large (so no images or loads of small files that produce high load by lots of log lines and requests). Maby on average 500MB or something!
The currently planned setup runs on a cheap consumer mainboard (asus), 2 GB DDR3 RAM and a AMD Athlon II X2 220, 2x 2.80GHz tray cpu.
Of course download managers and range requests will be an issue, but I think the average size of an access will be around at least 50 megs or so.
So my questions are:
Do I have any sever bottleneck in this flow? Can you spot any problems?
Am I right in assuming that mysql_affected_rows() can be directly read from the last request and does not do another request to the mysql server?
Do you think the system with the specs given above can handle this? If not, how could I improve? I think the first bottleneck would be the CPU wouldnt it?
What do you think about it? Do you have any suggestions for improvement? Maby something completely different? I thought about using Lighttpd and the mod_secdownload module. Unfortunately it cant check IP adress and I am not so flexible. It would have the advantage that the download validation would not need a php process to fire. But as it only runs short and doesnt read and output the data itself i think this is ok. Do you? I once did download using lighttpd on old throwaway pcs and the performance was awesome. I also thought about using nginx, but I have no experience with that. But
What do you think ab out the piped logging to a script that directly updates the database? Should I rather write requests to a job queue and update them in the database in a 2nd process that can handle delays? Or not do it at all but parse the log files at night? My thought that i would like to have it as real time as possible and dont have accumulated data somehwere else than in the central database. I also don't want to keep track on jobs running on all the servers. This could be a mess to maintain. There should be a simple unit test that generates a secured link, downlads it and checks whether everything worked and the logging has taken place.
Any further suggestions? I am happy for any input you may have!
I am also planning to open soure all of this. I just think there needs to be an open source alternative to the expensive storage services as amazon s3 that is oriented on file downloads.
I really searched a lot but didnt find anything like this out there that. Of course I would re use an existing solution. Preferrably open source. Do you know of anything like that?
MogileFS, http://code.google.com/p/mogilefs/ -- this is almost exactly thing, that you want.
I'm wrapping up development on an iPhone game right now that uses data from a PHP/MYSQL database. I'm currently (pre-release) hosting all the data on a non-dedicated web hosting service, but I have no idea how that will scale once the game goes live. I'm a bit worried it will crumble to it's knees if the game is moderately popular.
The game doesn't pull in a lot of data. The average user will ping the database 3-4 times a minute just to grab a tiny amount of data (a few text strings). Everything works fine with just a couple people using it, but I don't understand MYSQL well enough to know how it will scale to potentially hundreds of simultaneous connections.
I'm skeptical to move it to a dedicated server because they're damn expensive and I have no idea if the game will tank out of the gate or if it even needs a dedicated server.
Any advice? And sorry if anything I've said here is just plain stupid. This isn't my area of expertise.
I would stay away from shared hosting for any real application like this. Dedicated servers are expensive, but you can get reliable and relatively inexpensive service from a virtual private server. I use a VPS from linode.com for all my dev work, the basic plan costs $20 a month and you can upgrade your plan very quickly (matter of minutes) if needed.
Load test it first!
You didn't indicate how the data is pulled from the MySQL database to the iPhones, so I am going to assume that it's using HTTP requests in some form. This means you can use a load testing tool, such as Apache's Benchmarking tool ab, to generate many concurrent requests to your server-side application and see if it handles the load.
If the application is just reading small amounts of data and you have indexed your tables properly you may be fine. But, as others have noted, a VPS is probably your best bet.
We are working on a website for a client that (for once) is expected to get a fair amount of traffic on day one. There are press releases, people are blogging about it, etc. I am a little concerned that we're going to fall flat on our face on day one. What are the main things you would look at to ensure (in advance without real traffic data) that you can stay standing after a big launch?
Details: This is a L/A/M/PHP stack, using an internally developed MVC framework. This is currently being launched on one server, with Apache and MySQL both on it, but we can break that up if need be.
We are already installing Memcached and doing as much PHP-level caching as we can think of. Some of the pages are rather query intensive, and we are using Smarty as our template engine. Keep in mind there is no time to change any of these major aspects--this is the just the setup. What sorts of things should we watch out for?
Measure first, and then optimize. Have you done any load testing? Where are the bottlenecks?
Once you know your bottlenecks then you can intelligently decide if you need additional database boxes or web boxes. Right now you'd just be guessing.
Also, how does your load testing results compare against your expected traffic? Can you handle two times the expected traffic? Five times? How easy/fast can you acquire and release extra hardware? I'm sure the business requirement is to not fail during launch, so make sure you have lots of capacity available. You can always release it afterwards when the load has stabilized and you know what you need.
I would at least factor out all static content. Set up another vhost somewhere else and load all the graphics, CSS, and JavaScript onto it. You can buy some extra cycles, offloading the serving of that type of content. If you're really concerned, you can signup and use a content distribution service. There are lots now similar to Akamai and quite cheap.
Another idea might be to utilize Apache mod_proxy to keep the generated page output for a specific amount of time. APC would also be quite usable... You could employ output buffering capture + the last modified time of related data on the page, and use the APC cached version. If the page isn't valid any more, you regenerate and store in APC again.
Good luck. It'll be a learning experience!
Have a beta period where you allow in as many users as you can handle, measure your site's performance, and work out bugs before you go live.
You can either control the number of users explicitly in a private beta, or a Google-style semi-public beta where each user has a number of referrals that they can offer to their friends.
To prepare or handle a spike (or peak) performance, I would first determine whether you are ready through some simple performance testing with something like jmeter.
It is easy to set up and get started and will give you early metrics whether you will handle an expected peak load.
However, given your time constraints, other steps to take would be to prepare static versions of content that will attract the highest attention (such as press releases, if your launch day). Also ensure that you are making the best use of client-side caching (one fewer request to your server can make all the difference). The web is already designed for extremely high scalability and effective use content caching is your best friend in these situations.
There is an excellent podcast on high scalability on software engineering radio on the design of the new Guardian website when things calm down.
Good luck on the launch.
I'd, personally, do a few things
1) Put in some sort of load balancer/database replication system
This means that you can have your service spread across multiple servers. Can't afford to have more than one server permanently? Use Amazon E3 - It's good for putting in place for things like this (switch on a few more servers to handle the load)
2) Code in some "High Load" restrictions
For example, if your searching is inefficient - switch it off when load gets to a certain level. "Sorry, we're busy, try again later for searching"
3) Load test... Use something like ApacheBench to stress test your servers.
4) Personally, I think that switching "Keep-Alive" Connections off is better. It may slightly reduce overall performance, but - it means that instead of having something where the site works well for a few people, and the others get timeouts, everyone gets inconsistent service, if it gets to that level
Linux Format did a good article on "How to survive a slashdotting"... which I've found useful in the past. It's available online as a PDF
Basic first steps to harden your site for high traffic.
Use a low-cost tool like https://browsermob.com/ to load-test your site. At a minimum, you should be looking at 100K unique visitors per hour. If you get an ad off of the MSN home page, look to be able to handle 500K unique visitors per hour.
Move all static graphic/video content to a CDN. Edgecast and Amazon are two excellent choices.
Use Jet Profiler to profile your MySQL server to analyze any slow performing queries. Minor changes can have huge benefits.
Look into using Varnish - it's a caching reverse proxy server (like Squid, but much more single purpose).
I've run some pretty big sites behind it, and it seemed to work really well.