I am starting to redesign my current website project and would like to replace it by an architecture that can help me scale easily and has good performance.
Our prototype is running on a PHP Framework (CAKEPHP) + mySQL Server + 1ghz virutal server WIN2008 Server (feasibility test).
This system will not go online since it would not be able to meet the requirements.
Should be able to handle LOTS of HTTP-Requests per second ( scalability via hadoop maybe ? ) may become a bottleneck.
Should be able to handle many simultaneous uploads per minute ( mediafiles ) on the filesystem read and write + Media conversion ( Atm something like LAME encoder, are there faster tools? ) => bottleneck
Database gets many QUERIES per second ( somehow using clustering SQL Cluster or any cheaper product available? )
USE of CDN for static mediafiles
UNIX System?
should file compression be used ? CPU vs Bandwidth cost?
The scary part are the HTTP Requests and mediaupload & converter.
I started researching on www.highscalability.com for some good ebooks and would appreciate if some pro´s out here could give me some advice on this our helpful links.
You could take a look at those books:
High Performance Web Sites: Essential Knowledge for Front-End
Engineers (Steve Souders)
Even Faster Web Sites: Performance Best
Practices for Web Developers (Steve Souders)
Building Scalable Web Sites (Cal Henderson)
Use Linux as operating system,
Nginx as webserver (serveral instances
can be loadbalanced) Nginx can handle a lot
of http requests!
Redis for caching.
Solr or Sphinx for searching.
You could use a mongdb to store the files or take a look at mogilefs
For media conversions you could try node.js:
http://www.benfarrell.com/2012/06/14/morph-your-media-with-node-js/.
There are a few conversion(?) libraries at github:
http://github.com/TooTallNate/node-lame
Perhaps a few Servers with gearman installed could handle the conversion as well.
use Linux as Centos, debian no windows
use Pseudo-static
use cache
if the Browsing speed is not fast you should need CDN
if you have enough money more CPUs are perfect
you can use some Traffic monitoring software to judge wheather need
more Bandwidth or not
Related
I'm running an app like 9gag where uers can upload and watch images and videos so same images and videos are requested up to 100 times per minute which puts a big workload on the SSD so storing last used media in RAM and serve it from there would be better.
I've read that memcached and redis aren't good for that but without good explanations why not, can someone explain? Is vanish a better solution and does it work with PHP?
I need the best solution preferable using PHP.
I would definitely not advise you to store these types of workloads in Memcached or Redis and I would also not advise you to have these workloads processed by PHP.
Varnish is indeed the way to go here.
Why not Memcached & Redis?
Memcached and Redis are distributed key value stores. They are extremely fast and scalable and are perfect to store small values that change on a regular basis.
Image and video files are quite large and wouldn't really fit well in these memory-only databases. Keep in mind that Redis and Memcached aren't directly accessible from the web, they are caches that you would call from a web application.
That means there is additional latency running them through an application runtime like PHP.
Why not PHP?
Don't get me wrong, I'm a huge PHP fan and have been part of the PHP community since 2007. PHP is great for building web pages, but not so great for processing binary data.
These types of workloads that you're looking to process can easily overwhelm a PHP-FPM or PHP-CLI runtime.
It is possible to use PHP, but you'll need so many servers to handle video and image processing at large scale, that it will become an operational burden.
Why Varnish?
Varnish is a reverse caching proxy that sits in front of your web application, unlike distributed caches like Memcached and Redis that sit behind your web application.
This mean you can just store images and videos on the disk of your webserver and Varnish will cache requested content in memory without having to access the webserver upon every request.
Varnish is built for large-scale HTTP processing and is extremely good at handling HTTP responses of any size at large scale.
Varnish is software that is used by CDNs and OTT video streaming platforms to deliver imagery and online video.
Using video protocols like HLS, MPEG-DASH or CMAF, these streaming videos are chunked up in segments and indexed in manifest files.
A single Varnish server can serve these with sub-millisecond latency with a bandwidth up to 500 Gbps and a concurrency of about 100,000 requests.
The amount of machines you need will be way less than if you'd do this in PHP.
The Varnish Configuration Language, which is the domain-specific programming language that comes with Varnish, can also be used to perform certain customization tasks within the request/response flow.
The VCL code is only required to extend standard behavior, whereas in regular development languages like PHP you have to define all the behavior in code.
Here's a couple of Varnish-related resources:
The Varnish Developer Portal: https://www.varnish-software.com/developers/
The Varnish documentation: http://varnish-cache.org/docs/
The Varnish 6 By Example book that I wrote: https://info.varnish-software.com/resources/varnish-6-by-example-book
Maybe even Varnish Enterprise?
The only challenge is caching massive amounts of image/video content. Because Varnish stores everything in memory, you'll need enough memory to store all the content.
Although you can scale Varnish horizontally and use consistent hashing algorithms to balance the content across multiple Varnish servers, you'll probably still need quite a number of servers. This depends on the amount of content that needs to be stored in cache at all times.
If your origin web platform is powerful enough to handle requests for uncached long-tail content, Varnish could store the hot content in memory and trigger caches misses for that long-tail content. That way you might not need a lot of caching servers. This mainly depends on the traffic patterns of your platform.
The open source version of Varnish does have a file storage engine, but it behaves really poorly and is prone to disk fragmentation at large scale. This will slow you down quite significantly as write operations increase.
To tackle this issue Varnish Software, the commercial entity behind the open source project, came up with the Massive Storage Engine (MSE). MSE tackles the typical issues that come with file caching in a very powerful way.
The technology is used by some of the biggest video streaming platforms in the world.
See https://docs.varnish-software.com/varnish-cache-plus/features/mse/ for more information about MSE.
Varnish Enterprise and MSE are not free and open source. It's up to you to figure out what would be the cheaper solution from a total cost of ownership point of view: managing a lot of memory-based open source Varnish servers or paying the license fees of a limited amount of Varnish Enterprise servers with MSE.
Need some web application performance measurement tool.. Can you guys suggest me some better ones..
Purpose: First, app is built on Lumen and Dashboard is built upon Laravel. So why I want something is to measure all requests performance to app and then I can to note down results of each and every requests' time consumption, based on that app can be optimized in better way
I did some google found JMeter is most of the people's choice, as its from apache and does the job but it looks lil complex, also found https://locust.io/ interesting, that I'm gonna give it a try
But I would more like to get experts suggestions or advice on this
Thanks!
There is quite a number of free load testing tools and the absolute majority of them supports HTTP protocol so feel free to choose any.
Regarding JMeter and Locust, if you can develop code in Python - go for Locust as you won't have to learn new things and will be able to start right away.
If your Python programming skills are not that good I would recommend reconsidering JMeter as it is not that complex at all:
JMeter is GUI based so you can create your test using mouse.
JMeter comes with HTTP(S) Test Script Recorder so you will be able to create test plan "skeleton" in few minutes using your favourite browser
JMeter supports way more protocols, i.e. you can load test databases via JDBC, mail servers via SMTP/IMAP/POP, MQ servers via JMS, etc. while Locust is more HTTP-oriented, if you need more - you have to code
If above points sound promising check out JMeter Academy - the fastest and the most efficient way of ramping up on JMeter as of now.
XHProf you can use it check every function exec time! it can show you with a web gui!
https://pecl.php.net/package/xhprof
XHProf is a function-level hierarchical profiler for PHP and has a simple HTML based navigational interface. The raw data collection component is implemented in C (as a PHP extension). The reporting/UI layer is all in PHP. It is capable of reporting function-level inclusive and exclusive wall times, memory usage, CPU times and number of calls for each function. Additionally, it supports ability to compare two runs (hierarchical DIFF reports), or aggregate results from multiple runs.
I am developing an iPhone app and would like to create some sort of RESTful API so different users of the app can share information/data. To create a community of sorts.
Say my app is some sort of game, and I want the user to be able to post their highscore on a global leaderboard as well as maintain a list of friends and see their scores. My app is nothing like this but it shows the kind of collective information access I need to implement.
The way I could implement this is to set up a PHP and MySQL server and have a php script that interacts with the database and mediates the requests between the DB and each user on the iPhone, by taking a GET request and returning a JSON string.
Is this a good way to do it? Seems to me like using PHP is a slow way to implement this as opposed to say a compiled language. I could be very wrong though. I am trying to keep my hosting bills down because I plan to release the app for free. I do recognise that an implementation that performs better in terms of CPU cycles and RAM usage (e.g. something compiled written in say C#?) might require more expensive hosting solutions than say a LAMP server so might actually end up being more expensive in terms of $/request.
I also want my implementation to be scalable in the rare case that a lot of people start using the app. Does the usage volume shift the performance/$ ratio towards a different implementation? I.e. if I have 1k request/day it might be cheaper to use PHP+MySQL, but 1M requests/day might make using something else cheaper?
To summarise, how would you implement a (fairly simple) remote database that would be accessed remotely using HTTP(S) in order to minimise hosting bills? What kind of hosting solution and what kind of platform/language?
UPDATE: per Karl's suggestion I tried: Ruby (language) + Sinatra (framework) + Heroku (app hosting) + Amazon S3 (static file hosting). To anyone reading this who might have the same dilemma I had, this setup is amazing: effortlessly scalable (to "infinity"), affordable, easy to use. Thanks Karl!
Can't comment on DB specifics yet because I haven't implemented that yet although for my simple query requirements, CouchDB and MongoDB seem like good choices and they are integrated with Heroku.
Have you considered using Sinatra and hosting it on [Heroku]? This is exactly what Sinatra excels at (REST services). And hosting with Heroku may be free, depending on the amount of data you need to store. Just keep all your supporting files (images, javascript, css) on S3. You'll be in the cloud and flying in no time.
This may not fit with your PHP desires, but honestly, it doesn't get any easier than Sinatra.
It comes down to a tradeoff between cost vs experience.
if you have the expertise, I would definitely look into some form of cloud based infrastructure, something like Google App Engine. Which cloud platform you go with depends on what experience you have with different languages (AppEngine only works with Python/Java for e.g). Generally though, scalable cloud based platforms have more "gotchas" and need more know-how, because they are specifically tuned for high-end scalability (and thus require knowledge of enterprise level concepts in some cases).
If you want to be up and running as quickly and simply as possible I would personally go for a CakePHP install. Setup the model data to represent the basic entities you are managing, then use CakePHP's wonderful convention-loving magic to expose CRUD updates on these models with ease!
The technology you use to implement the REST services will have a far less significant impact on performance and hosting costs than the way you use HTTP. Learning to take advantage of HTTP is far more than simply learning how to use GET, PUT, POST and DELETE.
Use whatever server side technology you already know and spend some quality time reading RFC2616. You'll save yourself a ton of time and money.
In your case its database server that's accessed on each request. so even if you have compiled language (say C# or java) it wont matter much (unless you are doing some data transformation or processing).
So DB server have to scale well. here your choice of language and DB should be well configured with host OS.
In short PHP+MySQL is good if you are sending/receiving JSON strings and storing/retrieving in DB with minimum data processing.
next app gets popular and if your app don't require frequent updates to existing data then you can move such data to very high scalable databases like MongoDB (JSON friendly).
I have a php script which requires no web hosting or disk space etc.
The script simply retrieves infromation from a server, does some processing on this information
and than passes it on to a client (an iPhone app in this case).
The only thing is that if traffic gets high than there is a high demand for bandwidth and speed.
Does anyone know of a service with high speed and badwidth (apart from web hosting services) that allows you to host (on a static ip) such a php script?
Thanks.
You may want to try some sort of cloud service where you can set up the environment you actually need. Let's say your script need a lot of RAM but only little CPU power (or the other way around) you can have exactly such a system. Amazon EC2 is one of many cloud computing providers out there.
hmm the performance point you can use something like "Facebook HipHop" to convert your php script into "c++" then you have the performance you need.
Cloud solution is perfect. You can even write shell scripts to increase decrease RAM whenever demand goes up.
Like everyone here mentioned, cloud hosting is your best bet. It's slightly more expensive for resources & bandwidth than a dedicated but is superior in performance/latency/scalability. I have a similar application setup for a current project and I am running on the RackSpace cloud with 100K+ active users on a daily basis and I have had no problems (been running for 6 months).
If your code is simple, don't use Php !
You can consider:
Python
GWan server + C
Java
Php is good for big project because its simple, fast to use/test/debug ...
I recently experienced a flood of traffic on a Facebook app I created (mostly for the sake of education, not with any intention of marketing)
Needless to say, I did not think about scalability when I created the app. I'm now in a position where my meager virtual server hosted by MediaTemple isn't cutting it at all, and it's really coming down to raw I/O of the machine. Since this project has been so educating to me so far, I figured I'd take this as an opportunity to understand the Amazon EC2 platform.
The app itself is created in PHP (using Zend Framework) with a MySQL backend. I use application caching wherever possible with memcached. I've spent the weekend playing around with EC2, spinning up instances, installing the packages I want, and mounting an EBS volume to an instance.
But what's the next logical step that is going to yield good results for scalability? Do I fire up an AMI instance for the MySQL and one for the Apache service? Or do I just replicate the instances out as many times as I need them and then do some sort of load balancing on the front end? Ideally, I'd like to have a centralized database because I do aggregate statistics across all database rows, however, this is not a hard requirement (there are probably some application specific solutions I could come up with to work around this)
I know this is probably not a straight forward answer, so opinions and suggestions are welcome.
So many questions - all of them good though.
In terms of scaling, you've a few options.
The first is to start with a single box. You can scale upwards - with a more powerful box. EC2 have various sized instances. This involves a server migration each time you want a bigger box.
Easier is to add servers. You can start with a single instance for Apache & MySQL. Then when traffic increases, create a separate instance for MySQL and point your application to this new instance. This creates a nice layer between application and database. It sounds like this is a good starting point based on your traffic.
Next you'll probably need more application power (web servers) or more database power (MySQL cluster etc.). You can have your DNS records pointing to a couple of front boxes running some load balancing software (try Pound). These load balancing servers distribute requests to your webservers. EC2 has Elastic Load Balancing which is an alternative to managing this yourself, and is probably easier - I haven't used it personally.
Something else to be aware of - EC2 has no persistent storage. You have to manage persistent data yourself using the Elastic Block Store. This guide is an excellent tutorial on how to do this, with automated backups.
I recommend that you purchase some reserved instances if you decide EC2 is the way forward. You'll save yourself about 50% over 3 years!
Finally, you may be interested in services like RightScale which offer management services at a cost. There are other providers available.
First step is to separate concerns. I'd split off with a separate MySQL server and possibly a dedicated memcached box, depending on how high your load is there. Then I'd monitor memory and CPU usage on each box and see where you can optimize where possible. This can be done with spinning off new Media Temple boxes. I'd also suggest Slicehost for a cheaper, more developer-friendly alternative.
Some more low-budget PHP deployment optimizations:
Using a more efficient web server like nginx to handle static file serving and then reverse proxy app requests to a separate Apache instance
Implement PHP with FastCGI on top of nginx using something like PHP-FPM, getting rid of Apache entirely. This may be a great alternative if your Apache needs don't extend far beyond mod_rewrite and simpler Apache modules.
If you prefer a more high-level, do-it-yourself approach, you may want to check out Scalr (code at Google Code). It's worth watching the video on their web site. It facilities a scalable hosting environment using Amazon EC2. The technology is open source, so you can download it and implement it yourself on your own management server. (Your Media Temple box, perhaps?) Scalr has pre-built AMIs (EC2 appliances) available for some common use cases.
web: Utilizes nginx and its many capabilities: software load balancing, static file serving, etc. You'd probably only have one of these, and it would probably implement some sort of connection to Amazon's EBS, or persistent storage solution, as mentioned by dcaunt.
app: An application server with Apache and PHP. You'd probably have many of these, and they'd get created automatically if more load needed to be handled. This type of server would hold copies of your ZF app.
db: A database server with MySQL. Again, you'd probably have many of these, and more slave instances would get created automatically if more load needed to be handled.
memcached: A dedicated memcached server you can use to have centralized caching, session management, et cetera across all your app instances.
The Scalr option will probably take some more configuration changes, but if you feel your scaling needs accelerating quickly it may be worth the time and effort.