How can handle thousands of requests per second using php and mysql? - php

I would like to implement an API using php and mysql technologies that can handle several thousands of requests per second.
I haven't did this kind of API before. If you have an experienced to implement similar task could you please tell me what are the steps?
How can I implement such kind of API that can handle thousands of request per second?
I would be glad if you could explain with sample codes.
Thanks in advance for your help.

Based on the details described in the post, you likely want to use an asynchronous, stateless architecture. So requests don’t block resources and can scale easier (always sounds easier than actually doing it ;)).
Without knowing to what other services these servers would connect (it certainly doesn’t make things easier), I’d go for Elixir/Erlang as programming language and use Phoenix as a framework.
You get a robust functional language which comes with a lot of great built-in features/modules (e.g. mnesia, roll/unroll versions while being live) and scales well (good in utilizing all cores of your server).
If you need to queue up requests to the 2nd tier servers AMQP client/server (e.g. RabbitMQ) might be a good option (holds/stores the requests for the servers).
That works pretty okay if it’s stateless, in form of client asks one thing and the server responds once and is done with the task. If you have many requests because the clients ask for updates every single second, then you’re better switching to a stateful connection and use WebSockets so the server can push updates back to a lot of clients and cuts a lot of chatter/screaming.
All of this writing is from a ‘high up view’. In the end, it depends on what kind of services you want to provide. As that narrows down what the ‘suitable tool’ would be. My suggestion is one possibility which I think isn’t far off (Node.js mentioned before is also more than valid).

Well you need to consider several factors such as:
Authenticating the API. Your API should be called by valid users that are authorized and authenticated
Caching API results. Your API should cache the results of API call. This will allow your API to handle requests more quickly, and it will be able to handle more requests per second. Memcache can be used to cache results of API call
The API architecture. RESTFul APIs have less overhead as compared to SOAP based APIs. SOAP based APIs have better support for authentication. They are also better structured then RESTFul APIs.
API documentation. Your API should be well documented and easy for users to understand.
API scope. Your API should have a well defined scope. For example will it be used over the internet as a public API or will it be used as private API inside corporate intranet.
Device support. When designing your API you should keep in mind the devices that will consume your API. For example smart phones, desktop application, browser based application, server application etc
API output format. When designing your API you should keep in mind the format of the output. For example will the output contain user interface related data or just plain data. One popular approach is known as separation of concerns (https://en.wikipedia.org/wiki/Separation_of_concerns). For example separating the backend and frontend logic.
Rate limiting and throttling. Your API should implement rate limiting and throttling to prevent overuse and misuse of the API.
API versioning and backward compatibility. Your API should be carefully versioned. For example if you update your API, then the new version of your API should support older version of API clients. Your API should continue to support the old API clients until all the API clients have migrated to the new version of your API.
API pricing and monitoring. The usage of your API should be monitored, so you know who is using your API and how it is being used. You may also charge users for using your API.
Metric for success. You should also decide which metric to use for measuring the success of your API. For example number of API calls per second or monitory earnings from your API. Development activities such as research, publication of articles, open source code, participation in online forums etc may also be considered when determining the success of your API.
Estimation of cost involved. You should also calculate the cost of developing and deploying your API. For example how much time it will take you to produce a usable version of your API. How much of your development time the API takes etc.
Updating your API. You should also decide how often to update your API. For example how often should new features be added. You should also keep in mind the backward compatibility of your API, so updating your API should not negatively affect your clients.

Good answer, I think one thing to keep in mind is where the bottleneck is. Many times, the bottleneck isn't the API server itself but the data access patterns with the persistence layer.
Think about how you access your data. For posting new items, a lot of times the processing can be delayed and processed async to the original request. For example if resizing an image or sending an email, you can integrate RabmitMQ or SQS to queue up a job, which can be processed later by workers. Queues are great in buffering work so that if a server goes down, stuff is just queued up to be processed once back online.
On the query side, it is important to understand how indexing works and how data is stored. There are different types of indices, for example hash tables can give you constant access time, but you cannot perform range queries with hash tables. The easiest is if you have simple decentralized data objects queried by identifiers which can be stored in a index. If you're data is more complex where you need to do heavy joins or aggregations, then you can look at precomputed values stored in something like Redis or memcache.

Related

RESTful API in PHP - optimising successive requests?

I'm working (for the first time) on developing a PHP application driven by a PHP RESTful API (probably using peej/Tonic). Coming from an application with direct access which might make 20 different database calls during the course of a page load, I am trying to reconcile the fact that 20 API calls = 20x handshakes (which can be improved by Guzzle persistent connections) but also 20x connections to the database.
I believe that with better programming and planning, I can get my required API calls down to 4-5 per page. At this point:
a) Is it not worth considering the latency of 5x database connections + 5x handshakes per page load on account of all the other available optimisations?
b) Is there an existing method by which this can be mitigated that I've thus far failed to find?
c) I believe it violates the principles of RESTful programming but if I had a single API method which itself gathered information from other API endpoints (for instance, GET suppliers WHERE x=y then GET products for supplier), is there a documented method for internal API interaction (particularly within peej/Tonic or other frameworks).
Thank you all for your wisdom in advance.
Remember that the client should be "making 'a request' of the server," which is obliged to fulfill "that request." The server might "execute 20 different database queries" to prepare its response, and the client need not know nor care.
The client's point-of-view becomes, "I care what you tell me, not how you did it."
If you did wish to send query-responses directly to the client, for the client to "do the dirty-work" with these data, then you could still design your server request so that the server did many queries, all at once, and sent all of the result-sets back ... in just one exchange.
Your first priority should be to effectively minimize the number of exchanges that take place. The amount of data returned (within reason) is secondary.
Also consider that, "when the server does it, the work is naturally synchronized." When the client issues multiple asynchronous requests, those requests are, well, "asynchronous." Consider which strategy will be easier for you to debug.
If the server is given "a request to do," it can validate the request (thus checking for client bugs), and perform any number of database operations, perhaps in a TRANSACTION. This strategy, which puts the server into a very active role, is often much less complex than an interaction that is driven by the client with the server taking a passive role.

Server-side analytics and server-side logging in particular (PHP)

I need to log details in PHP of analytics and usage.
I'm looking at various possibilities:
- Google Analytics server-side
- segment.io
- Just adding a record to a DB with PHP
My concern is how much additional processing this will take on my server. Of course Google Analytics' JavaScript implementation won't use anything on my server, but my server-side method of course will.
I also notice that on https://segment.io/docs/integrations/google-analytics they mention that "Server-side Google Analytics is being deprecated due to difficulty of use" - what does this mean?
So basically, I want to implement some basic analytics storing (count number of hits to a URL + some other basic info) server-side - what's the best way to do this considering all things? I only use the PHP language.
It seems that adding a record to the DB every page view might be a little too much.
Segment.io can actually give you the flexibility of all three of these. Using the php library https://segment.io/libraries/php you can start sending events from your server. The library is designed to queue and batch to maximize server efficiency.
Once the events leave your server, they'll go to Segment.io's servers. Once there, we can route the data to Google Analytics.
Additionally, you could use the "Webhooks" integration on Segment.io to set your own server as a receiving endpoint for the data in real-time, so that you could host your own analytics DB separately from the rest of your infrastructure quite easily/cleanly.
https://segment.io/docs/integrations/webhooks

Real time activity feed - code / platform implementation?

I am defining out specs for a live activity feed on my website. I have the backend of the data model done but the open area is the actual code development where my development team is lost on the best way to make the feeds work. Is this purely done by writing custom code or do we need to use existing frameworks to make the feeds work in real time? Some suggestions thrown to me were to use reverse AJAX for this. Some one mentioned having the client poll the server every x seconds but i dont like this because it is unwanted server traffic if there are no updates. I was also mentioned a push engine like light streamer to push from server to browser.
So in the end: What is the way to go? Is it code related, purely pushing SQL quires, using frameworks, using platforms, etc.
My platform is written in PHP codeignitor and DB is MySQL.
The activity stream will have lots of activities. There are 42 components on the social networking I am developing, each component has approx 30ish unique activities that can be streamed.
Check out http://www.stream-hub.com/
I have been using superfeedr.com with Rails and I can tell you it works really well. Here are a few facts about it:
Pros
Julien, the lead developer is very helpful when you encounter a problem.
Immediate push of new feed entries which support PubSubHubHub.
JSon response which is perfect for parsing whoever you'd like.
Retrieve API in case the update callback fails and you need to retrieve the latest entries for a given feed.
Cons
Documentation is not up to the standards I would like, so you'll likely end up searching the web to find obscure implementation details.
You can't control how often superfeedr fetches each feed, they user a secret algorithm to determine that.
The web interface allows you to manage your feeds but becomes difficult to use when you subscribe to a loot of them
Subscription verification mechanism works synchronous so you need to make sure the object URL is ready for the superfeedr callback to hit it (they do provide an async option which does not seem to work well).
Overall I would recommend superfeedr as a good solution for what you need.

REST API for a PHP Web application

I am working on a API for my web application written in CodeIgniter. This is my first time writing a API.
What is the best way of imposing a API limit on the API?
Thanks for your time
Log the user's credentials (if he has to provide them) or his IP address, the request (optional) and a timestamp in a database.
Now, for every request, you delete records where the timestamp is more than an hour ago, check how many requests for that user are still in the table, and if that is more than your limit, deny the request.
Simple solution, keep in mind, though, there might be more performant solutions out there.
Pretty straight forward. If that doesn't answer your question, please provide more details.
I don't see how this is codeigniter related, for example.
You can use my REST_Controller to do basically all of this for you:
http://net.tutsplus.com/tutorials/php/working-with-restful-services-in-codeigniter-2/
I recently added in some key logging, request limiting features in so this can all be done through config.
One thing you can do is consider using an external service to impose API limits and provide API management functionality in general.
For example, my company, WebServius ( http://www.webservius.com ) provides a layer that sits in front of your API and can provide per-user throttling (e.g. requests per API key per hour), API-wide throttling (e.g. total requests per hour), adaptive throttling (where throttling limits decrease as API response time increases), etc, with other features coming soon (e.g. IP-address-based throttling). It also provides a page for user registration / issuing API keys, and many other useful features.
Of course, you may also want to look at our competitors, such as Mashery or Apigee.

Best Practices For Secure APIs?

Let's say I have a website that has a lot of information on our products. I'd like some of our customers (including us!) to be able to look up our products for various methods, including:
1) Pulling data from AJAX calls that return data in cool, JavaScripty-ways
2) Creating iPhone applications that use that data;
3) Having other web applications use that data for their own end.
Normally, I'd just create an API and be done with it. However, this data is in fact mildly confidential - which is to say that we don't want our competitors to be able to look up all our products every morning and then automatically set their prices to undercut us. And we also want to be able to look at who might be abusing the system, so if someone's making ten million complex calls to our API a day and bogging down our server, we can cut them off.
My next logical step would be then to create a developers' key to restrict access - which would work fine for web apps, but not so much for any AJAX calls. (As I see it, they'd need to provide the key in the JavaScript, which is in plaintext and easily seen, and hence there's actually no security at all. Particularly if we'd be using our own developers' keys on our site to make these AJAX calls.)
So my question: after looking around at Oauth and OpenID for some time, I'm not sure there is a solution that would handle all three of the above. Is there some sort of canonical "best practices" for developers' keys, or can Oauth and OpenID handle AJAX calls easily in some fashion I have yet to grok, or am I missing something entirely?
I think that 2-legged OAuth is what you want to satisfy #2 and #3. For #1 I would suggest that instead of the customer making JS requests directly against your application, they could instead proxy those requests through their own web application.
A midway solution is to require an API key; and then demand that whomsoever uses it doesn't actually use it directly with the AJAX; but wrap their calls in a server-side request, e.g.:
AJAX -> customer server -> your server -> customer server -> user
Creating a simple PHP API for interested parties shouldn't be too tricky, and your own iPhone applications would obviously cut out the middle man, shipping with their own API key.
OAuth and OpenID are unlikely to have much to do with the AJAX calls directly. Most likely, you'll have some sort of authorization filter in front of your AJAX handler that checks a cookie, and maybe that cookie is set as a result of an OpenID authentication.
It seems like this is coming down to a question of "how do I prevent screen scraping." If only logged-in customers get to see the prices, that's one thing, but assuming you're like most retail sites and your barrier to customer sign-up is as low as possible, that doesn't really help.
And, hey, if your prices aren't available, you don't get to show up in search engines like Froogle or Nextag or PriceGrabber. But that's more of a business strategy decision, not a programming one.

Categories