Are there best practices for storing/limiting API usage? I'm looking to build a basic API using Laravel, but when trying to think about limiting daily API usage I'm getting stuck on the best approach.
Do I log each call in the database and use that to aggregate total API calls for the day to determine if the limit has been reached? What about concurrent API requests? If I want to limit to 1 API call every 5 seconds, is it best to do a database query and determine that?
Any advice would be much appreciated!
This is one of those things that I like to leave up to an existing provider.
To keep things simple, put something in front of your API which has the sole responsibility of rate limiting and controlling API consumers. This is often done with an API proxy of some kind.
3scale is a good solution. It's effectively Nginx with a module for doing all the heavy lifting. http://www.3scale.net/ It's also cheap (or free depending on your load).
There are others out there like Mashery, but frankly I've had terrible luck with Mashery since Intel bought them. DNS resolution issues, skyrocketing prices, etc.
Related
I would like to implement an API using php and mysql technologies that can handle several thousands of requests per second.
I haven't did this kind of API before. If you have an experienced to implement similar task could you please tell me what are the steps?
How can I implement such kind of API that can handle thousands of request per second?
I would be glad if you could explain with sample codes.
Thanks in advance for your help.
Based on the details described in the post, you likely want to use an asynchronous, stateless architecture. So requests don’t block resources and can scale easier (always sounds easier than actually doing it ;)).
Without knowing to what other services these servers would connect (it certainly doesn’t make things easier), I’d go for Elixir/Erlang as programming language and use Phoenix as a framework.
You get a robust functional language which comes with a lot of great built-in features/modules (e.g. mnesia, roll/unroll versions while being live) and scales well (good in utilizing all cores of your server).
If you need to queue up requests to the 2nd tier servers AMQP client/server (e.g. RabbitMQ) might be a good option (holds/stores the requests for the servers).
That works pretty okay if it’s stateless, in form of client asks one thing and the server responds once and is done with the task. If you have many requests because the clients ask for updates every single second, then you’re better switching to a stateful connection and use WebSockets so the server can push updates back to a lot of clients and cuts a lot of chatter/screaming.
All of this writing is from a ‘high up view’. In the end, it depends on what kind of services you want to provide. As that narrows down what the ‘suitable tool’ would be. My suggestion is one possibility which I think isn’t far off (Node.js mentioned before is also more than valid).
Well you need to consider several factors such as:
Authenticating the API. Your API should be called by valid users that are authorized and authenticated
Caching API results. Your API should cache the results of API call. This will allow your API to handle requests more quickly, and it will be able to handle more requests per second. Memcache can be used to cache results of API call
The API architecture. RESTFul APIs have less overhead as compared to SOAP based APIs. SOAP based APIs have better support for authentication. They are also better structured then RESTFul APIs.
API documentation. Your API should be well documented and easy for users to understand.
API scope. Your API should have a well defined scope. For example will it be used over the internet as a public API or will it be used as private API inside corporate intranet.
Device support. When designing your API you should keep in mind the devices that will consume your API. For example smart phones, desktop application, browser based application, server application etc
API output format. When designing your API you should keep in mind the format of the output. For example will the output contain user interface related data or just plain data. One popular approach is known as separation of concerns (https://en.wikipedia.org/wiki/Separation_of_concerns). For example separating the backend and frontend logic.
Rate limiting and throttling. Your API should implement rate limiting and throttling to prevent overuse and misuse of the API.
API versioning and backward compatibility. Your API should be carefully versioned. For example if you update your API, then the new version of your API should support older version of API clients. Your API should continue to support the old API clients until all the API clients have migrated to the new version of your API.
API pricing and monitoring. The usage of your API should be monitored, so you know who is using your API and how it is being used. You may also charge users for using your API.
Metric for success. You should also decide which metric to use for measuring the success of your API. For example number of API calls per second or monitory earnings from your API. Development activities such as research, publication of articles, open source code, participation in online forums etc may also be considered when determining the success of your API.
Estimation of cost involved. You should also calculate the cost of developing and deploying your API. For example how much time it will take you to produce a usable version of your API. How much of your development time the API takes etc.
Updating your API. You should also decide how often to update your API. For example how often should new features be added. You should also keep in mind the backward compatibility of your API, so updating your API should not negatively affect your clients.
Good answer, I think one thing to keep in mind is where the bottleneck is. Many times, the bottleneck isn't the API server itself but the data access patterns with the persistence layer.
Think about how you access your data. For posting new items, a lot of times the processing can be delayed and processed async to the original request. For example if resizing an image or sending an email, you can integrate RabmitMQ or SQS to queue up a job, which can be processed later by workers. Queues are great in buffering work so that if a server goes down, stuff is just queued up to be processed once back online.
On the query side, it is important to understand how indexing works and how data is stored. There are different types of indices, for example hash tables can give you constant access time, but you cannot perform range queries with hash tables. The easiest is if you have simple decentralized data objects queried by identifiers which can be stored in a index. If you're data is more complex where you need to do heavy joins or aggregations, then you can look at precomputed values stored in something like Redis or memcache.
I am looking for the "best practices" or just recommended methods of tracking stats. If I am developing a site that has youtube-like page stat tracking (views, visits, etc). It is pretty important that I have realtime statistics, but I want to avoid issues when scaling and was wondering if there are other methods to solve this besides caching.
I plan to use Google Analytics for most of the statistics, but Google only updates once every 3-4 hours.
I am a little worried about scalability. Some stats need to be realtime - how does a site like youtube handle it? Do they count stats in memory and then defer a database query to once-every-30-mins or are they just caching read requests and updating those every few hours? What would you recommend doing?
Thanks again SO, I'm so glad that the rest of you can share the wealth of experience that I lack.
I can recommend one method of tracking web statistics, if you have an iPhone, you might want to look at TeddyTrack. (Full disclosure: I worked on the project.) It is as realtime as it gets. In fact if you shake your iPhone, it updates your stats instantly. :-) You only get four graphs but they include (complex) weekly cohort graphs. It's far less complex to setup and manage then Google Analytics. But why choose? Why not get Google Analytics, Piwik, AWStats and TeddyTrack? Use all of them and see which ones you like best. TeddyTrack might suit you because it is very lightweight and it keeps your data on your server. Links:
awstats.sourceforge.net, piwik.org, www.google.com/analytics, teddytrackapp.com.
You can also use no free ones like above mentioned chartbeat.com but they cost serious money.
Not sure which version of Google Analytics you are using but the newer versions support real time stats. Check out http://analytics.blogspot.com/2011/09/whats-happening-on-your-site-right-now.html
Also you could check what sitecatalyst from Omniture (now acquired by Adobe) has to offer. Its been a while since I worked with it, but it is really enterprise and scalable.
All the best!
I have done a fair amount of reading on this and I am not quite sure what the correct way to go about this is.
I am accessing a websites api that provides information that I am using on my site. On average I will be making over 400 different API requests which means over 400 curl requests. What is the proper way to make my code pause for an amount of time then continue. The site does not limit the amount of hits on so I will not get banned for just pulling all of the stuff at once, but I would not want to be that server when 10,000 people like me do the same thing. What I am trying to do is pause my code and politely use the service they offer.
What is the best method to pause php execution with resource consumption in mind?
What is the most courteous amount of requests per wait cycle?
What is the most courteous amount of wait per cycle?
With all of these questions I would also like to obtain the information as fast as possible while attempting to stay with in the above questions.
sample eve central API response
Thank you in advance for your time and patience.
Here's a thought: have you asked? If an API has trouble handling a high load, they usually include a limit in their terms. If not, I'd recommend emailing the service provider, explain what you want to do, and ask what they think would be a reasonable load. Though it's quite possible that their servers are quite capable of handling any load you might reasonably want to give it, which is why they don't specify.
If you want to do good by the service provider, don't just guess want they want. Ask, and then you'll know exactly how far you can go without upsetting the people who built the API.
For the actual mechanics of pausing, I'd use the method alex suggested (but has since deleted) of PHP's usleep.
I have an algorithm that receives input and delivers output which I would like developers to use like an API. To prevent denial of service attack and excessive overuse, I want some rate limits or protection. What options do I have? Do I provide accounts and API keys? How would that generally work? And what other ideas are possible for this scenario?
Accounts and API keys does sound like a good idea, if nothing else it stops people other than your intended developers being able to access your API.
It should be fairly straightforward to have a simple database table logging the last time a particular API was accessed, and denying re-use if it is accessed too many times in a certain time frame. If possible, return the next time the API will be available for re-use in the output, so developers can throttle accordingly, instead of having to go for a trial and error approach.
Are you expecting the same inputs to be used over and over again or will it be completely random? What about caching the output and only serving the cache to the developer(s) until the API is ready for re-use? This approach is far less dependent on accounts and keys too.
API keys can definitely be a good way to go, there is also openAuth (http://oauth.net) if you scenarios where end users will be accessing the service via apps built by third parties.
If you don't want to code the rate limits / key management yourself, it's worth taking a look at http://www.3scale.net/ which does a lot of this free out of the box as a service (plus other stuff including a developer portal, billing and so on). As a disclaimer, I work there so I might have some bias but we try to make exactly this as simple as possible!
I should add, there's a PHP plugin for 3scale which you can drop into your code and that'll enable all the rate limits etc.
other options that are slightly less complex at the expense of accuracy is using the ip address. obviously this is easier to overcome, but for the average user that does not know what an ip address is it works. Also easy to set up.
it all depends on the complexity of the app and the amount of time you got to do it in
I am working on a API for my web application written in CodeIgniter. This is my first time writing a API.
What is the best way of imposing a API limit on the API?
Thanks for your time
Log the user's credentials (if he has to provide them) or his IP address, the request (optional) and a timestamp in a database.
Now, for every request, you delete records where the timestamp is more than an hour ago, check how many requests for that user are still in the table, and if that is more than your limit, deny the request.
Simple solution, keep in mind, though, there might be more performant solutions out there.
Pretty straight forward. If that doesn't answer your question, please provide more details.
I don't see how this is codeigniter related, for example.
You can use my REST_Controller to do basically all of this for you:
http://net.tutsplus.com/tutorials/php/working-with-restful-services-in-codeigniter-2/
I recently added in some key logging, request limiting features in so this can all be done through config.
One thing you can do is consider using an external service to impose API limits and provide API management functionality in general.
For example, my company, WebServius ( http://www.webservius.com ) provides a layer that sits in front of your API and can provide per-user throttling (e.g. requests per API key per hour), API-wide throttling (e.g. total requests per hour), adaptive throttling (where throttling limits decrease as API response time increases), etc, with other features coming soon (e.g. IP-address-based throttling). It also provides a page for user registration / issuing API keys, and many other useful features.
Of course, you may also want to look at our competitors, such as Mashery or Apigee.