Best Practices For Secure APIs? - php

Let's say I have a website that has a lot of information on our products. I'd like some of our customers (including us!) to be able to look up our products for various methods, including:
1) Pulling data from AJAX calls that return data in cool, JavaScripty-ways
2) Creating iPhone applications that use that data;
3) Having other web applications use that data for their own end.
Normally, I'd just create an API and be done with it. However, this data is in fact mildly confidential - which is to say that we don't want our competitors to be able to look up all our products every morning and then automatically set their prices to undercut us. And we also want to be able to look at who might be abusing the system, so if someone's making ten million complex calls to our API a day and bogging down our server, we can cut them off.
My next logical step would be then to create a developers' key to restrict access - which would work fine for web apps, but not so much for any AJAX calls. (As I see it, they'd need to provide the key in the JavaScript, which is in plaintext and easily seen, and hence there's actually no security at all. Particularly if we'd be using our own developers' keys on our site to make these AJAX calls.)
So my question: after looking around at Oauth and OpenID for some time, I'm not sure there is a solution that would handle all three of the above. Is there some sort of canonical "best practices" for developers' keys, or can Oauth and OpenID handle AJAX calls easily in some fashion I have yet to grok, or am I missing something entirely?

I think that 2-legged OAuth is what you want to satisfy #2 and #3. For #1 I would suggest that instead of the customer making JS requests directly against your application, they could instead proxy those requests through their own web application.

A midway solution is to require an API key; and then demand that whomsoever uses it doesn't actually use it directly with the AJAX; but wrap their calls in a server-side request, e.g.:
AJAX -> customer server -> your server -> customer server -> user
Creating a simple PHP API for interested parties shouldn't be too tricky, and your own iPhone applications would obviously cut out the middle man, shipping with their own API key.

OAuth and OpenID are unlikely to have much to do with the AJAX calls directly. Most likely, you'll have some sort of authorization filter in front of your AJAX handler that checks a cookie, and maybe that cookie is set as a result of an OpenID authentication.
It seems like this is coming down to a question of "how do I prevent screen scraping." If only logged-in customers get to see the prices, that's one thing, but assuming you're like most retail sites and your barrier to customer sign-up is as low as possible, that doesn't really help.
And, hey, if your prices aren't available, you don't get to show up in search engines like Froogle or Nextag or PriceGrabber. But that's more of a business strategy decision, not a programming one.

Related

How can handle thousands of requests per second using php and mysql?

I would like to implement an API using php and mysql technologies that can handle several thousands of requests per second.
I haven't did this kind of API before. If you have an experienced to implement similar task could you please tell me what are the steps?
How can I implement such kind of API that can handle thousands of request per second?
I would be glad if you could explain with sample codes.
Thanks in advance for your help.
Based on the details described in the post, you likely want to use an asynchronous, stateless architecture. So requests don’t block resources and can scale easier (always sounds easier than actually doing it ;)).
Without knowing to what other services these servers would connect (it certainly doesn’t make things easier), I’d go for Elixir/Erlang as programming language and use Phoenix as a framework.
You get a robust functional language which comes with a lot of great built-in features/modules (e.g. mnesia, roll/unroll versions while being live) and scales well (good in utilizing all cores of your server).
If you need to queue up requests to the 2nd tier servers AMQP client/server (e.g. RabbitMQ) might be a good option (holds/stores the requests for the servers).
That works pretty okay if it’s stateless, in form of client asks one thing and the server responds once and is done with the task. If you have many requests because the clients ask for updates every single second, then you’re better switching to a stateful connection and use WebSockets so the server can push updates back to a lot of clients and cuts a lot of chatter/screaming.
All of this writing is from a ‘high up view’. In the end, it depends on what kind of services you want to provide. As that narrows down what the ‘suitable tool’ would be. My suggestion is one possibility which I think isn’t far off (Node.js mentioned before is also more than valid).
Well you need to consider several factors such as:
Authenticating the API. Your API should be called by valid users that are authorized and authenticated
Caching API results. Your API should cache the results of API call. This will allow your API to handle requests more quickly, and it will be able to handle more requests per second. Memcache can be used to cache results of API call
The API architecture. RESTFul APIs have less overhead as compared to SOAP based APIs. SOAP based APIs have better support for authentication. They are also better structured then RESTFul APIs.
API documentation. Your API should be well documented and easy for users to understand.
API scope. Your API should have a well defined scope. For example will it be used over the internet as a public API or will it be used as private API inside corporate intranet.
Device support. When designing your API you should keep in mind the devices that will consume your API. For example smart phones, desktop application, browser based application, server application etc
API output format. When designing your API you should keep in mind the format of the output. For example will the output contain user interface related data or just plain data. One popular approach is known as separation of concerns (https://en.wikipedia.org/wiki/Separation_of_concerns). For example separating the backend and frontend logic.
Rate limiting and throttling. Your API should implement rate limiting and throttling to prevent overuse and misuse of the API.
API versioning and backward compatibility. Your API should be carefully versioned. For example if you update your API, then the new version of your API should support older version of API clients. Your API should continue to support the old API clients until all the API clients have migrated to the new version of your API.
API pricing and monitoring. The usage of your API should be monitored, so you know who is using your API and how it is being used. You may also charge users for using your API.
Metric for success. You should also decide which metric to use for measuring the success of your API. For example number of API calls per second or monitory earnings from your API. Development activities such as research, publication of articles, open source code, participation in online forums etc may also be considered when determining the success of your API.
Estimation of cost involved. You should also calculate the cost of developing and deploying your API. For example how much time it will take you to produce a usable version of your API. How much of your development time the API takes etc.
Updating your API. You should also decide how often to update your API. For example how often should new features be added. You should also keep in mind the backward compatibility of your API, so updating your API should not negatively affect your clients.
Good answer, I think one thing to keep in mind is where the bottleneck is. Many times, the bottleneck isn't the API server itself but the data access patterns with the persistence layer.
Think about how you access your data. For posting new items, a lot of times the processing can be delayed and processed async to the original request. For example if resizing an image or sending an email, you can integrate RabmitMQ or SQS to queue up a job, which can be processed later by workers. Queues are great in buffering work so that if a server goes down, stuff is just queued up to be processed once back online.
On the query side, it is important to understand how indexing works and how data is stored. There are different types of indices, for example hash tables can give you constant access time, but you cannot perform range queries with hash tables. The easiest is if you have simple decentralized data objects queried by identifiers which can be stored in a index. If you're data is more complex where you need to do heavy joins or aggregations, then you can look at precomputed values stored in something like Redis or memcache.

Symfony2 RESTful API + AngularJS

I've been working on an e-commerce project built on Symfony2 (for the backend) and AngularJS for the frontend. Currently the Symfony part is used only as an API, which has three different user levels (guest, customer & admin). Different actions that can be done within the system (like add/remove data) are secured by:
Symfony2 firewall with user roles/access control
JMS security extra (#PreAuthorize expressions)
For the parts that are secure everything works as intended and I'm very happy with the way things work.
Problem:
There are parts of the API which are public (like retrieving product information, categories, etc.). I'm retrieving such data in Angular with Ajax calls to my API that returns the data in JSON format. One example would be:
/api/product/get-all/?page=1&count=10&sorting[id]=asc
The problem is that anyone could look at the requests in browser and copy the path and have access to all the data (such as all the products) and could just download a JSON of all the information. Although this data is "public", I don't want to give others such an easy way of "stealing" my data.
Ideas & possible solutions:
I was looking at the JWT (Json Web Token) standard to try and secure the public calls to my API and implement it in such a way that I generate a token for "real" users that are on the website, and such limit direct access to public API links.
What do you think? Would this be a possible solution?
I was also reading in some other question on StackOverflow that I could check the HTTP_X_REQUESTED_WITH header from the request, but we all know this can be easily spoofed by an attacker.
Finally, I read a similar approach to "solution" 1) here : http://engineering.talis.com/articles/elegant-api-auth-angular-js/ but I'm not entirely sure that this fits my purpose.
Additional notes:
I don't want to make this bullet-proof, but I also don't want to give people the option to click 2 buttons and get all my data. I know that eventually all the information can be "stolen" (e.g.: by using a web scraper ), but "securing" the system in such a way that people would have to make a bit of an effort is what I have in mind.
I can't really re-model my API too much at this stage, but any ideas would be appreciated
Thanks for taking the time to read my question and I'm looking forward for any feedback.
You can limit the abuse of your system in a number of ways, including:
Limit the total number of requests that API will return before requiring CAPTCHA or some other validation method. This can be limited by IP, browser fingerprint, authentication token, etc.
Make it difficult for abuser to guess IDs of products, categories, etc. by using GUIDs or other randomly generated IDs.
Use API management proxy such as Azure API Management for more enterprise level management of the APIs (http://justazure.com/azure-api-management-part-one-introduction/)
You could try something like:
To access the site anonymous users first need to fill in the captcha to get temporary token.
Add referrer check on.
Limit amount of data anonymous users can view. For instance, first 50 products.
This way everyone who wants to steal your data first need to get anonymous temporary token by filling in the captcha and change referrer.
Try with DunglasAngularCsrfBundle

Securing flash and php (AMF) communication

I am currently building a Flex 4 web app using PHP as my backend. I am using AMF to let the backend and flex application talk to each other.
How can I protect my AMF endpoint? Users can just decompile my flex application, find the URI to my endpoint and call methods. I need to ensure that all calls to the endpoint is done from within my application.
I would like to prevent somethig like this from happening: http://musicmachinery.com/2009/04/15/inside-the-precision-hack/
What are the best ways to achieve that?
Thanks :)
URLs aren't important. They're very easy to find out from any web application, and yet you still need it to have public access to them. There are a few things to do, first, if you're interested in the data security itself, you'll probably want to have your server running over https instead of http. If data security isn't crucial however (and it often isn't), you just need to have a quick and dirty authentication system.
I'm sure you can find many articles online or even frameworks made for authentication for php. In the past when I needed a very simple authentication, I would have my client send over a username and SHA1 password to an open authentication function on php, which would then create, store and return a session ID. That session ID would then be the first parameter of all the other php functions. Those functions would check the DB to see if the session ID is there or still valid (15 minute timestamp from the last time it was used) and if it is, go ahead with the function.
This is just a very simplistic way of doing things and will be good for a lot of small websites. If you need more security, send all of this over https to prevent sniffers to get the session id sent over the wire. After that, you're going into enterprise security which is probably overkill for what you want to do and will cost you an arm, a leg and your left testicle :P

Real time activity feed - code / platform implementation?

I am defining out specs for a live activity feed on my website. I have the backend of the data model done but the open area is the actual code development where my development team is lost on the best way to make the feeds work. Is this purely done by writing custom code or do we need to use existing frameworks to make the feeds work in real time? Some suggestions thrown to me were to use reverse AJAX for this. Some one mentioned having the client poll the server every x seconds but i dont like this because it is unwanted server traffic if there are no updates. I was also mentioned a push engine like light streamer to push from server to browser.
So in the end: What is the way to go? Is it code related, purely pushing SQL quires, using frameworks, using platforms, etc.
My platform is written in PHP codeignitor and DB is MySQL.
The activity stream will have lots of activities. There are 42 components on the social networking I am developing, each component has approx 30ish unique activities that can be streamed.
Check out http://www.stream-hub.com/
I have been using superfeedr.com with Rails and I can tell you it works really well. Here are a few facts about it:
Pros
Julien, the lead developer is very helpful when you encounter a problem.
Immediate push of new feed entries which support PubSubHubHub.
JSon response which is perfect for parsing whoever you'd like.
Retrieve API in case the update callback fails and you need to retrieve the latest entries for a given feed.
Cons
Documentation is not up to the standards I would like, so you'll likely end up searching the web to find obscure implementation details.
You can't control how often superfeedr fetches each feed, they user a secret algorithm to determine that.
The web interface allows you to manage your feeds but becomes difficult to use when you subscribe to a loot of them
Subscription verification mechanism works synchronous so you need to make sure the object URL is ready for the superfeedr callback to hit it (they do provide an async option which does not seem to work well).
Overall I would recommend superfeedr as a good solution for what you need.

ebay like API interface for a web portal

We have developed a B2B Web portal for Graphics Job Work which is similar to Camera Ready Art (www.camerareadyart.com). It is targeted for people wanting to convert bitmap to vector Graphics, Logo Designing and general image processing like coloring B/W images to color,etc.
We want to add facility so that people (our clients) can use a set of API that we provide to post their work from their site directly without having to visit our site literally to post their work.
I have never done anything like this till date so I have no ideas as to how I can implement something like this. I also want to know about how we can implement security so that only those who are authorized can post their work?
Can anyone give me ideas as to how we can do something like this.
This question covers a very large area and I doubt any single answer could cover matters in detail. What I can do is offer some starting points based on the mistakes I have made.
Build on top of your own API
Don't add-in API features to an existing system. Doing so will:
lead to additional testing load (you'll have to test both your app and the API independently)
result in an increase in overall maintenance costs
result in a poorer quality API than what you want to offer
Your overall goal should be to build the API first and then build your app on top of your own API. Doing so has the following benefits:
testing of the API is inherently performed whilst testing your app
you won't 'forget' to add any required API methods
Your app and your application logic (the API) will be logically separated - there will be a clear separation between them in terms of what each side of the equation does and what it is responsible for. This will help guide development. This will also allow you to very very easily put the app and the API on different machines as and when this is needed.
Using your own API is a very important point. The design of your API will initially be sub-optimal and only through using it yourself will you be able to make it offer to people the features that are actually needed in a way that is efficient.
You will end up with a system that roughly looks like this:
------------- -------------
| | | |
| Your APP | <= HTTP communication => | Your API |
| | | |
------------- -------------
This highlights some further benefits: you can replace 'Your APP' with any other app, allowing customers of yours to create apps to deal with things in ways that work with them best. You can also create new versions of your app on top of the existing API - moving to a new version of your public web site can be much easier.
Designing your URLs: mapping to classes and methods
Choosing sensible URLs is as much of a problem as choosing sensible class and method names. Deriving URLs from classes and their methods is a good approach. If there is no sensible correlation between URLs and classes/methods, you will find things harder to maintain in the long run.
I personally prefer to associate URLs to classes and methods in the following ways:
map classes to top-level directories
map methods to sub-directories of the top-level directories
Example:
The URL of your API is https://api.camerareadyart.com.
You have an image object with toColour() and toBlackAndWhite() methods.
This may map to:
https://api.camerareadyart.com/image/toColour/
https://api.camerareadyart.com/image/toBlackAndWhite/
Similarly for bitmap to vector conversion:
https://api.camerareadyart.com/bitmap/toVector/
Designing responses
When someone GETs data from, or POSTs data to, one of your URLs, what happens? How are errors handled, how are exceptions dealt with? What form do responses take?
I can't tell you what to do here. Personally I prefer to map things as closely to HTTP as possible and then only go beyond this when needed.
For example, if an incoming request is accepted and is processed but runs into an error internally I would issue a 500 status response. Likewise if a given API method requires authentication that has not been provided I might issue a 403. Taking advantage of existing HTTP features prevents you from having to re-invent certain things.
Use existing aspects of HTTP
As well as using HTTP status codes sensibly, make sure to look around for an HTTP-only method for doing something before rolling your own solution.
Want the user to specify whether the response format should by XML or JSON? Use the HTTP Accept header.
Want to re-direct a client to a different URL to grab the result of a request? Use the HTTP Location header.
There are many features to HTTP that already handle many things you might want to do. Use them!
Security
There are two general problems to tackle here: authenticating the user, and determining what actions a given user can perform.
Security: authentication
The user will need to specify in their request who they are.
The first solution to spring to mind is to allow the user to specify a username and password, possibly the same as the username and password they use to access your app. This seems on the surface to be a good idea but it is not ideal.
Users will end up baking their username and password into their own apps. Inevitably one user will forget their password and will change it so that they can happily access your app, breaking their own app in the process.
A better choice would be for the user to supply an authentication token, which is essentially a single value unique to a user much like a username and password rolled into one.
This allows you to logically separate a username and password from access to the API. A user can change their username and/or password for your app as often as they like without breaking their access to the API.
A user can also have multiple API tokens, each with different levels of access, allowing a user to safely give out an API token to a third-party service.
Security: access control
As far as the outside world is concerned, your API is a set of URLs. Each URL is, by definition, unique and performs a unique task. Basing your access control mechanisms around these concepts is a good starting point.
I prefer to keep a list, per token, of the URLs that token is permitted to access. When a given token is used to access a URL it is trivial to tell which URL is being accessed and whether it is in the token's list of allowed URLs.
If you choose a set of URLs wisely, where each URL performs one unique action, this process provides you with about the finest level of access control as you're going to get.
To give a finer level of control you may also want to specify, per URL that a token is allowed to access, what query arguments they are allowed to use.
You obviously need to have your backend webservices designed and working. However, all additional features (security, throttling, OAuth key management, subscriber portal, interactive console to try the APIs, etc.) are a fairly standard set of features that you probably should not be developing yourself.
There are commercial API Management solutions on the market. I work for WSO2, which has a 100% open-source (Apache License) WSO2 API Manager, that you can download for free here or use as a cloud hosted version in WSO2 API Cloud.

Categories