Vanity urls + REST + Web crawler - php

I have an app that uses data from several applications APISs (Facebook,Twitter,Instagram etc..), accessing them from REST endpoints in PHP.
I am building a vanity URL for my app users, say http://www.myapp.com/username.
If i had a database, i could fetch user data from database to display in the user page.
With REST services, each time i go into the URL, there is a call to the API that fetches information from the main websites.
The problem is getting big since the app is going to receive a lot of traffic from search engine crawlers (i would not lower the crawling rate)
1st problem: Since the API offers limited access (2000 query per hours), there is a way to skip the api call (for example, using memcache) ?
2nd problem: I want to make a vanity url, so each time i call http://www.myapp.com/username i have to call the api to get the userid and then the username, i wonder if this is the correct way to do this, most websites do it with url rewrite, but how to deal with it when you have external data and not your internal database?
Thanks for the reading, any help is needed!

About using memcache, the big problem you will encounter is validating and expiring the cache data.
Let's say you implement it like this:
function getSomeData() {
if (Memcache::has('key-for-data')) {
return Memcache::get('key-for-data');
} else {
$data = RestApi::getData();
Memcache::put('key-for-data', $data);
return $data;
}
}
This seems sensible enough but then, what happens if the REST API is accessed through any other means? (Like another third party app POSTing data to the same API). Then, the cached data can be invalid and you will not know about it.
From your application's perspective, changes to the underlying data store are completely random and furthermore, totally opaque and unknowable, and therefore it is not a good target for caching.
If on the other hand you can get some kind of "push" notification from the service whenever data is updated (i.e. a subscription service), you could use this as a trigger to invalidate the relevant cache entries. However this is additional complexity and would need to be supported at both ends.
Sorry this isn't really an answer but it is a partial answer and it was too long for a comment :-)

Try using some kind of framework. It should make the routing much simpler.
The url rewriting would happen vi .htaccess so the user would never see a rewrite for the url.
Two ways that I can think of are:
Rewrite in the .htaccess such that all your defined routes are left untouched and in all other cases(that is http://www.example.com/username) the user controller method is injected in between the username and the url.
Define routes so that all your known routes are handled and have the defualt route take care of figuring out the user id and doing everything that is necessary.
For caching use memcache/redis to cache your queries/user objects/anything else accessed frequently.

Related

Symfony2 RESTful API + AngularJS

I've been working on an e-commerce project built on Symfony2 (for the backend) and AngularJS for the frontend. Currently the Symfony part is used only as an API, which has three different user levels (guest, customer & admin). Different actions that can be done within the system (like add/remove data) are secured by:
Symfony2 firewall with user roles/access control
JMS security extra (#PreAuthorize expressions)
For the parts that are secure everything works as intended and I'm very happy with the way things work.
Problem:
There are parts of the API which are public (like retrieving product information, categories, etc.). I'm retrieving such data in Angular with Ajax calls to my API that returns the data in JSON format. One example would be:
/api/product/get-all/?page=1&count=10&sorting[id]=asc
The problem is that anyone could look at the requests in browser and copy the path and have access to all the data (such as all the products) and could just download a JSON of all the information. Although this data is "public", I don't want to give others such an easy way of "stealing" my data.
Ideas & possible solutions:
I was looking at the JWT (Json Web Token) standard to try and secure the public calls to my API and implement it in such a way that I generate a token for "real" users that are on the website, and such limit direct access to public API links.
What do you think? Would this be a possible solution?
I was also reading in some other question on StackOverflow that I could check the HTTP_X_REQUESTED_WITH header from the request, but we all know this can be easily spoofed by an attacker.
Finally, I read a similar approach to "solution" 1) here : http://engineering.talis.com/articles/elegant-api-auth-angular-js/ but I'm not entirely sure that this fits my purpose.
Additional notes:
I don't want to make this bullet-proof, but I also don't want to give people the option to click 2 buttons and get all my data. I know that eventually all the information can be "stolen" (e.g.: by using a web scraper ), but "securing" the system in such a way that people would have to make a bit of an effort is what I have in mind.
I can't really re-model my API too much at this stage, but any ideas would be appreciated
Thanks for taking the time to read my question and I'm looking forward for any feedback.
You can limit the abuse of your system in a number of ways, including:
Limit the total number of requests that API will return before requiring CAPTCHA or some other validation method. This can be limited by IP, browser fingerprint, authentication token, etc.
Make it difficult for abuser to guess IDs of products, categories, etc. by using GUIDs or other randomly generated IDs.
Use API management proxy such as Azure API Management for more enterprise level management of the APIs (http://justazure.com/azure-api-management-part-one-introduction/)
You could try something like:
To access the site anonymous users first need to fill in the captcha to get temporary token.
Add referrer check on.
Limit amount of data anonymous users can view. For instance, first 50 products.
This way everyone who wants to steal your data first need to get anonymous temporary token by filling in the captcha and change referrer.
Try with DunglasAngularCsrfBundle

HTTP overhead in API-centric PHP application

I am reorganizing an existing PHP application to separate data access (private API calls) from the application itself.
The purpose of doing this is to allow for another application on the intranet to access the same data without duplicating the code to run the queries and such. I am also planning to make it easier for developers to write code for the current web application, while only a few team members would be adding features to the API.
Currently the application has a structure like this (this is only one of many pages):
GET /notes.php - gets the page for the user to view notes (main UI page)
GET /notes.php?page=view&id=6 - get the contents of note 6
POST /notes.php?page=create - create a note
POST /notes.php?page=delete - delete a note
POST /notes.php?page=append - append to a note
The reorganized application will have a structure like this:
GET /notes.php
Internal GET /api/notes/6
Internal POST /api/notes
Internal DELETE /api/notes/6
Internal PUT /api/notes (or perhaps PATCH, depending on whether a full representation will be sent)
In the web application I was thinking of doing HTTP requests to URLs on https://localhost/api/ but that seems really expensive. Here is some code to elaborate on what I mean:
// GET notes.php
switch ($_GET['page']) {
case 'view':
$data = \Requests::get(
"https://localhost/api/notes/{$_GET['id']}",
array(),
array('auth' => ... )
);
// do things with $data if necessary and send back to browser
break;
case 'create':
$response = \Requests::post( ... );
if ($response->status_code === 201) {
// things
}
break;
// etc...
}
I read this discussion and one of the members posted:
Too much overhead, do not use the network for internal communications. Instead use much more readily available means of communications between different process or what have you. This depends on the system its running on of course...Now you can mimic REST if you like but do not use HTTP or the network for internal stuff. Thats like throwing a whale into a mini toilet.
Can someone explain how I can achieve this? Both the web application and API are on the same server (at least for now).
Or is the HTTP overhead aspect just something of negligible concern?
Making HTTP requests directly from the JavaScript/browser to the API is not an option at the moment due to security restrictions.
I've also looked at the two answers in this question but it would be nice for someone to elaborate on that.
The HTTP overhead will be significant, as you have to go through the full page rendering cycle. This will include HTTP server overhead, separate process PHP execution, OS networking layer, etc. Whether it is negligible or not really depends on the type of your application, traffic, infrastructure, response time requirements, etc.
In order to provide you with better solution, you need to present your reasoning for considering this approach in the first place. Factors to consider also include current application architecture, requirements, frameworks used, etc.
If security is your primary concern, this is not necessarily a good way to go in the first place, as you will need to now store some session related data in yet another layer.
Also, despite the additional overhead, final application could potentially perform faster given the right caching mechanisms. It really depends on your final solution.
I am doing the same application framework. Had the same problem. So I decided to do following design:
For processes that are located remotely (on a different machine) then I user crul or other calls to a remote resource. If I store user on a different server to get user status I do this API->Execute(https://remote.com/user/currentStatus/getid/6) it will return status.
For local calls, say Events will require Alerts (these are 2 separate package with their own data model but on the same machine) - I make a local API like call. something like this:
API->Execute(array('Alerts', Param1, Param2).
API->Execute then knows that's a local object. Will get the object local physical path. Initialize it, pass the data and return the results into context. No remote execution with protocols overhead.
For example if you want to keep an encryption service with keys and what not away from the rest of the applications - you can send data securely and get back the encrypted value; then that service is always called over a remote api (https://encryptionservice.com/encrypt/this/value)

Setting up a RESTful web service

I'm just getting into using REST and have started building my first app following this design model. From what I can gather the idea is to build your service like an api which your website itself is a consumer of.
This makes sense for me since my web app does a lot of AJAX calls, however it seems a little wasteful to authenticate each request to avoid using sessions. Is this just something I have to accept as part of the REST design process?
Also, making ajax calls works fine, but say, I need to just show a view of the users profile, does this now mean I also need to make a curl call to my api to pull this data. At this point I know I'm working internally so is authentication even required?
Some remarks:
While you could set up your whole application to have a REST interface, you should set it up to still be able to call it internally. Calling it from HTTP, and getting results back by HTTP is only input-processing, and output-rendering. So, if you seperate those concerns you get a flow: input-processing -> method call -> data return -> data rendering. Shaving of the first & last bit, what do you have left? A function call that returns data, which you can just use in your code. Seperate functionality to translate an 'outside' function call into an 'internal' one, and render 'internal' data into 'external' (xml, json, html, whatever you desire), makes your app efficient, and still fully REST capable.
Authentication is needed if you allow outside calls, even if you don't 'tell' other users data can be retrieved a certain way, it is still easily discoverable. I don't know why you wouldn't want to use sessions for this authentication (which most likely happens in forementioned translation from an 'outside' call to an internal one. I would not make 'not using sessions' a requirement, but there is no reason you couldn't allow several methods of authentication (session, re-authentication on every request, tokens, etc.).
Typically I prefer to produce an interface which can be called using standard PHP and then add an interface to this which adds authentication and RESTful access. So you can access for example:
http://example/api/fetchAllFromUsers?auth-key=XXXXX
Which translates to:
$internalInterface = new Api();
$internalInterface->fetchAllFromUsers();
Instead of authenticating each time, save a chunk of state (in, eg, a cookie) that identifies your session, and use that. It then becomes either a parameter to a GET (using the ?name-value syntax) or can be a part of the URI itself, eg
http://example.com/application/account/ACCTNO/TOKEN
where ACCTNO and TOKEN identify the account and the authentic session respectively.
This may seem a little flaky at first, but it means that your application, as it grows larger, never needs complicated load-balancing with session state and so on -- a simple proxy scheme works fine. This reduces the architeccture complexity by great staggering amounts.

Do I really need to use nonces?

I'm currently developing an app for iOS-devices. This app downloads data from a wordpress blog, but fetches a nonce-token first. This has been tested, and is showing to take about 2~3 seconds, which is a lot, considering it's a mobile device that should have the data ready in a few seconds. In addition to this, the data has to be downloaded as well, which takes another 4~5 seconds.
In the data-fetching-method there are several security-measures taken, for example a secret string that needs to match on both the web-server and device (of course encrypted), and some sort of simple UDID-validation + some header and useragent-tests. Is this enough, or do I really need the nonces? It's not like there is any sensitive data being passed through, and if it was, I'd of course encrypt it further.
Is it really necessary for me to use nonces?
Thank you.
If you are downloading public data, there's no need for the nonce authentication stuff.
If you are going to be modifying data on the server, or fetching data that is not public or otherwise has some kind of access control around it, then you'll need whatever mechanism Wordpress requires to gain access (which it sounds like is a nonce-based token approach).
If it's taking a few seconds to get that token, how about fetching it on app startup/resume in the background?

Securely serving up data via API to app and the residing site

I'm not quite sure if an API is the way to go with this, so a little background.
I have been building up a back end which has a very useful set of data and tools for someone to run a site. The front end also uses the same data to show to customers, as one would expect. A mobile app could probably be added in the near future to enable changes to be made to the site, via the app. But the back end can potentially go onto any website like a standard script (ie. it is not centrally stored nor does any data go back and forth between the client and us).
So I thought that the best way around this would be to make an API for the site. Naturally for an app to access the API, it would need a key to authenticate with the API (which the end user can set via their back end). However, I would like the back and front ends to use the API to access the same data so nothing needs to be written twice.
I'm sure it is clear that APIs are a new thing to me, which they are. But, I am trying to improve and adapt my coding to be more efficient.
I thought perhaps that the API could perhaps do some checks from the location of the query to see if it were local request (back/front end) or via an app (which uses a key + user authentication). So how would one go about ensuring that the back and front end could securely access the API, while no one can access it via spoofing. I imagine the checks could be on the lines of the requesting URL, but I am worried that this could be spoofed or other things (that could be checked) could be spoofed. What is the best way to allow local access? Is there anything that can't be spoofed?
I know I could write in a key into the code, but since the code is distributed, I don't want this access key to be public - nor do I want to manually change the key for each site - and nor do I really want the end user to enter some random letters and numbers during setup.
You should use a public/private key. Your front/back end's, mobile versions, or even 3rd party developers will then use the their keys to authenticate each other.

Categories