I have a website where most of the traffic comes from the API (http://untiny.com/api/). I use Google Analytics to collect traffic data, however, the statistics do not include the API traffic because I couldn't include the Google Analytics javascript code into the API pages, and including it will affect the API results. (example: http://untiny.com/api/1.0/extract/?url=tinyurl.com/123).
The solution might be executing the javascript using a javascript engine. I searched stackoverflow and found javascript engines/interpreters for Java and C, but I couldn't find one for PHP except an old one "J4P5" http://j4p5.sourceforge.net/index.php
The question: is using a javascript engine will solve the problem? or is there another why to include the API traffic to Google Analytics?
A simple problem with this in general is that any data you get could be very misleading.
A lot of the time it is probably other servers making calls to your server. When this is true the location of the server in no way represents to location of the people using it, the user agent will be fake, and you can't tell how many different individuals are actually using the service. There's no referrers and if there is they're probably fake... etc. Not many stats in this case are useful at all.
Perhaps make a PHP back end that logs IP and other header information, that's really all you can do to. You'll at least be able to track total calls to the API, and where they're made from (although again, probably from servers but you can tell which servers).
I spent ages researching this and finally found an open source project that seems perfect, though totally under the radar.
http://code.google.com/p/serversidegoogleanalytics/
Will report back on results.
you would likely have to emulate all http calls on the server side with whatever programming language you are using..... This will not give you information on who is using it though, unless untiny is providing client info through some kind of header.
if you want to include it purely for statistical purposes, you could try using curl (if using php) to access the gif file if you detect untiny on the server side
http://code.google.com/apis/analytics/docs/tracking/gaTrackingTroubleshooting.html#gifParameters
You can't easily do this as the Javascript based Google Analytics script will not be run by the end user (unless of course, they are including your API output exactly on their display to the end user: which would negate the need for a fully fledged API [you could just offer an iframable code], pose possible security risks and possibly run foul of browser cross-domain javascript checks).
Your best solution would be either to use server side analytics (such as Apache or IIS's server logs with Analog, Webalizer or Awstats) or - since the most information you would be getting from an API call would be useragent, request and IP address - just log that information in a database when the API is called.
Related
Assume there are two different apps on appengine- one powered by Go and another by PHP
They each need to be able to make specific requests to eachother, purely over the backend network (i.e. these are the only services that need to make these specific requests- other remote requests should be blocked).
What is the best-practices way of doing this? Off the top of my head, here are 3 possible solutions and why I am a bit worried about them
1) Do not keep them as separate apps, but rather modules
The problem with this is that using modules introduces some other annoyances- such as difficulties with Channel Presence reporting. Also, conceptually, these 2 requests are really the only places they touch and it will be clearer to see what's going on in terms of database usage etc. if they are separated. But the presence issue is more of a show-stopper
2) Append the request with some hardcoded long secret key and only allow response if via SSL
It seems a bit strange to rely on this, since the key would never change... theoretically the only way it could be known is if an administrator on the account or someone with the source revealed it... but I don't know, just seems strange
3) Only allow via certain IP ranges (maybe combined with #2)
This just seems iffy, can the IP ranges be definitively known?
4) Pub/Sub
So it seems AppEngine allows a pub/sub mechanism- but that doesn't really fit my use case since I want to get the response right away - not via a postback once the subscriber processes it
All of them
-- As a side point, assuming it is some sort of https request, is this done using the Socket API for each language?
HTTPS is of course an excellent idea in general (not just for communication between two GAE apps).
But, for the specific use case, I would recommend relying on the X-Appengine-Inbound-Appid request header: App Engine's infrastructure ensures that this cannot be set on requests not coming from GAE apps, and, for requests that do come from GAE apps (via a url-fetch that doesn't follow redirects), the header is set to the app-id.
This is documented for Go at https://cloud.google.com/appengine/docs/go/urlfetch/ , for PHP at https://cloud.google.com/appengine/docs/php/urlfetch/ (and it's just the same for Java and Python, by the way).
purely over the backend network
Only allow via certain IP ranges
These requirement are difficult to impossible to fulfill with app engine infrastructure because you're not in control of the physical network routes. From the app engine FAQ:
App Engine does not currently provide a way to map static IP addresses to an application. In order to optimize the network path between an end user and an App Engine application, end users on different ISPs or geographic locations might use different IP addresses to access the same App Engine application.
Therefore always assume your communication happens over the open network and never assume anything about IPs.
Append the request with some hardcoded long secret key
The hard coded long secret does not provide any added security, only obscurity.
only allow response if via SSL
This is a better idea; encrypt all of your internal traffic with a strong algorithm. For example, ECDHE-RSA or ECDHE-ECDSA if available.
I use a JSON API to get data for a website. I am aware of various methods that I could make it secure, but my situation is different from common methods.
Because of cross domain issues, I had to create an API folder with various PHP files that do cURL requests to the REStful API. I then request these local PHP files through AJAX on my site. On the next release it should be JSONP to avoid this issue.
Many of these JSON requests contain sensitive information so the first thing I did was check for the HTTP Referrer so people don't just grab the URL when inspecting the JavaScript code and try to run it on their browser. This is obviously not safe nor should I rely on it.
Any data I may try to post to the request will be through JavaScript so something like an API key or token would be visible and would defeat the whole purpose.
Is there a way I can prevent these PHP files to be run outside the website or something? Basically make them inaccesible for visitors?
This does not have to do anything with REST. You have a server side REST client, in which you call the REST service with cURL and the browser cannot see anything of this process. Until you don't want to build your own REST service for this AJAX client this is just a regular webapplication (from the perspective of the browser and the AJAX client ofc.). As Lorenz said in the comment, you should use sessions as you would do normally. That's all. If you want to restrict access to certain pages, you can use an access control solution, e.g. role based access control is very common.
I have looked around and it seems that there is no way whatsoever to load external/remote URLs like http://google.com through the client browser using Javascript without using a proxy be it a PHP file in the server side or YQL which essentially uses the Yahoo API as a proxy. This is due to the same-origin policy.
I am not versed in Flash and I think that it might hold an answer because even though some people are agressively phasing it out, it has a lot of power.
My question: is there something I missed when searching? Free hosts have some restrictions on the amount of requests and the load on the server per unit time and I wouldn't like to get kicked out. Also my site scrapes some remote site's data so I wouldn't like to get blocked which I would get if I used a PHP proxy. So is there a simple Flash solution or Javascript solution I did not see?
No, this is not possible due to the Same origin policy: http://en.wikipedia.org/wiki/Same_origin_policy
I have a demo server where I put samples of my apps, I send potential customers links to those apps. Is it possible to use htaccess to track visitors, without adding tracking capability to the apps themselves? The data I'm interested in are:
date and time of page visit
ip of visitor
url of the page visited
referrer
post and get (query string) data if any
That entirely depends on your webserver, what options it provides for htaccess overrides.
For Apache, the access log logs what you are looking for
http://httpd.apache.org/docs/current/logs.html#accesslog
but is not configurable via htaccess.
no, that's impossible to use .htaccess file, because it's merely a configuration file, not executable one.
However you can use another web-server capability - log files.
Everything you asking for is already stored in the access log, almost in the same format you listed here.
An important note: unlike google analytics or any other third-party or scripting solution, web-server logs is the only reliable and exact source of tracking data, contains very request been made to your site.
Best way it to use google analytics.
You will get all what you need and much much more.
I know this thread has been quiet for a while, but i it not possible to use the prepend?? directive that prepends a script to all visits to track site/page visits ?
I have not got the code (tried something similarthough was not successfull) but I used the prepend directive to prepend a script that "switches" on gzip for all site visits. I am sure the same can be implemented for logs (for those of us with cheap shared servers!) Come on coders, do us all a favour and reveal the secret!
I have already heard about the curl library, and that I get interest about...
and as i read that there are many uses for it, can you provide me with some
Are there any security problems with it?
one of the many useful features of curl is to interact with web pages, which means that you can send and receive http request and manipulate the data. which means you can login to web sites and actually send commands as if you where interacting from your web browser.
i found a very good web page titled 10 awesome things to do with curl. it's at http://www.catswhocode.com/blog/10-awesome-things-to-do-with-curl
One of it's big use cases is for automating activities such as getting content from another websites by the application. It can also be used to post data to another website and download files via FTP or HTTP. In other words it allows your application or script to act as a user accessing a website as they would do browsing manually.
There are no inherent security problems with it but it should be used appropriately, e.g. use https where required.
cURL Features
It's for spamming comment forms. ;)
cURL is great for working with APIs, especially when you need to POST data. I've heard that it's quicker to use file_get_contents() for basic GET requests (e.g. grabbing an RSS feed that doesn't require authentication), but I haven't tried myself.
If you're using it in a publicly distributed script, such as a WordPress plugin, be sure to check for it with function_exists('curl_open'), as some hosts don't install it...
In addition to the uses suggested in the other answers, I find it quite useful for testing web-service calls. Especially on *nix servers where I can't install other tools and want to test the connection to a 3rd party webservice (ensuring network connectivity / firewall rules etc.) in advance of installing the actual application that will be communicating with the web-services. That way if there are problems, the usual response of 'something must be wrong with your application' can be avoided and I can focus on diagnosing the network / other issues that are preventing the connection from being made.
It certainly can simplify simple programs you need to write that require higher level protocols for communication.
I do recall a contractor, however, attempting to use it with a high load Apache web server module and it was simply too heavy-weight for that particular application.