Disallow Robots in API URL using PHP - php

I know I can disallow robots using robots.txt but few search engines does not follow this. Hence I have a API where my users sends transactional info to insert/update/delete etc., using my API Request Parameters. But when I look at my logs, huge hits have been made to my .php page, Hence I google to use it in my php API page and found nothing.
Hence I landed on SO to get help from experts, is there any way I can block/disallow SE robots to access my base API URL?

The main approaches that I know of for dealing with bots that are ignoring robots.txt are to either:
Blacklist them via your firewall or server
Only allow whitelisted users to access your API
However, you should ask yourself whether they're having any impact on your website. If they're not spamming you with requests (which would be a DDoS attack) then you can probably safely ignore them and filter them out of your logs if you need to analyse real traffic.
If you're running a service that people use and you don't want it to be wide open to spam then here's a few more options on how to limit usage:
Restrict access to your API just to your users by assigning them an API token
Rate limit your API (either via the server and/or via your application)
Read the User Agent (UA) of your visitors, a lot of bots will mention they're bots or have fake UAs, the malicious ones will pretend to be users
Implement more advanced measures such as limiting access to a region if a lot of requests suddenly come from there in a short period of time
Use DDoS protection services such as CloudFlare
There's no perfect solution and each option involves trade-offs. If you're worried about DDoS then you could start by looking into your server's capabilities, for example here's an introduction into how NGINX can control traffic: https://www.nginx.com/blog/rate-limiting-nginx/
In a nutshell, any IP hitting your site can be a bot so you should defend by imposing limits and analysing behaviour, since there's no way to know for sure who is a malicious visitor and who isn't until they start using your service.

Related

DDoS attack on /wp-admin/admin-ajax.php

Was tempted to post this on ServerFault.
Cloudflare enabled with SSL. Got 250k requests in past 2 hours, 192k were cached, the others weren't.
The page that's causing the issue for me is "/wp-admin/admin-ajax.php".
The CPU is spiking like crazy here. Been manually adding IP's into server's firewall, but it's not good enough since the IP's keep changing. So site is down because it keeps getting crashed/timeout.
On a dedicated server, anything else I can do besides cloudflare? Have cpanel installed and can add extensions.
Thanks.
Technically the admin-ajax.php file is supposed to be publicly available, but who puts a file like that in an admin directory? sigh.
In addition to using fail2ban (which is a great suggestion) you could try one or more of the following... in no particular order.
Look at the user agents used in the attack, if there is a commonality use Cloudflare's User-Agent Blocking.
Use use Cloudflare's javascript challenge on the firewall to require visitors from the attacking countries to complete a javascript captcha before being allowed through.
Block access to the file (temporarily?) using .htaccess
Use a Cloudflare page rule to redirect traffic to that URI to another (e.g. to www.yoursite.com).
Use Cloudflare Access to protect your entire /wp-admin directory.
Use Cloudflare's zone lockdown to restrict access to that URI to specific IP addresses.

Secure requests between different apps

Assume there are two different apps on appengine- one powered by Go and another by PHP
They each need to be able to make specific requests to eachother, purely over the backend network (i.e. these are the only services that need to make these specific requests- other remote requests should be blocked).
What is the best-practices way of doing this? Off the top of my head, here are 3 possible solutions and why I am a bit worried about them
1) Do not keep them as separate apps, but rather modules
The problem with this is that using modules introduces some other annoyances- such as difficulties with Channel Presence reporting. Also, conceptually, these 2 requests are really the only places they touch and it will be clearer to see what's going on in terms of database usage etc. if they are separated. But the presence issue is more of a show-stopper
2) Append the request with some hardcoded long secret key and only allow response if via SSL
It seems a bit strange to rely on this, since the key would never change... theoretically the only way it could be known is if an administrator on the account or someone with the source revealed it... but I don't know, just seems strange
3) Only allow via certain IP ranges (maybe combined with #2)
This just seems iffy, can the IP ranges be definitively known?
4) Pub/Sub
So it seems AppEngine allows a pub/sub mechanism- but that doesn't really fit my use case since I want to get the response right away - not via a postback once the subscriber processes it
All of them
-- As a side point, assuming it is some sort of https request, is this done using the Socket API for each language?
HTTPS is of course an excellent idea in general (not just for communication between two GAE apps).
But, for the specific use case, I would recommend relying on the X-Appengine-Inbound-Appid request header: App Engine's infrastructure ensures that this cannot be set on requests not coming from GAE apps, and, for requests that do come from GAE apps (via a url-fetch that doesn't follow redirects), the header is set to the app-id.
This is documented for Go at https://cloud.google.com/appengine/docs/go/urlfetch/ , for PHP at https://cloud.google.com/appengine/docs/php/urlfetch/ (and it's just the same for Java and Python, by the way).
purely over the backend network
Only allow via certain IP ranges
These requirement are difficult to impossible to fulfill with app engine infrastructure because you're not in control of the physical network routes. From the app engine FAQ:
App Engine does not currently provide a way to map static IP addresses to an application. In order to optimize the network path between an end user and an App Engine application, end users on different ISPs or geographic locations might use different IP addresses to access the same App Engine application.
Therefore always assume your communication happens over the open network and never assume anything about IPs.
Append the request with some hardcoded long secret key
The hard coded long secret does not provide any added security, only obscurity.
only allow response if via SSL
This is a better idea; encrypt all of your internal traffic with a strong algorithm. For example, ECDHE-RSA or ECDHE-ECDSA if available.

Website Administration Location + PHP CURL

I'm building an online dating website at the moment.
There needs to be an admin backend to the site to approve users/photos etc.
I can add this admin part of the site/login etc to the same domain.
eg: www.domainname.com/admin
Or from my experience with PHP CURL I can put this site on a different domain and CURL the requests through.
Question: is it more secure to put the admin code/site on a completely different domain? or it really doesn't matter if it sits on the same domain? hacking/security is the really point of this.
thx
Technically it might be more secure if you ran it from a different server and hosted it on a subdomain using a different IP/vhost, or use a proxy mod for your webserver (see Apache mod_proxy) to proxy requests from yourdomain.com/admin to admin.otherdomain.com and enforce additional IP or access control using .htaccess or equivalent to access the proxy url.
Of course, if those other domains are web accessible, then they are only as secure as the users and passwords that use them.
For corporate applications, you may want to make the admin interface accessible from a VPN connection, but I don't know if that applies to you.
If there is a vulnerability on your public webserver that allows someone to get shell access, then it may make it slightly more difficult to get administrative access since they don't have the code for the administration portion.
In other words, it can provide additional security depending on the lengths you go to, but is not necessarily a solid solution.
Using something like cURL is a possibility, but you'd have far less troubleshooting to do using a more conventional method like proxy or subdomain on another server.

Service to check a URL for malware or phishing?

Is there a service that lets me check a URL to see if it may possibly be a dangerous site?
When a user exits our application by clicking on an untrusted link, we sent them through a "are you sure you want to leave" redirection screen. It'd be a nice touch to do a quick check to see if we should warn the user as well.
Try with Google Safe Browsing API.
The Google Safe Browsing Lookup API is an experimental API that allows applications to check URLs against Google's constantly-updated lists of suspected phishing and malware pages.
You can use the Google Safe Browsing API to check if a URL is safe according to what they know about it. (API documentation)
How about z-protect.com?
Example report: http://www.z-protect.com/report/stackoverflow.com/
this system is using a dns server to block the pages ... maybe interessting?
An this service has a api! but with a limit of 10000 requests per day:
http://www.z-protect.com/api

How to capture a website API traffic data with Google Analytics?

I have a website where most of the traffic comes from the API (http://untiny.com/api/). I use Google Analytics to collect traffic data, however, the statistics do not include the API traffic because I couldn't include the Google Analytics javascript code into the API pages, and including it will affect the API results. (example: http://untiny.com/api/1.0/extract/?url=tinyurl.com/123).
The solution might be executing the javascript using a javascript engine. I searched stackoverflow and found javascript engines/interpreters for Java and C, but I couldn't find one for PHP except an old one "J4P5" http://j4p5.sourceforge.net/index.php
The question: is using a javascript engine will solve the problem? or is there another why to include the API traffic to Google Analytics?
A simple problem with this in general is that any data you get could be very misleading.
A lot of the time it is probably other servers making calls to your server. When this is true the location of the server in no way represents to location of the people using it, the user agent will be fake, and you can't tell how many different individuals are actually using the service. There's no referrers and if there is they're probably fake... etc. Not many stats in this case are useful at all.
Perhaps make a PHP back end that logs IP and other header information, that's really all you can do to. You'll at least be able to track total calls to the API, and where they're made from (although again, probably from servers but you can tell which servers).
I spent ages researching this and finally found an open source project that seems perfect, though totally under the radar.
http://code.google.com/p/serversidegoogleanalytics/
Will report back on results.
you would likely have to emulate all http calls on the server side with whatever programming language you are using..... This will not give you information on who is using it though, unless untiny is providing client info through some kind of header.
if you want to include it purely for statistical purposes, you could try using curl (if using php) to access the gif file if you detect untiny on the server side
http://code.google.com/apis/analytics/docs/tracking/gaTrackingTroubleshooting.html#gifParameters
You can't easily do this as the Javascript based Google Analytics script will not be run by the end user (unless of course, they are including your API output exactly on their display to the end user: which would negate the need for a fully fledged API [you could just offer an iframable code], pose possible security risks and possibly run foul of browser cross-domain javascript checks).
Your best solution would be either to use server side analytics (such as Apache or IIS's server logs with Analog, Webalizer or Awstats) or - since the most information you would be getting from an API call would be useragent, request and IP address - just log that information in a database when the API is called.

Categories