This question is not about protecting against SQL injection attacks. That question has been answered many times on StackOverflow and I have implemented the techniques. This is about stopping the attempts.
Recently my site has been hit with huge numbers of injection attacks. Right now, I trap them and return a static page.
Here's what my URL looks like:
/products/product.php?id=1
This is what an attack looks like:
/products/product.php?id=-3000%27%20IN%20BOOLEAN%20MODE%29%20UNION%20ALL%20SELECT%2035%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C%27qopjq%27%7C%7C%27ijiJvkyBhO%27%7C%7C%27qhwnq%27%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35%2C35--%20
I know for sure that this isn’t just a bad link or fat-fingered typing so I don't want to send them to an overview page. I also don’t want to use any resources on my site delivering static pages.
I’m considering just letting the page die with die(). Is there anything wrong with this approach? Or is there an HTML return code that I can set with PHP that would be more appropriate?
Edit:
Based on a couple of comments below, I looked up how to return 'page not found'. This Stack Overflow answer by icktoofay suggests using a 404 and then the die(); - the bot thinks that there isn’t a page and might even go away, and no more resources are used to display a page not found message.
header("HTTP/1.0 404 Not Found");
die();
Filtering out likely injection attempts is what mod_security is for.
It can take quite a bit of work to configure it to recognize legitimate requests for your app.
Another common method is to block IP addresses of malicious clients when you detect them.
You can attempt to stop this traffic from reaching your server with hardware. Most devices that do packet inspection can be of use. I use an F5 for this purpose (among others). The F5 has a scripting language of its own called iRules which affords great control and customization.
The post has been unblocked, so I thought I’d share what I’ve been doing to reduce attacks from the same ip address. I still get a half dozen a day, but they usually only try once or twice from each ip address.
Note: In order to return the 404 error message, all of this must come before any HTML is sent. I’m using PHP and redirect all errors to an error file.
<?php
require_once('mysql_database.inc');
// I’m using a database, so mysql_real_escape_string works.
// I don’t use any special characters in my productID, but injection attacks do. This helps trap them.
$productID = htmlspecialchars( (isset($_GET['id']) ? mysql_real_escape_string($_GET['id']) : '55') );
// Product IDs are all numeric, so it’s an invalid request if it isn’t a number.
if ( !is_numeric($productID) ) {
$url = $_SERVER['REQUEST_URI']; // Track which page is under attack.
$ref = $_SERVER['HTTP_REFERER']; // I display the referrer just in case I have a bad link on one of my pages
$ip = $_SERVER['REMOTE_ADDR']; // See if they are comng from the same place each time
// Strip spaces just in case they typed the URL and have an extra space in it
$productID=preg_replace('/[\s]+/','',$productID);
if ( !is_numeric($productID) ) {
error_log("Still a long string in products.php after replacement: URL is $url and IP is $ip & ref is $ref");
header("HTTP/1.0 404 Not Found");
die();
}
}
I also have lots of pages where I display different content depending on the category that is picked. In these cases I have a series of if statements, like this if ($cat == 'Speech') { } There is no database lookup, so no chance of SQL injection, but I still want to stop the attacks and not waste bandwidth displaying a default page to a bot. Usually the category is a short word so I modify the is_numeric conditional above to check for string length e.g. if ( strlen($cat) > 10 ) Since most to the attempts have more than 10 characters in them, it works quite well.
A very good Question +1 from me and answer is not simple.
PHP does not provide way to maintain data for different pages and different sessions, so you can't limit access by IP address unless you store access details somewhere.
If you don't want to use a database connection for this, you can of course use the filesystem. I'm sure you already know how to do this, but you can see an example here:
DL's Script Archives
http://www.digi-dl.com/
(click on "HomeGrown PHP Scripts", then on "IP/networking", then
on "View Source" for the "IP Blocker with Time Limit" section)
The best option used to be "mod_throttle". Using that, you could restrict each IP address to one access per five seconds by adding this directive to your Apache config file:
<IfModule mod_throttle.c>
ThrottlePolicy Request 1 5
</IfModule>
But there's some bad news. The author of mod_throttle has abandoned the product:
"Snert's Apache modules currently CLOSED to the public
until further notice. Questions as to why or requests
for archives are ignored."
Another apache module, mod_limitipconn, is used more often nowadays. It doesn't let you make arbitrary restrictions (such as "no more than ten requests in each fifteen seconds"). All you can do is to limit each IP address to a certain number of concurrent connections. Many webmasters seem to be advocating that as a good way to fight bot spam, but it does seem less flexible than mod_throttle.
You need different versions of mod_limitipconn depending which version of Apache you're running:
mod_limitipconn.c - for Apache 1.3
http://dominia.org/djao/limitipconn.html
mod_limitipconn.c - Apache 2.0 port
http://dominia.org/djao/limitipconn2.html
Finally, if your Apache server is hosted on a Linux machine, there's a solution you can use which doesn't involve recompiling the kernel. Instead, it uses the "iptables" firewall rules. This method is rather elegant, and is flexible enough to impose constraints such as "no more than three connections from this IP in one minute". Here's how it's done:
Linux Noob forums - SSH Rate Limit per IP
http://www.linux-noob.com/forums/index.php?showtopic=1829
I realize that none of these options will be ideal, but they illustrate what is possible. Perhaps using a local database will end up being best after all? In any case, bear in mind that simply limiting the rate of requests, or limiting the bandwidth, doesn't solve the problem of bots. They may take longer, but they'll eventually drain just as many resources as they would if they were not slowed down. It's necessary to actually reject their HTTP requests, not simply delay them or spread them out.
Good luck in the escalating battle between content and spam!
Related
We've recently found one of our client's PHP websites suffering from attacks on a page that passes URL Parameters to select/show a product.
The code is such that they "should not" be able to cause problems due to it simply selecting a product Id, but they are sending thousands of queries looking for some vulnerability and consuming many server resources in their attempts. We've blocked the offending IP addresses but of course, they just move to different ones.
Question: Are URL Parameters no longer recommended for simple page content display?
I'm thinking of updating the pages to use a $_SESSION value and just reset that each time a new page selection is made.
I've been reading up on "URL Parameter pollution" and see how it is being used more often to exploit websites.
I've developed a site with basic search function. The site accepts input in GET params and the site is hosted on a shared hosting server. Thus there are limits on SQL execution.
My goal is to stop [or at least lower the chance] of processing automated search query so the site does not reach the SQL limit for bogus search queries where there is no real user.To prevent this I've used CSRF in the landing page from where search is initiated.
What else can I try to make sure that the search is performed only for real users and not for automated/bot search. I've thought of CAPTCHAs but asking to confirm CAPTCHA for every search query will make it really worse.
Welcome to the eternal dichotomy between useability and security. ;)
Many of the measures that are used to detect and block bots, also impact usability (such as the extra steps required by opaque captchas). None of which solve the bot problem 100% either (ala captcha farms).
The trick is to use a reasonable mix of controls, and to try hard not to impact the user experience as much as possible.
A good combination that I have used with success to protect high-cost functions on high-volume sites is:
CSRF: this is a good basic measure in itself to stop blind script
submission, but won't slow down or stop a sophisticated attacker at
all;
response caching: search engines tend to get used for the same thing
repeatedly, so by caching the answers to common searches, you avoid
making the SQL request altogether (which avoids the resource
consumption, and also improves regular usage too);
source throttling: track the source IP and restrict the number of
operations within a reasonable window (not rejecting any outside
this, just queueing them and so throttling the volume to a reasonable
level); and
transparent captcha: something like Google's CAPTCHAv3 in
transparent mode will help you drop a lot of the automated requests,
without impacting the user experience.
You could also look at developing your search function to search an XML file instead of via the database. This would enable you to search as many times as you like without any issues.
As most programmers I try to program my applications in the safest way possible but this we know that does not guarantee security at 100%. Therefore I think it is also appropriate to have methods to monitor if we may be being attacked. So this is my question.
(My websites are made with PHP and MySQL)
In the case of SQL injection I think this can be done in two ways, but if there are other ways I would also like to know them.
Parsing access/error logs. Does anyone have or know a script that adequately analyzes the access logs (apache) to detect possible attacks? And notify to the administrator automatically with all details.
Analyze HTTP params at real time. It would be a script that analyzes in real time the content passed by GET / POSt and notify (e.g. via email) to the administrator of the website
For example, I do not know much about SQLi attacks but I think it's common for the 'SELECT', 'UINON',...(Others?) strings to appear in query strings and params.
In this way we can analyze the attack and see if it succeeds or not, and then take the consequent actions.
Thanks for your attention!
Edited: Simple bash script
I have made a simple system for analyzing the Apache access_log files and communicate results by email. Which is detailed in this question:
Linux bash to iterate over apache access_log files and send mail
In addition, another one using AWK. The only one resource I've found related about that:
https://www.unix.com/shell-programming-and-scripting/248420-sql-injection-detection.html
(But I have not been able to make it runs in my case)
Oh boy.
Alright, where to start?
For starters, remember that bad hackers are usually financially motivated. You know your website has been injected if you wake up one morning to a red error message from Chrome or Firefox, and you open it anyway to find that your website is now among the more popular places to find free cruises and viagra online.
Sites that score well with SEO are more likely to be hacked than sites that do not. More users means more exposure. Password protected sites don't get hacked as often, but the password protection itself does not necessarily mean any added security. If you're vulnerable, you're vulnerable, and you need to be on top of it.
First and foremost, remember to filter your variables. Never trust anything that comes in from a browser. IT'S ALL SUSPECT. That's means filtering anything that counts as a super global, GET POST, REQUEST, etc. I wouldn't even trust sessions, honestly. Filter it all. More on this can be found here: http://php.net/manual/en/function.filter-var.php
Something else to think about is file uploading. Bad guys love uploading files, and taking over your server. Most common method is exploit files disguised as images. You're going to want to resample every image that comes in. GD Works, but I like Imagick better, personally, more options. More on that here: http://php.net/manual/en/book.imagick.php You're also going to want to make sure that your site can't upload images or any other type of file from pages that you don't explicitly designate as form or upload pages. You would be shocked how often I see sites that can upload from the index, it's insane.
Another method you can deploy for this, is use your php ini to set a global include, and open up any file in a $_FILES array that comes in. Open up the first million spaces in the file, and scan it for php reserved words, and unix shell scripting. If you find one, kill the upload, exit or die, whatever you like to do there.
Apache has a setting for forensic logs. Forensic logs will capture all GET and POST stuff, but the issue with it, and the reason it's not exposed by default is that your log get big, and quickly. You can read up on it here: https://httpd.apache.org/docs/2.4/mod/mod_log_forensic.html
Lastly, you're going to want to evaluate your site for injection vulnerabilities and cross site scripting. Cross site scripting isn't the issue it once was, given the way browsers are constructed these days. All those little details that make life harder for us as a developers actually make us more secure.
But you do want to check for SQL vulnerabilities, especially if you're writing code from scratch. There are a couple reasonably solid plugins for Chrome that make pen testing a little easier.
Hackbar: https://chrome.google.com/webstore/detail/hackbar/ejljggkpbkchhfcplgpaegmbfhenekdc?utm_source=chrome-ntp-icon
HackTab:
https://chrome.google.com/webstore/detail/hack-tab-web-security-tes/nipgnhajbnocidffkedmkbclbihbalag?utm_source=chrome-ntp-icon
For Firefox, there's scrippy
https://addons.mozilla.org/en-US/firefox/addon/scrippy/?src=search
Hope that helps.
Good luck.
Therefore I think it is also appropriate to have methods to monitor if we may be being attacked.
The biggest waste of time ever.
ANY site gets "attacked" 100% of time. There are freely avalable scripts that allow any stupid schoolboy to scan whole internet, probing sites just by chance. You'll grow bored the very next day after scouring these logs of your detection system.
In your place I would invest in the protection. Other vectors than you could think of. For examle, all recent breakings I was a vitness of were performed by means of stealing ftp passwords stored on the webmaster's PC. And I can assure you that there are much more attack vectors than a blunt SQL injection. Which is a simplest thing to protect from, with only two simple rules to follow:
Any variable data literal (i.e. a string or a number) should be substituted with a parameter, whereas actual value should be sent to the query separately, through bind/execute process.
All other query parts that happen to be added through a variable, should be explicitly filtered through a hardcoded list of allowed values.
I have a website that has been hacked once to have it's database stolen. I think it was done by an automated process that simply accessed the visible website using a series of searches, in the style of 'give me all things beginning with AA', then 'with AB', then 'with AC' and so on. The reality is a little more complicated than this, but that illustrates the principal of the attack. I found the thief and am now taking steps against them, but I want to prevent more like this in the future.
I thought there must be some ready made PHP (which I use) scripts out there. Something that for instance recorded the IP address of the last (say) 50 visitors and tracked the frequency of their requests over the last (say) 5 minutes. It would ban them for (say) 24 hours if they exceeded a certain threshold of requests. However to my amazement I can find no such class, library or example of code intended for this purpose, anywhere online.
Am I missing a trick, or is there a solution here - like the one I imagine, or maybe an even simpler and more effective safeguard?
Thanks.
There are no silver bullets. If you are trying to brainstorm some possible workarounds and solutions there are none that are particularly easy but here are some things to consider:
Most screen scrapers will be using curl to do their dirty work. There is some discussion such as here on SO about whether trying to block based on User-Agent (or lack thereof) is a good way to prevent screen scrapes. Ultimately, if it helps at all it is probably a good idea (and Google does it to prevent websites from screen scraping them). Because User-Agent spoofing is possible this measure can be overcome fairly easily.
Log user requests. If you notice an outlier that is far beyond your average number of user requests (up to you to determine what is uneacceptable), then you can serve them an HTTP 500 error until they revert back to an acceptable range.
Check number of broken links attempted. If a request to a broken link is served, add it to a log. A few of these should be fine, but it should be pretty clear to find someone who is fishing for data. If they are looking for AA, AB, AC, etc. When that occurs, start to serve HTTP 500 errors for all of your pages for a set amount of time. You can do this by serving all of your page requests through a Front Controller, or by creating a custom 404-file not found page and redirecting requests there. The 404 page can log them for you.
Set errors when there is a sudden change in statistics. This is not to shut anyone down, this is just to get you to investigate. The last thing you want to do is shut someone down by accident, because to them it will just seem like the website is down. If you set up a script to send you an e-mail when there has been a sudden change in usage patterns but before you shut someone down, it can help you adjust your decision making appropriately.
These are all fairly broad concepts and there are plenty of other solutions or tweaks on this that can work. In order to do it successfully you will need to monitor your own web patterns in order to determine a safe fix. This is not a small undertaking to craft such a solution (at least not well).
A Caveat
This is important: Security is always going to be counterbalanced by useability. If you do it right you won't be sacrificing too much security and your users will never run into these issues. Extensive testing will be important, and because of the nature of websites and downtime being so crucial, perform extensive testing whenever you introduce a new security measure, before bringing it live. Otherwise, you will have a group of very unhappy people to deal with and a potential en mass loss of users. And in the end, screen scraping is probably a better thing to deal with than angry users.
Another caveat
This could interfere with SEO for your web page, as search engines like Google employ screen scraping to keep records up to date. Again, the note on balance applies. I am sure there is a fix here that can be figured out but it would stray too far from the original question to look into it.
If you're using Apache, I'd look into mod_evasive:
http://www.zdziarski.com/blog/?page_id=442
mod_evasive is an evasive maneuvers module for Apache to provide
evasive action in the event of an HTTP DoS or DDoS attack or brute
force attack. It is also designed to be a detection and network
management tool, and can be easily configured to talk to ipchains,
firewalls, routers, and etcetera. mod_evasive presently reports abuses
via email and syslog facilities.
...
"Detection is performed by creating an internal dynamic hash table of
IP Addresses and URIs, and denying any single IP address from any of
the following:
Requesting the same page more than a few times per second
Making more than 50 concurrent requests on the same child per second
Making any requests while temporarily blacklisted (on a blocking list)"
This question already has answers here:
Top techniques to avoid 'data scraping' from a website database
(14 answers)
Closed 5 years ago.
I have LAMP server where I run a website, which I want to protect against bulk scraping / downloading. I know that there is no perfect solution for this, that the attacker will always find a way. But I would like to have at least some "protection" which hardenes the way of stealing data than just having nothing at all.
This website has cca. 5000 of subpages with valuable text data and couple of pictures on each page. I would like to be able online analyze incoming HTTP requests and if there is suspicious activity (e.g. tens of requests in one minute from one IP) it would automatically blacklist this certain IP address from further access to the site.
I fully realize that what I am asking for has many flaws, but I am not really looking for bullet-proof solution, but just a way how to limit script-kiddies from "playing" with easily scraped data.
Thank you for your on-topic answers and possible solution ideas.
Although this is a pretty old post, I think the answer isnt quite complete and I thought it worthwhile to add in my two cents. First, I agree with #symcbean, try to avoid using IP's but instead using a session, a cookie, or another method to track individuals. Otherwise you risk lumping together groups of users sharing an IP. The most common method for rate limiting, which is essentially what you are describing "tens of requests in one minute from one IP", is using the leaky bucket algorithm.
Other ways to combat web scrapers are:
Captchas
Make your code hard to interpret, and change it up frequently. This makes scripts harder to maintain.
Download IP lists of known spammers, proxy servers, TOR exit nodes, etc. This is going to be a lengthy list but its a great place to start. You may want to also block all amazon EC2 IP's.
This list, and rate limiting, will stop simple script kiddies but anyone with even moderate scripting experience will easily be able to get around you. Combating scrapers on your own is a futile effort but my opinion is biased because I am a cofounder of Distil Networks which offers anti-scraping protection as a service.
Sorry - but I'm not aware of any anti-leeching code available off-the-shelf which does a good job.
How do you limit access without placing burdens on legitimate users / withuot providing a mechanism for DOSing your site? Like spam prevention, the best solution is to use several approaches and maintain scores of badness.
You've already mentioned looking at the rate of requests - but bear in mind that increasingly users will be connecting from NAT networks - e.g. IPV6 pops. A better approach is to check per session - you don't need to require your users to register and login (although openId makes this a lot simpler) but you could redirect them to a defined starting point whenever they make a request without a current session and log them in with no username/password. Checking the referer (and that the referer really does point to the current content item) is a good idea too. Tracking 404 rates. Road blocks (when score exceeds threshold redirect to a capcha or require a login). Checking the user agent can be indicative of attacks - but should be used as part of the scoring mechanism, not as a yes/no criteria for blocking.
Another approach, rather than interrupting the flow, is when the thresholds are triggered start substituting content. Or do the same when you get repeated external hosts appearing in your referer headers.
Do not tar pit connections unless you've got a lot of resource serverside!
Referrer checking is one very simple technique that works well against automated attacks. You serve content normally if the referrer is your own domain (ie the user has reached the page by clicking a link on your own site), but if the referrer is not set, you can serve alternate content (such as a 404 not found).
Of course you need to set this up to allow search engines to read your content (assuming you want that) and also be aware that if you have any flash content, the referrer is never set, so you can't use this method.
Also it means that any deep links into your site won't work - but maybe you want that anyway?
You could also just enable it for images which makes it a bit harder for them to be scraped from the site.
Something that I've employed on some of my websites is to block known User-Agents of downloaders or archivers. You can find a list of them here: http://www.user-agents.org/ (unfortunately, not easy to sort by Type: D). In the host's setup, I enumerate the ones that I don't want with something like this:
SetEnvIf User-Agent ^Wget/[0-9\.]* downloader
Then I can do a Deny from env=downloader in the appropriate place. Of course, changing user-agents isn't difficult, but at least it's a bit of a deterrent if going through my logs is any indication.
If you want to filter by requests per minute or something along those lines, I don't think there's a way to do that in apache. I had a similar problem with ssh and saslauth, so I wrote a script to monitor the log files and if there were a certain number of failed login attempts made within a certain amount of time, it appended an iptables rule that blocked that IP from accessing those ports.
If you don't mind using an API, you can try our https://ip-api.io
It aggregates several databases of known IP addresses of proxies, TOR nodes and spammers.
I would advice one of 2 things,
First one would be, if you have information that other people want, give it to them in a controlled way, say, an API.
Second would be to try and copy google, if you scrape the results of google ALOT (and I mean a few hundred times a second) then it will notice it and force you to a Captcha.
I'd say that if a site is visited 10 times a second, its probably a bot. So give it a Captcha to be sure.
If a bot crawls your website slower then 10 times a second, I see no reason to try and stop it.
You could use a counter (DB or Session) and redirect the page if the limit is triggered.
/**Pseudocode*/
if( ip == currIp and sess = currSess)
Counter++;
if ( Count > Limit )
header->newLocation;
I think dynamic blocking of IPs using IP blocker will help better.