What is the best way to stop bots, malicious users, etc. from executing php scripts too fast? Is it ok if I use the usleep() or sleep() functions to simply do "nothing" for a while (just before the desired code executes), or is that plain stupid and there are better ways for this?
Example:
function login() {
//enter login code here
}
function logout() {
//enter logout code here
}
If I just put, say, usleep(3000000) before the login and logout codes, is that ok, or are there better, wiser ways of achieving what I want to achieve?
edit: Based on the suggestions below, does then usleep or sleep only cause the processor to disengage from the current script being executed by the current user, or does it cause it to disengage from the entire service? i.e. If one user+script invokes a sleep/usleep, will all concurrent users+scripts be delayed too?
The way most web servers work (Apache for example) is to maintain a collection of worker threads. When a PHP script is executed, one thread runs the PHP script.
When your script does sleep(100), the script takes 100 seconds to execute.. That means your worker thread is tied up for 100 seconds.
The problem is, you have a very finite number of worker-threads - say you have 10 threads, and 10 people login - now your web-server cannot serve any further responses..
The best way to rate-limit logins (or other actions) is to use some kind of fast in-memory storage thing (memcached is perfect for this), but that requires running separate process and is pretty complicated (you might do this if you run something like Facebook..).
Simpler, you could have a database table that stores user_id or ip_address, first_failed and failure_counter.
Every time you get a failed login, you (in pseudo code) would do:
if (first_failed in last hour) and (failure_counter > threshold):
return error_403("Too many authentication failures, please wait")
elseif first_failed in last hour:
increment failure_counter
else:
reset first_failed to current time
increment failure_counter
Maybe not the most efficient, and there is better ways, but it should stop brute-forcing pretty well. Using memcached is basically the same, but the database is replaced with memcached (which is quicker)
to stop bots, malicious users, etc.
from executing php scripts too fast?
I would first ask what you are really trying to prevent? If it is denial-of-service attacks, then I'd have to say there is nothing you can do if you are limited by what you can add to PHP scripts. The state of the art is so much beyond what we as programmers can protect against. Start looking at sysadmin tools designed for this purpose.
Or are you trying to limit your service so that real people can access it but bots cannot? If so, I'd look at some "captcha" techniques.
Or are you trying to prevent users from polling your site every second looking for new content? If so, I'd investigate providing an RSS feed or some other way of notifying them so they don't eat up your bandwidth.
Or is it something else?
In general, I'd say neither sleep() nor usleep() is a good way.
Your suggested method will force ALL users to wait unnecessarily before logging in.
Most LAMP servers (and most routers/switches, actually) are already configured to prevent Denial of Service attacks. They do this by denying multiple consecutive requests from the same IP address.
You don't want to put a sleep in your php. Doing so will greatly reduce the number of concurrent requests your serve can handle since you'll have connections held open waiting.
Most HTTP servers have features you can enable to avoid DoS attacks, but failing that you should just track IP addresses you've seen too many times too recently and send them a 403 Forbidden with a message asking them to wait a second.
If for some reason you can't count on REMOTE_ADDR being user specific (everyone behind the same firewall, etc.) you could prove a challenge in the login form and make the remote browser do an extended calculation on it (say, factor a number) that you can quickly check on the server side (with speedy multiplication).
Related
I currently have a simple script that connects to the mysql. Each time a clients connects to that script i add +1 to the total max_connections inside the database for that IP.
In this script, I have a limit, for example
if($user['max_cons'] < 5)
{
# ... do some things
}
However if a user floods this web script with many threads at once, he will be able to bypass it and open more than 5 connections. I tried it a python flooding script and it worked.
I guess it's because of the MySQL queries that needs some time to be imported into the database.
What I can do to prevent that?
(btw: I don't want to block the user even if he floods)
Thank you!
MySQL keeps a count of connections for you. Refer this answer to obtain that number.
If you are concerned about flooding or other forms of attacks, you need to act also in the infrastructure and networking layers of your system. Once the attack got to your code, you don't have much room to maneuver, as the application layer would have been already compromised.
Moreover, if you design your defense this way, you would need to repeat or include this code in every other piece of code you program. Acting on the infrastructure and/or networking layers will give you the chance to add security and protection as a cross-cutting concern or an "aspect" of your system, adding it once and intercepting all requests.
Your code checking 'max_conns' for each user seems more like a quota check to me, a feature of your website if you will. You could use that to prevent a user accidentally using more connections than you want to allow, but if you want to defend against actual intended attacks, you need to do some research on infrastructure and networking security, as it's a very broad subject.
Two more notes:
Maybe your hosting provider already provides some sort of defense against this and you could rely on that? Or are you hosting it yourself?
Maybe take this to superuser.com?
You can use
sleep(1);//sleep for one second
just before checking number of connections, but after you've increased number of connections for the ip. Something like
increaseConnectionsCount($user);//but max_cons should be affected n this method
sleep(1);
$user = reloadUser();
if($user['max_cons'] < 5) {
...
I have a php script on my website that is designed to give a nice overview of a domain name the user enters. It does this job quite well, however it is very slow. This might have something to do with the fact it's checking an array of 64 possible domain names, and THEN moving on to checking nameservers for A records/MX records/NS records etc.
What i would like to know, is it possible to run multiple threads/child processes of this? So that it will check multiple ellements of the array at once, and generate the output a lost faster?
I've put an example of my code in a pastebin (so to avoid creating a huge and spammy post on here)
http://pastebin.com/Qq9qKtP9
In perl I can do something like this:
$fork = new Parallel::ForkManager($threads);
foreach(Something here){
$fork->start and next;
$fork->finish;
}
And i could make the loop run in as many processes as needed. Is something similar possible in PHP or any other ways you can think of to speed this up? The main issue is, cloudflare has a timeout, and often it will take long enough CF blocks the response happening.
Thanks
* Never Mind Support !! *
You never want to create threads (or additional processes for that matter) in direct response to a web request.
If your frontend is instructed to create 60 threads every time someone clicks on page.php, and 100 people come along and request page.php at once, you will be asking your hardware to create and execute 6000 threads concurrently, to say nothing of the threads used by operating system services and other software. For obvious reasons, this does not, and will never scale.
Rather you want to separate out those parts of the application that require additional threads or processes and communicate with this part of the application via some kind of sane RPC. This means that the backend of the application can utilize concurrency via pthreads or forking, using a fixed number of threads or processes, and spreading work as evenly as possible across all available resources. This allows for an influx of traffic; it allows your application to scale.
I won't write example code, it seems altogether too trivial.
The first thing you want to do is optimze your code to shorten the execution time as much as possible.
For example, instead of making five dns queries:
$NS = dns_get_record($murl, DNS_NS);
$MX = dns_get_record($murl,DNS_MX);
$SRV = dns_get_record($murl,DNS_SRV);
$A = dns_get_record($murl,DNS_A);
$TXT = dns_get_record($murl,DNS_TXT);
You can only call dns_get_record once:
$DATA = dns_get_record($murl, DNS_NS + DNS_MX + DNS_SRV + DNS_A + DNS_TXT);
and parse out the variables from there.
Instead of outright forking processes to handle several parts concurrently, I'd implement a queue that all of the requests would get pushed into. The query processor would be limited as to how many items it could process at once, avoiding the potential DoS if hundreds or thousands of requests hit your site at the same time. Without some sort of limiting mechanism, you'd end up with so many processes that the server might hang.
As for the processor, in addition to the previously mentioned items, you could try pecl/Gearman as your queue processor. I haven't used it, but it appears to do what you're looking for.
Another method to optimize this would be implementing a caching system, that saved the results for, say, a week (or whatever). This would cut down on someone looking up the same site repeatedly in a day (or running a script on your site).
I doubt that it's a good idea to fork with PHP the apache process. But if you really want there is PCNTL (which is not available in the apache module).
You might have more fun with pthread. Nowadays you can even download a PHP which claims to be threadsafe.
And finally you have the possibility to use classic non blocking IO which I would prefer in the case of PHP.
This question already has answers here:
Top techniques to avoid 'data scraping' from a website database
(14 answers)
Closed 5 years ago.
I have LAMP server where I run a website, which I want to protect against bulk scraping / downloading. I know that there is no perfect solution for this, that the attacker will always find a way. But I would like to have at least some "protection" which hardenes the way of stealing data than just having nothing at all.
This website has cca. 5000 of subpages with valuable text data and couple of pictures on each page. I would like to be able online analyze incoming HTTP requests and if there is suspicious activity (e.g. tens of requests in one minute from one IP) it would automatically blacklist this certain IP address from further access to the site.
I fully realize that what I am asking for has many flaws, but I am not really looking for bullet-proof solution, but just a way how to limit script-kiddies from "playing" with easily scraped data.
Thank you for your on-topic answers and possible solution ideas.
Although this is a pretty old post, I think the answer isnt quite complete and I thought it worthwhile to add in my two cents. First, I agree with #symcbean, try to avoid using IP's but instead using a session, a cookie, or another method to track individuals. Otherwise you risk lumping together groups of users sharing an IP. The most common method for rate limiting, which is essentially what you are describing "tens of requests in one minute from one IP", is using the leaky bucket algorithm.
Other ways to combat web scrapers are:
Captchas
Make your code hard to interpret, and change it up frequently. This makes scripts harder to maintain.
Download IP lists of known spammers, proxy servers, TOR exit nodes, etc. This is going to be a lengthy list but its a great place to start. You may want to also block all amazon EC2 IP's.
This list, and rate limiting, will stop simple script kiddies but anyone with even moderate scripting experience will easily be able to get around you. Combating scrapers on your own is a futile effort but my opinion is biased because I am a cofounder of Distil Networks which offers anti-scraping protection as a service.
Sorry - but I'm not aware of any anti-leeching code available off-the-shelf which does a good job.
How do you limit access without placing burdens on legitimate users / withuot providing a mechanism for DOSing your site? Like spam prevention, the best solution is to use several approaches and maintain scores of badness.
You've already mentioned looking at the rate of requests - but bear in mind that increasingly users will be connecting from NAT networks - e.g. IPV6 pops. A better approach is to check per session - you don't need to require your users to register and login (although openId makes this a lot simpler) but you could redirect them to a defined starting point whenever they make a request without a current session and log them in with no username/password. Checking the referer (and that the referer really does point to the current content item) is a good idea too. Tracking 404 rates. Road blocks (when score exceeds threshold redirect to a capcha or require a login). Checking the user agent can be indicative of attacks - but should be used as part of the scoring mechanism, not as a yes/no criteria for blocking.
Another approach, rather than interrupting the flow, is when the thresholds are triggered start substituting content. Or do the same when you get repeated external hosts appearing in your referer headers.
Do not tar pit connections unless you've got a lot of resource serverside!
Referrer checking is one very simple technique that works well against automated attacks. You serve content normally if the referrer is your own domain (ie the user has reached the page by clicking a link on your own site), but if the referrer is not set, you can serve alternate content (such as a 404 not found).
Of course you need to set this up to allow search engines to read your content (assuming you want that) and also be aware that if you have any flash content, the referrer is never set, so you can't use this method.
Also it means that any deep links into your site won't work - but maybe you want that anyway?
You could also just enable it for images which makes it a bit harder for them to be scraped from the site.
Something that I've employed on some of my websites is to block known User-Agents of downloaders or archivers. You can find a list of them here: http://www.user-agents.org/ (unfortunately, not easy to sort by Type: D). In the host's setup, I enumerate the ones that I don't want with something like this:
SetEnvIf User-Agent ^Wget/[0-9\.]* downloader
Then I can do a Deny from env=downloader in the appropriate place. Of course, changing user-agents isn't difficult, but at least it's a bit of a deterrent if going through my logs is any indication.
If you want to filter by requests per minute or something along those lines, I don't think there's a way to do that in apache. I had a similar problem with ssh and saslauth, so I wrote a script to monitor the log files and if there were a certain number of failed login attempts made within a certain amount of time, it appended an iptables rule that blocked that IP from accessing those ports.
If you don't mind using an API, you can try our https://ip-api.io
It aggregates several databases of known IP addresses of proxies, TOR nodes and spammers.
I would advice one of 2 things,
First one would be, if you have information that other people want, give it to them in a controlled way, say, an API.
Second would be to try and copy google, if you scrape the results of google ALOT (and I mean a few hundred times a second) then it will notice it and force you to a Captcha.
I'd say that if a site is visited 10 times a second, its probably a bot. So give it a Captcha to be sure.
If a bot crawls your website slower then 10 times a second, I see no reason to try and stop it.
You could use a counter (DB or Session) and redirect the page if the limit is triggered.
/**Pseudocode*/
if( ip == currIp and sess = currSess)
Counter++;
if ( Count > Limit )
header->newLocation;
I think dynamic blocking of IPs using IP blocker will help better.
Im implementing a delay system so that any IP i deem abusive will automatically get an incremental delay via Sleep().
My question is, will this result in added CPU usage and thus kill my site anyways if the attacker just keeps opening new instances while being delayed? Or is the sleep() command use minimal CPU/memory and wont be much of a burden on a small script. I dont wish to flat out deny them as i'd rather they not know about the limit in an obvious way, but willing to hear why i should.
[ Please no discussion on why im deeming an IP abusive on a small site, cause heres why: I recently built a script that cURL's a page & returns information to the user and i noticed a few IP's spamming my stupid little script. cURLing too often sometimes renders my results unobtainable from the server im polling and legitimate users get screwed out of their results. ]
The sleep does not use any CPU or Memory which is not already used by the process accepting the call.
The problem you will face with implementing sleep() is that you will eventually run out of file descriptors while the attacker site around waiting for your sleep to time out, and then your site will appear to be down to any other people who tries to connect.
This is a classical DDoS scenario -- the attacker do not actually try to break into your machine (they may also try to do that, but that is a different storry) instead they are trying to harm your site by using up every resource you have, being either bandwidth, file descriptors, thread for processing etc. -- and when one of your resources are used up, then you site appears to be down although your server is not actually down.
The only real defense here is to either not accept the calls, or to have a dynamic firewall configuration which filters out calls -- or a router/firewall box which does the same but off your server.
I think the issue with this would be that you could potentially have a LARGE number of sleeping threads laying around the system. If you detect your abuse, immediately send back an error and be done with it.
My worry with your method is repeat abusers that get their timeout up to several hours. You'll have their threads sticking around for a long time even though they aren't using the CPU. There are other resources to keep in mind besides just CPU.
Sleep() is a function that "blocks" execution for a specific amount of time. It isn't the equivalent of:
while (x<1000000);
As that would cause 100% CPU usage. It simply puts the process into a "Blocked" state in the Operating System and then puts the process back into the "Ready" state after the timer is up.
Keep in mind, though, that PHP has a default of 30-second timeout. I'm not sure if "Sleep()" conforms to that or not (I would doubt it since its a system call instead of script)
Your host may not like you having so many "Blocked" processes, so be careful of that.
EDIT: According to Does sleep time count for execution time limit?, it would appear that "Sleep()" is not affected by "max execution time" (under Linux), as I expected. Apparently it does under Windows.
If you are doing what I also tried, I think you're going to be in the clear.
My authentication script built out something similar to Atwood's hellbanning idea. SessionIDs were captured in RAM and rotated on every page call. If conditions weren't met, I would flag that particular Session with a demerit. After three, I began adding sleep() calls to their executions. The limit was variable, but I settled on 3 seconds as a happy number.
With authentication, the attacker relies on performing a certain number of attempts per second to make it worth their while to attack. If this is their focal point, introducing sleep makes the system look slower than it really is, which in my opinion will make it less desirable to attack.
If you slow them down instead of flat out telling them no, you stand a slightly more reasonable chance of looking less attractive.
That being said, it is security through a "type" of obfuscation, so you can't really rely on it too terribly much. Its just another factor in my overall recipe :)
I've decided the best way to handle authentication for my apps is to write my own session handler from the ground up. Just like in Aliens, its the only way to be sure a thing is done the way you want it to be.
That being said, I've hit a bit of a roadblock when it comes to my fleshing out of the initial design. I was originally going to go with PHP's session handler in a hybrid fashion, but I'm worried about concurrency issues with my database. Here's what I was planning:
The first thing I'm doing is checking IPs (or possibly even sessions) to honeypot unauthorized attempts. I've written up some conditionals that sleep naughtiness. Big problem here is obviously WHERE to store my blacklist for optimal read speed.
session_id generates, hashed, and gets stored in $_SESSION[myid]. A separate piece of the same token gets stored in a second $_SESSION[mytoken]. The corresponding data is then stored in TABLE X which is a location I'm not settled on (which is the root of this question).
Each subsequent request then verifies the [myid] & [mytoken] are what we expect them to be, then reissues new credentials for the next request.
Depending on the status of the session, more obvious ACL functions could then be performed.
So that is a high level overview of my paranoid session handler. Here are the questions I'm really stuck on:
I. What's the optimal way of storing an IP ACL? Should I be writing/reading to hosts.deny? Are there any performance concerns with my methodology?
II. Does my MitM prevention method seem ok, or am I being overly paranoid with comparing multiple indexes? What's the best way to store this information so I don't run into brick walls at 80-100 users?
III. Am I hammering on my servers unnecessarily with constant session regeneration + writebacks? Is there a better way?
I'm writing this for a small application initially, but I'd prefer to keep it a reusable component I could share with the world, so I want to make sure I make it as accessible and safe as possible.
Thanks in advance!
Writing to hosts.deny
While this is a alright idea if you want to completely IP ban a user from your server, it will only work with a single server. Unless you have some kind of safe propagation across multiple servers (oh man, it sounds horrible already) you're going to be stuck on a single server forever.
You'll have to consider these points about using hosts.deny too:
Security: Opening up access to as important a file as hosts.deny to the web server user
Pain in the A: Managing multiple writes from different processes (denyhosts for example)
Pain in the A: Safely making amends to the file if you'd like to grant access to an IP that was previously banned at a later date
I'd suggest you simply ban the IP address on the application level in your application. You could even store the banned IP addresses in a central database so it can be shared by multiple subsystems with it still being enforced at the application level.
I. Optimal way of storing IP ACL would be pushing banned IP's to an SQL database, which does not suffer from concurrency problems like writing to files. Then an external script, on a regular basis or a trigger, may generate IPTABLES rules. You do not need to re-read your database on every access, you write only when you detect mis-behavior.
II. Fixation to IP is not a good thing on public Internet if you offer service to clients behind transparent proxies, or mobile devices - their IP changes. Let users chose in preferences, if they want this feature (depends on your audience, if they know what does the IP mean...). My solution is to generate unique token per (page) request, re-used in that page AJAX requests (not to step into a resource problem - random numbers, session data store, ...). The tokens I generate are stored within session and remembered for several minutes. This let's user open several tabs, go back and submit in an earlier opened tab. I do not bind to IP.
III. It depends... there is not enough data from you to answer. Above may perfectly suit your needs for ~500 user base coming to your site for 5 minutes a day, once. Or it may fit even for 1000 unique concurent users in a hour at a chat site/game - it depends on what your application is doing, and how well you cache data which can be cached.
Design well, test, benchmark. Test if session handling is your resource problem, and not something else. Good algorithms should not throw you into resource problems. DoS defense included, and it should not be an in-application code. Applications may hint to DoS prevention mechanisms what to do, and let the defense on specialized tools (see answer I.).
Anyway, if you get into a resource problems in future, the best way to get out is new hardware. It may sound rude or even incompetent to someone, but calculate price for new server in 6 months, practically 30% better, versus price for your work: pay $600 for new server and have additional 130% of horsepower, or pay yourself $100 monthly for improving by 5% (okay, improve by 40%, but if the week is worth $25 may seriously vary).
If you design from scratch, read https://www.owasp.org/index.php/Session_Management first, then search for session hijacking, session fixation and similar strings on Google.