I am programming in PHP and I am wondering if there is a way to set up too many requests or does this need to be coded up manually? For example, if someone has opened 30 pages in 60 seconds, that is too many requests (and they may potentially be a bot), thus they get sent a too many requests HTTP status code.
If it is supposed to be done manually, what is the best practice to set up something like this?
You could try using ratelimit by Apache.
Here is a sample provided by Apache. The rate limit is 400kb/second for the particular IP.
<Location "/downloads">
SetOutputFilter RATE_LIMIT
SetEnv rate-limit 400
</Location>
More specifically, you can try a module like Mod Evasive to prevent multiple requests from accessing the server. You can use a product like CloudFlare to mitigate DDOS attacks.
If you really want to use PHP for this you can log the amount of requests from a given IP, and if requests from that IP is greater than a certain value, you can block the IP from accessing your page.
To do this, you can store the IP addresses in a database along with a date column indicating when they accessed your page, and calculate aggregates of their access in a particular period using SQL.
Just for anyone that might not be using apache here is the nginx documenation
https://docs.nginx.com/nginx/admin-guide/security-controls/controlling-access-proxied-http/
Related
I have a project that serves many images. That project also have an API that serves not only but the image links.
I would like to have a way to successfuly avoid the scraping of my images. I don't mind users could download each image individually but would not like that someone could scrape all images at the same time to avoid high bandwith usage.
I though using htaccess to deny direct access to image folders.
Also, thought to use in PHP (in website) to use a dynamic link to show the image (for example loadimage.php?id=XXXXX) so my users doesn't know the full image link.
How could I do it in API (and even in website) to prevent scraping? I though something like a token and each request will generate a new "image id", but or I'm missing something or can't figure it out how to make it work.
I know it will be impossible to have a 100% valid method to do it, but any suggestions in how to difficult it would be appreciated.
Thanks.
You're looking for a rate limit policy. It involves tracking how many times the images are being requested (or the number of bytes being exchanged), and issuing a (typically) 429 Too Many Requests response when a threshold is exceeded.
Nginx has some pretty good built-in tools for rate limiting. You mention .htaccess which implies Apache, for which there is also a rate limiting module.
You could do this with or without PHP. You could identify a URL pattern that you want rate limited, and apply the rate limit policy to that URL pattern (could be a PHP script or just a directory somewhere).
For Apache:
<Location ".../path/to/script.php">
SetOutputFilter RATE_LIMIT
SetEnv rate-limit 400
SetEnv rate-initial-burst 512
</Location>
Or, you could write code in your PHP that writes accesses to a database, and enforces a limit based on how many accesses in a given window period.
I would not generally recommend writing your own when there are such good tools available supported in the web server itself. One exception would be if you use several web servers in a cluster, which cannot easily synchronize rate limiting thresholds and counts across the server.
Sometimes we don't have the APIs we would like to, and this is one of these cases.
I want to extract certain information from certain website, so I was considering using a CURL request to hundreds of pages within a site in a programmatically way by using a CRON job in my server.
Then caching the response and firing it again after one or multiple days.
Could that potentially be considered as some kind of attack by the server who might see hundreds of calls to certain sites in a very short period of time from the same server IP?
Lets say, 500 hundred curls?
What would you recommend me? Perhaps making use of the sleep command from curl to curl to reduce the frequency of those requests?
There are a lot of situations where your scripts could end up getting blocked by the website's firewall. One of the best steps you can take in seeing if this is allowed is by contacting the site owner and letting them know what you want to do. If that's not possible read their Terms of Service, and see if it's strictly prohibited.
If time is not of the essence when making these calls then, yes, you can definitely utilize the sleep command to delay the time between each request, and I would recommend it if you find out you need to make a few less requests per second.
You could definitely do this. However you should keep a few things in mind:
Most competent sites will have a clause in their Terms of Service which prohibit the use of the site in anyway other than the interface provided.
If the site see's what you are doing and notices a detrimental effect on their network they will block your ip (our organization was running into this issue enough that it warranted us developing a program that logs ips and the rate at which they access content, then if they attempt to access more than x number of pages in y number of seconds we ban the ip for z minutes), however you might be able to circumvent this by utilizing the sleep command as you had mentioned.
If you require information on the page that is loaded dynamically via javascript after the markup has been rendered, the response you receive from your curl request will not include this information. For cases such as these there are programs such as iMacros which allow you to write scripts in your browser to carry out actions programmatically as if you were actually using the browser.
As mentioned by #RyanCady the best solution may be to reach out to the owner of the site and explain what you are doing and see if they can accommodate your requirement.
I have an application that requires logon.
It is only possible to access the site via a single logon page.
I am concerned about DDOS and have (thanks to friends here) been able to write a script that will recognise potential DDOS attacks and lock the particular IP to prevent site access (also a security measure to prevent multiple password/username combination guesses)
Is there any value in blocking those IPs that offend with .htaccess. I can simply modify the file to prevent my server allowing access to the offending IP for a period of time but will it do any good? Will the incoming requests still bung up the system, even though .htaccess prevents them being served or will it reduce the load allowing genuine requests in?
it is worth noting that most of my requests will come from a limited range of genuine IPs so the implementation I intend is along the lines of:
If DDOS attack suspected, Allow access only from IPs from which there has been a previous good logon for a set time period. Block all suspect IPs where there has been no good logon permanently, unless a manual request to unblock has been made.
Your sage advice would be greatly appreciated. If you think this is a waste of time, please let me know!
Implementation is pretty much pure PHP.
Load caused by a DDOS attack will be lower if blocked by .htaccess as the unwanted connections will be refused early and not allowed to call your PHP scripts.
Take for example a request made for the login script, your apache server will call the PHP script which will (I'm assuming) do a user lookup in a database of some kind. This is load.
Request <---> Apache <---> PHP <---> MySQL (maybe)
If you block and ip (say 1.2.3.4) your htacess will have an extra line like this:
Deny from 1.2.3.4
And the request will go a little like this:
Request <---> Apache <-x-> [Blocked]
And no PHP script or database calls will happen, this is less load than the previous example.
This also has the added bonus of preventing bruteforce attacks on the login form. You'll have to decide when to add IPs to a blocklist, maybe when they give incorrect credentials 20 times in a minute or continuously over half an hour.
Firewall
It would be better to block the requests using a firewall though, rather than with .htaccess. This way the request never gets to apache, it's a simple action for the server to drop the packet based on a IP address rule.
The line below is a shell command that (when run as root) will add an iptables rule to drop all packets originating from that IP address:
/sbin/iptables -I INPUT -s 1.2.3.4 -j DROP
This is for DDoS Attack
Basically, I need to deny access to the site based on the specific number of connections in a specific time. Like 1 minute or 1 second. If an IP overlaps the maximum connections in an specific time, this IP will black listed with 1 day denegation.
For example: 1000 connections to the server in one minute is not normal, this ip will blacklisted.
What I want is an script to detect this in PHP. BUT!, very important: how to not deny service for Google-Bots or Search-bots and how to not deny for a normal visitor.
I don't think this sort of thing should go into your app's code. This is something that you can implement at the network level. Your firewall may already provide this sort of thing. If you use IPTables in Linux, you can definitely implement rules of this sort.
One link that may help in the case of IPTables is this.
This link is actually better than above (thanks, Google!)
If you don't use Linux or your Firewall doesn't support this sort of feature, you can easily put a Linux box in front of your DB server and implement this method.
I'm curious to know how to stop Apache from logging every URL I search with CURL.
My PHP script opens a few hundred thousand URLs, scans them, takes a tiny bit of info, closes, and then opens the next.
I discovered after opening the access log that each and every URL opened with CURL is written to the access log.
::1 - - [01/Dec/2010:18:37:37 -0600] "GET /test.php HTTP/1.1" 200 8469 "-"..."
My access log is almost 45MBytes large. Help anyone?
This is the purpose for access log - recording any incoming traffic
In order to effectively manage a web server, it is necessary to get feedback about the activity and performance of the server as well as any problems that may be occurring. The Apache HTTP Server provides very comprehensive and flexible logging capabilities. This document describes how to configure its logging capabilities, and how to understand what the logs contain.
source: http://httpd.apache.org/docs/trunk/logs.html
Of course, you have the option to disable logging (preferable not)
If all of your curl requests are coming from a single or otherwise manageable group of IPs you can exclude them from your logs with a configuration similar to the following:
# Set your address here, you can do this for multiple addresses
SetEnvIf Remote_Addr "1\.1\.1\.1" mycurlrequest
CustomLog logs/access_log common env=!mycurlrequest
You can do something similar with the user agent field which by default will indicate that it's curl.
You can read more here:
http://httpd.apache.org/docs/2.2/logs.html#accesslog (conditional logging is the last section under this header)
and here
http://httpd.apache.org/docs/2.2/mod/mod_setenvif.html#setenvif
If you want to conditionally exclude logging I would to it by the most precise method possible such as the ip address. In the event the server is externally accessible you probably don't want to find yourself NOT logging external requests from curl.
Using conditional logging you can also segment your logging if you want to multiple files one of which you could roll more frequently. The benefit of that is you can save space and at the same time have log data to help research and debug.
See the Apache manual, about Conditional Logs. That may be what you are looking for.