I'm not sure if the title is correct way to ask this question, but here it goes.
Case:
I'm using CodeIgniter (2.1.3) to make AJAX calls and fetch JSON data. Being aware that you can't really "secure" AJAX as the JavaScript is always accessible by the user I was wondering what are the possibilities to make it as hard as possible for people to automate the AJAX calls.
Let's say you have a game where you keep requesting queue data for constructing buildings and training troops. If I would want to bot that website I could just find out how the AJAX calls work and make a script to log myself into the domain and call the AJAX calls manually.
The goal of doing this is; Might 10000 people try to bot the website, the layers of hinderance I would build into the AJAX calls might reduce those 10000 to maybe 100, thus making it easier to track for the administrators who still managed to cross all layers.
In this case we can also see what they are doing and try to add more checks/layers to prevent the majority to be able to bot the website.
Confirming an actual session
The first layer I was thinking of is the passing of a random hash to the page that is loaded and storing that hash in the PHP Session too. This way the visitor can "only" get JSON back from the AJAX calls that get the hash value that was given to that single page load. So if they try to fetch the HASH with a regular expression match in one cURL/wget call, they can't use it in the call to fake the AJAX call.
I still think there is an issue here with multiple page loading tho. I might be tracking if people are opening a new page under their login-name and give them a message they may not operate on multiple pages to work with the application. Also it might be problematic with automation tools like Selenium.
In CodeIgniter I do this now:
<?php
public function index()
{
$this->load->helper(array('security', 'url'));
$this->load->library('session');
$data = array();
// AJAX Security
$data['hash'] = sha1(hash('md5', (microtime() - rand(0, 1^3)) * rand(0, 1^12)));
$this->session->set_userdata('live_hash', $data['hash']);
$this->load->view('jqueryjson', $data);
}
public function xhr()
{
$json_data = json_decode($data_set);
if (isset($data_set['hash']) AND $data_set['hash'] == $this->session->userdata('hash')) {
echo 'HASH security layer passed<br />';
echo json_encode(array('JSONDATA TO BE SEND BACK'));
}
else {
echo 'Please do not call the page outside a browser.';
}
}
?>
I know this approach is kind of naive, but I'm wonder how others do this (client-side) to prevent the majority of botters. Off-course I'm also validating all the passed data on the server-side to be sure no data is customized outside the standards the data should have.
I think this problem isnt only bound to ajax calls, but to every request made in a webpage.
To prevent access by automatons most sites use some kind of captcha images.
Idea:
Maybe you could place the graphical elements to trigger the ajax calls in varying places inside an image, so that only a human would click on them on the right spot. I am thinking of an image showing the element of which only you know the position. You simply send the position of the click inside the image with the ajax-request and compare if it was the right spot...
2.Idea:
Open up a captcha-window after 10 ajax-requests made too fast with an image inside, so that
the user has to verify himself as a human. Without the verification no ongoing game.
Maybe you could build the verification somehow in the game, so that a user doesnt see it as a captcha right away.
Lucian
Related
I've just started learning PHP and just done with $_POST/$_GET.
Now I want to know, what is the pro's and con's of having the PHP to process the data from a form inside the same file, or send the data to another file (action="anotherfile")?
Logically I will think that sending it to another file would increase the time process it, but is that true?
When I have the PHP script inside the same file, the page doesnt seem to reload when I hit the submit button (but the content changes). Or does it? If it does, wouldn't the only difference would be that I would have to type the script for the menu (lets say you have the same menu on all pages) in both files? Which would lead to more coding/less space?
what is the pro's and con's of having the PHP to process the data from a form inside the same file, or send the data to another file (action="anotherfile")?
You are conflating files and urls.
By having the logic split between different files (and then included where appropriate) you seperate concerns and make your code easier to manage.
By having a single URL be responsible for both displaying the form and processing the form data you don't end up in the awkward situation where the result of processing the form data requires that you redisplay the form with error messages in it. If you used two different URLs there you would need to either display the form on the processing URL (so you have two different URLs which display the form) or perform an HTTP redirect back to the original URL while somehow passing details of the errors to it.
Logically I will think that sending it to another file would increase the time process it, but is that true?
No. It makes no difference on the time scales being dealt with.
When I have the PHP script inside the same file, the page doesnt seem to reload when I hit the submit button (but the content changes).
It does reload.
If it does, wouldn't the only difference would be that I would have to type the script for the menu (lets say you have the same menu on all pages) in both files?
That's what includes are for.
In any language we always try to write clean code. That's why we follow MVC.
Logically I will think that sending it to another file would increase the time process it, but is that true? I think NO.
Because when we send data to another page and on another page at the top we echo that post data and exit. you will see it will not take time. it take time when we redirect/load some html page after that.
It does not matter where we sending data (same page or another page). matter is what is loading after that.
There is no difference about speed.
Whetever you post the content of your form in standard submit, this data will be sent to the server and a response (after processing ) will be downloaded.
The only difference is about organization of your code. The logic that draws themplate of page (menu or other fixed parts) should be stored in some file that you can include separately or call by a function.
Is also true that when you post your data you do for some reason, register a user for example. Is a good pratice that the php file that handles user registration will do that and output the messages and not other functions.
If your file has some logic switches that make it output either an empty form or a a registration message based on the presence of post or get variables, you will notice that when you scale to more complex tasks this will add complexity and make code mantainment harder.
I'll try to make sure I understand your question by restating it.
If you have a form (/form.php), and the "action" of that submit button leads you to a separate php page (/form_action.php), there is absolutely no difference in speed. Each HTTP request (form.php and form_action.php) is independent - "form_action.php" doesn't remember anything about "form.php" unless you pass that information through (as parameters). This is what people mean when they say that HTTP is stateless. It's worth learning about how HTTP works in general alongside the details of PHP.
If you have a PHP script which in turn includes other PHP scripts, there is a tiny performance impact - too small to measure in pretty much any case I've ever come across.
However, using includes allows you to separate your markup (the HTML) from the logic (the PHP). This is a really good thing if you are doing anything other than tinkering. It allows you to re-use functionality, it makes it much easier to change and maintain the code over time, and it helps you think through what you're trying to achieve.
There are many different ways people have solved the "how do I keep my code clean" puzzle; the current orthodoxy is "Model-View-Controller" (as #monty says). There are also PHP frameworks which make this a little easier to implement - once you've got the basics of the language, you might want to look at Zend or TinyMVC (there are several others, each with their benefits and drawbacks).
I have a PHP page with a simple form. One input text field & a button. Input text field accepts user queries & on button click an HTTP GET request is made to the server & the result has to be shown back in the same page containing the form. That's too simple to do. I can do this in two ways. One is AJAX & other one is the good old sodding form-submit method.
My question is simple- Which method should I use? Since both of the roads lead us to the same place, which one should I choose to travel?
First of all, let me talk about form submit method. I can use <?php echo $_SERVER['PHP_SELF'] ; ?> as the action of the form for submitting the values of my form to the same page. Once I store those values into some random variables, I can make a GET request & obtain the result & show it to the world. This method is easy to use. Happy Down Voting to all of you.
Or I can make a GET request using AJAX and jQuery or JavaScript or whatever you wish to use & obtain the same result as in the previous case. Output is same. Only the mode of execution is different.
So which one is better? Which one fetches result faster? And why should I use that? Is there any difference? GET, POST, PUT or whatever- it doesn't really matter. AJAX or form-submit?
There shouldn't be any significant, genuine speed difference between them.
The Ajax approach will load a smaller amount of data (since you aren't loading an entire HTML document), but by the time you take into account HTTP compression and the fact that (if your system is sensibly configured) your dependancies (images, scripts, stylesheets, etc) will be cached, it won't be significantly smaller.
By using JavaScript to create a loading indicator and not refreshing the entire window in front of the user, you can create the illusion of a faster load time though. So if feeling faster was the only concern, then Ajax is the way forward.
Using JavaScript, however, is more complicated and slightly more prone to failure. The consequences of failure states are more severe because, unless your code detects and does something with them, the user will (not) see it fail silently. For example, if a normal page load times out because the user is on a train and went through a tunnel, they'll see an error page provided by their browser suggesting that they refresh and try again. With Ajax, you need to write the error handling code yourself. This does give you more flexibility (such as allowing you to simply try again a few times) but the work isn't done for you.
The other consequence of using Ajax is that the address bar will not update automatically. This means that the results won't be bookmarkable or sharable unless you do something explicit the make that possible. The usual way to do that is pushState and friends, but again, it is more work.
You should also make the site work without JavaScript so that if the JS doesn't run for any reason then the site won't break completely. If you use pushState then you have to do this for the URLs you are setting the address bar to point to to be useful.
The short answer: Use a regular form submission, then consider layering JavaScript over the top if you think it will give your visitors a worthwhile benefit.
I Should stick to an Ajax request when possible.
This because you then don't really have to load every single item on the page again ( like all the images, menu and so on ). You can just give the relevant HTML back and JQuery can place it inside the relevant holder.
But that is just my humble opinion...
If you have to retrive simple data from server without reload the page my advice is use jquery .get o .post
also it provides you a very large API that allows you to reduce your programming time.
http://api.jquery.com/
obviously the execution time increase but in my experience the user cant fell the differce with a simple ajax request.
so in my opinion if jquery allow you to obtain the results, this is the best solution because halves your work time!
See the edited one it may help you.
I think that AJAX should be used for displays updates and form submissions should be done via a page reload. Reasoning?
When submitting forms, you are telling the application to do something. Users tend to want to feel that it was done. When a page doesn't reload, users are often left wondering "Did that work?". Then they have to check to make sure what they did was right.
but when you are displaying a table or something, and the user says to "display x data....now x1 data" for instance, they aren't "doing" something (creating new entities, sending emails, etc). So AJAX can provide a nice user interface in this case. Page reloads would be annoying here.
In conclusion, I think form submission should be done via page reloads (let the user see it working), whereas display updates should use AJAX (prevent annoying page reloads).
Of course, this is a preference thing. Some of my company's applications use AJAX all over. But those are the applications that are the most difficult to maintain and debug. ;)``
I asked a similar question before, and the answer was simply:
if JavaScript can do it, then any client can do it.
But I still want to find out a way do restrict AJAX calls to JavaScript.
The reason is :
I'm building a web application, when a user clicks on an image, tagged like this:
<img src='src.jpg' data-id='42'/>
JavaScript calls a PHP page like this:
$.ajax("action.php?action=click&id=42");
then action.php inserts rows in database.
But I'm afraid that some users can automate entries that "clicks" all the id's and such, by calling necessary url's, since they are visible in the source code.
How can I prevent such a thing, and make sure it works only on click, and not by calling the url from a browser tab?
p.s.
I think a possible solution would be using encryption, like generate a key on user visit, and call the action page with that key, or hash/md5sum/whatever of it. But I think it can be done without transforming it into a security problem. Am I right ? Moreover, I'm not sure this method is a solution, since I don't know anything about this kind of security, or it's implementation.
I'm not sure there is a 100% secure answer. A combination of a server generated token that is inserted into a hidden form element and anti-automation techniques like limiting the number of requests over a certain time period is the best thing I can come up with.
[EDIT]
Actually a good solution would be to use CAPTCHAS
Your question isn't really "How can I tell AJAX from non-AJAX?" It's "How do I stop someone inflating a score by repeated clicks and ballot stuffing?"
In answer to the question you asked, the answer you quoted was essentially right. There is no reliable way to determine whether a request is being made by AJAX, a particular browser, a CURL session or a guy typing raw HTTP commands into a telnet session. We might see a browser or a script or an app, but all PHP sees is:
GET /resource.html HTTP/1.1
host:www.example.com
If there's some convenience reason for wanting to know whether a request was AJAX, some javascript libraries such as jQuery add an additional HTTP header to AJAX requests that you can look for, or you could manually add a header or include a field to your payload such as AJAX=1. Then you can check for those server side and take whatever action you think should be made for an AJAX request.
Of course there's nothing stopping me using CURL to make the same request with the right headers set to make the server think it's an AJAX request. You should therefore only use such tricks where whether or not the request was AJAX is of interest so you can format the response properly (send a HTML page if it's not AJAX, or JSON if it is). The security of your application can't rely on such tricks, and if the design of your application requires the ability to tell AJAX from non-AJAX for security or reliability reasons then you need to rethink the design of your application.
In answer to what you're actually trying to achieve, there are a couple of approaches. None are completely reliable, though. The first approach is to deposit a cookie on the user's machine on first click, and to ignore any subsequent requests from that user agent if the cookie is in any subsequent requests. It's a fairly simple, lightweight approach, but it's also easily defeated by simply deleting the cookie, or refusing to accept it in the first place.
Alternatively, when the user makes the AJAX request, you can record some information about the requesting user agent along with the fact that a click was submitted. You can, for example store a hash (stored with something stronger than MD5!) of the client's IP and user agent string, along with a timestamp for the click. If you see a lot of the same hash, along with closely grouped timestamps, then there's possibly abuse being attempted. Again, this trick isn't 100% reliable because user agents can see any string they want as their user agent string.
Use post method instead of get.Read the documentation here http://api.jquery.com/jQuery.post/ to learn how to use post method in jquery
You could, for example, implement a check if the request is really done with AJAX, and not by just calling the URL.
if(!empty($_SERVER['HTTP_X_REQUESTED_WITH']) && strtolower($_SERVER['HTTP_X_REQUESTED_WITH']) == 'xmlhttprequest') {
// Yay, it is ajax!
} else {
// no AJAX, man..
}
This solution may need more reflexion but might do the trick
You could use tokens as stated in Slicedpan's answer. When serving your page, you would generate uuids for each images and store them in session / database.
Then serve your html as
<img src='src.jpg' data-id='42' data-uuid='uuidgenerated'/>
Your ajax request would become
$.ajax("action.php?action=click&uuid=uuidgenerated");
Then on php side, check for the uuid in your memory/database, and allow or not the transaction. (You can also check for custom headers sent on ajax as stated in other responses)
You would also need to purge uuids, on token lifetime, on window unload, on session expired...
This method won't allow you to know if the request comes from an xhr but you'll be able to limit their number.
Good day,
I would like to know how to protect my website from ajax-spam. I'm looking to limit any ajax action per
users. Let's say 8 ajax-actions per minute.
An example of an action would be: a button to add/remove a blog posts "as my favorites".
Unless I'm wrong, I believe the best way would be using $_SESSION's variable and to avoid someone/a bot to clear
cookies to avoid my protection. I'm allowing ajax-functions only to logged-on users.
Using database would make my protection useless because it's the unwanted database's writes I'm trying to avoid.
I have to mention that I actually use PHP as server-language and jQuery to proceeds my ajax calls.
Thank you
Edit:
The sentense
... to protect my website ...
is confusing but it's not about cross-domain ajax.
Edit 2011-04-20:
I added a bounty of 50 to it.
Since you're only allowing AJAX actions to logged in users, this is really simple to solve.
Create a timestamp field for each account. You can do this in the database, or leverage Memcached, or alternatively use a flat file.
Each time the user makes a request through your AJAX interface, add the current timestamp to your records, and:
Check to make sure the last eight timestamps aren't all before one minute ago.
From there you can add additional magic, like tempbanning accounts that flagrantly violate the speed limit, or comparing the IPs of violators against blacklists of known spammers, et cetera.
Are you talking about specific ajax-spam to your site, or ajax-spam in general?
If the latter, you can use hashes to prevent auto-sending forms, i.e. write your hash() one-way function which takes string and makes sha1-checksum of it.
So that's how you use it:
// the page is a blog post #357
$id = 357;
$type = 'post';
$hash = hash($_SERVER['REMOTE_ADDR'].$type.$id);
Put that hash in hidden field which is not within the comment form or even hidden div, somewhere at the bottom of the page, and name it "control_hash" or something. Attach it's value to the ajax-request on form submit. When the form is received by the script, make a new hash from $_REQUEST data (excluding existing $control_hash) and check if they match.
If the form was submitted by bot, it won't have $control_hash, so it won't pass.
Yes, your idea in principle is good. Some things to consider though:
If you track the limits globally then you may run into the issue of a bot DoSing your server and preventing legitimate users from being able to use your "Favourite" button.
If you track the requests based on their IP then someone could use a bot network (multiple IPs) to get around your blocking. Depending on your site and what your preference is, perhaps limit based on a subnet of the IP.
Install and use Memcache to store and track the requests, particularly if you are going to be tracking based on the IP. This should be faster than using session variables (someone might correct me on this).
If you have access to the source code of the web-site, you can rewrite some of the javascript code that actually performs AJAX-request. I.e. your pages can have a hidden counter field, that is incremented every time a user clicks the button. And also you can have a timefield hidden on the page, in order to rate the frequency of clicks.
The idea is that you don't even have to send anything to the server at all - just check it on the client side inside the script. Of course, that will not help against the bots adressing directly to the server.
It really depends on the result of such a spam. If you just want to avoid writing to your database, all these check could end up taking more ressources than actually writing to the database.
Does the end justify the means?
You also have to judge what's the probability of such a spam. Most bots are not very smart and will miserably fail when there's some logging involved.
Just my 2 cents, the other answers are perfectly valid to avoid spam.
Buy more powerful hosting to be able serve requests, don't limit them.
8 requests per minute it's ridiculous.
Anyway, if requests are 'legal', you should find ways how to serve requests, not how to limit them. And if not 'legal' - then deny them without any 'time' limitations.
You can use a session field with a global variable holding the time of last ajax request. Since you want to allow 8 requests, make it an array of size 8 and check for the time differences. If it increases, (important) it might not always be a bot. give the user a chance with captcha or something similar. (a math problem maybe?)
once the captcha is validated, allow the next few posts etc..
But do make sure that you are checking for that particular session and user.
Kerin's answer is good, I just wanted to emphasize on captcha.
yes you need to use a function in every function views can interact, also, it should be in global library so you can use it anywhere.
if(is_logged_in())
{
// do you code here
}
while is_logged in is defined as follows
function is_logged_in($activated = TRUE)
{
return $this->ci->session->userdata('status') === ($activated ? STATUS_ACTIVATED : STATUS_NOT_ACTIVATED);
}
you should set the status session when user login successfully.
I am hitting a lot of different sites to get a list of information and I want to display this information as I get it. Right now I am using a Smarty Template and what I would like to do is:
Pseudo code:
{foreach $page}
$smarty_var = use AJAX to call a PHP function
Render out a new table row on the fly w/ the newly assigned var
<tr><td>{$smarty_var}</td></tr>
{/foreach}
I don't know much about AJAX, I used it a long time ago, and it was similar to this, but not quite, there was user action taken. No I don't have a JS Framework in place. Am I way off here on how this should go? Basically I want to display a table row as data comes available, each table row will be a request to get the data from another site.
Sure, I will tell you about what I am trying to do: http://bookscouter.com/prices.php?isbn=0132184745+&x=19&y=6
If you click on the 'Click to view prices from all 43 links' at the bottom on that page you will see. I am using cURL to get all the pages I want a price from. Then for each page I want to get the price. So each page is gonna fire off a function that runs some fun code like this:
function parseTBDOTpageNew($page, $isbn)
{
$first_cut = preg_split('/<table[^>]*>/', $page);
$second_cut = preg_split('/<td[^>]*>/', $first_cut[2]);
if(strstr($second_cut[4], "not currently buying this book") == true)
{
return "\$0.00";
}
$third_cut = preg_split('/<b[^>]*>/', $second_cut[9]);
$last_cut = preg_split('/</', $third_cut[3]);
return $last_cut[0];
}
This function is called from another function which puts the price returned from the function above, the name of the company, and a link in an array to be added to another bigger array that is sent to smarty. Instead of doing that, I just want to get the first array that is returned with individual information and add the values into a table row on the fly.
I will take your advice on Jquery, what I have started is an onload function that receives the $pages to be parsed, and I was just in the middle of writing: foreach page get the info and spit some html w/ the info on the page.
Also the function that calls the function to get the price is in a php file, so I need the request to hit a function within a php file and NOT just call file.php?param1=foo, I need to it to actually hit the function in the file. I have Jquery in place, now just trying to figure it out and get it to do what I need, ugh. I am searching, any help would be appreciated.
No I don't have a JS Framework in place
Fix that first. You don't want to juggle XMLHTTPRequests yourself. jQuery is SO's canonical JS library, and is pretty nifty.
Basically I want to display a table row as data comes available, each table row will be a request to get the data from another site.
How many rows will you be dealing with? Do they all have to be loaded asynchronously?
Let's tackle this in a braindead, straightforward way. Create a script that does nothing more than:
Take a site ID and fetch data from the corresponding URL
Render that data to some data transport format, either HTML or JSON.
Then it's a simple matter of making the page that the user gets, which will contain Javascript code that makes the ajax calls to the data fetcher, then either shoves the HTML in the page directly, or transforms the data into HTML and then shoves that into the page.
You'll note that at no point is Smarty really involved. ;)
This solution is highly impractical for anything more than a trivial number of sites to be polled asynchronously. If you need rows for dozens or hundreds of sites, that means each client is going to need to make dozens or hundreds of requests to your site for every single normal pageview. This is going to slaughter your server if more than one or two people load the page at once.
Can you tell us more about what you're doing, and what you're trying to accomplish? There are lots of ways to mitigate this problem, but they all depend on what you're doing.
Update for your question edit.
First, please consider using an actual HTML parser instead of regular expressions. The DOM is very powerful and you can target specific elements using XPath.
Instead of doing that, I just want to get the first array that is returned with individual information and add the values into a table row on the fly.
So, here's the ultimate problem. You want to do something asynchronously. PHP does not have a built-in generalized way to perform asynchronous tasks. There are a few ways to deal with this problem.
The first is as I've described above. Instead of doing any of the curl requests on page load, you farm the work out to the end user, and have the end user's browser make requests to your scraping script one by one, filling in the results.
The second is to use an asynchronous work queue, like Gearman. It has excellent PHP support via a PECL extension. You'd write one or more workers that can accept requests, and keep a pool of them running at all times. The larger the pool, the more things you can do at once. Once all of the data has returned, you can throw the complete set of data at your template engine, and call it good.
You can even combine the two, having the user make only one or two or three extra requests via ajax to fetch part of the returned data. Heck, you can even kick off the jobs in the background and return the page immediately, then request the results of the background jobs later via ajax.
Regardless of which way you handle it, you have a giant, huge problem. You're scraping someone's site. You may well be scraping someone's site very often. Not everyone is OK with this. You should seriously consider caching results aggressively, or even checking with each of the vendors to see if they have an API or data export that you can query against instead.