A number of my pages are produced from results pulled from MySQL using $_Get. It means the urls end like this /park.php?park_id=1. Is this a security issue and would it be better to hide the query string from the URL? If so how do I go about doing it?
Also I have read somewhere that Google doesn't index URLs with a ?, this would be a problem as these are the main pages of my site. Any truth in this?
Thanks
It's only a security concern if this is sensitive information. For example, you send a user to this URL:
/park.php?park_id=1
Now the user knows that the park currently being viewed has a system identifier of "1" in the database. What happens if the user then manually requests this?:
/park.php?park_id=2
Have they compromised your security? If they're not allowed to view park ID 2 then this request should fail appropriately. But is it a problem is they happen to know that there's an ID of 1 or 2?
In either case, all the user is doing is making a request. The server-side code is responsible for appropriately handling that request. If the user is not permitted to view that data, deny the request. Don't try to stop the user from making the request, because they can always find a way. (They can just manually type it in. Even without ever having visited your site in the first place.) The security takes place in responding to the request, not in making it.
There is some data they're not allowed to know. But an ID probably isn't that data. (Or at least shouldn't be, because numeric IDs are very easy to guess.)
No, there is absolutely no truth to it.
ANY data that comes from a client is subject to spoofing. No matter if it's in a query string, or a POST form or URL. It's as simple as that...
As far as "Google doesn't index URLs with a ?", who-ever told you that has no clue what they are talking about. There are "SEO" best practices, but they have nothing to do with "google doesn't index". It's MUCH more fine grained than that. And yes, Google will index you just fine.
#David does show one potential issue with using an identifier in a URL. In fact, this has a very specific name: A4: Insecure Direct Object Reference.
Note that it's not that using the ID is bad. It's that you need to authorize the user for the URL. So doing permissions soley by the links you show the user is BAD. But if you also authorize them when hitting the URL, you should be fine.
So no, in short, you're fine. You can go with "pretty urls", but don't feel that you have to because of anything you posted here...
Related
I don't understand how using a 'challenge token' would add any sort of prevention: what value should compared with what?
From OWASP:
In general, developers need only
generate this token once for the
current session. After initial
generation of this token, the value is
stored in the session and is utilized
for each subsequent request until the
session expires.
If I understand the process correctly, this is what happens.
I log in at http://example.com and a session/cookie is created containing this random token. Then, every form includes a hidden input also containing this random value from the session which is compared with the session/cookie upon form submission.
But what does that accomplish? Aren't you just taking session data, putting it in the page, and then comparing it with the exact same session data? Seems like circular reasoning. These articles keep talking about following the "same-origin policy" but that makes no sense, because all CSRF attacks ARE of the same origin as the user, just tricking the user into doing actions he/she didn't intend.
Is there any alternative other than appending the token to every single URL as a query string? Seems very ugly and impractical, and makes bookmarking harder for the user.
The attacker has no way to get the token. Therefore the requests won't take any effect.
I recommend this post from Gnucitizen. It has a pretty decent CSRF explanation: http://www.gnucitizen.org/blog/csrf-demystified/
CSRF Explained with an analogy - Example:
You open the front door of your house with a key.
Before you go inside, you speak to your neighbour
While you are having this conversation, walks in, while the door is still unlocked.
They go inside, pretending to be you!
Nobody inside your house notices anything different - your wife is like, ‘oh crud*, he’s home’.
The impersonator helps himself to all of your money, and perhaps plays some Xbox on the way out....
Summary
CSRF basically relies on the fact that you opened the door to your house and then left it open, allowing someone else to simply walk in and pretend to be you.
What is the way to solve this problem?
When you first open the door to your house, you are given a paper with a long and very random number written on it by your door man:
"ASDFLJWERLI2343234"
Now, if you wanna get into your own house, you have to present that piece of paper to the door man to get in.
So now when the impersonator tries to get into your house, the door man asks:
"What is the random number written on the paper?"
If the impersonator doesn't have the correct number, then he won't get in. Either that or he must guess the random number correctly - which is a very difficult task. What's worse is that the random number is valid for only 20 minutes (e.g). So know the impersonator must guess correctly, and not only that, he has only 20 minutes to get the right answer. That's way too much effort! So he gives up.
Granted, the analogy is a little strained, but I hope it is helpful to you.
**crud = (Create, Read, Updated Delete)
You need to keep researching this topic for your self, but I guess that's why you are posting to SO :). CSRF is a very serious and widespread vulnerability type that all web app developers should be aware of.
First of all, there is more than one same origin policy. But the most important part is that a script being hosted from http://whatever.com cannot READ data from http://victom.com, but it can SEND data via POST and GET. If the request only contains information that is known to the attacker, then the attacker can forge a request on the victom's browser and send it anywhere. Here are 3 XSRF exploits that are building requests that do not contain a random token.
If the site contains a random token then you have to use XSS to bypass the protection that the Same Origin Policy provides. Using XSS you can force javascript to "originate" from another domain, then it can use XmlHttpRequest to read the token and forge the request. Here is an exploit I wrote that does just that.
Is there any alternative other than
appending the token to every single
URL as a query string? Seems very ugly
and impractical, and makes bookmarking
harder for the user.
There is no reason to append the token to every URL on your site, as long as you ensure that all GET requests on your site are read-only. If you are using a GET request to modify data on the server, you'd have to protect it using a CSRF token.
The funny part with CSRF is that while an attacker can make any http request to your site, he cannot read back the response.
If you have GET urls without a random token, the attacker will be able to make a request, but he won't be able to read back the response. If that url changed some state on the server, the attackers job is done. But if just generated some html, the attacker has gained nothing and you have lost nothing.
I have a dynamic page where it should take data from a db. So the approach I thought of was to create the dynamic page with this php code at the top
<?php $pid = $_GET["pid"]; ?>
Then later in the file it connects to the database and shows the correct content according to the page ID ($pid). So on the home page, I want to add the links to display the correct pages. For example, the data for the "Advertise" page is saved in the database in the row where the pid is 100. So I added the link to the "Advertise" page on the homepage like this:
Advertise</li>
So my question is, anyone can see the value that's send on the link and play around by changing the pid. Is there an easy way to mask this value, or a safer method to send the value to the page.php?
The general concept you're looking for is Access Control. You have a resource (in this case, a page and its content), and you want to control who can access it (users, groups, etc), and probably how they can access it as well (for example, read-only, read-and-write, write-but-only-on-the-first-Monday-of-the-month, etc).
Defining the problem
The first thing you need to decide is which resources you need access control for, and which you don't. It sounds to me like some of these pages are supposed to be "public access" (thus they are listed on some kind of index page), while others are supposed to be restricted in some way.
Secondly, you need to come up with an access policy - this can be informally described for a small project, but larger projects usually have some structured system for defining this policy. For each resource, your policy should answer questions like:
Do you have some kind of user account system, and you only want account holders (or certain types of account holders) to access it? Or, are you going to send links to email addresses, and want to limit access to just those people who have the link?
What kind of access should each user have? Read-only? Should they be able to change the content as well (if your system supports that)?
Are there any other types of restrictions on a users' access? Group membership? Do they need to pay before they get access? Are they only allowed access at specific times?
Implementing your policy
Once you've answered these questions, you can start to think about implementation. As it stands, I think you are mixing up access control with identification. Your pid identifies a page (page 100, for example), but it doesn't do anything to limit access. If your pages are identified with a predictable numbering scheme, anyone can easily modify the number in the request (this is true for both GET requests, such as when you type a URL into an address bar, and POST requests, such as when you submit a form).
To securely control access there needs to be a key, usually a string that is very difficult to guess, which is required before access is granted. In very simple systems, it is perfectly fine for this key to be directly inserted in the URL, provided you can still keep the key secret from unauthorized users. This is exactly how Google Drive's "get a link to share" feature works. More complex systems will use either a server-side session or an API key to control access - but in the end, it's still a secret, difficult-to-guess string that the client (user or user's browser) sends to the server along with their request for the resource.
You can think of identification like your street address, which uniquely identifies your house but is not, and is not meant to be, secret. Access control is the key to your house. Only you and the people you've given a key to can actually get inside your house. If your lock is high quality, it will be difficult to pick the lock.
Bringing it together
Writing code is easy, designing software is hard. Before you can determine the solution best for you, you need to think ahead about the ramifications of what you decide. For example, do you anticipate needing to "change the keys" to these pages in the future? If so, you'll have to give your authorized users (the ones that are still supposed to have access) the new key when that happens. A user-account system decouples page access control from page identification, so you can remove one user's access without affecting everyone else.
On the other hand, you also need to think about the nature of your audience. Maybe your users don't want to have to make accounts? This is something that is going to be very specific to your audience.
I get the sense that you're still fairly new to web development, and that you're learning on your own. The hardest part of learning on one's own is "learning what to learn" - Stack Overflow is too specific, and textbooks are too general. So, I'm going to leave you with a short glossary of concepts that seem most relevant to your current problem:
Access control. This is the name of the general problem that you're trying to solve with this question.
Secrecy vs obscurity. When it comes to security, secrecy == good, obscurity == bad.
Web content management system. You've probably heard of Wordpress, but there are tons of others. I'm not sure what your system is supposed to do, but a content management system might solve these problems for you.
Reinventing the wheel. Good in the classroom, bad in the real world.
How does HTTP work. Short but to the point. A lot of questions I see on SO stem from a fundamental misunderstanding of how websites actually work. A website isn't so much a single piece of software, as a conversation between two players - the client (e.g. the user and their browser), and the server. The client can only say something to the server via a request, and the server can only say something to the client via a response. Usually, this conversation consists of the client asking for some resource (an HTML web page, a Javascript file, etc), to which the server responds. The server can either say "here you go, I got it for you", or respond with some kind of error ("I can't find it", "you're not allowed to see that", "I'm too busy right now", "I'm not working properly right now", etc).
PHP The Right Way. Something I wish I had found when I first started learning web development and PHP, not seven years later ;-)
It is always safer to $_POST when you can, but if you have to use something in the query string, it is safer to use a hash or GUID rather than something that is so obviously an auto-incremental value. It makes it harder to guess what the IDs would be. There are other ways values can be past between pages ($_SESSIONs, cookies etc), but it is really about what you want to achieve.
Sending it to php is not an issue, should be fine.
What php does with it afterwards... that's how you secure.
First thing I'd do is make sure it's an integer.
$pid=(is_int($_GET['pid']))? $_GET['pid'] : 1; //1 is the default pid, change this to whatever you want.
Now that you know you're dealing with an integer, use $pid after that and you should be good to go.
I have used URL Shortener services, such as goo.gl or bit.ly, to shorten long URLs in my applications using their respective APIs. These APIs are very convenient, unfortunately I have noticed that the long URL gets hit when they shorten it. Let me explain a bit the issue I have. Let's say for instance that I want users to validate something (such as an email address, or a confirmation) and propose to them in my application a link for them to visit in order to validate something. I take this long URL, and use the API to shorten it. The target link (a PHP script for example) is getting hit when I call the shorten API, which makes the validation process useless.
One solution would be to make an intermediate button on the target page which the user has to click to confirm, but that solution makes another step in the validation process, which I would like to simplify.
I would like to know if anyone has already encountered this problem of if anyone has a clue in how to solve it.
Thanks for nay help.
I can't speak to Google but at Bitly we crawl a portion of the URLs shortened via our service to support various product features (spam checking, title fetching, etc) which is the cause of the behavior you are seeing.
In this type of situation we make two recommendations:
Use robots.txt to mark relevant paths as "disallowed". This is a light form of protection as there's nothing forcing clients to respect robots.txt but well behaved bots like BitlyBot or GoogleBot will respect your robots.txt file.
As mentioned by dwhite.me in a comment and as you acknowledged in your post, it is usually best to not do any state changing actions in response to GET requests. As always there's a judgement call on the risks associated vs the added complexity of a safer approach.
I am using a URL to query some posts by their ID.
http://domain.com/page-name/?id=123
Visitors click the URL and will open the page and get the right post.
However, if anybody want, he can input this URL in browser and get the post, he can even get a lot of different posts if he knows other IDs. How can I reject this kind of query?
By the way, my site provide embed code for post. So, I need to enable access from other website.
The easiest way probably would be to check the HTTP Referer via $_SERVER['HTTP_REFERER'] and make sure the visitor clicked the link on one of your pages. This will, however, prevent any kind of bookmarking as well.
Another solution would be to use something else than IDs as URL parameter. Those would be hard to guess. You could use an MD5-Hash of the id + date or something instead of just the ID. (Of course you would have to store the hash in the database!)
On some pages you can see another approach. It is mainly used for search engine optimization, but can work for you as well. Generate a string from the title of the post (something like "news_new_blog_software") and store that in the database. Then use mod_rewrite to redirect all calls of http://domain.tld/post/* to a PHP file and over there check if the string after /post/ is in your database. This might look a little nicer than MD5 hashes, but you would have to ensure URL strings are not used several times.
If you want to make it really secure there is basically no other way than using some kind of login to check the access privileges.
However, if anybody want, he can input this URL in browser and get the post, he can even get a lot of different posts if he knows other IDs.
Exactly. That is the purpose of the World Wide Web.
And there is absolutely no reason in rejecting direct queries.
In fact, from the technological point of view, every request to you site is a "direct" one.
You are probably trying to solve some other problem (most likely imaginary one). If you care to tell it to us, you will get the right solution.
You can generate some kind of secret key and append it to the link URL, something like
http://domain.com/page-name/?id=123&key=1234567890
Some specific data required to generate this key is stored in cookie.
You can use md5 hash of random value + timestamp + page id, saving that random value to cookie. Every time you get a request, you check if key is present in request parameters, if user has cookie, then calculate hash and compare it with the one in the request.
you can pass id in hidden field and use post form method.
You need authorisation, not this. This would stop people clicking through to your site from search engines or other websites.
If you don't want to implement authorisation/login, then why not try implementing the First Click Free: http://support.google.com/webmasters/bin/answer.py?hl=en&answer=74536
I have an HTML menu with links like <a href="input&db=<some database>" There are multiple menu items and multiple databases, so I am using GET as my form method and using it in my menu item links.
However, that shows up in the browser's address bar (as /inout&db=mydatabase) & might lead the user to start guessing as to database names.
How can I have a list of links to the same page, in which only the database varies, using $_POST ?
EDIT: oops, my bad Shoulda said server-side only, so no JS
POST values will be just as obvious to anyone who would be savvy enough to do anything with this information. Unless you're building something like phpMyAdmin, you should never pass such internal information to the client side to begin with. And if you are, where's the harm? You do have proper authentication in place, don't you?
I think the only way to send request via post using links is to use JavaScript. But sending it via post is not secure at all; anyone can install FireBug to see the request.
Instead, I'll suggest a change to your design. Databases are usually at the bottom tier in an application hierarchy, and coupling page details with database sounds unnecessary. Maybe you should try to encapsulate pages so that they don't need to know which database they are reading from?
Granted, I have no idea of the scope of your application (you may be doing something like phpmyadmin). Then it may be unavoidable, and I will just suggest the usual combination of verification and sanctification all users' input and their rights.
Or you can just encrypt your database names. Still I would prefer a change to design.
Use the onclick event of the anchors to submit a hidden POST form, or to perform AJAX POST actions.
No. There are a few narrow and dangerous solutions you can apply:
Use an iframe : everything will work as before, but the actual address will not appear in the browser address bar.
Use AJAX to fetch data.
Replace the link with a form-submitting button or javascript: link.
These all solve the "database name appears in address bar" issue, however:
Anyone with even basic technical skills and appropriate tools (chrome, firebug) can determine the database name anyway by looking at the requests being sent out.
Not using GET can mess up the browser's back and refresh buttons, and prevent deep linking.
My suggestion would be to keep using GET as you currently are, but add a secret token to the URL (such as HMAC(db_name,secret_key)) that cannot be guessed by the user but can be easily checked for validity by the server. This way, unless you give the user a link to the database (with both database name and secret token), all the guessing in the world will not let them access it.
Neither GET or POST will hide your database name.
Even you using POST, view source will reveal the HTML.
In the first place, you should not expose your database name.
Or replace it with some fuzzy mapping
such as
input&db=A
input&db=B
Internally, do string matching and convertA to actual database name