PHP Security Advice on $_GET (combining clean URLs with query string) - php

I am using "clean" URLs like this:
http://localhost/controller/action/param
I access the parameters with a custom function like this my_get(1), my_get(2), etc...
However there are times where I think I need to combine them with query strings.
For example: If I need parameter values containing paths with several slashes like:
http://localhost/controller/action/param?mypath=foo/bar/qux.jpg
I do that because it would be a little harder to implement if done with clean URL.
Now my question is, in combining clean URL and with query string, I only intend to allow this character class:
[.&=a-z0-9\/_-]
I was wondering would there be any security issue with it? Should I disallow certain characters?

Don't mind about string formatting, but please validate the path passed... In the example you said: " in the example above, mypath's value will be deleted with unlink();", well, if you don't validate it in worst cases an attacker could delete any file on the filesystem of the server... ;)
So don't bother about validating the string with a regex, but validate the content of the string and make it safe for your environment... :)

Related

Do I need to sanitize input to file_exists?

I can't seem to find a reference. I am assuming the PHP function file_exists uses system calls on linux and that these are safe for any string that does not contain a \0 character, but I would like to be sure.
Does anyone have (preferably non-anecdotal) information regarding this? Is is vulnerable to injection if I don't check the strings first?
I guess you need to, because the user may enter something like :
../../../somewhere_else/some_file and access a file that he is not allowed to access .
I suggest that you generate the absolute path of the file independently in your php code and just get the file name from user by basename()
or exclude any input containing ../ like :
$escaped_input = str_replace("../","",$input);
It depends on what you're trying to protect against.
file_exists doesn't do any writing to disk, which means that the worst that can happen is that someone gains some information about your file system or the existence of files that you have.
In practice however, if you're doing something later on with the same file that was previously checked with file_exists, such as includeing it, you may wish to perform more stringent checks.
I'm assuming that you may be passing arbitrary values, possibly sourced from user input, into this function.
If that is the case, it somewhat depends on why you actually need to use file_exists in the first place. In general, for any filesystem function that the user can pass values directly into, I'd try to filter out the string as much as possible. This is really just being pedantic and on the safe side, and may be unnecessary in practice.
So, for example, if you only ever need to check the existence of a file in a single directory, you should probably strip out directory delimiters of all sorts.
From personal experience, I've only ever passed user input into a file_exists call for mapping to a controller file, in which case, I'd just strip out any non-alphanumeric + underscore character.
UPDATE: reading your comments recently added, no there aren't special characters as this isn't executed in a shell. Even \0 should be fine, at least on newer PHP versions (I believe older ones would cut the string before the \0 when sent to underlying filesystem calls).

Is it safe to use (strip_tags, stripslashes, trim) to clear variable that holds URLs

It's quite pleasure to be posting my first question in here :-)
I'm running a URL Shortening / Redirecting service, PHP written.
I aim to store and handle valid URLs data as much as possible within my service.
I noticed that sometimes, invalid URL data is being handled over to the database, holding invalid characters (like spaces in the end or beginning of the URL).
I decided to make my URL-Check mechanism trim, stripslashes and strip_tags the values before storing them.
As far as I can think, these functions will not remove valid charterers that any URL may have.
Kindly, just correct me or advise me if I'm going into the wrong direction.
Regards..
If you're already trimming the incoming variable, as well as filtering it with the other built in PHP methods, and STILL running into issues, try changing the collation of your table to UTF-8 and see if that helps you get rid of the special characters you mention. (Could you paste a few examples to let us know?)

How to escape input in PHP?

I have a PHP page that accepts input from a form post, but instead of directing that input to a database it is being used to retrieve a file from the file system. What is a good method for escaping a string destined for the file system rather then a database? Is mysql_real_escape_string() appropriate?
If you're using user-provided input to specify a filename directory, you'll have to make sure that the provided filename/path isn't trying to break "out" of your site's playground.
e.g. having something like
readfile($_GET['filepath']);
will send out ANYTHING on your server that the attack knows the path for. Even something like
readfile('/path/to/your/site/download/' . $_GET['filepath']);
accomplishes the same, if the user specifies enough '../../../' to get to whatever file they want.
mysql_real_escape_string() is NOT appropriate for this, as you're not doing a database operation. Use appropriate tools for appropriate jobs. In a goofy way, m_r_e_s() is a banana, and you need a giraffe. Something like
readfile('/path/to/your/site/download/' . basename($_GET['filepath']));
would be relatively save as basename() will extract only the filename portion of the user-provided file, so even if they pass in ../../../../../etc/passwd, basename will return only passwd.
You always only need to escape characters that are otherwise interpreted by your target system. For databases you usually make sure to escape quotes so you use mysql_real_escape_string or others. If your target is html, you usually use htmlspecialchars to make sure you get rid of html special characters (namely <, > and &). If your target is CSV, you basically only need to make sure line breaks and the CSV separator are escaped.
So depending on your target you can either reuse an existing escape function, define your own, or even go without one. If all you do is dump the input in a single file, then there is not much you need to take care of, as long as you specify the filename and that file is never used (or interpreted) by anything else than your application.
So think of what kind of special characters your target format requires for it to work, and simply escape those. You can usually ignore the rest.
edit:
If you want to use the input as the file path or file name, you can simply decide yourself how gracious you are, and what characters you want to support. A simple method would be to replace everything except latin characters and numbers (and maybe some special characters like _ and -) by something else. For example:
preg_replace( '/[^A-Za-z0-9_-]/', '_', $text );

Detect random strings

I am building a string to detect whether filename makes sense or if they are completely random with PHP. I'm using regular expressions.
A valid filename = sample-image-25.jpg
A random filename = 46347sdga467234626.jpg
I want to check if the filename makes sense or not, if not, I want to alert the user to fix the filename before continuing.
Any help?
I'm not really sure that's possible because I'm not sure it's possible to define "random" in a way the computer will understand sufficiently well.
"umiarkowany" looks random, but it's a perfectly valid word I pulled off the Polish Wikipedia page for South Korea.
My advice is to think more deeply about why this design detail is important, and look for a more feasible solution to the underlying problem.
You need way to much work on that. You should make an huge array of most-used-word (like a dictionary) and check if most of the work inside the file (maybe separated by - or _) are there and it will have huge bugs.
Basically you will need of
explode()
implode()
array_search() or in_array()
Take the string and look for a piece glue like "_" or "-" with preg_match(); if there are some, explode the string into an array and compare that array with the dictionary array.
Or, since almost every words has alternate vowel and consonants you could make an huge script that checks whatever most of the words inside the file name are considered "not-random" generated. But the problem will be the same: why do you need of that? Check for a more flexible solution.
Notice:
Consider that even a simple-and-friendly-file.png could be the result of a string generator.
Good luck with that.

Regex to validate URL - Not checking for HTTP?

I know there are tonns of questions on here to validate a web address with something like this
/^[a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i
The only problem is, not everybody uses the http:// or whatever comes before so i wanted to find a way to use the preg_match() but not checking for http as a must have but more of a doesn't really matter, i modified it to this but then it rejects the url it it does have http:// in it:
/^[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+$/i
I was hoping more to validate it on these conditions
If it has http:// or www then just ignore this
If the .extension is longer than 9 then reject
If it contains no full stops
Anybody got an idea, thanks :)
Can't you just use the built in filter_var function?
filter_var('example.com', FILTER_VALIDATE_URL);
Not sure about the nine chars extension limit, but I guess you could easily check this in an additional step.
Why not have a stage before the regexp to simply remove the http:// if present ? The same would apply to the www. That may make your life a bit easier.
/^(http\://|www\.)/
/^.+?\.\S{0,9}\./
/\./
Those should work for your bullet points?
not everybody uses the http://
They should. Without a scheme it simply isn't a URL, and omitting it can cause weird problems. For example:
www.example.com:8080/file.txt
This is a valid URL with the non-existant scheme www.example.com:.
If you are sure that the normal scheme should be http:, you could try automatically appending http:// to ‘fix up’ any URL that doesn't begin with https?:, before validation. But you shouldn't allow/keep/return schemeless URLs over the longer term.
Incidentally the current regex you are using is a long way from accurate according to the official URI syntax (see RFC 3986). It will disallow many valid URI characters, not to mention Unicode characters in IRI. If you want a proper validation you should use a real URL-parser; if you just want a quick check for obvious problems you should use something much more permissive. For example just checking for the absence of categorically-invalid characters like space and ".

Categories