Query string coming through get - php

Is it safe to query string coming through GET, as long as
it itself isn't sensitive data
I filter_input the GET string
I mysqli_real_escape_string the string
not relevant but I trim it too
I don't want any security issues.

There's very little difference between GET and POST as regards security. The main differences are:
GET parameters will be visible in the location bar of the browser, unless you send a redirect
The maximum size of GET parameters is very small, around 512 characters in PHP. POST parameters can be much larger.

As long as you aren't sending any sensitive data through a GET request there is nothing to worry about.

Related

PHP stripping null byte

I have an application where I need to be able to let the users put anything into an input field and am having issues with null bytes. I am passing the data via AJAX to PHP 5.5 and can see it's being passed from the AJAX request correctly, but when I immediately var_dump the $_POST on the PHP side, a string that contained '%00' comes through as ''. As an aside, I'm protecting my database from injection by using query bindings. Also, the user base is exclusively internal to my company. So, I'm not really concerned with the security aspect of it. How can I get PHP to let these null bytes through?
Echoing a null-byte in PHP would indeed result in an empty-looking string, so this makes perfect sense.
echo chr(0); // outputs nothing

byte array over $_GET request feasibility?

I hope this will be a relatively quick question.
If i send a byte array via the URL and retrieve it from a $_GET request in PHP server side scripts will the url be capable of transmitting the byte array? Is a URL capable of being long enough for this purpose? or do i need another way to transmit the byte array?
example of what im attempting: http://www.website.com/scrypt.php?image="bytearray"
better yet is there a best practices for transmitting this data from say an Android app to php?
As long as it doesn't exceed the limit for a URL or contain reserved characters that would be interpreted by the CGI...you're all set. Go for it.

Is it worth it to check the length of the input (too large) before querying the database or the db takes care of it?

I have a dynamic PHP web app which gets input params in the url (no surprise here). However, bingbot sometimes requests etremely long URLs from the site. E.g. > 10000 characters long urls. One of the inputs is an UTF name and bingbot somehow submits sketchy input names, thousands of characters long like this: \xc2\x83\xc3\x86... (goes on for thousands of characters).
Obviously, it gets a 404, because there is no such name in the database (and therefore no such page), but it occurred to me whether it's worth it to check the input length before querying the db (e.g. a name cannot be more than 100 characters long) and return a 404 instantly if it's too long. Is it standard practice? Or it's not worth the trouble, because the db handles it?
I'm thinking of not putting extra load on the db unnecessarily. Is this long input submitted as is by the db client interface (two calls: first a prepare for sanitizing the input and then the actual query) or the php db client knows the column size and truncates the input string before sending it down the wire?
Not only what you're asking is more than legit, but I'd say it's something that you should be doing as part of the input filtering/validation. If you expect your input to be always shorter than 100 characters, everything that's longer should be filtered.
Also, it appears that you're getting UTF-8 strings: if you're not expecting them, you could simply filter out all characters that are not part of the standard ASCII set (even reduced, filtering all control characters away. For example $string = filter_var($input, FILTER_SANITIZE_FULL_SPECIAL_CHARS, FILTER_FLAG_STRIP_LOW).
This is not just a matter of DB performance, but also security!
PS: I hardly doubt that bot is actually Bing. Seems like a bot trying to hack your website.
Addendum: some suggestions about input validation
As I wrote above in some comments (and as others have written too), you should always validate every input. No matter what is that or where it comes from: if it comes from outside, it has to be validated.
The general idea is to validate your input accordingly to what you're expecting. With $input any input variable (anything coming from $_GET, $_POST, $_COOKIE, from external API's and from many $_SERVER variables as well - plus anything more that could be altered by a user, use your judgement and in doubt be overly cautious).
If you're requesting an integer or float number, then it's easy: just cast the input to (int) or (float)
$filtered = (int)$input;
$filtered = (float)$input;
If you're requesting a string, then it's more complicated. You should think about what kind of string you are requesting, and filter it accordingly. For example:
If you're expecting a string like a hexadecimal id (like some databases use), then filter all characters outside the 0-9A-Fa-f range: $filtered = preg_replace('/[^0-9A-Fa-f]/', '', $input);
If you're expecting an alphanumeric ID, filter it, removing all characters that are not part of that ASCII range. You can use the code posted above: $string = filter_var($input, FILTER_SANITIZE_FULL_SPECIAL_CHARS, FILTER_FLAG_STRIP_LOW);. This one removes all control characters too.
If you're expecting your input to be Unicode UTF-8, validate it. For example, see this function: https://stackoverflow.com/a/1523574/192024
In addition to this:
Always encode HTML tags. FILTER_SANITIZE_FULL_SPECIAL_CHARS will do that as well on filter_var. If you don't do that, you risk XSS (Cross-Site Scripting) attacks.
If you want to remove control characters and encode HTML entities but without removing the newline chracters (\n and \r), then you can use: $filtered = preg_replace('/[\x00-\x09\x0B\x0C\x0E-\x1F\x7F]/u', '', htmlspecialchars($input, ENT_COMPAT, 'UTF-8'));
And much more. Use your judgement always.
PS: My approach to input filtering is to prefer sanitization. That is, remove everything "dangerous" and accept the sanitized input as if that was what the user wrote. Other persons will instead argue that input should only be accepted or refused.
Personally, I prefer the "sanitize and use" approach for web applications, as your users still may want to see something more than an error web page; on desktop/mobile apps I go with the "accept or refuse" method instead.
However, that's just a matter of personal preference, backed only by what my guts tell me about UX. You're free to follow the approach you prefer.
There should be some sort of validation done on any data before it is used in a query. If you have a limit on the length of the name, then you could use that as part of the validation when checking the input. If it's over the limit, it can't be in there and then handle it accordingly. Whether it's a 404 or a page that displays an error message.
The load will go down if you are bypassing queries because a name is too long. Depending on how you are querying the database, LIKE or MATCH AGAINST and how your indexes are set up, will determine just how much load will go down.

How can I get raw request data when the request is malformed?

I have been investigating a problem with data sent to my PHP webservice with a POST request and who were sometimes truncated somewhere in the middle. I have found that it is due to an unescaped ampersand (&) which cut the data in the middle. For example if in POST the data is:
data=foobar&morethings
then I will only have "data" => "foobar" in my $_POST array and the "morethings" part is lost.
The obvious solution would be to fix the software which sends the POST request to my webservice so that it escapes ampersands, but this is not practically possible right now (we can not make our users update the software so easily). Therefore I have to find a temporary workaround.
Is there a possibility, from PHP to retrieve the raw data as it was sent to the webservice before it was parsed by whatever is cutting the POST data in pieces?
file_get_contents('php://input');
http://php.net/manual/en/wrappers.php.php
Yes, a couple options.
Use $xml = file_get_contents('php://input'); to read the raw post data.
Use $HTTP_RAW_POST_DATA. This is less reliable though as that variable isn't always populated depending on PHP ini settings.

Store html entities in database? Or convert when retrieved?

Quick question, is it a better idea to call htmlentities() (or htmlspecialchars()) before or after inserting data into the database?
Before: The new longer string will cause me to have to change the database to hold longer values in the field. (maxlength="800" could change to a 804 char string)
After: This will require a lot more server processing, and hundreds of calls to htmlspecialchars() could be made on every page load or AJAX load.
SOOO. Will converting when results are retrieved slow my code significantly? Should I change the DB?
I'd recommend storing the most raw form of the data in the database. That gives you the most flexibility when choosing how and where to output that data.
If you find that performance is a problem, you could cache the HTML-formatted version of this data somehow. Remember that premature optimization is a bad thing.
I have no experience of php but generally I always convert or escape nearest to output. You don't know when your output requirements will change, for example you may want to spit out data as XML, or JSON arrays and so escaping for HTML and then storing means you're limited to using the data as HTML alone.
In a php/MySQL web app, data flows in two ways
Database -> scripting language (php) -> HTML output -> browser ->screen
and
Keyboard-> browser-> $_POST -> php -> SQL statement -> database.
Data is defined as everything provided by the user.
ALWAYS ALWAYS ALWAYS....
A) process data through mysql_real_escape_string as you move it into an SQL statement, and
B) process data through htmlspecialchars as you move it into the HTML output.
This will protect you from sql injection attacks, and enable html characters and entities to display properly (unless you manage to forget one place, and then you have opened up a security hole).
Did I mention that this has to be done for every single piece of data any user could ever have touched, altered or provided via a script?
p.s. For performance reasons, use UTF-8 encoding everywhere.
It's best to store text as raw and encode it as needed, to be honest, you always need to htmlencode your data anyways when you're outputting it to the wbe page to prevent XSS hacking.
You shouldn't encode your data before you put it in the database. The main reason are:
If such data is near the column size limit, say 32 chars, if the title was "Steve & Fred blah blah" then you might go over that column limit because a 1 char & becomes a 5 char & amp;
You are assuming the data will always be displayed in a web page, in the future you never know where you'll be looking at the data and you might not want it encoded, now you have to decode it and it's possible you might not have access to PHP's decode function
It is the way of the craftsman to "measure twice, optimize once".
If you don't need high performance for your website, store it as raw data and when you output it do what you want.
If you need performance then consider storing it twice: raw data to do what you want with it and another field with the filtered data. It could be seen as redundant, but CPU is expensive, while data storage is really cheap.
The easiest way is store the data "as is" and then convert to htmlentities wherever it is needed.
The safest solution is to filter the data before it goes in into the Database as this prevents possible attacks on your server and database from the lack of security implementation, and then convert it however you need when needed. Also if you are using PDO this will happen automatically for you using prepared statements.
http://php.net/PDO
We had this debate at work recently. We decided to store the escaped values in the database, because before (when we were storing it unescaped) there were corner cases where data was being displayed without being escaped. This can lead to XSS. So we decided to store it escaped to be safe, and if you want it unescaped you have to do the work yourself.
Edit: So to everyone who disagrees, let me add some backstory for my case. Let's say you're working in a team of 50+ people... and data from the database is not guaranteed to be HTML-Encoded on the way out - there's no built-in mechanism for it so the developer has to write the code to do it. And this data is shown all over the place so it's not going through 1 developer's code it's going through 30's - most of whom have no clue about this data (or that it could even contain angle brackets which is rare) and merely want to get it shown on the page, move on, and forget about it.
Do you still think it's better to put the data, in HTML, into the database and rely on random people who are not-you to do things properly? Because frankly, while it certainly may not seem warm-fuzzy-best-practicey, I prefer to fail closed (meaning when the data comes through in a Word Doc it looks like Value<Stock rather than Value<Stock) rather than open (so the Word Doc looks right with no work, but some corner of the platform may/likely-is vulnerable to XSS). You can't have both.

Categories