creating a sanitizing function - php

Reading all the posts about sanitizing has left me so confused. I'm creating a blog type of site and need to sanitize user input which will go into a database (user profile information, blog posts, and comments) and certain id's and usernames from GET requests to use queries for information to display.
This is what I've pieced together based on what I've read:
function escape($data) {
global $conn;
connect();
$data = $conn->real_escape_string($data);
$conn->close();
$data = str_replace(chr(0), '', $data);
return $data;
}
function sanitize($data) {
$data = trim($data);
$data = strip_tags($data);
$data = stripslashes($data);
$data = escape($data);
$data = htmlspecialchars($data);
return $data;
}
The stripslashes confuses me a bit. I know PHP automatically puts those in GET and POST requests and double slashes can be a problem. Should I put addslashes() in the function after stripslashes to make sure it's okay?
For all insert and update statements the inserted values are bound using prepared statements, but all other statements are not prepared (and doing prepared statements on them would not be efficient at this stage in this project for various reasons).
I'd love to get your feedback. Like I said, this is all very confusing!
UPDATE:
I added the $data = str_replace(chr(0), '', $data); to protect against null byte injections. Is that right?
BTW, the only GET requests that go into queries are either ID numbers (which I have a function that removes everything but numbers on) or usernames. I'm using the escape function above to sanitize the username before going into any queries. Is that good enough?
The sanitize function I use on blog posts and profile info which is provided by the user and inserted into a table via a prepared statement.

function cleanInput($input) {
$search = array(
'#<script[^>]*?>.*?</script>#si', // Strip out javascript
'#<[\/\!]*?[^<>]*?>#si', // Strip out HTML tags
'#<style[^>]*?>.*?</style>#siU', // Strip style tags properly
'#<![\s\S]*?--[ \t\n\r]*>#' // Strip multi-line comments
);
$output = preg_replace($search, '', $input);
return $output;
}
function sanitize($input) {
if (is_array($input)) {
foreach ($input as $var => $val) {
$output[$var] = sanitize($val);
}
} else {
if (get_magic_quotes_gpc()) {
$input = stripslashes($input);
}
$input = cleanInput($input);
$output = mysql_real_escape_string($input);
}
return $output;
}

Related

Search Tags in Codeigniter

Hello everyone the search is not working. For example the tags contain Php, Ajax, HTML 5 when you search Php or Ajax there is a result but if you search two words or more such as HTML 5 there is no result.
my model Codes:
public function getTagsMatch($limit=null, $tags, $offset=null) {
$match = $tags;
$this->db->from('threads');
$this->db->where('status', 1);
$search_query_values = explode(' ', $match);
$counter = 0;
foreach ($search_query_values as $key => $value) {
if ($counter == 0) {
$this->db->like('tags', $value);
}
$counter++;
}
$this->db->order_by('pin_post', 'DESC');
$this->db->order_by('id', 'DESC');
$this->db->limit($limit);
$this->db->offset($offset);
$query = $this->db->get();
return $query->result_array();
}
Well...
$search_query_values = explode(' ', $match);
This line takes in query values and "explodes" the string using spaces. So if you input "HTML 5", it will actually search for tags like "HTML" or "5". Consider using a different character for exploding.
For example:
$search_query_values = explode(',', $match);
And the function call would be something like this:
getTagsMatch(NULL, 'Ajax,HTML 5,PHP', NULL);
A few more notes:
Default parameters should be at the end of an argument list (function foo(bar0, bar1=0, bar2='') { ... }, not randomly placed).
Consider using less variables. There is absolutely no reason to have $tags, $match and $search_query_values - one is enough. You might consider using 3 variables a semantic advantage, but it actually makes your code more difficult to read.
The problem of this was on my controller because. I customize the tags but the fact is Im having a trouble how to explain this since Im not good in explaining such thing. I will just post some of the codes with some explanation.
My tags search will base on what is the tags slug for example website.com/tags/html-5
so my controller codes:
public function tags($tags) {
$tag = search_title($tags);
$data['result'] = $this->topic_model->getTagsMatch($tag);
}
and my search_title function code:
function search_title($str, $separator = '&nbsp') {
$str = ucwords(strtolower($str));
foreach (array('-', '\'') as $delimiter) {
if (strpos($str, $delimiter)!==false)
{
$str =implode($delimiter, array_map('ucfirst', explode($delimiter, $str)));
}
}
$str = str_replace('-','&nbsp',$str);
$str = str_replace('%20','&nbsp',$str);
$str = str_replace('%26','&',$str);
$str = str_replace('%27','&nbsp',$str);
$str = str_replace('%28','&nbsp',$str);
$str = str_replace('%29','&nbsp',$str);
return trim(stripslashes($str));
}
the problem of it is on this line
$data['result'] = $this->topic_model->getTagsMatch($tag);
so I change the $tag to str_replace("-"," ",$tags)

preg_replace & mysql_real_escape_string problem cleaning SQL

check out the method below. If entered value in text box is \ mysql_real_escape_string will return duble backslash but preg_replace will return SQL with only one backslash. Im not that good with regular expression so plz help.
$sql = "INSERT INTO tbl SET val='?'";
$params = array('someval');
public function execute($sql, array $params){
$keys = array();
foreach ($params as $key => $value) {
$keys[] = '/[?]/';
if (get_magic_quotes_gpc()) {
$value = stripslashes($value);
}
$paramsEscaped[$key] = mysql_real_escape_string(trim($value));
}
$sql = preg_replace($keys, $paramsEscaped, $sql, 1, $count);
return $this->query($sql);
}
For me it basically looks like you're re-inventing the wheel and your concept has some serious flaws:
It assumes get_magic_quotes_gpc could be switched on. This feature is broken. You should not code against it. Instead make your application require that it is switched off.
mysql_real_escape_string needs a database link identifier to properly work. You are not providing any. This is a serious issue, you should change your concept.
You're actually not using prepared statements, but you mimic the syntax of those. This is fooling other developers who might think that it is safe to use the code while it is not. This is highly discouraged.
However let's do it, but just don't use preg_replace for the job. That's for various reasons, but especially, as the first pattern of ? results in replacing everything with the first parameter. It's inflexible to deal with the error-cases like too less or too many parameters/placeholders. And additionally imagine a string you insert contains a ? character as well. It would break it. Instead, the already processed part as well as the replacement needs to be skipped (Demo of such).
For that you need to go through, take it apart and process it:
public function execute($sql, array $params)
{
$params = array_map(array($this, 'filter_value'), $params);
$sql = $this->expand_placeholders($sql, $params);
return $this->query($sql);
}
public function filter_value($value)
{
if (get_magic_quotes_gpc())
{
$value = stripslashes($value);
}
$value = trim($value);
$value = mysql_real_escape_string($value);
return $value;
}
public function expand_placeholders($sql, array $params)
{
$sql = (string) $sql;
$params = array_values($params);
$offset = 0;
foreach($params as $param)
{
$place = strpos($sql, '?', $offset);
if ($place === false)
{
throw new InvalidArgumentException('Parameter / Placeholder count mismatch. Not enough placeholders for all parameters.');
}
$sql = substr_replace($sql, $param, $place, 1);
$offset = $place + strlen($param);
}
$place = strpos($sql, '?', $offset);
if ($place === false)
{
throw new InvalidArgumentException('Parameter / Placeholder count mismatch. Too many placeholders.');
}
return $sql;
}
The benefit with already existing prepared statements is, that they actually work. You should really consider to use those. For playing things like that is nice, but you need to deal with much more cases in the end and it's far easier to re-use an existing component tested by thousand of other users.
It's better to use prepared statement. See more info http://www.php.net/manual/en/pdo.prepared-statements.php

Sanitizing array values before mysql inserts

I have the following code blocks in a PHP form-handler:
function filter($data) {
$data = trim(htmlentities(strip_tags($data)));
if (get_magic_quotes_gpc()) {
$data = stripslashes($data);
}
$data = mysql_real_escape_string($data);
return $data;
}
foreach($_POST as $key => $value) {
$data[$key] = filter($value);
}
I am modifying my form to now include checkbox groups:
eg:
<input type="checkbox" name="phone_prefs[]" value="prefer_home">
<input type="checkbox" name="phone_prefs[]" value="prefer_cell">
<input type="checkbox" name="phone_prefs[]" value="prefer_work">
etc.
Because of this code I now have arrays in my _POST variables rather than just strings.
Am I correct in thinking that my filter() function the will not actually sanitize arrays properly? What changes do I need to make to my filter() function to make sure the arrays for the checkboxes are sanitized completely and not an easy target for SQL injection attacks?
As for the sql injection, I would switch to PDO using a prepared statement.
You can use a simple is_array() on your values to check for an array and then loop through it. You are correct, as it is, your filter function will not handle arrays correctly.
Edit: If you use PDO and a prepared statement, you don´t need mysql_real_escape_string anymore. strip_tags, htmlentities and trim are also not needed to store the information safely in a database, they are needed when you output information to the browser (trim not of course...), although htmlspecialchars would be sufficient for that. It´s always better to prepare your information / output correctly for the medium you are outputting to at that moment.
Your function is pretty good, but if you make it recursive it'll crawl nested arrays for you
function filter(&$array) {
$clean = array();
foreach($array as $key => &$value ) {
if( is_array($value) ) {
filter($value);
} else {
$value = trim(strip_tags($value));
if (get_magic_quotes_gpc()) {
$data = stripslashes($value);
}
$data = mysql_real_escape_string($value);
}
}
}
filter($_POST); # filters $_POST and any nested arrays by reference
Edit: Leave out htmlentities(). If you need it, then use it when outputting the values - not when getting them as input.
array_walk_recursive($array,function(&$item){
$item=mysql_real_escape_string($item);
});
You're using a foreach on the $_POST which only loops once, using the Array and handling it like a string.
Try using:
foreach($_POST['phone_prefs'] as $key => $value)
EDIT:
I believe I misunderstood your question:
foreach($_POST as $key => $value)
if (is_array($value))
foreach($_POST[$key] as $key2 => $value2)
/* Setting stuff */
else /* Setting same stuff */
Instead of sanitizing the input manually, use always prepared statements with placeholders. That will transparently pass the input to the database in such a way that it does not need to be escaped and thus is not vulnerable to SQL injection. This is the best current practice.
See the following for more information: http://php.net/manual/en/pdo.prepared-statements.php
I use this on various sites I have created:
public function clean($dirty) {
if (!is_array($dirty)) {
$dirty = ereg_replace("[\'\")(;|`,<>]", "", $dirty);
$dirty = mysql_real_escape_string(trim($dirty));
$clean = stripslashes($dirty);
return $clean;
}
$clean = array();
foreach ($dirty as $p => $data) {
$data = ereg_replace("[\'\")(;|`,<>]", "", $data);
$data = mysql_real_escape_string(trim($data));
$data = stripslashes($data);
$clean[$p] = $data;
}
return $clean;
}
Using mysql_real_escape_string means a MySQL connection is required before using the function, which is not best convenient. I used to use a work around in that case :
function real_escape_string($aQuery) {
if (!is_string($aQuery)) {
return FALSE;
} else {
return strtr($aQuery, array( "\x00" => '\x00', "\n" => '\n', "\r" => '\r', '\\' => '\\\\', "'" => "\'", '"' => '\"', "\x1a" => '\x1a' ));
}
}
But definitely, the best is to use PDO prepared statement instead of mysql. You will enjoy it.

\r\n \" printing out

I have begun using ADOdb and parameterized queries (ex. $db->Execute("SELECT FROM users WHERE user_name=?;",array($get->id);)to prevent SQL injections. I have read this is suppose to protect you on the MySQL injection side of things, but obviously not XSS. While this may be the case, I'm still a bit skeptical about it.
Nevertheless, I always filter my environmental variables using shotgun approach towards safety at the beginning of my wrapper code (kernel.php). I notice the combination of using ADOdb and the following functions produces browser-visible carriage returns (\r\n \" \'), which is something I don't want (although I do want to store that information!). I also don't want to have to filter my output before display, since I already properly filter my input (aside from BBcode and that sort of thing). Below you will find the functions I'm referring to.
While in general I have isolated this problem to the mysql_real_escape_string portion of the sanitize function, do note that my server is running PHP 5.2+, and this issue does not exist when I use my own simplified db abstraction class. Also, the site is ran on mostly my own code and not built on the scaffold of some preexisting CMS). Thus, considering these factors, my only guess is there is some double-escaping going on. However, when I looked at adodb.inc.php file, I noticed $rs->FetchNextObj() doesn't utilize mysql_real_escape_string. It appears the only function that does this is qstr, which encapsulates the entire string. This leads me to worry that relying on parameterized queries may not be enough, but I don't know!
// Sanitize all possible user inputs
if(keyring_access("am")) // XSS and HTML stripping exemption for administrators editing HTML content
{
$_POST = sanitize($_POST,false,false);
$_GET = sanitize($_GET,false,false);
$_COOKIE = sanitize($_COOKIE,false,false);
$_SESSION = sanitize($_SESSION,false,false);
}
else
{
$_POST = sanitize($_POST);
$_GET = sanitize($_GET);
$_COOKIE = sanitize($_COOKIE);
$_SESSION = sanitize($_SESSION);
}
// Setup $form object shortcuts (merely convenience)
if($_POST)
{
foreach($_POST as $key => $value)
{
$form->$key = $value;
}
}
if($_GET)
{
foreach($_GET as $key => $value)
{
$get->$key = $value;
}
}
function sanitize($val, $strip = true, $xss = true, $charset = 'UTF-8')
{
if (is_array($val))
{
$output = array();
foreach ($val as $key => $data)
{
$output[$key] = sanitize($data, $strip, $xss, $charset);
}
return $output;
}
else
{
if ($xss)
{
// code by nicolaspar
$val = preg_replace('/([\x00-\x08][\x0b-\x0c][\x0e-\x20])/', '', $val);
$search = 'abcdefghijklmnopqrstuvwxyz';
$search .= 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
$search .= '1234567890!##$%^&*()';
$search .= '~`";:?+/={}[]-_|\'\\';
for ($i = 0; $i < strlen($search); $i++)
{
$val = preg_replace('/(&#[x|X]0{0,8}'.dechex(ord($search[$i])).';?)/i', $search[$i], $val); // with a ;
$val = preg_replace('/(&#0{0,8}'.ord($search[$i]).';?)/', $search[$i], $val); // with a ;
}
$ra1 = Array('javascript', 'vbscript', 'expression', 'applet', 'meta', 'xml', 'blink', 'link', 'style', 'script', 'embed', 'object', 'iframe', 'frame', 'frameset', 'ilayer', 'layer', 'bgsound', 'title', 'base');
$ra2 = Array('onabort', 'onactivate', 'onafterprint', 'onafterupdate', 'onbeforeactivate', 'onbeforecopy', 'onbeforecut', 'onbeforedeactivate', 'onbeforeeditfocus', 'onbeforepaste', 'onbeforeprint', 'onbeforeunload', 'onbeforeupdate', 'onblur', 'onbounce', 'oncellchange', 'onchange', 'onclick', 'oncontextmenu', 'oncontrolselect', 'oncopy', 'oncut', 'ondataavailable', 'ondatasetchanged', 'ondatasetcomplete', 'ondblclick', 'ondeactivate', 'ondrag', 'ondragend', 'ondragenter', 'ondragleave', 'ondragover', 'ondragstart', 'ondrop', 'onerror', 'onerrorupdate', 'onfilterchange', 'onfinish', 'onfocus', 'onfocusin', 'onfocusout', 'onhelp', 'onkeydown', 'onkeypress', 'onkeyup', 'onlayoutcomplete', 'onload', 'onlosecapture', 'onmousedown', 'onmouseenter', 'onmouseleave', 'onmousemove', 'onmouseout', 'onmouseover', 'onmouseup', 'onmousewheel', 'onmove', 'onmoveend', 'onmovestart', 'onpaste', 'onpropertychange', 'onreadystatechange', 'onreset', 'onresize', 'onresizeend', 'onresizestart', 'onrowenter', 'onrowexit', 'onrowsdelete', 'onrowsinserted', 'onscroll', 'onselect', 'onselectionchange', 'onselectstart', 'onstart', 'onstop', 'onsubmit', 'onunload');
$ra = array_merge($ra1, $ra2);
$found = true;
while ($found == true)
{
$val_before = $val;
for ($i = 0; $i < sizeof($ra); $i++)
{
$pattern = '/';
for ($j = 0; $j < strlen($ra[$i]); $j++)
{
if ($j > 0)
{
$pattern .= '(';
$pattern .= '(&#[x|X]0{0,8}([9][a][b]);?)?';
$pattern .= '|(&#0{0,8}([9][10][13]);?)?';
$pattern .= ')?';
}
$pattern .= $ra[$i][$j];
}
$pattern .= '/i';
$replacement = substr($ra[$i], 0, 2).'<x>'.substr($ra[$i], 2);
$val = preg_replace($pattern, $replacement, $val);
if ($val_before == $val)
{
$found = false;
}
}
}
}
// Strip HTML tags
if ($strip)
{
$val = strip_tags($val);
// Encode special chars
$val = htmlentities($val, ENT_QUOTES, $charset);
}
// Cross your fingers that we don't get a MySQL injection with relying on ADOdb prepared statements alone… ? It works great otherwise by just returning $val... so it appears the code below is the culprit of the \r\n \" etc. escaping
//return $val;
if(function_exists('get_magic_quotes_gpc') or get_magic_quotes_gpc())
{
return mysql_real_escape_string(stripslashes($val));
}
else
{
return mysql_real_escape_string($val);
}
}
}
Thank you very much in advance for your help! If you need any further clarifications, please let me know.
Update the backslash is still showing up in front of " and ', and yes I removed the extra mysql_real_escape_string... now I can only think this might be get_quotes_gpc, or ADOdb adding them...
~elix
It turned out to be a side effect of qstr in ADOdb, even though I didn't reference that particular function of the class, but must be called elsewhere. The problem in my particular case was that magic quotes is enabled, so I set the default argument for the function to $magic_quotes=disabled. As for not needing any escaping with this, I found that ADOdb by itself DOES NOT utilize mysql_real_escape_string through the basic Execute() with binding alone! How I recognized this was due to the fact that the characters " ' threw errors (hence didn't render on my server where error_reporting is disabled). It appears the combination of the functions with fixing that small issue with ADOdb has me both well protected, and accepts most/all input the way I want it to: which in the case of the double quote prevented any quotes from being entered as content into the database, which meant at the very least no HTML
Nevertheless, I appreciate your suggestions, but also felt that my follow-up might help others.

mysql_real_escape_string not being used with given regex

I am using a dataHandler library to handle all of my db inserts / updates, etc.
The library has the following functions:
function prepareValue($value, $connection){
$preparedValue = $value;
if(is_null($value)){
$preparedValue = 'NULL';
}
else{
$preparedValue = '\''.mysql_real_escape_string($value, $connection).'\'';
}
return $preparedValue;
}
function parseParams($params, $type, $connection){
$fields = "";
$values = "";
if ($type == "UPDATE"){
$return = "";
foreach ($params as $key => $value){
if ($return == ""){
if (preg_match("/\)$/", $value)){
$return = $key."=".$value;
}
else{
$return = $key."=".$this->prepareValue($value, $connection);
}
}
else{
if (preg_match("/\)$/", $value)){
$return = $return.", ".$key."=".$value;
}
else{
$return = $return.", ".$key."=".$this->prepareValue($value,
$connection);
}
}
}
return $return;
/* rest of function contains similar but for "INSERT", etc.
}
These functions are then used to build queries using sprintf, as in:
$query = sprintf("UPDATE table SET " .
$this->parseParams($params, "UPDATE", $conn) .
" WHERE fieldValue = %s;", $this->prepareValue($thesis_id, $conn));
$params is an associative array: array("db_field_name"=>$value, "db_field_name2"=>$value2, etc.)
I am now running into problems when I want to do an update or insert of a string that ends in ")" because the parseParams function does not put these values in quotes.
My question is this:
Why would this library NOT call prepareValue on strings that end in a closed parenthesis? Would calling mysql_real_escape_string() on this value cause any problems? I could easily modify the library, but I am assuming there is a reason the author handled this particular regex this way. I just can't figure out what that reason is! And I'm hesitant to make any modifications until I understand the reasoning behind what is here.
Thanks for your help!
Please note that inside prepareValue not only mysql_real_escape_string is applied to the value but it is also put inside '. With this in mind, we could suspect that author assumed all strings ending with ) to be mysql function calls, ie:
$params = array(
'field1' => "John Doe",
'field2' => "CONCAT('John',' ','Doe')",
'field3' => "NOW()"
);
Thats the only reasonable answer that comes to mind.

Categories