Escape only particular special chars - php

I'm using QueryParser::parse() method to get a query from a string search term for my ZendSearch Lucene index. But I've a problem with the following query:
+php +5.7.1)
This throws the QueryParserException with message:
Syntax Error: mismatched parentheses, every opening must have closing.
So I used QueryParser::escape() to escape the string search term before I pass it to QueryParser::parse() but then it escapes everything so this leads to this string:
\\+\\p\\h\\p\\ \\+\\5\\.\\7\\.\\1\\)
Now the QueryParserException has gone but also the possbility of using special chars like +, -, etc.
I look for a way to just escape special chars which will lead to a QueryParserException so in my case the ) should be escaped because there is no opening bracket ) in the query but my two + should stay untouched.
Is there any possbility to achieve this? Building the query itself without parsing is not an option because the search terms are user inputs.
I tried to use QueryParser::suppressQueryParsingExceptions() which probably would be the thing I'm looking for but it has no effect. The QueryParser still throws a QueryParserException although the default value for this is true.

You could use addcslashes
$escapedParenthesis = addcslashes('+php +5.7.1)','\\)');

Related

Differences in backslashing between Notepad++ and PHP

EDIT: I found a solution I didn't expect. See below.
Using regex via PHP's preg_match_all , I want to match a certain url (EDIT: that is already escaped) in a string formatted as json. The search works wonderfully in Notepad++ (using regex-matching, of course) but preg_match_all() just returns an empty array.
Testing on tryphpregex.com I found out that somehow my usual approach to escaping a backslash gives a pattern error, i.e. even the simple pattern https:\\ returns an empty result.
I'm utterly confused and have been trying to debug for too long so I may miss the obvious. Maybe one of you can see the simple error?
The string.
The pattern (that works fine in Notepad++, but not in PHP):
%(https:\\/\\/play.spotify.com\\/track\\/)(.*?)(\")%
You don't need to escape the slash in PHP %(https://play.spotify.com/track/)(.*?)(\")%
The Backslash before doule quote is only needed if you enclosures are double quotes too.
Found a solution to my problem.
According to this site, I need to match every backslash with \\\\. Horrible, but true.
So my pattern becomes:
$pattern = "%(https:\\\\/\\\\/play\.spotify\.com\\\\/track\\\\/)(.*?)(\")%";
Please observe that I tried to find a pattern inside a string that didn't contain clear urls, but urls containing escape characters (it was a json-output from spotify)

how to escape single quote from ODBC query [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
Correct way to escape input data before passing to ODBC
the error I am getting from querying a ODBC query is this:
(pos: 72 '...M = 'Owen O'^Donavon' AND...') - syntax error
and when I try to escape it:
(pos: 73 '... = 'Owen O\'^Donavon' AND...') - syntax error
the ^ means that is where it is breaking
I have tried the following:
NAM = '".$var."'
And also this:
NAM = '".mysql_escape_string($var)."'
then I got desperate
NAM = \"".$var."\"
Where $var is any name that contains a ' in it.
if you need the whole query:
UPDATE TABLE SET COLUMN1 = 'ERR' WHERE COLUMN_NAM = '".mysql_escape_string($var)."' AND COLUMN7 = 0");
does anybody know how I can get the quote properly escaped?
To include a single quote within a MySQL string literal (which is delimited by single quotes), use two single quote characters. e.g.
'I don''t like it'
Effectively, When MySQL parses that, it will see the two single quote characters, and will interpret that as one single quote within a literal, rather than seeing the "end" of the string literal.
But (as you are finding out) when you have only one single quote in there, the MySQL parser has a hissy fit over it. Consider this example:
'I don't like it'
What the MySQL parser sees there is a string literal, five characters in length, containing 'I don'. Then MySQL sees that literal as being followed by some more tokens that need to be parsed: t like it. The parser does NOT see that as part of a string literal. That previous single quote marked the end of the string literal.
So now, the MySQL parser can't make heads or tails of what t like it is supposed to be. It sees the single quote following these tokens as the beginning of another string literal. (So, you could be very clever about what appears there, and manage to get something that MySQL does understand... and that would probably be even worse.)
(NOTE: this issue isn't specific to ODBC; this affects clients that make use of string literals in MySQL query text.)
One way to avoid this type of problem is to use bind variables in your query text, vs. string literals. (But with MySQL, what's happening anyway, is that escaping, what gets sent to the MySQL server (behind the scenes, so to speak) is a string literal.
Sometimes we DO need to include string literals in our query text, and we shouldn't be required to use bind variables as a workaround. So it's good to know how to "escape" a single quote within a string literal which is enclosed in single quotes.

Which regular expression to use in order to determine which characters to escape for html attributes and javascript?

I am adopting some code from Twig (a php template framework) for escaping html and js output. Now I don't entirely understand the regex they are using.
For the full Twig code:
git clone git://github.com/fabpot/Twig.git
// the code is in Core.php in the function twig_escape_filter
They use:
preg_replace_callback( '#[^a-zA-Z0-9,\._]#Su' , '_twig_escape_js_callback' , $string ); // for javascript
preg_replace_callback( '#[^a-zA-Z0-9,\.\-_]#Su' , '_twig_escape_html_attr_callback' , $string ); // for html attibutes
Where the callback functions will replace everything that corresponds to the negative character class.
As far as I can tell, this is equivalent (getting rid of some backslashes):
'#[^a-zA-Z0-9,._]#Su'
'#[^a-zA-Z0-9,._-]#Su'
Now we see that for javascript they allow commas, which I don't understand because a comma is a control character in a javascript context. Take this example of a comma exploit:
// say we have a function call to a javascript function like this
function ajax( timeout, onerror, onsuccess ) {...};
// now assume I get the timeout value from somewhere dodgy (in php)
$timeout = escapeJS( '1000, evilCallback, evilCallback2' );
echo "ajax( $timeout, myErrorHandler, mySuccessHandler );"
Note that javascript will happily ignore the extra parameters...
In the html attribute, the idea is to prevent closing the attribute, hence they don't allow spaces, since it is common to write attributes without quotes and in html4 it is legal as well. However, i see spaces used in attributes for giving multiple classes to an element like: <tr class="tablerow odd">. So dissallowing spaces prevents class attributes like this from coming from a database with templates or other sources...
Given that in xhtml it is forbidden to use attributes without quotes and my site generates xhtml strict doctype, can I afford to allow spaces?
Should I forbid the comma for javascript?
You should use htmlspecialchars for escaping HTML and json_encode for escaping Javascript.
$timeout = json_encode('1000, evilCallback, evilCallback2');
echo "ajax( $timeout, myErrorHandler, mySuccessHandler );";
Output:
ajax( "1000, evilCallback, evilCallback2", myErrorHandler, mySuccessHandler );
In your case you should also validate the actual content of the $timeout var, or cast it to int as this:
$timeout = json_encode((int)'1000, evilCallback, evilCallback2');
echo "ajax( $timeout, myErrorHandler, mySuccessHandler );";
Output:
ajax( 1000, myErrorHandler, mySuccessHandler );
The json_encode is not really needed when you cast to int, because PHP integers are also valid JS integers, but it is a good practice to escape all your data for the appropriate context nevertheless.
Update: Regarding the Twig code you're trying to adapt, it seems that it does not produce actual Javascript literals, but escapes strings for inclusion into Javascript literals — this is apparent from the actual use of \xHH escape codes, which in JS are valid only inside strings (and regular expressions, but that's beside the point). It should be used as this:
$timeout = escapeJS('1000, evilCallback, evilCallback2');
echo "ajax('$timeout', myErrorHandler, mySuccessHandler);";
Notice extra quotes around $timeout in the echo. This is likely done this way to allow composition of longer JS strings from multiple escaped parts, like 'foo $escaped_part1 bar $escaped_part2 baz'.
What I found on XSS (Cross Site Scripting) Prevention Cheat Sheet:
For HTML attributes:
Properly quoted attributes can only be escaped with the corresponding quote. Unquoted attributes can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and |.
I suppose looking at it like that means that there is no way to get both protected against unquoted attributes and have spaces in your attributes. I suppose the escape function could add the quotes itself, but that would be inconsistent an create situations where vulues would be quoted twice, basically unquoting them... So, for now I have made two escaping functions, allowing the user to call one explicitely that allows the space, knowing that they must put quotes.
Considering javascript:
Except for alphanumeric characters, escape all characters less than 256 with the \xHH format to prevent switching out of the data value into the script context or into another attribute. DO NOT use any escaping shortcuts like \" because the quote character may be matched by the HTML attribute parser which runs first. These escaping shortcuts are also susceptible to "escape-the-escape" attacks where the attacker sends \" and the vulnerable code turns that into \" which enables the quote.
If an event handler is properly quoted, breaking out requires the corresponding quote. However, we have intentionally made this rule quite broad because event handler attributes are often left unquoted. Unquoted attributes can be broken out of with many characters including [space] % * + , - / ; < = > ^ and |. Also, a closing tag will close a script block even though it is inside a quoted string because the HTML parser runs before the JavaScript parser.
This seems to indicate that we should escape everything. I have opted to keep underscore, since that can be part of javascript names and dot in order to allow inserting numerical values with a decimal point. I hope that leaves no vulnerabilities.
I suppose the Twig code has a bug leaving that comma around and I will file a report so they can look into it.

What does it mean to escape a string?

I was reading Does $_SESSION['username'] need to be escaped before getting into an SQL query? and it said "You need to escape every string you pass to the sql query, regardless of its origin". Now I know something like this is really basic. A Google search turned up over 20, 000 results. Stackoverflow alone had 20 pages of results but no one actually explains what escaping a string is or how to do it. It is just assumed. Can you help me? I want to learn because as always I am making a web app in PHP.
I have looked at:
Inserting Escape Characters, What are all the escape characters in Java?,
Cant escape a string with addcslashes(),
Escape character,
what does mysql_real_escape_string() really do?,
How can i escape double quotes from a string in php?,
MySQL_real_escape_string not adding slashes?,
remove escape sequences from string in php I could go on but I am sure you get the point. This is not laziness.
Escaping a string means to reduce ambiguity in quotes (and other characters) used in that string. For instance, when you're defining a string, you typically surround it in either double quotes or single quotes:
"Hello World."
But what if my string had double quotes within it?
"Hello "World.""
Now I have ambiguity - the interpreter doesn't know where my string ends. If I want to keep my double quotes, I have a couple options. I could use single quotes around my string:
'Hello "World."'
Or I can escape my quotes:
"Hello \"World.\""
Any quote that is preceded by a slash is escaped, and understood to be part of the value of the string.
When it comes to queries, MySQL has certain keywords it watches for that we cannot use in our queries without causing some confusion. Suppose we had a table of values where a column was named "Select", and we wanted to select that:
SELECT select FROM myTable
We've now introduced some ambiguity into our query. Within our query, we can reduce that ambiguity by using back-ticks:
SELECT `select` FROM myTable
This removes the confusion we've introduced by using poor judgment in selecting field names.
A lot of this can be handled for you by simply passing your values through mysql_real_escape_string(). In the example below you can see that we're passing user-submitted data through this function to ensure it won't cause any problems for our query:
// Query
$query = sprintf("SELECT * FROM users WHERE user='%s' AND password='%s'",
mysql_real_escape_string($user),
mysql_real_escape_string($password));
Other methods exist for escaping strings, such as add_slashes, addcslashes, quotemeta, and more, though you'll find that when the goal is to run a safe query, by and large developers prefer mysql_real_escape_string or pg_escape_string (in the context of PostgreSQL.
Some characters have special meaning to the SQL database you are using. When these characters are being used in a query they can cause unexpected and/or unintended behavior including allowing an attacker to compromise your database. To prevent these characters from affecting a query in this way they need to be escaped, or to say it a different way, the database needs to be told to not treat them as special characters in this query.
In the case of mysql_real_escape_string() it escapes \x00, \n, \r,\, ', " and \x1a as these, when not escaped, can cause the previously mentioned problems which includes SQL injections with a MySQL database.
For simplicity, you could basically imagine the backslash "\" to be a command to the interpreter during runtime.
For e.g. while interpreting this statement:
$txt = "Hello world!";
during the lexical analysis phase ( or when splitting up the statement into individual tokens) these would be the tokens identified
$, txt, =, ", Hello world!, ", and ;
However the backslash within the string will cause an extra set of tokens and is interpreted as a command to do something with the character that immediately follows it :
for e.g.
$txt = "this \" is escaped";
results in the following tokens:
$, txt, =, ", this, \, ", is escaped, ", and ;
the interpreter already knows (or has preset routes it can take) what to do based on the character that succeeds the \ token. So in the case of " it proceeds to treat it as a character and not as the end-of-string command.

Searching MySQL for data that contains backslashes

In a database, I have some text stored in a field call Description, the value of the string saved in my database is Me\You "R'S'" % and thats how it appears when querying the database command line.
Now, on a web page i have a function which searches this field as such:
WHERE Description LIKE '%$searchstring%'
So when $searchstring has been cleaned, if i was searching for Me\You, the backslash gets escape and my query reads:
WHERE Description LIKE '%Me\\You%'
However it doesn't return anything.
Strange part of this, is that when i search Me\\You or Me\\\You (So two or three backslashes, but no less or no more) it will return the result i expect with one backslash.
When querying for the result command-line, it does not return a result for:
WHERE Description LIKE '%Me\You%'
or when i use two or three backslashes.
However it will return the result if i use 4 - 7 backslashes, for example:
WHERE Description LIKE '%Me\\\\\\\You%'
will return the string which is Me\You "R'S'" %
Anyone have a reason to this happening? Thanks
Note
Because MySQL uses C escape syntax in strings (for example, “\n” to represent a newline character), you must double any “\” that you use in LIKE strings. For example, to search for “\n”, specify it as “\\n”. To search for “\”, specify it as “\\\\”; this is because the backslashes are stripped once by the parser and again when the pattern match is made, leaving a single backslash to be matched against.
Source: http://dev.mysql.com/doc/refman/5.1/en/string-comparison-functions.html#operator_like
Read this Need to select only data that contains backslashes in MySQL to see how to use double backslash escaping. You could also run MySQL in NO_BACKSLASH_ESCAPES mode (http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html#sqlmode_no_backslash_escapes)
Although an old post, you can bypass this limitation using replace function to change backslash to another character: something like this in the WHERE clause. EXAMPLE:
WHERE replace('your field here', '\', '-') like "You-Me%"

Categories