Wrapping words in a SQL query string with regex - php

I have a database class that is written in PHP and it should take care of some things I don't want to care about. One of these features is handling the decryption of columns that are encoded with the AES function of MySQL.
This works perfect in normal cases (which in my opinion means that there is no alias in the query string "AS bla_bla"). Lets say that someone writes a query string that contains an alias, which contains the name of a column the script should decrypt, the query dies, because my regex wraps not only the column, but the alias as well. That is not how its supposed to be.
This is the regex I've written:
preg_replace("/(((\`|)\w+(\`|)\.|)[encrypted|column|list])/i", "AES_DECRYPT(${0},'the hash')"
The part with the grave accents is there because sometimes the query does contain the table name which is either inside of grave accents or not.
An example input:
SELECT encrypted, something AS 'a_column' FROM a_table;
An example output:
SELECT AES_DECRYPT(encrypted, 'the hash'), something AS 'a_AES_DECRYPT(column, 'the hash')' FROM a_table;
As you can see, this is not going to work, so my idea was to search only for words, that are not right after the word 'as' until a special character or a white space appears. Of course i tried it hours to work, but I don't get the correct syntax.
Is it possible to solve this with pure regex and if yes how would it look like?

This should get you started:
$quoted_name = '(\w+|`\w+`|"\w+"|\'\w+\')';
preg_match("/^SELECT ((, )?$quoted_name( AS $quoted_name)?)* FROM $quoted_name;$/", "SELECT encrypted, something AS 'a_column' FROM a_table;", $m);
var_dump($m);
The replacement parts should be easy to spot an write after you study the var_dump.

Related

Ignore accents in greek words when using elastic search

I have a search input box that supports autosuggestion logic. The results are fetched from an elastic index whose analyzers (index_analyzer, search_analyzer) use as tokenizer: nGram, and as filter: standard, lowercase and asciifolding for the search_analyzer & lowercase and asciifolding for the index_analyzer respectively.
What I am struggling to achieve but without any effect/result yet is to get result(s) even if the user has given a greek word without the accent (tonos). Otherwise, the user gets proper results and the mechanism works as expected.
I have to mention that the given string is matched against a specific field to the document set that includes greek words with accent. Moreover, this field is of datatype string and enabled to get analyzed.
The query is formed (using example string without accent that highlights the problem):
$searchString = mb_strtolower('Προταση', 'UTF-8')
$queryText = new Elastica_Query_Text();
$queryText->setField('name', $searchString)
$query = new Elastica_Query();
$query->setQuery($queryText);
A quick solution but not appropriate cause it's kinda heavy for this purpose, is to form a fuzzy query with min_similarity set to 0.7. Then it works thoroughly but the cost is significant.
All the work has been done using the elastica, let alone php. Could you please help to solve my problem ? It is imperative for me that a solution be found.
Thank you in advance

Why not backslash every empty space to prevent mysql injections

I've been wondering this for maybe a few months now but I still don't know an answer, other then possible speed performance. So long story short, instead of having all this PDO codes everywhere, why not just put a backslash between every character?
$String = $_POST["attack"]; // SOME THING' OR 1 = 1 --
$String = fFilter( $String ); // \S\O\M\E\ \T\H\I\N\G\'\ \O\R\ \1\ \=\ \1\ \-\-
Now I haven't been into this SQL stuff in awhile, so I can't give a perfect example, but basically the sql string should look like this SELECT * FROM account WHERE id = '\S\O\M\E\ \T\H\I\N\G\'\ \O\R\ \1\ \=\ \1\ \-\-' and something like that just always seemed pretty safe, but I haven't heard of anyone using it, or even why not to use it. I always see things like filtering html and etc isn't good, but I don't see why not just filter every single character. Since any attack would look like \a\t\t\a\c\k.
Because placing a backslash before certain characters changes their meaning entirely. For instance, \t is a tab character, not t, so \a\t\t\a\c\k would be transformed to:
a ack
A full list of such sequences is given at:
http://dev.mysql.com/doc/refman/5.5/en/string-literals.html
As several other people have mentioned, use parameterized queries, not input escaping.

MySQL search for value containing spaces returns empty result (no rows)

I have a MySQL database with a column containing part numbers. Some of the part numbers contain spaces:
3864205010  J
When I query the database or search for the part in phpMyAdmin no results are returned.
Yet when I delete the 2 spaces and then type them again, the search returns a result.
This query does not return a result:
SELECT *
FROM `parts`
WHERE `part_no` LIKE '3864205010  K'
This query returns the result:
SELECT *
FROM `parts`
WHERE `part_no` LIKE '3864205010 K'
They look the same but in the second query I have deleted the 2 spaces before "K" and typed the spaces again.
If you can use wildcard instead of spaces:
SELECT *
FROM `parts`
WHERE `part_no` LIKE '3864205010%K'
This is probably not a space but a HTAB (ascii code 9) or even a line feed/carriage return (10 and 13). Copy paste in a good text editor, you'll see what it really is.
Now, regarding to your wonder about why it doesn't work even if it does look like a space, this is because every single character we see is interpreted by the engine (notepad, phpmyadmin, firefox... any software with text rendering)
What actually happens is that when the engine finds an ascii code, it transforms it into a visible character. The CHAR(9) for example is often transformed into a 'big space' usually equal to 2 or 4 spaces. But phpmyadmin might just decide to not do it that way.
Other example is the line feed (CHAR(10)). In a text editor it would be the signal that the line ends, and (under unix systems mostly) a new line has to start. But you can copy this line feed into a database field, you're just not sure about how it is going to render.
Because they want you to see everything in the cell they may choose to render it as a space... but that's NOT a space if you look at the ascii code of it (and here there's no trick, all rendering engines will tell you the right ascii code).
This is important to always treat characters with their ascii codes.
there's an answer above that suggests using a wildcard instead of the spaces. That might match, or just might not. Let's say your string is '386420K5010', so it is not the one you're looking for, still the LIKE '3864205010%K' pattern would return it. The best is probably to use a regular expression or at least identify the fixed pattern of these strings.
yes as updated question if you wish to remove more space between which contents might be 3 or 4 space below query will use full to you
SELECT REPLACE( REPLACE( part_no, " ", " " ), " ", " " ) from parts.
let me know if it is work for you ?
SELECT *
FROM `parts`
WHERE REPLACE(REPLACE(`part_no`, CHAR(9), ''),' ','') LIKE REPLACE(REPLACE('3864205010 K', CHAR(9), ''),' ','')
This will probably work if part_no and/or search string has tabs and/or spaces.

Blocking Cuss/Vulgar/Obscenity Terms in PHP

I know you might laugh, but actually this is a common need in most apps. Many apps that take in customer/visitor input may need to filter cuss words or vulgar terms.
Sometimes PHP changes and new stuff gets added in. For instance, just the other day I learned about MultiCurl API in PHP5. So, anyway, is there a new native function in PHP that lets me filter most common English-based cuss words in a string, as well as flip a boolean to say, "string had English-based cuss words in it"? It doesn't need to be perfect, obviously, but cut out a good bit of garbage and let me replace it with ### for instance.
If that's not part of PHP yet, then does anyone have a function that I can use which cloaks the cuss word list? For instance, I want it such that I can drop the class in a project and not have to worry about another programmer getting offended. In other words, a decently encoded cuss word list -- not one actually spelled out.
Now, obviously it needs to be flexible and let words like "rebuttal" get through.
tl;dr: Does PHP5 now have a native function that can filter obscene words? And if not, does anyone have a class that encodes a cuss word list so that it doesn't offend other programmers?
I doubt this is something that would be a high priority for the core PHP team since that treads dangerously close to censorship. Censorship in that they would have a 'master' list of 'inappropriate' language which should be filtered.
You can do this fairly simply. Make up an array of all the words you want filtered out and when a page is displayed that contains user input run a preg_filter() on the words.
$bad_words = array('bleeping', 'blooping');
$submitted_text = 'bleh blah....';
echo preg_filter($bad_words, $replace, $submitted_text);
Note: you will have to deal with the edge cases where a bad word might be inside of a good word (i.e.- 'shitzu[sic] dog')
EDIT
For the bad-words-inside-good-words issue, you can add to the regular expression to require space at the beginning and end of the bad word. If you have lots of submissions though, it's going to be a constant battle to keep up with the trolls.
<?php
$badwords = "fuc";
$replacebad = "****";
$string = $_POST['something'];
$filtered = str_ireplace($badwords, $replacebad, "$string");
echo $filtered;
?>
something like this ?
Edit:
sorry I didn't noticed the php5 part ..

PHP Regex for human names

I've run into a bit of a problem with a Regex I'm using for humans names.
$rexName = '/^[a-z' -]$/i';
Suppose a user with the name Jürgen wishes to register? Or Böb? That's pretty commonplace in Europe. Is there a special notation for this?
EDIT:, just threw the Jürgen name against a regex creator, and it splits the word up at the ü letter...
http://www.txt2re.com/index.php3?s=J%FCrgen+Blalock&submit=Show+Matches
EDIT2: Allright, since checking for such specific things is hard, why not use a regex that simply checks for illegal characters?
$rexSafety = "/^[^<,\"#/{}()*$%?=>:|;#]*$/i";
(now which ones of these can actually be used in any hacking attempt?)
For instance. This allows ' and - signs, yet you need a ; to make it work in SQL, and those will be stopped.Any other characters that are commonly used for HTML injection of SQL attacks that I'm missing?
I would really say : don't try to validate names : one day or another, your code will meet a name that it thinks is "wrong"... And how do you think one would react when an application tells him "your name is not valid" ?
Depending on what you really want to achieve, you might consider using some kind of blacklist / filters, to exclude the "not-names" you thought about : it will maybe let some "bad-names" pass, but, at least, it shouldn't prevent any existing name from accessing your application.
Here are a few examples of rules that come to mind :
no number
no special character, like "~{()}#^$%?;:/*§£ø and probably some others
no more that 3 spaces ?
none of "admin", "support", "moderator", "test", and a few other obvious non-names that people tend to use when they don't want to type in their real name...
(but, if they don't want to give you their name, their still won't, even if you forbid them from typing some random letters, they could just use a real name... Which is not their's)
Yes, this is not perfect ; and yes, it will let some non-names pass... But it's probably way better for your application than saying someone "your name is wrong" (yes, I insist ^^ )
And, to answer a comment you left under one other answer :
I could just forbid the most command
characters for SQL injection and XSS
attacks,
About SQL Injection, you must escape your data before sending those to the database ; and, if you always escape those data (you should !), you don't have to care about what users may input or not : as it is escaped, always, there is no risk for you.
Same about XSS : as you always escape your data when ouputting it (you should !), there is no risk of injection ;-)
EDIT : if you just use that regex like that, it will not work quite well :
The following code :
$rexSafety = "/^[^<,\"#/{}()*$%?=>:|;#]*$/i";
if (preg_match($rexSafety, 'martin')) {
var_dump('bad name');
} else {
var_dump('ok');
}
Will get you at least a warning :
Warning: preg_match() [function.preg-match]: Unknown modifier '{'
You must escape at least some of those special chars ; I'll let you dig into PCRE Patterns for more informations (there is really a lot to know about PCRE / regex ; and I won't be able to explain it all)
If you actually want to check that none of those characters is inside a given piece of data, you might end up with something like that :
$rexSafety = "/[\^<,\"#\/\{\}\(\)\*\$%\?=>:\|;#]+/i";
if (preg_match($rexSafety, 'martin')) {
var_dump('bad name');
} else {
var_dump('ok');
}
(This is a quick and dirty proposition, which has to be refined!)
This one says "OK" (well, I definitly hope my own name is ok!)
And the same example with some specials chars, like this :
$rexSafety = "/[\^<,\"#\/\{\}\(\)\*\$%\?=>:\|;#]+/i";
if (preg_match($rexSafety, 'ma{rtin')) {
var_dump('bad name');
} else {
var_dump('ok');
}
Will say "bad name"
But please note I have not fully tested this, and it probably needs more work ! Do not use this on your site unless you tested it very carefully !
Also note that a single quote can be helpful when trying to do an SQL Injection... But it is probably a character that is legal in some names... So, just excluding some characters might no be enough ;-)
PHP’s PCRE implementation supports Unicode character properties that span a larger set of characters. So you could use a combination of \p{L} (letter characters), \p{P} (punctuation characters) and \p{Zs} (space separator characters):
/^[\p{L}\p{P}\p{Zs}]+$/
But there might be characters that are not covered by these character categories while there might be some included that you don’t want to be allowed.
So I advice you against using regular expressions on a datum with such a vague range of values like a real person’s name.
Edit   As you edited your question and now see that you just want to prevent certain code injection attacks: You should better escape those characters rather than rejecting them as a potential attack attempt.
Use mysql_real_escape_string or prepared statements for SQL queries, htmlspecialchars for HTML output and other appropriate functions for other languages.
That's a problem with no easy general solution. The thing is that you really can't predict what characters a name could possibly contain. Probably the best solution is to define an negative character mask to exclude some special characters you really don't want to end up in a name.
You can do this using:
$regexp = "/^[^<put unwanted characters here>]+$/
If you're trying to parse apart a human name in PHP, I recomment Keith Beckman's nameparse.php script.

Categories