To secure a HTML GET method form submission so that users cannot use MySQL wildcards to get the whole table values, I was looking for a list of PHP-MySQL wildcard characters.
For example, this GET method URL takes lowrange & highrange values as 1 and 100 respectively, and generates the appropriate results between that range: example.com/form.php?lowrange=1&highrange=100
But my table values may range from -10000 to +10000, & a smart alec may like to get the whole list by changing the URL as example.com/form.php?lowrange=%&highrange=% (or other special characters like *, ?, etc. etc.)
The basic purpose is to not allow anything that can lead to whole db values getting exposed in one shot.
So far, I've found the following characters to be avoided as in the preg_match:
if(preg_match('/^~`!##$%\^&\*\(\)\?\[\]\_]+$/',$url)) {
echo "OK";
}
else {
echo "NOT OK";
}
Any other characters to be included in the list to completely block the possibility of wildcard based querying?
There are string fields & numbers fields. String field have LIKE matching (where field1 like '%GET-FORM-VALUE%'), & nos. fields have equal to and BETWEEN matching (where field2 = $GET-FORM-VALUE, OR where field3 between $GET-FORM-LOVALUE and $GET-FORM-HIVALUE) $in SQL.
Thank you.
No doubt that Prepared Statements are the best implementation, & MUST be the norm.
But sometimes, one gets into a "tricky scenario" where it may not be possible to implement it. For example, while working on a client project as external vendor, I was required to do similar implementation, but without having access to the code that made the connection (like, execute_query was not possible to implement, as connection to db was differently set in another config file). So I was forced to implement the "sanitization" of incoming form values.
To that, the only way was to check what data type & values were expected, & what wild characters can be used to exploit the submission.
If that is the case with you, then the alternate solution for your situation (String LIKE matching) & (numbers EQUAL TO or BETWEEN 2 given numbers) is as follows:
As soon as form is submitted, at backend first thing to do is:
Put a check for alphabets on String, BLOCK percentage sign & underscore.
if (preg_match('/[^A-Za-z]+/', $str) && !(preg_match('/%/',$strfield)))
{
// all good...proceed to execute the query
}
else
{
// error message
}
Similarly, put a check for numbers/floats on number fields, like if (preg_match('/[^0-9]+/', $nofield))
Only if above are satisfied, then proceed to connect to database, and run the query. Add more checks on field to prevent other wild-cards, as needed.
Another option I implemented (may not necessarily fit, but mentioning as food for thought): In addition to above checks, first generate a count of records that fit the query. If count is abnormally high, then either throw error asking user to narrow the range by resubmitting, or display a limited records per page making it cumbersome for them to keep clicking.
Again to reiterate, go for Prepared Statements if you can.
Related
I have a new question cause i didnt find it anywhere.
I have a db which contains 4 columns. I did my bot to insert array to a column.Now i have to fill another columns.
My filled column contains site links. Exmp: www.dizipub.com/person-of-interest-1-sezon-2-bolum-izle
I need to take "person-of-ınterest" part and write it to another column as kind of a "Person of Interest". And also "1-sezon-2-bolum" as "Sezon 1 - Bölüm 1".
I couldnt find it to do with php not sql. I need to make it with bot. Can someone help me about it please.
database
There is a column named bolumlink where i put the links. As i told i need to take some words from these links. For instance:
dizi column needs to be filled with "Pretty Little Liars" in first 9 row.
It can be done by SQL Update with Like which allows you to select rows with pattern based search using wild-cards:
% matches any number of characters, even zero characters.
_ matches exactly one character.
update your_table set dizi = 'Pretty Little Liars' where bolumlink like '%pretty-little-liars%'
NOTE:
Updating your database using like without limit or conditions with unique columns can be dangerous. This code might affect the whole table if empty string is passed.
I have a MySQL database that contains all the words in the standard English alphabet, which I am using to create a simple Scrabble word generator. The database is separated into 26 tables: one for each letter in the alphabet. Each table contains two columns:
"Word" column: this column is the primary key, is of type char(12), and does not accept null values.
"Length" column: this column contains an unsigned tinyint value and does not accept null values.
In my application, the user enters in any number of letters into a textbox (indicating their tiles) and I query the database using this code:
// this is looped over 26 times, and $char is a letter between 'A' and 'Z'
// check if the user entered in character $char or a blank tile (signified by ? in app)
// this check prevents me from having to query useless tables
if (in_array($char, $lettersArray) || $blanks)
{
// if so, select all words that have a length that's possible to make
$query = 'SELECT Word FROM '.$char.'Words WHERE Length <= '.strlen($letters);
$result = $db->query($query);
$num_results = $result->num_rows;
for ($j = 0; $j < $num_results; $j++)
{
// determine if it's possible to create word based on letters input
// if so, perform appropriate code
}
}
Everything is working, but my application takes a long time compared to the competition (theoretical competition, that is; this is more of a learning project I created for myself and I doubt I'll release it on the internet), despite the fact the application is on my local computer. I tried used the automatic optimization feature of phpMyAdmin, but that provided no noticeable speed increase.
I don't think the performance problem is really the database. The structure of your data store is going to have the most significant impact on the performance of your algorithm.
One fairly easy-to-understand approach to the problem would be to handle the problem as anagrams. You could alphabetize all of the letters in each of your words, and store that as a column with an index on it.
word dorw
-------- -------
DALE ADEL
LEAD ADEL
LED DEL
HELLO EHLLO
HELP EHLP
Then, given a set of letters, you could query the database for all matching anagrams. Just alphabetize the set of letters passed in, and run a query.
SELECT word FROM dictionary WHERE dorw = 'AERT'
RATE
TARE
TEAR
Then, you could query for subsets of the letters:
SELECT word FROM dictionary WHERE dorw IN ('AER','AET','ART','ERT')
This approach would get you the longest words returned first.
This isn't the most efficient approach, but it's workable.
Handling a "blank" tile is going to be more work, you'd need to substitute a possible letter for it, and checking all 26 possibilities could be done in one query,
If they have letters ABCD and the blank tile, for example...
SELECT word FROM dictionary WHERE dorw IN ('AABCD','ABBCD', 'ABCCD'
, 'ABCDD', 'ABCDE', 'ABCDE', 'ABCDF', ..., 'ABCDZ')
That gets more painful when you start dealing with the subsets...
(In Crossword and Jumble puzzles, there aren't any blank tiles)
So this may not be the most appropriate algorithm for Scrabble.
There are other algorithms that may be more efficient, especially at returning the shorter words first.
One approach is to build a tree.
The root node is a "zero" letter word. As a child of the root node, would be nodes of all one-letter words. Each node would be marked whether it represented a valid word or not. As children of those nodes, you would have all possible three-letter words, again marked as whether it was valid or not.
That will be a lot of nodes. For words up to 12 letters in length, that's a total possible space of 1 + 26 + 26**2 + 26**3 + 26**4 + ...
But you wouldn't need to store every possible node, you'd only store those branches that result in a valid word. You wouldn't have branches below ->Z->Z or ->X->Q
However, you would have a branch under ->X->Y->L, even though XYL is not a word, it would be the beginning of a branch leading to 'XYLOPHONE'
But that's a tree traversal algorithm, which is fundamentally different.
It sounds like you need to learn about indexes. If you created indexes in the database, even if all the data was in one table, it would not be querying" useless letters".
You should provide some more information though, how long a query takes to return a result if you run it from the mysql console, how long it takes to move that result from the database to the PHP engine. You might for example be bringing back a 100 meg result set with each query you are running, if that is the case, limit the results to the first or a number of possible results.
To look at how much data is being returned, manually run one of your queries in the console and see how many records are being returned. If the number is high, the data will take longer to be passed to PHP, but it will also mean your code must iterate through a lot more results. You might want to consider dropping our of the for loop after you find the first word that can be accepted. If at least one word is possible, don't check it again until another letter is placed.
I know this question is about optimizing your database but if I were doing this I would only read the words from the database once, initialize some data structure and search that structure instead of continually querying the database.
Sorry if this was completely irrelevant.
I'm working on a suggestions tool: When the user types a char into a text field, the char gets sent off to a script (using AJAX) and is checked on in a SQL DB. If any entries in the DB match the users input, they get returned.
But I'm stuck on actually going through the users input string;
I need to check all characters one-by-one with each row in the DB.
So i was thinking:
1.) get the char length of the string and put it into $string_count
2.) define $count=0;
3.) do a while loop and make it stop when $count reaches over or equal to $string_count.
4.) inside that while loop: do another while loop and use condition: $det=mysql_fetch_array()
5.) then i could get the data (name) from row 1, and do a check to see if its first char matches the users input first char (which i don't know how to do).
6.)If it contains the correct char in the correct position, add the ID to a string with a comma after ($possible.=$det['id'].",";)
7.) Add one to $count ($count++;) in the parent while loop.
7.) that loop will go on, on every row, but obviously changing char each time, until there are no more chars to check...
I just wrote all that up from the top of my head, and now I'm lost. I think its completely wrong and there is a much easier way to do it? but can you guys just tell me if im on the right track?
I need to know how to do a check on a certain char in a string. So for example:
$my_str="my name is nav nav";
now i want to get the forth char in that string (which is "n").
any suggestions?
Thanks guys.
$my_str="my name is nav nav";
$charAt= $my_str{3};
echo $charAt;
//outputs n (strings are 0 indexed)
EDIT: Also wait until the user has input at least three characters and just do a database query using LIKE everytime:
SELECT * FROM example WHERE age LIKE '$userinput%'
http://www.htmlite.com/mysql011.php
(optimizing this is another story)
I don't quite understand your approach, but here's what I would do...
If the length of the user string is past a certain threshold (3 characters for example), send off the entire user input string to the DB via AJAX.
On the server side, run SELECT * FROM example WHERE field LIKE CONCAT(?, '%'). The ? represents a bound parameter to ensure you don't suffer from SQL injection issues. (See PHP's PDO Docs for more info on prepared statements)
A group of people have been inconsistently entering data for a while.
Some people will enter this:
101mxeGte - TS 200-10
And other people will enter this
101mxeGte-TS-200-10
The sad thing is, those are supposed to be identical records.
They will also search inconsistently. If a record was entered one way, some people will search the other way.
Now, I know all about how you can fix data entry for the future, but that's NOT what I am asking about. I want to know how it is possible to:
Leave the data alone, but...
Search for the right thing.
Am I asking for the impossible here?
The best thing I found so far was a suggestion to simply muck about with the existing data, using the REPLACE function in mySQL.
I am uncomfortable with this option, as it means it will certainly actively piss off half of the users. The unfocused angst of all is less than the active ire of half.
The problem is that it has to go both ways:
Entering spaces in the query has to find both space and not-space entries,
and NOT entering spaces ALSO has to find both space and not-space entries.
Thanks for any help you can offer!
The "ideal" solution is pretty straightforward:
Decide what is the canonical way of representing a record
When someone saves a record, canonicalize it before saving
When someone searches for a record, canonicalize the input before searching for it
You could also write a small program to convert all existing data to the canonical form (you will have the code for it anyway, as "canonicalize" in steps 2 and 3 require that you write code that does so).
Edit: some specific information on how to canonicalize
With the sample data you give, the algorithm might be:
Replace all spaces with hyphens
Replace all runs of one or more hyphens with a single hyphen (a regex would be easiest for this -- actually, a regex can do both steps in one go)
Is there any practical problem with this approach?
Trim whitespaces from BOTH the existing data and the input of the search. That way the intended record(s) will always be returned. Hope your data size is small, though, because it's going to perform pretty poorly.
Edit: by "existing data" I meant "the query of existing data". My answer was based on assumption that the actual data could not be touched (which might not be correct).
If it where up to me, I'd have the data in the database updated with REPLACE, and on future searches when dealing with the given row remove all spaces in the input.
Presumably your users enter the search terms (or record details, when creating a record) in an HTML form, which then goes to a PHP script. It looks like your data can always be written in a way that contains no spaces, so why don't you do this:
Run a query that strips spaces from the existing data
Add code in the PHP script(s) that receives the form(s), so that it strips spaces from submitted data - whether that data is to be used for search or for writing new data.
Edit: I guess you would also need to change some spaces to hyphens. Shouldn't be too hard to write logic to accomplish that.
Something like this.
pseudo code:
$myinput = mysql_real_escape_string('101mxeGte-TS-200-10')
$query = " SELECT * FROM table1
WHERE REPLACE(REPLACE(f1, ' ', ''),'-','')
= REPLACE(REPLACE($myinput, ' ', ''),'-','') "
Alternatively you might write your own function to trim the data so it can be compared.
DELIMITER $$
CREATE FUNCTION myTrim(AStr varchar) RETURNS varchar
BEGIN
declare Result varchar;
SET Result = REPLACE(AStr, ' ','');
SET Result = ......
.....
RETURN Result;
END$$
DELIMITER ;
And then use this in your select
$query = " SELECT * FROM table1
WHERE MyTrim(f1) = MyTrim($myinput) "
have you ever heard of SQL's LIKE?
http://dev.mysql.com/doc/refman/4.1/en/string-comparison-functions.html
there's also regex
http://dev.mysql.com/doc/refman/4.1/en/regexp.html#operator_regexp
101mxeGte - TS 200-10
101mxeGte-TS-200-10
how about this?
SELECT 'justalnums' REGEXP '101mxeGte[[:blank:]]*(\-[[:blank:]]*)?TS[[:blank:]-]*200[[:blank:]-]*10'
digits can be represented by [0-9] and alphas as [a-z] or [A-Z] or [a-zA-Z]
append a + to make then multiple of that. perens allow you to group and even capture what is in the perens and reuse it later in a replace or something else.
RLIKE is the same as REGEXP.
I have 3 groups of fields (each group consists of 2 fields) that I have to check against some condition. I don't check each field, but some combination, for example:
group priceEurBus, priceLocalBus
group priceEurAvio, priceLocalAvio
group priceEurSelf, priceLocalSelf
My example (formatted for legibility) — how can this be improved?
$rest .="
WHERE
(
((priceEurBus+(priceLocalBus / ".$ObrKursQuery.")) <= 400)
OR
((priceEurAvio+(priceLocalAvio / ".$ObrKursQuery.")) <= 400)
OR
((priceEurSelf+(priceLocalSelf / ".$ObrKursQuery.")) <= 400)
)
";
$ObrKursQuery is the value I use to convert local currency to Euro.
Performance improvement: Your query is OR based, meaning that it will stop evaluating the conditions as soon as it finds one of them being true. Try to order your conditions in such a way that, for example, in your case, the first check is the most likely to be under 400.
Security imporvement: Use prepared statements and filter out your variables before using them. In case of the $ObrKursQuery, if it comes from a user input or an untrusted source, this is a non-quoted numeric value and you are exposed to a big variety of sql injection problems (including arithmetic sql injection: if that value is 0, you'll get a divideByZero error that can be used as a blind sql injection condition).
Readability imporvement: Be sure to always be consistent in the way you write your code, and if possible, follow some accepted de facto standard, like starting variable names lower case: $ObrKursQuery -> $obrKursQuery. Also for the sake of self documenting code, choose names for your variables that mean what they are: $ObrKursQuery -> $conversionRatio.
Maintainability/Scalability improvement: Use a constant instead of a fixed value for the 400. When you change that value in the future, you will want to change it in just one place and not all over your code.
Never use concatenation to generate your SQL, you should be using prepared SQL statements with parameters.
The only way to simplify this statement without having greater knowledge of the problem domain is to reduce the number of columns. It looks as if you've got three prices per product entry. You could create a table of product prices instead of columns of product prices and this would make it a single comparison and give you the flexibility to create yet more product prices in the future.
So you'll need to create a one->many relationship between product and prices.