How to search partial/masked strings? - php

I am storing social security numbers in the database, but instead of storing whole numbers, I only store only 5 digits sequence. So, if SSN# is 123-12-1234, my database would store it #23121### or ####21234 or anything else, as long as it has a 5 digits in the row.
Therefore, when user enters whole SSN, I want the database to locate all matches.
So, I can do this :
SELECT * FROM user WHERE ssn like 123121234
But the query above would not work, since I have some masked characters in the SSN field (#23121###). Is there a good way of doing this?
Maybe a good way would be to use
SELECT * FROM user WHERE REPLACE (ssn, '#', '') like 123121234
Although there could be an issue - the query might return non-relevant matches since 5 numbers that I store in the DB could be anywhere in a sequence.
Any idea how to do a better search?

If the numbers are always in a sequential block, you can generate a very efficient query by just generating the 5 variations of the ssn that could be stored in the DB and search for all of them with an exact match. This query can also use indexes to speed things up.
SELECT *
FROM user
WHERE ssn IN ('12312####',
'#23121###',
'##31212##',
'###12123#',
'####21234');

I think you can do something like this:
Extract all possible 5-char combinations out of the queried SSN.
Make an IN() query on those numbers. I'm not sure though how many results you would get from this.
$n = 123121234;
$sequences = array();
for($i = 0; $i + 5 <= strlen($n); $i++) {
$sequences[] = substr($n, $i, 5);
}
var_dump($sequences);
Tell me if you need those hash sign surrounding the strings.

Related

Search in MySQL with permutations

I need help.
I have a table where only two columns are: ID and NAME and these data:
ID | NAME
1 HOME
2 GAME
3 LINK
And I want show e.g. row with name: HOME if user search: HOME or OMEH or EMOH or HMEO, etc... - all permutations from word HOME.
I can't save to mysql all these permutations and search in this columns, because some words will be a too big (9-10 chars) and more than 40 MB for each 9 chars words.
One way to solve this problem is to store the sorted set of characters in each name in your database as an additional column and then sort the string the user inputs before searching e.g. database has
ID NAME CHARS
1 HOME EHMO
2 GAME AEGM
3 LINK IKLN
Then when searching in PHP you would do this:
$search = 'MEHO'; // user input = MEHO
$chars = str_split($search);
sort($chars);
$search = implode('', $chars); // now contains EHMO
$sql = "SELECT ID, NAME FROM table1 WHERE CHARS = '$search'";
// perform query etc.
Output
ID NAME
1 HOME
This sounds like a "please do my homework for me" question. It is hard to conceive what real world problem this is applicable to and there is no standard solution. It is OK to ask for help with your homework here, but you should state that this is the case.
more than 40 MB for each 9 chars words
Your maths is a bit wonky, but indeed the storage does not scale well. OTOH leaving aside the amount of storage, in terms of the processing workload it does scale well as a solution.
You could simply brute-force a dynamic query:
function mkqry($word)
{
$qry="SELECT * FROM yourtable WHERE 1 ";
$last=strlen($word);
for ($x=0; $x<$last; $x==) {
$qry.=" AND word LIKE '%" . substr($word, $x, 1) . "%'";
}
return $qry;
}
However this will always result in a full table scan (slow) and won't correctly handle cases where a letter occurs twice in a word.
The solution is to use an indexing function which is independent of the order in which the characters appear - a non-cryptographic hash. An obvious candidate would be to XOR the characters together, although this only results in a one character identifier which is not very selective. So I would suggest simply adding the character codes:
function pos_ind_hash($word)
{
$sum=0;
for ($x=0; $x<$last; $x==) {
$sum+=ord(substr($word, $x));
}
return $sum;
}
function mkqry($word)
{
$qry="SELECT * FROM yourtable WHERE 1 ";
$last=strlen($word);
for ($x=0; $x<$last; $x==) {
$qry.=" AND word LIKE '%" . substr($word, $x, 1) . "%'";
}
$qry.=" AND yourtable.hash=" . pos_ind_hash($word);
return $qry;
}
Note that the hash mechanism here does not uniquely identify a single word, but is specific enough to reduce the volume to the point where an index (on the hash) would be effective.
Multiplying rather than adding would create fewer collisions but at a greater risk of overflowing (which would create ambiguity between implementations).
But both the hash and the single character LIKE only reduce the number of potential matches. To get the query to behave definitively, you need to go further. You could add an attribute to the table (and to the index with the hash)containing the string length - this would be more selective (i.e. improve effectiveness of the index) but still not definitive.
For a definitive method you would need to specify in your query that the data does NOT contain characters which are NOT in the word you are looking for.
The wrong way to do that would be to add a loop specifying "AND NOT LIKE....".
A valid way of doing that would be to add a test in the query which replaces all the letters in the table attribute which appear in the word you are searching for which results in a zero length string.

PHP - Generating random integers within specified range from a key

I have a set of questions with unique IDs in a MySQL database.
Users also have a unique ID and are to answer these questions and their answers are saved in the database.
Now, I want users to get 5 non-repeating uniquely and randomly picked questions from the pool of available ones (let's say 50) based on users ID. So when a user with id 10 starts answering his questions, but stops and wants to return later to the same page, he will get the same questions as before. A user with id 11 will get a different random set of questions, but it will always be the same for him and different from all other users.
I found that random.org can generate exactly what I need with their sequence generator that generates a random sequence of numbers based on provided ID:
https://www.random.org/sequences/?min=1&max=50&col=1&format=plain&rnd=id.10
But I would like the generation to be done locally instead of relying random.org API.
So, I need to generate 'X' unique random integers, within specified range 'Y' that are generated based on supplied integer 'Z'. I should be able to call a function with 'Z' as parameter and receive back the same 'X' integers every time.
I need to know how to replicate this generation with PHP code or at least a push or hint in a direction of a PHP function, pseudo-code or code snippet that will allow me to do it myself.
Thank you in advance!
Why reinvent the wheel
mt_srand(44);
for ($i=0; $i < 10; $i++) echo mt_rand(). "\n";
echo "\n\n";
mt_srand(44);
for ($i=0; $i < 10; $i++) echo mt_rand(). "\n";
result
362278652
928876241
1914830862
68235862
1599103261
790008503
1366233414
1758526812
771614145
1520717825
362278652
928876241
1914830862
68235862
1599103261
790008503
1366233414
1758526812
771614145
1520717825
Generate your random numbers at the beginning and save it in a session. That way the random numbers for that user is always known and you can know what id of question you should go back to by looking it up in the session.
Cheers
you can get random $w array values. try this code as example and change with your logic.
$w = array('0'=>11,'1'=>22,'2'=>44,'3'=>55,'4'=>66,'5'=>88);
$str = '';
for($i=0;$i<5;$i++) {
$str.= $w[rand(0,5)];
}
As this article suggests, you could use a non-repeating pseudo random number generator. Only problem would be to generate a primnumber that is atleast 2x as big as the upper-bound for IDs and satisfies the condition p = 3 in the ring Z4. Though there should be big-enough primnumbers matching the conditions on the net for free use.
Due to my lack of experience with PHP i can only provide pseudocode though.
int[] generateUniqueRands(int id , int ct)
int[] res
const int prim//the primnumber described above
for int i in [0 , ct[
res[i] = ((id + i) * (id + i)) % prim
return res
Note that this algorithm basically works like a window:
id = x set = [a , b , c , d]
id = x + 1 set = [b , c , d , e]
...
If you wish to avoid this kind of behavior just generate a unique random-number from the id first (can be achieved in the same way the set of random numbers is generated).
When the user with ID 10 opens the page for the first time, use rand() to generate random numbers then store them into a cell in the users table in database. So the user with id 10 has the rand() numbers stored.
For example the users table has id, rand_questions.
Check if the rand_questions is empty then update with the new random numbers generated, else you get the numbers from the database.

Algorithm to sanitize MySQL data

Let's say I have a table of 100,000 MySQL records in a table with 2 columns: title and description.
There's also a table containing all the bad words that need to be sanitized.
For e.g. let's say the title column contains the string "Forget this" and the profanity table says that the "Forget" string should be replaced with "F*****".
Currently I implemented it with a brute force method, but this is way too slow. It checks every single substring from the sentence and compares it with every single string that exists in the profanity filter.
public function sanitizeSiteProfanity($word, $replacement)
{
$query = $this->_ci->db->select('title, description')->get('top_sites')->result_array();
$n = $query->num_rows();
for($i = 0; $i < $n; $i++)
{
str_replace($word, $replacement, $query[$i]['title']);
str_replace($word, $replacement, $query[$i]['description']);
}
}
Is there a faster method to sanitize all the substrings?
I don't know if there is a fast way to sanitize the data. It seems that you have to loop through all the words for the replacement, because one title could have multiple offensive words.
If you are looking for complete words, a full text index and contains should speed things up. Essentially, you would set up a loop for each of the words and then run:
update table
set title = replace(title, 'F***')
where match (title) against ('Fuck' in boolean mode);
You would need to put this in a stored procedure loop. But, the match() would be quite fast and this would probably significantly speed up the current process.
The best way to optimize this is to delegate the replacement step to the database and let mysql do the heavy lifting. You'll need to use the REPLACE mysql built-in. The (not-so-big) drawback is that you'll need to use explicit sql instead of the code igniter expression builder.

Compare one by one characters from a mysql db with php

I'm trying to compare in my DB a row with another character by character and give as a result the id which best fits the given data. For example I have on my DB the user David with a AAA sequence and I want to compare it with one I give in which is a ABA so I'd like to receive a percentage (66.6% in this case) of match,
I have done until here but don't know how to go on:
$uname = $_POST['sequence'];
$query = "SELECT name FROM dna WHERE sequence = '$uname'";
$result = mysql_query($query);
while($row = mysql_fetch_array($result))
{
echo $row['name'];
}
In order to get the similarity in percent, you might use the PHP function similar_text().
The two strings are compared and the similarity percentage is returned, if the third parameter is passed to the function.
$string_1 = 'AAA';
$string_2 = 'ABA';
similar_text($string_1, $string_2, $percent);
echo $percent;
// 66.666666666667
The database part is a bit more work. A very basic implementation could look like this.
Keep in mind, that the real problem is, that you compare a string against 1 million rows.
In general: one wouldn't do that, because instead of chars, there a bits. And to compare bits, you would use simply bit-shifts. Anyway...
Here, when working with chars/strings, a rolling row requests or limited query could help, too.
That would mean, that you ask the db for chunks of let's say 500 rows and do the calc work.
It depends on the number of rows and the memory use of the dataset.
// incomming via user input
$string_1 = $_POST['sequence'];
// temporary var to store the highest similarity percentage and it's row_id
$bestValue = array('row_id' => 0, 'similarity' => '0');
// iterate over the "total number of rows" in the database
foreach($rows as $id => $row)
{
// get a new string_2 from db
$string_2 = $row['name'];
// calculate similarity
similar_text($string_1, $string_2, $percent);
// if calculated similarity is higher, then update the "best" value
if($percent > $bestValue['similarity']) {
$bestValue = array('row_id' = $id, 'similiarity' = $percent);
}
}
var_dump($bestValue);
After all db rows are processed, bestValue will containg the highest percentage and it's row id.
You can do all kinds of things here, for instance:
switch from first match update (<) to last match update (<=)
stop iteration on first match
store row_id's, which have the same similarity (multi row match)
if you don't need multi row match, you might drop the array and use two vars for row and percent
proper error handling, escaping, mysqli usage
Be warned: this isn't the most efficient approach, especially not, when working with large datasets. If you need this on a level, which is not hobby or homework, then simply pull a tool, which is optimized for this job, like EMBOSS (http://emboss.sourceforge.net/).

While loop for mysql database with php?

I am developing a mysql database.
I "need" a unique id for each user but it must not auto increment! It is vital it is not auto increment.
So I was thinking of inserting a random number something like mt_rand(5000, 1000000) into my mysql table when a user signs up for my web site to be. This is where I am stuck?!
The id is a unique key on my mysql table specific to each user, as I can not 100% guarantee that inserting mt_rand(5000, 1000000) for the user id will not incoherently clash with another user's id.
Is there a way in which I can use mt_rand(5000, 1000000) and scan the mysql database, and if it returns true that it is unique, then insert it as the user's new ID, upon returning false (somebody already has that id) generate a new id until it becomes unique and then insert it into the mysql database.
I know this is possible I have seen it many times, I have tried with while loops and all sorts, so this place is my last resort.
Thanks
You're better off using this: http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_uuid
Or using this: http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html
But if you actually want to do what you are saying, you can just do something like:
$x;
do {
$x = random_number();
"SELECT count(*) FROM table WHERE id = $x"
} while (count != 0);
// $x is now a value that's not in the db
You could use a guid. That's what I've seen done when you can't use an auto number.
http://php.net/manual/en/function.com-create-guid.php
Doesn't this function do what you want (without verification): http://www.php.net/manual/en/function.uniqid.php?
I think you need to approach the problem from a different direction, specifically why a sequence of incrementing numbers is not desired.
If it needs to be an 'opaque' identifier, you can do something like start with a simple incrementing number and then add something around it to make it look like it's not, such as three random numbers on the end. You could go further than that and put some generated letters in front (either random or based on some other algorithm, such as the day of the month they first registered, or which server they hit), then do a simple checksuming algorithm to make another letter for the end. Now someone can't easily guess an ID and you have a way of rejecting one sort of ID before it hits the database. You will need to store the additional data around the ID somewhere, too.
If it needs to be a number that is random and unique, then you need to check the database with the generated ID before you tell the new user. This is where you will run into problems of scale as too small a number space and you will get too many collisions before the check lucks upon an unallocated one. If that is likely, then you will need to divide your ID generation into two parts: the first part is going to be used to find all IDs with that prefix, then you can generate a new one that doesn't exist in the set you got from the DB.
Random string generation... letters, numbers, there are 218 340 105 584 896 combinations for 8 chars.
function randr($j = 8){
$string = "";
for($i=0;$i < $j;$i++){
srand((double)microtime()*1234567);
$x = mt_rand(0,2);
switch($x){
case 0:$string.= chr(mt_rand(97,122));break;
case 1:$string.= chr(mt_rand(65,90));break;
case 2:$string.= chr(mt_rand(48,57));break;
}
}
return $string;
}
Loop...
do{
$id = randr();
$sql = mysql_query("SELECT COUNT(0) FROM table WHERE id = '$id'");
$sql = mysql_fetch_array($sql);
$count = $sql[0];
}while($count != 0);
For starters I always prefer to do all the randomization in php.
function gencode(){
$tempid=mt_rand(5000, 1000000);
$check=mysql_fetch_assoc(mysql_query("SELECT FROM users WHERE id =$tempid",$link));
if($check)gencode();
$reg=mysql_query("INSERT INTO users id VALUES ('$tempid')",$link);
//of course u can check for if $reg then insert successfull

Categories