Compare one by one characters from a mysql db with php - php

I'm trying to compare in my DB a row with another character by character and give as a result the id which best fits the given data. For example I have on my DB the user David with a AAA sequence and I want to compare it with one I give in which is a ABA so I'd like to receive a percentage (66.6% in this case) of match,
I have done until here but don't know how to go on:
$uname = $_POST['sequence'];
$query = "SELECT name FROM dna WHERE sequence = '$uname'";
$result = mysql_query($query);
while($row = mysql_fetch_array($result))
{
echo $row['name'];
}

In order to get the similarity in percent, you might use the PHP function similar_text().
The two strings are compared and the similarity percentage is returned, if the third parameter is passed to the function.
$string_1 = 'AAA';
$string_2 = 'ABA';
similar_text($string_1, $string_2, $percent);
echo $percent;
// 66.666666666667
The database part is a bit more work. A very basic implementation could look like this.
Keep in mind, that the real problem is, that you compare a string against 1 million rows.
In general: one wouldn't do that, because instead of chars, there a bits. And to compare bits, you would use simply bit-shifts. Anyway...
Here, when working with chars/strings, a rolling row requests or limited query could help, too.
That would mean, that you ask the db for chunks of let's say 500 rows and do the calc work.
It depends on the number of rows and the memory use of the dataset.
// incomming via user input
$string_1 = $_POST['sequence'];
// temporary var to store the highest similarity percentage and it's row_id
$bestValue = array('row_id' => 0, 'similarity' => '0');
// iterate over the "total number of rows" in the database
foreach($rows as $id => $row)
{
// get a new string_2 from db
$string_2 = $row['name'];
// calculate similarity
similar_text($string_1, $string_2, $percent);
// if calculated similarity is higher, then update the "best" value
if($percent > $bestValue['similarity']) {
$bestValue = array('row_id' = $id, 'similiarity' = $percent);
}
}
var_dump($bestValue);
After all db rows are processed, bestValue will containg the highest percentage and it's row id.
You can do all kinds of things here, for instance:
switch from first match update (<) to last match update (<=)
stop iteration on first match
store row_id's, which have the same similarity (multi row match)
if you don't need multi row match, you might drop the array and use two vars for row and percent
proper error handling, escaping, mysqli usage
Be warned: this isn't the most efficient approach, especially not, when working with large datasets. If you need this on a level, which is not hobby or homework, then simply pull a tool, which is optimized for this job, like EMBOSS (http://emboss.sourceforge.net/).

Related

PHP increment booking number according to the last booking number in database

I'm using PHP 7 with Phalcon PHP and I'm trying to create a method to generate a booking number. Here is my current method :
public function generateNumber($company_code) {
// Build the prefix : COMPANY20190820
$prefix = $company_code . date('Ymd');
// It's like SELECT count(*) FROM bookings WHERE number LIKE 'COMPANY20190820%'
$counter = Bookings::count(array(
"number LIKE :number:",
"bind" => array('number' => $prefix.'%')
));
// Concat prefix with bookings counter with str_pad
// COMPANY20190820 + 005 (if 4 bookings in DB)
$booking_number = $prefix . str_pad($counter + 1, 3, 0, STR_PAD_LEFT);
// Return COMPANY20190820005
return $booking_number;
}
So I have a problem because sometime I have to delete 1 or multiple bookings so I can get :
COMPANY20190820001
COMPANY20190820002
COMPANY20190820005
COMPANY20190820006
COMPANY20190820007
And I need to add after the last in my DB so here 007, because I can get duplicated booking number if I count like that.
So how can I do to take the last and increment according the last booking number of the current day ?
You need to rethink what you want to do here as it will never work that way.
As I see it you have at least two options:
Use an auto-increment id and use that in combination with the prefix
Use a random fairly unique string (e.g. UUID4)
You should never manually try to get the current maximum id as that may and most likely will at some point result in race conditions and brittle code as a result of that.
So I found a solution, maybe there is a better way to do that but my function works now:
public function generateNumber($company_code) {
// Build the prefix : COMPANY20190820
$prefix = $company_code . date('Ymd');
// Get the last booking with the today prefix
// e.g : COMPANY20190820005
$last_booking = Bookings::maximum(array(
"column" => "number",
"number LIKE :number:",
"bind" => array('number' => $prefix.'%')
));
// Get the last number by removing the prefix (e.g 005)
$last_number = str_replace($prefix, "", $last_booking);
// trim left 0 if exist to get only the current number
// cast to in to increment my counter (e.g 5 + 1 = 6)
$counter = intval(ltrim($last_number, "0")) + 1;
// Concat prefix + counter with pad 006
$booking_number = $prefix . str_pad($counter, 3, 0, STR_PAD_LEFT);
// Return COMPANY20190820006
return $booking_number;
}
I reckon that the use case you describe does not justify the hassle of writing a custom sequence generator in PHP. Additionally, in a scenario where booking deletion is expected to happen, ID reusing feels more a bug than a feature, so your system should store a permanent counter to avoid reusing, making it less simple. Don't take me wrong, it can be done and it isn't rocket science, but it's time and energy you don't need to spend.
Your database engine surely has a native tool to generate autoincremented primary keys, with varying names and implementations (SQL Server has identity, Oracle has sequences and identity, MySQL has auto_increment...). Use that instead.
Keep internal data and user display separated. More specifically, don't use the latter to regenerate the former. Your COMPANY20190820007 example is trivial to compose from individual fields, either in PHP:
$booking_number = sprintf('%s%s%03d',
$company_code,
$booking_date->format('Ymd'),
$booking_id
);
... or in SQL:
-- This is MySQL dialect, other engines use their own variations
SELECT CONCAT(company_code, DATE_FORMAT(booking_date, '%Y%m%d'), LPAD(booking_id, 3, '0')) AS booking_number
FROM ...
You can (and probably should) save the resulting booking_number, but you cannot use it as source for further calculations. It's exactly the same case as dates: don't need to store dates in plain English in order to eventually display them to the end-user and you definitively don't want to parse English dates back to actual dates in order to do anything else beyond printing.
You also mention the possibility of generating long pure-digit identifiers, as Bookings.com does. There're many ways to do it and we can't know which one they use, but you may want to considering generating a numeric hash out of your auto-incremented PK via integer obfuscation.
you could split your database field in two parts, so you hold the prefix and the counter separately.
then, you simply select the highest counter for your desired prefix and increment that one.
if you can't change the table structure, you could alternatively order by the id descendingly and select the first. then you can extract its counter manually. keep in mind you should pad the numbers then, or you get #9 even if #10 exists.
if padding is not an option, you can direct the database to replace your prefix. that way, you can cast the remaining string to a number and let the database sort - this will cost some performance, though, so keep the amount of records low.

How to search partial/masked strings?

I am storing social security numbers in the database, but instead of storing whole numbers, I only store only 5 digits sequence. So, if SSN# is 123-12-1234, my database would store it #23121### or ####21234 or anything else, as long as it has a 5 digits in the row.
Therefore, when user enters whole SSN, I want the database to locate all matches.
So, I can do this :
SELECT * FROM user WHERE ssn like 123121234
But the query above would not work, since I have some masked characters in the SSN field (#23121###). Is there a good way of doing this?
Maybe a good way would be to use
SELECT * FROM user WHERE REPLACE (ssn, '#', '') like 123121234
Although there could be an issue - the query might return non-relevant matches since 5 numbers that I store in the DB could be anywhere in a sequence.
Any idea how to do a better search?
If the numbers are always in a sequential block, you can generate a very efficient query by just generating the 5 variations of the ssn that could be stored in the DB and search for all of them with an exact match. This query can also use indexes to speed things up.
SELECT *
FROM user
WHERE ssn IN ('12312####',
'#23121###',
'##31212##',
'###12123#',
'####21234');
I think you can do something like this:
Extract all possible 5-char combinations out of the queried SSN.
Make an IN() query on those numbers. I'm not sure though how many results you would get from this.
$n = 123121234;
$sequences = array();
for($i = 0; $i + 5 <= strlen($n); $i++) {
$sequences[] = substr($n, $i, 5);
}
var_dump($sequences);
Tell me if you need those hash sign surrounding the strings.

Random dosieid number that is not used

I'm trying to generate a unique "dosieid" number for my web site. My web site is a human resources program solution, in that program users create dosie of their workers in their firm ...random dosieid needs me so when user creating dosie in field dosieid automatically show the dosieid-s that are not used before...the dosieid that don't exist in database. In other case I would use auto increment but in this case dosie is not created yet. And in form dosieid must be option to change the number if random is not fine with a user. One more hint the numbers must bee from 1 to 9999. Can someone help me? I have try many codes but I have not find something like one with this spec.
This is what I have do so far. It gets the random number but I don't know how to compare that random number with database row "dosieid" ?
$id_num = mt_rand(1,9999);
$query = "SELECT dosjeid FROM albums";
$result = mysql_query($query) or die(mysql_error());
while($account = mysql_fetch_array($result)){
if ($id_num == $account['id']){
$id_num = mt_rand(1,9999);
}
}
echo"$id_num<br>";
This is extraordinarily convoluted... why is an auto-incrementing number not enough? This code would also never work properly. If for whatever reason you HAVE to use a random number, then you'd do it like this:
while(true) {
$id_rand = mt_rand(1,9999);
$result = mysql_query("SELECT count(*) FROM albums WHERE dosjeid=$id_rand") or die(mysql_error());
$row = mysql_fetch_row($result);
if ($row[0] == 0) {
break; // our random number isn't in the database, so exit the loop
}
}
However, here's some problems with this:
1) You'll get an infinite loop when you reach 9999 dosie records
2) The more records there are in the database, the longer this loop will take to find a "vacant" slot. As you get closer and closer to 9999 records, you'll be taking a LONG time to find that one empty slot
3) If you're trying to "cloak" the IDs of anyone member so that users can't simply increment an ID parameter somewhere to see other people's records, there's FAR FAR FAR better/easier ways of doing this, such as encrypting the ID value before sending it out to clients.
Use a auto-increment number as your primary key and an additional display id with the UNIQUE attribute as the ID shown to the user. This way you have a unique ID for your internal processing and a display ID that can be easily changed.
This is a terrible design. You should either:
not let users create the dosieid (create it yourself, give it to them after record created)
Try to create a stub record first with an assigned dosieid, and then update it with information
or use UUIDs, which requires a much bigger range than 1-9999
Even if you check that the number was unique, in between the time when you check it and the time you insert the record someone else may have taken it.
And under no circumstances should you find an empty id by picking numbers at random. This makes your program execution time non-deterministic, and if you eventually get 5000 employees you could be waiting a long time.
Also, This range is way too small for a randomness requirement.
You may also want to read about number only hashes (check upon the algorithm's collision rate) - php: number only hash?
function doesIdExists($id)
{
$query = "SELECT dosjeid FROM albums";
$result = mysql_query($query) or die(mysql_error());
while($account = mysql_fetch_array($result))
{
if ($id_num == $account['id'])
return true; /* The id is taken */
}
return false; /* Not taken */
}
$recNotAdded = true;
while($recNotAdded)
{
$rand = mt_rand(1,1000); //Whatever your numbers
$doesExist = doesIdExists($rand);
if(!$doesExist)
{
/* Add to DB */
$recNotAdded = false;
}
}

While loop for mysql database with php?

I am developing a mysql database.
I "need" a unique id for each user but it must not auto increment! It is vital it is not auto increment.
So I was thinking of inserting a random number something like mt_rand(5000, 1000000) into my mysql table when a user signs up for my web site to be. This is where I am stuck?!
The id is a unique key on my mysql table specific to each user, as I can not 100% guarantee that inserting mt_rand(5000, 1000000) for the user id will not incoherently clash with another user's id.
Is there a way in which I can use mt_rand(5000, 1000000) and scan the mysql database, and if it returns true that it is unique, then insert it as the user's new ID, upon returning false (somebody already has that id) generate a new id until it becomes unique and then insert it into the mysql database.
I know this is possible I have seen it many times, I have tried with while loops and all sorts, so this place is my last resort.
Thanks
You're better off using this: http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_uuid
Or using this: http://dev.mysql.com/doc/refman/5.0/en/insert-on-duplicate.html
But if you actually want to do what you are saying, you can just do something like:
$x;
do {
$x = random_number();
"SELECT count(*) FROM table WHERE id = $x"
} while (count != 0);
// $x is now a value that's not in the db
You could use a guid. That's what I've seen done when you can't use an auto number.
http://php.net/manual/en/function.com-create-guid.php
Doesn't this function do what you want (without verification): http://www.php.net/manual/en/function.uniqid.php?
I think you need to approach the problem from a different direction, specifically why a sequence of incrementing numbers is not desired.
If it needs to be an 'opaque' identifier, you can do something like start with a simple incrementing number and then add something around it to make it look like it's not, such as three random numbers on the end. You could go further than that and put some generated letters in front (either random or based on some other algorithm, such as the day of the month they first registered, or which server they hit), then do a simple checksuming algorithm to make another letter for the end. Now someone can't easily guess an ID and you have a way of rejecting one sort of ID before it hits the database. You will need to store the additional data around the ID somewhere, too.
If it needs to be a number that is random and unique, then you need to check the database with the generated ID before you tell the new user. This is where you will run into problems of scale as too small a number space and you will get too many collisions before the check lucks upon an unallocated one. If that is likely, then you will need to divide your ID generation into two parts: the first part is going to be used to find all IDs with that prefix, then you can generate a new one that doesn't exist in the set you got from the DB.
Random string generation... letters, numbers, there are 218 340 105 584 896 combinations for 8 chars.
function randr($j = 8){
$string = "";
for($i=0;$i < $j;$i++){
srand((double)microtime()*1234567);
$x = mt_rand(0,2);
switch($x){
case 0:$string.= chr(mt_rand(97,122));break;
case 1:$string.= chr(mt_rand(65,90));break;
case 2:$string.= chr(mt_rand(48,57));break;
}
}
return $string;
}
Loop...
do{
$id = randr();
$sql = mysql_query("SELECT COUNT(0) FROM table WHERE id = '$id'");
$sql = mysql_fetch_array($sql);
$count = $sql[0];
}while($count != 0);
For starters I always prefer to do all the randomization in php.
function gencode(){
$tempid=mt_rand(5000, 1000000);
$check=mysql_fetch_assoc(mysql_query("SELECT FROM users WHERE id =$tempid",$link));
if($check)gencode();
$reg=mysql_query("INSERT INTO users id VALUES ('$tempid')",$link);
//of course u can check for if $reg then insert successfull

Calculate distances and sort them

I wrote a function that can calculate the distance between two addresses using the Google Maps API.
The addresses are obtained from the database. What I want to do is calculate the distance using the function I wrote and sort the places according to the distance. Just like "Locate Store Near You" feature in online stores.
I'm going to specify what I want to do with an example:
So, lets say we have 10 addresses in database. And we have a variable $currentlocation. And I have a function called calcdist(), so that I can calculate the distances between 10 addresses and $currentlocation, and sort them. Here is how I do it:
$query = mysql_query("SELECT name, address FROM table");
while ($write = mysql_fetch_array($query)) {
$distance = array(calcdist($currentlocation, $write["address"]));
sort($distance);
for ($i=0; $i<1; $i++) {
echo "<tr><td><strong>".$distance[$i]." kms</strong></td><td>".$write['name']."</td></tr>";
}
}
But this doesn't work very well. It doesn't sort the numbers.
Another challenge:
How can I do this in an efficient way? Imagine there are infinite numbers of addresses; how can I sort these addresses and page them?
$query = mysql_query("SELECT name, address FROM table");
$rows = array();
while ($row = mysql_fetch_array($query)) {
$row['distance'] = array(calcdist($currentlocation, $row['address']));
$rows[$row['name']] = $row;
}
function cmp_distances($a, $b) {
if($a['distance'] > $b['distance']) return 1;
elseif($a['distance'] < $b['distance']) return -1;
else return 0;
}
// sort distances while preserving key=>value associations
uasort($rows, 'cmp_distances');
// iterate over the sortest list and displaythe entries
foreach($rows as $name => $row) {
echo '<tr><td><strong>'.$row['distance'].' km</strong></td><td>'.$name.'</td></tr>';
}
In your example you calculate the distance to one address at the time:
$distance = array(calcdist($currentlocation, $write["address"]));
And when you do this
sort($distance);
you only have one item in your array. Basically you are printing the values exactly in the same order they are coming from the db, before the distance calculation.
You could:
1) Calculate all the addresses and put them into an array
2) Sort the array
3) Print out the results
About the another challenge you mentioned. This is a bit more tricky and I'm sure there are plenty of options. I would start by thinking how many addresses you really need to compare with each other? Is it really infinite? :)
Is this inside one country or world wide? In your db addresses, you most likely have the postal code. You can use this to narrow the search. Use only the postal codes near by and make the calculations only for those addresses.
But rule of thumb usually is that we worry about the performance too soon. Before it's even a problem.
I think you got some nice answers about your first question.
Concerning the second problem, it depends little bit how your database looks like. Do you store just a string with an address? I assume that you use some geocoding service to convert the address to a (lat,lon) position and then you calculate the distance, is this right?
In case you do something like this, you could start saving the coordinates for each geocoded address in your dataabse. In this way you would geocode the address only once (maybe you later will be willing to update this information now and then, but this is another issue).
Once you have in your table "Address,lat,lon" you can use SQL to narrow down your search imposing some conditions on (lat,lon) or you may even try to make SQL do the whole job for you, defining a new column (in the result set) like for example distance = sqrt((lat-x)^2 + (lon-y)^2)) where (x,y) is the point you start from (the point where the user is) and later return the first N results sorted by distance

Categories