MySQL - inserting 70000 random unique strings efficiently

MySQL - inserting 70000 random unique strings efficiently - php

I'm working on a project, in which I should generate at least 70000 codes which contain 8 alphanumeric characters. The codes must be unique. currently I am using php to generate these codes with the following function :
function random_unique_serial($length, PDO $conn) {
$codeCheck=FALSE;
while (!$codeCheck) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyz';
$charactersLength = strlen($characters);
$randomCode = '';
for ($i = 0; $i < $length; $i++) {
$randomCode .= $characters[rand(0, $charactersLength - 1)];
}
$sql = "SELECT * FROM codes WHERE code=:code";
$st = $conn->prepare($sql);
$st->bindvalue(":code", $randomCode, PDO::PARAM_STR);
$st->execute();
$count = $st->rowcount();
if ($count==0) {
$codeCheck=TRUE;
} else {
$codeCheck=FALSE;
}
}
return $randomCode;
}
As you see this codes checks the database for every single code generated to make sure it is not a duplicate. This should work theoretically. However this is very slow and causes the request to time out. I tried increasing execution time but that also didn't help.
Then I decided to use a database side approach and used this solution :
Generating a random & unique 8 character string using MySQL
This is also very slow and some of the generated codes are less than 8 characters long.
could you please suggest a better solution?

Create your table structure:
CREATE TABLE t (code CHAR(8) CHARACTER SET ascii COLLATE ascii_general_ci NOT NULL UNIQUE);
Define a PHP function to generate a random string:
function random_string(integer $length = 8): string {
return bin2hex(mcrypt_create_iv(ceil($length/2), MCRYPT_DEV_URANDOM));
}
Use PHP to build a multi-value INSERT statement, ram that into the database, count how many were inserted, and repeat until the required number are inserted:
function insert_records(\PDO $pdo, integer $need = 70000): void {
$have = 0;
while ($have < $need) {
// generate multi value INSERT
$sql = 'INSERT IGNORE INTO t VALUES ';
for ($i = 1; $i < $need; $i++) {
$sql .= sprintf('("%s"),', random_string());
}
$sql .= sprintf('("%s");', random_string());
// pass to database and ask how many records were inserted
$result = $pdo->query($sql);
$count = $result->rowCount();
// adjust bookkeeping values so we know how many we have and how many
// we need
$need -= $count;
$have += $count;
}
}
On my machine (Amazon Linux c2.small), the run time for 70k records is about 2 seconds:
real 0m2.136s
user 0m1.256s
sys 0m0.212s
The relevant tricks in this code, to make it fast, are:
Sending the minimum number of SQL statements necessary to generate the needed number of records. Using a multi-value insert - INSERT INTO ... VALUES (), (), ... (); - really helps this as it minimizes the total amount of statement processing MySQL has to do and it tells us how many records were inserted without having to do another query.
Using INSERT IGNORE to avoid having to check for the existence of every single code we insert, which is really really expensive.
Using the fastest possible string generating function we can for our needs. In my experience, mcrypt_create_iv is a fast generator that is cryptographically secure, so it provides an ideal balance of security and performance.
Using the ASCII character set and fixed width CHAR to remove unnecessary byte overhead and UNIQUE to enforce de-duplication.

I'd do that with mysql alone, a stored procedure will help - you can still create and call that with php. The stored procedure uses the substring of a md5 hash, created from rand(). The column where the string is to be inserted needs to be unique. Replace table name and column in in this part:
insert ignore into foo (`uniqueString`)
delimiter //
create procedure createRandomString (in num int)
begin
declare i int default 0;
while i < num do
insert ignore into foo (`uniqueString`) values (substr(md5(rand()), 1, 8));
set i = i + 1;
end while;
end //
delimiter ;
call createRandomString (70000);
I did a quick test, I got 69934 random unique strings inserted on a remote db (from the 70000 runs) within 10s 603ms. Running the same procedure with 80000 as parameter
call createRandomString(80000);
runs 12s 434ms for me, inserting 77354 rows - so you have at least 70000 in little time.
Will produce results like this:
If you want to make sure to have exactly the number of rows inserted as called, use this (but note to set the max_sp_recursion_depth to what it was before after calling the procedure, default is 0):
delimiter //
create procedure createRandomString2 (in num int)
begin
declare i int default 0;
while i < num do
insert ignore into foo (uniqueString) values (substr(md5(rand()), 1, 8));
set i = i + 1;
end while;
if (select count(id) from foo) < num then
call createRandomString2(num - (select count(id) from foo));
END IF;
end //
delimiter ;
set max_sp_recursion_depth = 100;
call createRandomString7 (70000);
set max_sp_recursion_depth = 0;

Here's one idea...
Here I'm inserting (approx.) 16, unique, 3-character (0-9/a-z) strings...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table (my_string CHAR(3) NOT NULL PRIMARY KEY);
INSERT INTO my_table
SELECT CONCAT(SUBSTR('0123456789abcdefghihjlmnopqrstuvwxyz',(RAND()*35)+1,1)
,SUBSTR('0123456789abcdefghihjlmnopqrstuvwxyz',(RAND()*35)+1,1)
,SUBSTR('0123456789abcdefghihjlmnopqrstuvwxyz',(RAND()*35)+1,1)
) x;
//Repeat this block as necessary
INSERT IGNORE INTO my_table
SELECT CONCAT(SUBSTR('0123456789abcdefghihjlmnopqrstuvwxyz',(RAND()*35)+1,1)
,SUBSTR('0123456789abcdefghihjlmnopqrstuvwxyz',(RAND()*35)+1,1)
,SUBSTR('0123456789abcdefghihjlmnopqrstuvwxyz',(RAND()*35)+1,1)
) x
FROM my_table;
//End of block
SELECT * FROM my_table;
+-----------+
| my_string |
+-----------+
| 0he |
| 112 |
| 24c |
| 322 |
| 4b7 |
| 7vq |
| as7 |
| g7n |
| h66 |
| i54 |
| idd |
| m62 |
| mqt |
| obh |
| x75 |
| xz4 |
+-----------+

Eight digit numbers are guaranteed unique: 00000000, 00000001, 00000002, ... If you don't want the codes so obvious, then select eight different sets of ten alphanumeric characters to replace the ten digits in a given position. There will still be a pattern, but it will be less obvious: ql4id78sk, ql4id78s3, ql4id78sa, ...
Beyond that, you could encrypt the original numbers, and the encryptions are guaranteed unique. A 32 bit block cypher will produce four byte results, giving eight hex characters.

Related

Getting the missing IDs from a table by storing the records into an array and compare the set of numbers with a for loop

I'm currently working on a program that gets the missing IDs of a table and the idea that I come up with is that by storing the IDs into an array and use a for loop to check if a number exists in the array and if it's existing then it is classified as a missing ID. I also used the php function - in_array() to check if a number exists in the array.
This is the code that I came up with, but I ended up with just displaying the numbers from the for loop.
<?php
include 'dbconnect.inc'; //just to the the dbconnect for connecting into the database.
$numbers = array(1, 2, 4, 6, 7, 9);
$arrlength = count($numbers);
$query = "SELECT id FROM existing";
$result = mysqli_query($conn, $query);
$existing = array();
while ($row = mysqli_fetch_assoc($result)) {
$existing[] = $row;
}
for ($i=0; $i<7358; $i++) {
if (in_array($i, $existing)) {
echo $i . " is a missing ID <br>";
} elseif(!in_array($i, $existing)) {
echo $i . " exists in the table <br>";
}
}
?>
I prefer this solution than using the temporary tables in an SQL because it takes more than to load the query and it would not be good for a webpage.
Hope that you could help me. Thanks!

From this answer:
To get missing ranges:
SELECT a.id+1 AS 'Missing From', MIN(b.id)-1 AS 'Through'
FROM existing AS a
JOIN existing AS b ON a.id < b.id
GROUP BY a.id
HAVING a.id+1 < MIN(b.id)
fiddle

User variables are only evaluated when sent, so using a HAVING NOT (gap_from=0 AND gap_to=0) clause isn't possible as an optimization (see user variables manual). A such we use the "sending" to be sending to the temporary table to save a larger time full of data that is about to be discarded.
The temporary table uses the primary key ensure there will only be one (0,0) entry that occurs when the there is no gap. Inserting subsequent existing entries (0,0) gets ignored resulting in a minimal table of gaps.
The remainder of the table is the gaps in the sequence:
create table existing (id int unsigned not null)
insert into existing values (3),(5),(6),(7),(8),(19),(20),(21),(30)
set #last=0
CREATE TEMPORARY TABLE v (gap_from int unsigned, gap_to int unsigned, next int unsigned, PRIMARY KEY(gap_from, gap_to))
IGNORE SELECT IF(#last=id, 0, #last) as gap_from,
IF(#last=id, 0, id-1) as gap_to,
#last:=id+1 as next
FROM existing ORDER BY id
select gap_from,gap_to from v where NOT (gap_from=0 AND gap_to=0)
gap_from | gap_to
-------: | -----:
0 | 2
4 | 4
9 | 18
22 | 29
If you don't want the first gap, the one between 0 and the first entry in the table:
select gap_from,gap_to from v where gap_from!=0
db<>fiddle here

Sql correct filter for floats values in string column

I have table invoices and there is column 'total' varchar(255). There are values like these: "500.00", "5'199.00", "129.60", "1.00" and others.
I need select records and filter by total column. For example, find records where total is not more than 180.
I tried this:
SELECT total from invoices WHERE invoices.total <= '180'
But in result there are :
125.25
100.50
1593.55 - not correct
4'799.00 - not correct
1.00
-99.00
2406.52 -not correct
How can I fix it and write correct filter for this column? Thanks!

You can use cast() function to convert it in float
SELECT total from invoices WHERE cast(invoices.total as decimal(16,2)) <= 180

Why are you storing numbers as strings? That is a fundamental problem with your data model, and you should fix it.
Sometimes, we are stuck with other people's really, really, really bad decisions. If that is the case, you can attempt to solve this with explicit conversion:
SELECT i.total
FROM invoices i
WHERE CAST(REPLACE(i.total, '''', '') as DECIMAL(20, 4)) <= 180;
Note that this will return an error if you have other unexpected characters in your totals.

If the string starts with a number, then contains non-numeric characters, you can use the CAST() function or convert it to a numeric implicitly by adding a 0:
SELECT CAST('1234abc' AS UNSIGNED); -- 1234
SELECT '1234abc'+0; -- 1234
To extract numbers out of an arbitrary string you could add a custom function like this:
DELIMITER $$
CREATE FUNCTION `ExtractNumber`(in_string VARCHAR(50))
RETURNS INT
NO SQL
BEGIN
DECLARE ctrNumber VARCHAR(50);
DECLARE finNumber VARCHAR(50) DEFAULT '';
DECLARE sChar VARCHAR(1);
DECLARE inti INTEGER DEFAULT 1;
IF LENGTH(in_string) > 0 THEN
WHILE(inti <= LENGTH(in_string)) DO
SET sChar = SUBSTRING(in_string, inti, 1);
SET ctrNumber = FIND_IN_SET(sChar, '0,1,2,3,4,5,6,7,8,9');
IF ctrNumber > 0 THEN
SET finNumber = CONCAT(finNumber, sChar);
END IF;
SET inti = inti + 1;
END WHILE;
RETURN CAST(finNumber AS UNSIGNED);
ELSE
RETURN 0;
END IF;
END$$
DELIMITER ;
Once the function is defined, you can use it in your query:
SELECT total from invoices WHERE ExtractNumber(invoices.total) <= 180

Mysql auto increment issue

function randomUnique(){
return $randomString =rand(0, 9999); //generate random key
}
function insert($uid,$name,$email){
$link = mysqli_connect("localhost", "root", "", "dummy");
$query = "insert into `usertbl`(`uid`,`name`,`email`)
values('".$uid."','".$name."','".$email."');";
if(mysqli_query($link, $query)){
return $rval = 1;
}else if(mysqli_errno($link) == 1062){
insert(randomUnique(),$name,$email);
}else if(mysqli_errno($link != 1062)){
return $rval = 2;// unsuccessful query
}
}
$uid = randomUnique();
$name = "sam";
$email = "sam#domain.com";
$msg_code = insert ($uid,$name,$email);
echo $msg_code;
I have 4 columns in the table :
id(PK AI),uid(varchar unique),name(varchar),email(varchar).
When I want to create a new user entry.A random key is generated using the function 'randomUnique()'.And I have the column 'id' set to AI so it tries to input the details, but if the key repeats that error number 1062 is returned back from mysql.Everything runs well except for id column which is set to AI. the column value is skipped once if one key is a duplicate.
The above code is a recursive function so the number of values skipped in column 'id' is directly proportional to the number of times the function is called.
Example:
id | uid | name | email
1 | 438 | dan | dan#domail.com
2 | 3688 | nick | nick#domain.com
4 | 410 | sid | sid#domain.com
Here, we can see number 3 has skipped bcoz either random number function gave us a number 438 or 3688 which tends to throw back an error and our recursive function repeats once skipping the number 3 and entering 4 next time on successful execution.
I need to fix the auto increment so it enters the value into proper sequence .
I cannot change the structure of the table.

You can check whether an entry already exists with that uid before performing the INSERT operation, e.g.
SELECT COUNT(*) FROM table WHERE uid = '$uid';
This will return you the count of records that have the newly generated uid. You can check this count and perform the INSERT only if count is 0. If not, you can call the function again to generate anoter random value.

In each function calling you are creating new db link, may be for this situation php provided mysqli_close($link);
Either you close connection
if(mysqli_query($link, $query)){
return $rval = 1;
}else if(mysqli_errno($link) == 1062){
mysqli_close($link);
insert(randomUnique(),$name,$email);
}else if(mysqli_errno($link != 1062)){
return $rval = 2;// unsuccessful query
}
OR simply put DB connection out of function
$link = mysqli_connect("localhost", "root", "", "dummy");
function insert($uid,$name,$email){

Use PHP's uniqid function, it generates a proper unique is.
http://php.net/manual/en/function.uniqid.php
What is this is being used for? You may be able to use the id column which will perform much faster and is already guaranteed to be unique.

create new unique ID in postgres

I have a table
id | entry | field | value
1 | 1 | name | Egon
2 | 1 | sname | Smith
3 | 1 | city | Los Angeles
4 | 2 | name | Stephe
5 | 2 | sname | Mueller
6 | 2 | city | New York
where id is the PK and autoincrements. Lines with the same entry belong together, they are kind of a dataset. I want to add new lines for a new entry (which should have the value 3).
My Question is: how can I obtain the next value for entry in a way, that it is unique?
At the moment im using something like
SELECT MAX(entry)+1
but when two queries are made at the same time, I'll get entry = "3" for both of them. I'm coding in PHP.

Use the sequence that you already got as a unique identifier at INSERT (don't leave this task to PHP):
INSERT INTO "my_table" ("entry", "field", "value") VALUES (currval(('"my_current_id_sequence"'::text)::regclass), 'any_field_info', 'any_value_info');
This way postgres will use the right number EVERYTIME.
Another aproach is to create a sequence and use it on INSERT as well, just use nextval(('"my_new_sequence"'::text)::regclass) as the value of entry and you can even use it as a default value.
If you need to know which value was used just add RETURNING entry to your INSERT
CREATE SEQUENCE alt_serial START 101;
ALTER TABLE "my_table"
ALTER COLUMN "entry" SET DEFAULT nextval(('"alt_serial"'::text)::regclass);
INSERT INTO "my_table" ("entry", "field", "value") VALUES (DEFAULT, 'any_field_info', 'any_value_info') RETURNING entry;

you can do some kind of loop that increments the entry every loop. If you want to add more lines with the same entry, pass more arguments to it in 1 loop.
class foo{
protected $foo = [];
public function set_foo($arguments = []){
$this->foo = $arguments;
}
public function insert_rows(){
$entry = // last entry from the db (number)- your logic here
$num_of_foos = count($this->foo);
$i = 0;
$query = 'INSERT INTO table(entry, field, value ) VALUES';
while($i<$num_of_foos){
if($i>1){$query .= ',';} // adds the comma if loop repeats more than once
$query .= '("';
$query .= $entry; // entry number for the first item
$query .= ',' . $this->foo[$i] . ',' . $some_Value;
$query .= '")';
$i++;
} // end of while loop
$entry++;
} // end of function insert_rows()
} // end of class foo
//now all you need to do is set the foo
$foo = new foo;
$foo->set_foo('lalala'); // sets one argument with new entry
$foo->set_foo('jajaja'); // sets another argument with new entry
$foo->set_foo('lala', 'jaja'); // sets two arguments with the same entry

You need to:
create a sequence
CREATE SEQUENCE table_entry_seq;
Then, when you need to add a row with a new entry, get the next value of the sequence:
SELECT nextval(('"table_entry_seq"'::text)::regclass)

Value in Column

I am developing a small hobby application. Though I've worked with MySQL and PostgreSQL before, I'm more of a n00b here and would appreciate any help.
I have a table in my MySQL database called "TECH". This table has two columns: "ID" (primary key) and "name" (name of the tech - not a key of any sort). Here are a couple of example rows:
+----+--------+
| ID | name |
+----+--------+
| 1 | Python|
| 2 | ASP |
| 3 | java |
+----+--------+
Here is the code that creates TECH:
CREATE TABLE TECH (
id INT(5) ,
name VARCHAR(20),
PRIMARY KEY (id)
);
I have developed an html form for the user to input a new technology into TECH. However, I would like to ensure that duplicate entries do not exist in TECH. For example, the user should not be allowed to enter "Python" to be assigned ID 4. Further, the user should also not be allowed to enter "pYthon" (or any variant of capitalization) at another ID.
Currently, I have the following code that does this (on the PHP side, not the MySQL side):
// I discovered that MySQL is not case sensitive with TECH.name
$rows = 0;
$result = $mysql_query("SELECT * FROM tech AS T WHERE T.name='python'");
while ($row = mysql_fetch_array($result)) {
$rows += 1;
}
if ($rows != 0) {
echo "'python' cannot be inserted as it already exists";
} else {
// insertion code
}
Now, I know that the correct way to do this would be to constrain TECH.name to be UNIQUE by doing UNIQUE (name) and catching an "insert error" on the PHP side.
However, I have the following two questions regarding this process:
Does defining the UNIQUE constraint maintain the apparent case-insensitivity addressed above?
How do I go about catching exactly such an insert error on the PHP side?
I'd appreciate any help with this or any better ideas that anyone has.

When you manipulate mysql form php (i.e. by doing an INSERT or UPDATE), you can call mysql_get_rows_affected which will return the rows affected. If the query has failed due to the UNIQUE constraint then the affected rows will be 0
http://php.net/manual/en/function.mysql-affected-rows.php
I usually check the number of rows returned from that function, The same check can be applyed if you take the INSERT OR IGNORE approach

TRY
INSERT IGNORE INTO mytable
(primaryKey, field1, field2)
VALUES
('abc', 1, 2),
('def', 3, 4),
('ghi', 5, 6);
duplicated rows would be ignored

Changing the collation of the field to _ci or _cs would determine whether a unique key was caseinsensitive or casesensitive.
As for catching the error, you should try using mysqli or PDO to run db queries: http://www.php.net/manual/en/pdo.exec.php
You can catch a duplicate error entry with PDO like so:
try
{
$dbh->exec($mySqlQuery);
// insert was successful...
} catch (PDOException $e) {
if ($e->errorInfo[1]==1062) {
// a 'duplicate' error occurred...
} else {
// a non 'duplicate error' occurred...
}
}
Edit:
If you're not using PDO, this should work after your mysql_query:
if (mysql_errno() == 1062)
{
// you have a duplicate error...
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

MySQL - inserting 70000 random unique strings efficiently - php

Related

Getting the missing IDs from a table by storing the records into an array and compare the set of numbers with a for loop

Sql correct filter for floats values in string column

Mysql auto increment issue

create new unique ID in postgres

Value in Column

Categories

Resources