Random unique Mysql Id - php

I have a table and want to genarate random unique value to it using 1 mysql query.
Table structure like:
id | unique_id
1 | 1
2 | 5
3 | 2
4 | 7
etc
unique_id is integer(10) unsigned
So I want to fullfill unique_id field with unique random value (not 1, 5, 2, 7 in my example) every time.
Algorytm is:
1. Get 1 unique random value, which will not have duplicates in unique_id field
2. Create new row with unique_id = result of 1 query
I tried
SELECT FLOOR(RAND() * 9) AS random_number
FROM table
HAVING random_number NOT IN (SELECT unique_id FROM table)
LIMIT 1
but it generates not unique values..
Note: multiplier = 9 is given just for example, it is easy to reproduce this problem with such multiplier

One way to do this is to use the id column, just as a random permutation:
select id
from tables
order by rand()
limit 1
(Your example only returns one value.)
To return a repeatable random number for each id, you can do something like:
select id,
(select count(*) from table t2 where rand(t2.id) < rand(t.id)) as randomnumber
from table t
What this is doing is producing a stable sort order by seeding the random number generator. This guarantees uniqueness, although this is not particularly efficient.
A more efficient alternative that uses variables is:
SELECT id, #curRow := #curRow + 1 AS random_number
FROM table CROSS JOIN (SELECT #curRow := 0) r
order by rand()
Note: this returns random numbers up to the size of the table, not necessarily from the ids. This may be a good thing.
Finally, you can get the idea that you were attempting to work with a bit of a trick. Calculate an md5 hash, then cast the first four characters as an integer and check back in the table:
SELECT convert(hex(left(md5(rand()), 4)), unsigned) AS random_number
FROM table
HAVING random_number NOT IN (SELECT unique_id FROM table)
LIMIT 1
You need to insert the value back into the table. And, there is no guarantee that you will actually be able to get a value not in the table, but it should work for up to millions of values.

If you have the availability to use an MD5 hash as a unique_ID, go for MD5(NOW()). This will almost certainly generate a unique ID every time.
Reference: MySQL Forums

Related

Pagination Offset Issues - MySQL

I have an orders grid holding 1 million records. The page has pagination, sort and search options. So If the sort order is set by customer name with a search key and the page number is 1, it is working fine.
SELECT * FROM orders WHERE customer_name like '%Henry%' ORDER BY
customer_name desc limit 10 offset 0
It becomes a problem when the User clicks on the last page.
SELECT * FROM orders WHERE customer_name like '%Henry%' ORDER BY
customer_name desc limit 10 offset 100000
The above query takes forever to load. Index is set to the order id, customer name, date of order column.
I can use this solution https://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/ if I don't have a non-primary key sort option, but in my case sorting is user selected. It will change from Order id, customer name, date of order etc.
Any help would be appreciated. Thanks.
Problem 1:
LIKE "%..." -- The leading wildcard requires a full scan of the data, or at least until it finds the 100000+10 rows. Even
... WHERE ... LIKE '%qzx%' ... LIMIT 10
is problematic, since there probably not 10 such names. So, a full scan of your million names.
... WHERE name LIKE 'James%' ...
will at least start in the middle of the table-- if there is an index starting with name. But still, the LIMIT and OFFSET might conspire to require reading the rest of the table.
Problem 2: (before you edited your Question!)
If you leave out the WHERE, do you really expect the user to page through a million names looking for something?
This is a UI problem.
If you have a million rows, and the output is ordered by Customer_name, that makes it easy to see the Aarons and the Zywickis, but not anyone else. How would you get to me (James)? Either you have 100K links and I am somewhere near the middle, or the poor user would have to press [Next] 'forever'.
My point is that the database is not the place to introduce efficiency.
In some other situations, it is meaningful to go to the [Next] (or [Prev]) page. In these situations, "remember where you left off", then use that to efficiently reach into the table. OFFSET is not efficient. More on Pagination
I use a special concept for this. First I have a table called pager. It contains an primary pager_id, and some values to identify a user (user_id,session_id), so that the pager data can't be stolen.
Then I have a second table called pager_filter. I consist of 3 ids:
pager_id int unsigned not NULL # id of table pager
order_id int unsigned not NULL # store the order here
reference_id int unsigned not NULL # reference into the data table
primary key(pager_id,order_id);
As first operation I select all records matching the filter rules from and insert them into pager_filter
DELETE FROM pager_filter WHERE pager_id = $PAGER_ID;
INSERT INTO pager_filter (pager_id,order_id,reference_id)
SELECT $PAGER_ID pager_id, ROW_NUMBER() order_id, data_id reference_id
FROM data_table
WHERE $CONDITIONS
ORDER BY $ORDERING
After filling the filter table you can use an inner join for pagination:
SELECT d.*
FROM pager_filter f
INNER JOIN data_table d ON d.data_id = f.reference id
WHERE f.pager_id = $PAGER_ID && f.order_id between 100000 and 100099
ORDER BY f.order_id
or
SELECT d.*
FROM pager_filter f
INNER JOIN data_table d ON d.data_id = f.reference id
WHERE f.pager_id = $PAGER_ID
ORDER BY f.order_id
LIMIT 100 OFFSET 100000
Hint: All code above is not tested pseudo code

Efficiently get diff of large data set?

I need to be able to diff the results of two queries, showing the rows that are in the "old" set but aren't in the "new"... and then showing the rows that are in the "new" set but not the old.
Right now, i'm pulling the results into an array, and then doing an array_diff(). But, i'm hitting some resource and timing issues, as the sets are close to 1 million rows each.
The schema is the same in both result sets (barring the setId number and the table's autoincrement number), so i assume there's a good way to do it directly in MySQL... but im not finding how.
Example Table Schema:
rowId,setId,userId,name
Example Data:
1,1,user1,John
2,1,user2,Sally
3,1,user3,Tom
4,2,user1,John
5,2,user2,Thomas
6,2,user4,Frank
What i'm needing to do, is figure out the adds/deletes between setId 1 and setId 2.
So, the result of the diff should (for the example) show:
Rows that are in both setId1 and setId2
1,1,user1,John
Rows that are in setId 1 but not in setId2
2,1,user2,Sally
3,1,user3,Tom
Rows that are in setId 2 but not in setId1
5,2,user2,Thomas
6,2,user4,Frank
I think that's all the details. And i think i got the example correct. Any help would be appreciated. Solutions in MySQL or PHP are fine by me.
You can use exists or not exists to get rows that are in both or only 1 set.
Users in set 1 but not set 2 (just flip tables for the opposite):
select * from set1 s1
where set_id = 1
and not exists (
select count(*) from set1 s2
where s1.user1 = s2.user1
)
Users that are in both sets
select * from set2 s2
where set_id = 2
and exists (
select 1 from set1 s1
where s1.setId = 1
and s2.user1 = s1.user1
)
If you only want distinct users in both groups then group by user1:
select min(rowId), user1 from set1
where set_id in (1,2)
group by user1
having count(distinct set_id) = 2
or for users in group but not the other
select min(rowId), user1 from set1
where set_id in (1,2)
group by user1
having count(case when set_id <> 1 then 1 end) = 0
What we ended up doing, was adding a checksum column to the necessary tables being diffed. That way, instead of having to select multiple columns for comparison, the diff could be done against a single column (the checksum value).
The checksum value was a simple md5 hash of a serialized array that contained the columns to be diffed. So... it was like this in PHP:
$checksumString = serialize($arrayOfColumnValues);
$checksumValue = md5($checksumString);
That $checksumValue would then be inserted/updated into the tables, and then we can more easily do the joins/unions etc on a single column to find the differences. It ended up looking something like this:
SELECT i.id, i.checksumvalue
FROM SAMPLE_TABLE_I i
WHERE i.checksumvalue not in(select checksumvalue from SAMPLE_TABLE_II)
UNION ALL
SELECT ii.id, ii.checksumvalue
FROM SAMPLE_TABLE_II ii
WHERE ii.checksumvalue not in(select checksumvalue from SAMPLE_TABLE_I);
This runs fast enough for my purposes, at least for now :-)

MYSQL: SELECT * FROM `user_log` WHERE `id` is sequential and `username` == EQUAL in multiple rows

I've searched high and long for an answer to this.
I have a database that collects data whenever a user logs onto our network.
Some users are complaining of disconnections, so I would like to crawl the database, and find any sections where a user is appearing in the database on 3 sequential rows.
Database Structure is:
ID USER
1 MIKE
2 JOHN
3 MIKE
4 MIKE
5 MIKE
6 JOHN
7 JOHN
8 MIKE
I would like the query to return the below (Mike user logged on with 3 sequential ID's)
ID USER
3 MIKE
4 MIKE
5 MIKE
I'm stumped as to how to even attack this.
I'm thinking something like:
SELECT * FROM `user_log` WHERE `id` IS sequential??? and `username` == ???
Possibly a sub-select ?
What you need to do is establish a grouping identifier for each consecutive sequence of users, and then use that as a temporary table to perform a query that groups on that new grouping identifier. From that, we just grab any group that has three or more rows, and can use the min/max values of the id to show your range. We need to use variables to accomplish this.
select min(id), max(id), user
from (
select if(#prev != user, if(#prev := user, #rc := #rc +1, #rc := #rc + 1), #rc) g,
id, user
from log, (select #prev := -1, #rc := 0) q
order by id desc
) q
group by g
having count(g) >= 3;
demo here
this part: (select #prev := -1, #rc := 0) q initialises the variables for us so that we can do it in a single statement.
This alternative doesn't use variables. It creates two temporary tables, a and b, containing names and either the next id number (in table a) or the one after that (in table b), and then checks for each entry in the original table whether there is a corresponding entry in the two temporary tables with matching name.
SELECT
user_log.username, user_log.id-2 FROM user_log,
(SELECT username, id, (id+1) as nxt FROM user_log) as a,
(SELECT username, id, (id+2) as nxtnxt FROM user_log) as b
WHERE
user_log.id=a.nxt and
user_log.username=a.username and
user_log.id=b.nxtnxt and
user_log.username=b.username;
It returns name and the location (id) of the "event". It doesn't return the sequence as you requested, since that seems redundant to me. id-2 is used in the result because the structure natively returns the last id in the triplet, but the last or middle id might be just as useful depending on how you're going to use the result.
One of the things to watch out for is if you have four entries in a row with the same name, it will give you two results.
Anyone searching for longer sequences is better off using pala_'s variable method, but this method is also useful if you want to find other patterns. For example, if you wanted to find sequences like 'Mike', something, 'Mike', something, 'Mike', you could simply replace id+1 with id+2 and id+2 with id+4 in the subqueries.

PHP select the biggest `id` value from MySQL table

I have a table with more than 300 000 rows and I need to select the highest value for the column 'id'. Usually, I will do like this:
SELECT id FROM my_table ORDER BY id DESC
... but this will cause slow queries and I don't want to use it. Is there a different way to solve this situation? id is auto increment and primary key.
Later Edit: It seems my full code is quite bad written, as I deduct from your comments. Below I posted a sample of the code I'm working and the tables. Can you suggest a proper way to insert the last ID+1 of table_x in two tables (including table_x itself). I have to mention that the script will be running more than once.
TABLE_X TABLE_Y
------------ ----------
id_x | value id_y | id_x
------------ ----------
1 | A 1 | 3
2 | B
3 | C
<?php
for($i=0; $i<10; $i++){
$result_x = mysql_query('SELECT id_x FROM table_x ORDER BY id_x DESC');
$row_x = mysql_fetch_array($result_x);
$next = $row_x['id_x'] + 1;
mysql_query('INSERT INTO table_x(id_x) VALUES("'.$next.'")');
mysql_query('INSERT INTO table_y(id_x) VALUES("'.$next.'")');
}
?>
Slightly better:
SELECT id FROM my_table ORDER BY id DESC LIMIT 1
Significantly better:
SELECT MAX(id) FROM my_table
Here is the right code you have to use.
mysql_query('INSERT INTO table_x(id_x) VALUES(NULL)');
$id = mysql_insert_id();
mysql_query("INSERT INTO table_y(id_x) VALUES($id)");
Depending on the context either
SELECT id FROM my_table ORDER BY id DESC LIMIT 1
or mysql_insert_id() in PHP or (SELECT LAST_INSERT_ID()) in MySQL
As other said, you should use the MAX operator:
SELECT MAX( id ) FROM my_table ORDER BY id DESC
As a general rule of thumb, always reduce the amount of records returned from the database. The database always is faster than your application program when operating on result sets.
In case of slow queries, please give EXPLAIN a try:
EXPLAIN SELECT id FROM my_table ORDER BY id DESC
vs.
EXPLAIN SELECT MAX( id ) FROM my_table ORDER BY id DESC
EXPLAIN ask MySQL's query optimizer how it sees the query. Look in the documentation, to learn how to read its output.
PS: I really wonder, why you need MAX(id). Even if your application gets the value back from the database, it is useless: Another process might just during the next CPU cycle have inserted a new record - and MAX(id) isn't valid any more.
I guess it is slow because you retrieve all 300 000 rows. Add LIMIT 1 to the query.
SELECT id FROM my_table ORDER BY id DESC LIMIT 1
Or use the MAX() operator.

Returning random rows from mysql database without using rand()

I would like to be able to pull back 15 or so records from a database. I've seen that using WHERE id = rand() can cause performance issues as my database gets larger. All solutions I've seen are geared towards selecting a single random record. I would like to get multiples.
Does anyone know of an efficient way to do this for large databases?
edit:
Further Edit and Testing:
I made a fairly simple table, on a new database using MyISAM. I gave this 3 fields: autokey (unsigned auto number key) bigdata (a large blob) and somemore (a medium int). I then applied random data to the table and ran a series of queries using Navicat. Here are the results:
Query 1: select * from test order by rand() limit 15
Query 2: select *
from
test
join
(select round(rand()*(select max(autokey) from test)) as val from test limit 15) as rnd
on
rnd.val=test.autokey;`
(I tried both select and select distinct and it made no discernible difference)
and:
Query 3 (I only ran this on the second test):
SELECT *
FROM (
SELECT #cnt := COUNT(*) + 1,
#lim := 10
FROM test
) vars
STRAIGHT_JOIN
(
SELECT r.*,
#lim := #lim - 1
FROM test r
WHERE (#cnt := #cnt - 1)
AND RAND(20090301) < #lim / #cnt
) i
ROWS: QUERY 1: QUERY 2: QUERY 3:
2,060,922 2.977s 0.002s N/A
3,043,406 5.334s 0.001s 1.260
I would like to do more rows so I can see how query 3 scales, but at the moment, it seems as though the clear winner is query 2.
Before I wrap up this testing and declare an answer, and while I have all this data and the test environment set up, can anyone recommend any further testing?
Try:
select * from table order by rand() limit 15
Another (and possibly more efficient way) would be to join against a set of random values. This should work, if there's some contiguous integer key in the table. Here is how I would do it in postgres (My MySQL is a bit rusty)
select * from table join
(select (random()*maxid)::integer as val from generate_series(1,15)) as rnd
on rand.val=table.id;
where maxid is the highest id in table. If id has an index, then this would mean only 15 index lookup, so its very fast.
UPDATE:
Looks like there no such thing as generate_series in MySQL. My fault. We don't need it actually:
select *
from
table
join
-- this just returns 15 random numbers.
-- I need `table` here only to produce rows for rand()
(select round(rand()*(select max(id) from table)) as val from table limit 15) as rnd
on
rnd.val=table.id;
P.S. If I don't want duplicates returned, I can use (select distinct [...]) in the random generator expression.
Update: Check out the accepted answer in this question. It's pure mySQL and even deals with even distribution.
The problem with id = rand() or anything comparable in PHP is that you can't be sure whether that particular ID still exists. Therefore, you need to work with LIMIT, and that can become slow for large amounts of data.
As an alternative to that, you could try using a loop in PHP.
What the loop does is
Create a random integer number using rand(), with a scope between 0 and the number of records in the database
Query the database whether a record with that ID exists
If it exists, add the number to an array
If it doesn't, go back to step 1
End the loop when the array of random numbers contains the desired number of elements
this method could cause a lot of queries in a fragmented table, but they should be pretty fast to execute. It may be faster than LIMIT rand() in certain situations.
The LIMIT method, as outlined by #Luther, is certainly the simplest code-wise.
You could do a query with all the results or however many limited, then use mysqli_fetch_all followed by:
shuffle($a);
$a = array_slice($a, 0, 15);
For a large dataset doing
select * from table order by rand() limit 15
can be quite time and memory consuming.
If your data records happen to be numbered you can put and index on the numbering colum and do a
select * from table where no >= rand() limit 15
Or even better do the random number generation in your application and do
select * from table where no >= $rand and no <= $rand+15
If your data doesn't change too often, it might be worth to add such a numbering a column to make the selection efficient.
Assuming MySQL supports nested queries and that operations on the primary key are fast, I'd try something like
select * from table where id in (select id from table order by rand() limit 15)

Categories