Selecting random rows from a table automatically

Selecting random rows from a table automatically - php

I'm working on a project that requires back-end service. I am using MySQL and php scripts to achieve communication with server side. I would like to add a new feature on the back-end and that is the ability to generate automatically a table with 3 'lucky' members from a table_members every day. In other words, I would like MySQL to pick 3 random rows from a table and add these rows to another table (if is possible). I understand that, I can achieve this if manually call RAND() function on that table but ... will be painful!
There is any way to achieve the above?
UPDATE:
Here is my solution on this after comments/suggestions from other users
CREATE EVENT `draw` ON SCHEDULE EVERY 1 DAY STARTS '2013-02-13 10:00:00' ON COMPLETION NOT PRESERVE ENABLE DO
INSERT INTO tbl_lucky(`field_1`)
SELECT u_name
FROM tbl_members
ORDER BY RAND()
LIMIT 3
I hope this is helpful and to others.

You can use the INSERT ... SELECT and select 3 rows ORDER BY RAND() with LIMIT 3
For more information about the INSERT ... SELECT statement - see
It's also possible to automate this every day job with MySQL Events(available since 5.1.6)

Related

Execute a formula in mysql for millions of data

I have table users which has millions of data
users have colums like id, competed_date, thirdparty_id etc etc
thirdparty_id is new column i have to update for all users
proces to find thirdparty_id
for each user there is corresponding order table from which we have to fetch latest order
from which we will get one value based on it we can calculate some rate
that rate we can search in another table thirdpary from there we will get thirdparty_id
i have did everything for individual users its working fine
now my question is how to execute this for millions of users?
i am using laravel
the process is fetch all users having thirdparty_id null
and call formula function to find id and update
but fetch all means millions of data in single query?
so if i am giving limit whats maximum limit i can give?
what are another options to execute?
queries i used
select id as userid from users where thirdparty_id is null limit {some limit}
in foreach of this
select amount from orders wehere user_id =userid order by created desc limit 1
some wiered formula with amount will give `rate`
select id from third party where start_rate > rate and end_rate<rate
update users set thirdparty_id=id where id=userid

You can use chunk: https://laravel.com/docs/9.x/eloquent#chunking-results
something like this:
public const CHUNK_SIZE = 5000;
...
User::whereNull('thirdparty_id')->chunk(self::CHUNK_SIZE, function (Collection $users) {
// $users is collection contains 5000 users, you can do mass update them or something you want
});

For this type of problematic, i will use a store procedure to do the update instead of using language like PHP.
Doing this the migration store proc run inside MySQL Server and you throw away all problems like PHP execution time, memory limits, apache timeout, etc.

Long polling with PHP and jQuery - issue with update and delete

I wrote a small script which uses the concept of long polling.
It works as follows:
jQuery sends the request with some parameters (say lastId) to php
PHP gets the latest id from database and compares with the lastId.
If the lastId is smaller than the newly fetched Id, then it kills the
script and echoes the new records.
From jQuery, i display this output.
I have taken care of all security checks. The problem is when a record is deleted or updated, there is no way to know this.
The nearest solution i can get is to count the number of rows and match it with some saved row count variable. But then, if i have 1000 records, i have to echo out all the 1000 records which can be a big performance issue.
The CRUD functionality of this application is completely separated and runs in a different server. So i dont get to know which record was deleted.
I don't need any help coding wise, but i am looking for some suggestion to make this work while updating and deleting.
Please note, websockets(my fav) and node.js is not an option for me.

Instead of using a certain ID from your table, you could also check when the table itself was modified the last time.
SQL:
SELECT UPDATE_TIME
FROM information_schema.tables
WHERE TABLE_SCHEMA = 'yourdb'
AND TABLE_NAME = 'yourtable';
If successful, the statement should return something like
UPDATE_TIME
2014-04-02 11:12:15
Then use the resulting timestamp instead of the lastid. I am using a very similar technique to display and auto-refresh logs, works like a charm.
You have to adjust the statement to your needs, and replace yourdb and yourtable with the values needed for your application. It also requires you to have access to information_schema.tables, so check if this is available, too.
Two alternative solutions:
If the solution described above is too imprecise for your purpose (it might lead to issues when the table is changed multiple times per second), you might combine that timestamp with your current mechanism with lastid to cover new inserts.
Another way would be to implement a table, in which the current state is logged. This is where your ajax requests check the current state. Then generade triggers in your data tables, which update this table.

You can get the highest ID by
SELECT id FROM table ORDER BY id DESC LIMIT 1
but this is not reliable in my opinion, because you can have ID's of 1, 2, 3, 7 and you insert a new row having the ID 5.
Keep in mind: the highest ID, is not necessarily the most recent row.
The current auto increment value can be obtained by
SELECT AUTO_INCREMENT FROM information_schema.tables
WHERE TABLE_SCHEMA = 'yourdb'
AND TABLE_NAME = 'yourtable';
Maybe a timestamp + microtime is an option for you?

Server-side Pagination: total row count for expensive query?

I have a simple query using server-side pagination. The issue is the WHERE Clause makes a call to an expensive function and the functions argument is the user input, eg. what the user is searching for.
SELECT
*
FROM
( SELECT /*+ FIRST_ROWS(numberOfRows) */
query.*,
ROWNUM rn FROM
(SELECT
myColumns
FROM
myTable
WHERE expensiveFunction(:userInput)=1
ORDER BY id ASC
) query
)
WHERE rn >= :startIndex
AND ROWNUM <= :numberOfRows
This works and is quick assuming numberOfRows is small. However I would also like to have the total row count of the query. Depending on the user input and database size the query can take up to minutes. My current approach is to cache this value but that still means the user needs to wait minutes to see first result.
The results should be displayed in the Jquery datatables plugin which greatly helps with things like serer-side paging. It however requires the server to return a value for the total records to correctly display paging controls.
What would be the best approach? (Note: PHP)
I thought if returning first page immediately with a fake (better would be estimated) row count. After the page is loaded do an ajax call to a method that determines total row count of the query (what happens if the user pages during that time?) and then update the faked/estimated total row count.
However I have no clue how to do an estimate. I tried count(*) * 1000 with SAMPLE (0.1) but for whatever reason that actually takes longer than the full count query. Also just returning a fake/random value seems a bit hacky too. It would need to be bigger than 1 page size so that the "Next" button is enabled.
Other ideas?

One way to do it is as I said in the comments, to use a 'countless' approach. Modify the client side script in such a way that the Next button is always enabled and fetch the rows until there are none, then disable the Next button. You can always add a notification message to say that there are no more rows so it will be more user friendly.
Considering that you are expecting a significant amount of records, I doubt that the user will paginate through all the results.
Another way is to schedule a cron job that will do the counting of the records in the background and store that result in a table called totals. The running intervals of the job should be set up based on the frequency of the inserts / deletetions.
Then in the frontend, just use the count previously stored in totals. It should make a decent aproximation of the amount.

Depends on your DB engine.
In mysql, solution looks like this :
mysql> SELECT SQL_CALC_FOUND_ROWS * FROM tbl_name
-> WHERE id > 100 LIMIT 10;
mysql> SELECT FOUND_ROWS();
Basically, you add another attribute on your select (SQL_CALC_FOUND_ROWS) which tells mysql to count the rows as if limit clause was not present, while executing the query, while FOUND_ROWS actually retrieves that number.
For oracle, see this article :
How can I perform this query in oracle
Other DBMS might have something similar, but I don't know.

The Matrix Part 4 and MySql dilemma

I want to do the following:
Basically I have the following design for an events table:
event:
id
code
date
When a new event is created, I want to do the following:
Check if there are any codes already available. A code is available if the date has already passed.
$code1 = select code from event where date_add(date, INTERVAL 7 day) < NOW() AND code NOT IN (select code from event where date_start > NOW()) limit 1
If a code is available, get that code and use that for the new event.
insert into event (code, date) VALUES($code1, NOW())
If a code is not available, then generate a new code.
The problem is I am afraid that when 2 events are created at the same time, they both get the same code. How can I prevent that?
The goal is to assign a code from 1-100 for each event. So because 1-100 is only 100 numbers, I need to recycle codes so that is why I check for old codes to assign to new events. I want to be able to assign codes from 1 to 100 to events by recycling old codes. I don't want to assign the same code to 2 different events.

You ought to lock the table while you work:
LOCK TABLE event WRITE;
SELECT MIN(code) FROM event WHERE date_add(date_end, INTERVAL 7 day) < NOW()
AND code NOT IN (SELECT code FROM event WHERE date_start > NOW());
...
INSERT INTO event ...
UNLOCK TABLES;
You might also keep a table with all active codes (in this case all numbers from 1 to 100). In this case you can do the INSERT with a single statement:
INSERT INTO event ( code, <other fields> )
SELECT MIN(codes.code) AS code, <other values> FROM codes
LEFT JOIN event ON ( codes.code = event code
AND ( event.date_end > DATE_ADD(now(), INTERVAL 7 day)
OR event.date_start >= NOW()) )
WHERE event.code IS NULL;
This selects all codes that are not used in "active" events, and inserts the smallest of them into event (add other fields as needed).
You could also employ a subSELECT ( SELECT DISTINCT code FROM event ) in place of the codes table, but in that case you would only select codes that you already used at least once; any "new" codes would be ignored.
A side effect of the above logic is that larger codes gets reused less often, i.e., if you have twenty active events, chances are that they're using codes from 1 to 20. If you instead want to recycle codes evenly, you can for example SELECT ... ORDER BY RAND() LIMIT 1.

Ok, this is a long shot, and I'm not a database person, so take my suggestion with a grain of salt and do your own research ...
I think what you want is serializable transactions. Basically, you first ask MySQL to make transaction isolation serializable using SET TRANSACTION ISOLATION LEVEL SERIALIZABLE. Then, before your three steps, you begin a transaction by running START TRANSACTION, and after your three steps you run COMMIT.
Some further reading:
http://en.wikipedia.org/wiki/Database_transaction
http://en.wikipedia.org/wiki/Serializability
http://dev.mysql.com/doc/refman/5.0/en/sql-syntax-transactions.html
I am not sure about the overhead the serializability incurs. Hopefully someone more familiar with databases can chip in.
If you can work around your problem, I'd personally rather do that than using transactions.

Daily/Weekly/Monthly Highscores

I have an online highscores made with php + mysql but it currently shows the All Time highscores, I want to add Daily/Weekly/Monthly to that and I was wondering what would be the best way todo that?
My current thought is to add 3 new tables and then have the data inserted into each of them, and then having a cron which would run at the appropriate times to delete the data from each of the tables.
Is there any better way I could do this?
Another thing, I want to have it so the page would be highscores.php?t=all t=daily, etc. How would I make it so that the page changed the query depending on that value?
Thanks.

Use one table and add a column with the date of the highscore. Then have three different queries for each timespan, e.g.
SELECT ... FROM highscores WHERE date>"05-12-2011";
If you want to have a generic version without the need to have a fixed date, use this one:
SELECT ...
FROM highscores
WHERE date >= curdate() - INTERVAL DAYOFWEEK(curdate())+6 DAY;

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Selecting random rows from a table automatically - php

You can use the INSERT ... SELECT and select 3 rows ORDER BY RAND() with LIMIT 3 For more information about the INSERT ... SELECT statement - see It's also possible to automate this every day job with MySQL Events(available since 5.1.6)

Related

Execute a formula in mysql for millions of data

Long polling with PHP and jQuery - issue with update and delete

Server-side Pagination: total row count for expensive query?

The Matrix Part 4 and MySql dilemma

Daily/Weekly/Monthly Highscores

Categories

Resources