I have table users which has millions of data
users have colums like id, competed_date, thirdparty_id etc etc
thirdparty_id is new column i have to update for all users
proces to find thirdparty_id
for each user there is corresponding order table from which we have to fetch latest order
from which we will get one value based on it we can calculate some rate
that rate we can search in another table thirdpary from there we will get thirdparty_id
i have did everything for individual users its working fine
now my question is how to execute this for millions of users?
i am using laravel
the process is fetch all users having thirdparty_id null
and call formula function to find id and update
but fetch all means millions of data in single query?
so if i am giving limit whats maximum limit i can give?
what are another options to execute?
queries i used
select id as userid from users where thirdparty_id is null limit {some limit}
in foreach of this
select amount from orders wehere user_id =userid order by created desc limit 1
some wiered formula with amount will give `rate`
select id from third party where start_rate > rate and end_rate<rate
update users set thirdparty_id=id where id=userid
You can use chunk: https://laravel.com/docs/9.x/eloquent#chunking-results
something like this:
public const CHUNK_SIZE = 5000;
...
User::whereNull('thirdparty_id')->chunk(self::CHUNK_SIZE, function (Collection $users) {
// $users is collection contains 5000 users, you can do mass update them or something you want
});
For this type of problematic, i will use a store procedure to do the update instead of using language like PHP.
Doing this the migration store proc run inside MySQL Server and you throw away all problems like PHP execution time, memory limits, apache timeout, etc.
Related
I have a database with 1 million entries and 3 programs that process something with the data. The 3 programs get the data over an api call. For example 100 entries for each request. What is the best way to prevent that the programs get the same 100 entries?
I tried to update an id per program to database, but that don't solve the problem. Because if the 3 programs request data and update from one is still running it can be that other program get same data.
I have tried LOCK TABLES but it is the maintable in my databse. So all other processes from php and so also slow down extremly. Because table is totaly locked every few minutes.
How about an app_id column.
App 1 does this...
UPDATE table SET app_id=3 where app_id=NULL LIMIT 100
SELECT * FROM table WHERE app_id=1 LIMIT 100
----process and when done----
UPDATE table SET app_id=NULL where app_id=1 LIMIT 100
App 2 does this...
UPDATE table SET app_id=2 where app_id=NULL LIMIT 100
SELECT * FROM table WHERE app_id=2 LIMIT 100
----process and when done----
UPDATE table SET app_id=NULL where app_id=2 LIMIT 100
You can have unlimited apps and they should only get their own records. This kinda hits your DB harder. Maybe you could use a combo of memcache/stored procedures to limit db load depending on your architecture.
Another option might be able to handle this with the API. You could create a system wide global variable and store what app has what records. Anytime the API is called, it will look at the system wide variable to know if it can obtain data or from what record to start with. That might be easier on the DB.
I'm working on a project that requires back-end service. I am using MySQL and php scripts to achieve communication with server side. I would like to add a new feature on the back-end and that is the ability to generate automatically a table with 3 'lucky' members from a table_members every day. In other words, I would like MySQL to pick 3 random rows from a table and add these rows to another table (if is possible). I understand that, I can achieve this if manually call RAND() function on that table but ... will be painful!
There is any way to achieve the above?
UPDATE:
Here is my solution on this after comments/suggestions from other users
CREATE EVENT `draw` ON SCHEDULE EVERY 1 DAY STARTS '2013-02-13 10:00:00' ON COMPLETION NOT PRESERVE ENABLE DO
INSERT INTO tbl_lucky(`field_1`)
SELECT u_name
FROM tbl_members
ORDER BY RAND()
LIMIT 3
I hope this is helpful and to others.
You can use the INSERT ... SELECT and select 3 rows ORDER BY RAND() with LIMIT 3
For more information about the INSERT ... SELECT statement - see
It's also possible to automate this every day job with MySQL Events(available since 5.1.6)
I have a web page where people are able to post a single number between 0 and 10.
There is like a lotto single number generation once daily. I want my PHP script to check on the the posted numbers of all the users and assign a score of +1 or -1 to the relative winners (or losers).
The problem is that once I query the DB for the list of the winning users, I want to update their "score" field (in "users" table). I was thinking of a loop like this (pseudocode)
foreach winner{
update score +1
}
but this would mean that if there are 100 winners, then there will be 100 queries. Is there a way to do some sort of batch inserting with one single query?
Thanks in advance.
I'll assume you are using a database, with sql, and suggest that would probably want to do something like
UPDATE `table` SET `score`=`score`+1 WHERE `number`=3;
and the corresponding -1 for losers (strange, can't see a reason to -1 them).
Without more details though, I can't be of further help.
You didn't specify how the numbers were stored. If there is a huge number of people posting, a good option is to use a database to store their numbers.
You can have for example a table called lotto with three fields: posted_number, score and email. Create an (non-unique!) index on the posted_number field.
create table lotto (posted_number integer(1) unsigned, score integer, email varchar(255), index(posted_number));
To update their score you can execute two queries:
update lotto set score = score+1 where posted_number = <randomly drawn number here>
update lotto set score = score-1 where posted_number = <randomly drawn number here>
Let's just assume we have a datatable named posts and users.
Obviously, users contain the data of the gambler (with a convenient id field and points for the number of points they have), and posts contain the post_id ID field for the row, user_id, which is the ID of the user and value, the posted number itself.
Now you only need to implement the following SQL queries into your script:
UPDATE users INNER JOIN posts ON users.id = posts.user_id SET users.points = (users.points + 1)
WHERE posts.value = 0;
Where 0 at the end is to be replaced with the randomly drawn number.
What will this query do? With the INNER JOIN construct, it will create a link between the two tables. Automatically, if posts.value matches our number, it will link posts.user_id to users.id, knowing which user has to get his/her points modified. If someone gambled 0, and his ID (posts.user_id) is 8170, the points field will update for the user having user.id = 8170.
If you alter the query to make it (users.points - 1) and WHERE posts.value != 0, you will get the non-winners having one point deducted. It can be tweaked as much as you want.
Just be careful! After each daily draw, the posts table needs to be truncated or archived.
Another option would be storing the timestamp (time() in PHP) of the user betting the number, and when executing, checking against the stored timestamp... whether it is in between the beginning and the end of the current day or not.
Just a tip: you can use graphical database software (like Microsoft Access or LibreOffice Base) to have your JOINs and such simulated on a graphical display. It makes modelling such questions a lot easier for beginners. If you don't want desktop-installed software, trying out an installation of phpMyAdmin is another solution too.
Edit:
Non-relational databases
If you are to use non-relational databases, you will first need to fetch all the winner IDs with:
SELECT user_id FROM posts WHERE value=0;
This will give you a result of multiple rows. Now, you will need to go through this result, one-by-one, and executing the following query:
UPDATE users SET points=(users.points + 1) WHERE id=1;
(0 is the drawn winning number, 1 is the concurrent id of the user to update.)
Without using the relation capabilities of MySQL, but using a MySQL database, the script would look like this:
<?php
$number = 0; // This is the winning number we have drawn
$result = mysql_query("SELECT user_id FROM posts WHERE number=" .$number);
while ( $row = mysql_fetch_assoc($result) )
{
$curpoints_result = mysql_query("SELECT points FROM users WHERE user_id=" .$row['user_id']);
$current_points = mysql_fetch_assoc($curpoints_results);
mysql_query("UPDATE users SET points=" .($current_points['points'] + 1). " WHERE user_id=" .$row['user_id']);
}
?>
The while construct make this loop to run until every row of the result (list of winners) is updated.
Oh and: I know MySQL is a relational database, but it is just what it is: an example.
What I try to do is that I have a table to keep user information (one row for each user), and I run a php script daily to fill in information I get from users. For one column say column A, if I find information I'll fill it in, otherwise I don't touch it so it remains NULL. The reason is to allow them to be updated in the next update when the information might possibly be available.
The problem is that I have too many rows to update, if I blindly SELECT all rows that's with column A as NULL then the result won't fit into memory. If I SELECT 5000 at a time, then in the next SELECT 5000 I could get the same rows that didn't get updated last time, which would be an infinite loop...
Does anyone have any idea of how to do this? I don't have ID columns so I can't just say SELECT WHERE ID > X... Is there a solution (either on the MySQL side or on the php side) without modifying the table?
You'll want to use the LIMIT and OFFSET keywords.
SELECT [stuff] LIMIT 5000 OFFSET 5000;
LIMIT indicates the number of rows to return, and OFFSET indicates how far along the table is read from.
I have a website that has user ranking as a central part, but the user count has grown to over 50,000 and it is putting a strain on the server to loop through all of those to update the rank every 5 minutes. Is there a better method that can be used to easily update the ranks at least every 5 minutes? It doesn't have to be with php, it could be something that is run like a perl script or something if something like that would be able to do the job better (though I'm not sure why that would be, just leaving my options open here).
This is what I currently do to update ranks:
$get_users = mysql_query("SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC");
$i=0;
while ($a = mysql_fetch_array($get_users)) {
$i++;
mysql_query("UPDATE users SET month_rank = '$i' WHERE id = '$a[id]'");
}
UPDATE (solution):
Here is the solution code, which takes less than 1/2 of a second to execute and update all 50,000 rows (make rank the primary key as suggested by Tom Haigh).
mysql_query("TRUNCATE TABLE userRanks");
mysql_query("INSERT INTO userRanks (userid) SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC");
mysql_query("UPDATE users, userRanks SET users.month_rank = userRanks.rank WHERE users.id = userRanks.id");
Make userRanks.rank an autoincrementing primary key. If you then insert userids into userRanks in descending rank order it will increment the rank column on every row. This should be extremely fast.
TRUNCATE TABLE userRanks;
INSERT INTO userRanks (userid) SELECT id FROM users WHERE status = '1' ORDER BY month_score DESC;
UPDATE users, userRanks SET users.month_rank = userRanks.rank WHERE users.id = userRanks.id;
My first question would be: why are you doing this polling-type operation every five minutes?
Surely rank changes will be in response to some event and you can localize the changes to a few rows in the database at the time when that event occurs. I'm pretty certain the entire user base of 50,000 doesn't change rankings every five minutes.
I'm assuming the "status = '1'" indicates that a user's rank has changed so, rather than setting this when the user triggers a rank change, why don't you calculate the rank at that time?
That would seem to be a better solution as the cost of re-ranking would be amortized over all the operations.
Now I may have misunderstood what you meant by ranking in which case feel free to set me straight.
A simple alternative for bulk update might be something like:
set #rnk = 0;
update users
set month_rank = (#rnk := #rnk + 1)
order by month_score DESC
This code uses a local variable (#rnk) that is incremented on each update. Because the update is done over the ordered list of rows, the month_rank column will be set to the incremented value for each row.
Updating the users table row by row will be a time consuming task. It would be better if you could re-organise your query so that row by row updates are not required.
I'm not 100% sure of the syntax (as I've never used MySQL before) but here's a sample of the syntax used in MS SQL Server 2000
DECLARE #tmp TABLE
(
[MonthRank] [INT] NOT NULL,
[UserId] [INT] NOT NULL,
)
INSERT INTO #tmp ([UserId])
SELECT [id]
FROM [users]
WHERE [status] = '1'
ORDER BY [month_score] DESC
UPDATE users
SET month_rank = [tmp].[MonthRank]
FROM #tmp AS [tmp], [users]
WHERE [users].[Id] = [tmp].[UserId]
In MS SQL Server 2005/2008 you would probably use a CTE.
Any time you have a loop of any significant size that executes queries inside, you've got a very likely antipattern. We could look at the schema and processing requirement with more info, and see if we can do the whole job without a loop.
How much time does it spend calculating the scores, compared with assigning the rankings?
Your problem can be handled in a number of ways. Honestly more details from your server may point you in a totally different direction. But doing it that way you are causing 50,000 little locks on a heavily read table. You might get better performance with a staging table and then some sort of transition. Inserts into a table no one is reading from are probably going to be better.
Consider
mysql_query("delete from month_rank_staging;");
while(bla){
mysql_query("insert into month_rank_staging values ('$id', '$i');");
}
mysql_query("update month_rank_staging src, users set users.month_rank=src.month_rank where src.id=users.id;");
That'll cause one (bigger) lock on the table, but might improve your situation. But again, that may be way off base depending on the true source of your performance problem. You should probably look deeper at your logs, mysql config, database connections, etc.
Possibly you could use shards by time or other category. But read this carefully before...
You can split up the rank processing and the updating execution. So, run through all the data and process the query. Add each update statement to a cache. When the processing is complete, run the updates. You should have the WHERE portion of the UPDATE reference a primary key set to auto_increment, as mentioned in other posts. This will prevent the updates from interfering with the performance of the processing. It will also prevent users later in the processing queue from wrongfully taking advantage of the values from the users who were processed before them (if one user's rank affects that of another). It also prevents the database from clearing out its table caches from the SELECTS your processing code does.