How to check/update columns in a database contains about million rows?

How to check/update columns in a database contains about million rows? - php

I have MYSQL Database that contains about one Million (1,000,000) rows , I want to check all the rows and update some according to a condition , So for example I run a SQL statement like this:
select messageid from messages where messageid !=""
Then I fetch all the IDs and store them in a variable:
$existMessages;
Then I generate a 4 characters string 0-9a-z :
function generateRandomString($length = 4) {
return substr(str_shuffle(str_repeat($y='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', ceil($length/strlen($y)) )),1,$length);
}
Then I update the existing IDs with the generated strings , After checking that the generated IDs are unique.
This process becomes slower and takes the whole CPU as the rows are increasing.
Is there is a better way to do that ? Like using SQL statements directly in MYSQL? Or what to do ?

You should first put a unique or primary constraint on the column you want to be unique.
After that you can execute the update command
UPDATE TABLE_NAME SET COLMUN_NAME=generateRandomString() WHERE messageid !="";

You could use SQL directly, and create a simple hash based upon some data in the row, for instance:
UPDATE table_name SET messageid = MD5(messageid) WHERE messageid !="";
You might want to do that in batches, so add a LIMIT to the statement, ie. LIMIT 0,1000 to do bunch at a time.

You can use below update clause. This will pick 4 random letters and numbers for both lower and upper case and 0-9.
update table_name cross join (select
#chars:='1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIKLMNOPQRSTUVWXYZ') tab
set messageid = concat(substring(#chars, floor(rand()*61) + 1, 1),
substring(#chars, floor(rand()*61) + 1, 1),
substring(#chars, floor(rand()*61) + 1, 1),
substring(#chars, floor(rand()*61) + 1, 1)
)
where messageid !="";

I'll eliminate the need for the 'check', and make it faster.
There are not many more than a million such 4-character strings. So, there will be an annoying number of duplicates if you 'randomly' generate them.
Instead, I recommend
Generate all of them (or a million of them)
Shuffle them
Apply them to your table.
CONV(x, 10, 36) will generate a base-36 (0-9A-Z) value from x. But, The following may be better...
Build a table with 36 rows of 0..9,a..z.
CROSS JOIN it to itself 4 times to generate all 36^4 combos.
ORDER BY RAND() will shuffle them without leading to dups.
A multi-table UPDATE will let you copy the 4-char string from one table to another.

Related

Update Current Row in MySQL Loop

I have a MySQL table with over 16 million rows and there is no primary key. Whenever I try to add one, my connection crashes. I have tried adding one as an auto increment in PHPMyAdmin and in shell but the connection is always lost after about 10 minutes.
What I would like to do is loop through the table's rows in PHP so I can limit the number of results and with each returned row add an auto-incremented ID number. Since the number of impacted rows would be reduced by reducing the load on the MySQL query, I won't lose my connection.
I want to do something like
SELECT * FROM MYTABLE LIMIT 1000001, 2000000;
Then, in the loop, update the current row
UPDATE (current row) SET ID='$i++'
How do I do this?
Note: the original data was given to me as a txt file. I don't know if there are duplicates but I cannot eliminate any rows. Also, no rows will be added. This table is going to be used only for querying purposes. When I have added indexes, however, there were no problems.

I suspect you are trying to use phpmyadmin to add the index. As handy as it is, it is a PHP script and is limited to the same resources as any PHP script on your server, typically 30-60 seconds run time, and a limited amount of ram.
Suggest you get the mysql query you need to add the index, then use SSH to shell in, and use command line MySQL to add your indexes.

If you don't have duplicate rows then the following way might shed some light:
Suppose you want to update the auto incremented value for first 10000 rows.
UPDATE
MYTABLE
INNER JOIN
(SELECT
*,
#rn := #rn + 1 AS row_number
FROM MYTABLE,(SELECT #rn := 0) var
ORDER BY SOME_OF_YOUR_FIELD
LIMIT 0,10000 ) t
ON t.field1 = MYTABLE.field1 AND t.field2 = MYTABLE.field2 AND .... t.fieldN = MYTABLE.fieldN
SET MYTABLE.ID = t.row_number;
For next 10000 rows just need to change two things:
(SELECT #rn := 10000) var
LIMIT 10000,10000
Repeat..
Note: ORDER BY SOME_OF_YOUR_FIELD is important otherwise you would get results in random order. Better create a function which might take limit,offset as parameter and do this job. Since you need to repeat the process.
Explanation:
The idea is to create a temporary table(t) having N number of rows and assigning a unique row number to each of the row. Later make an inner join between your main table MYTABLE and this temporary table t ON matching all the fields and then update the ID field of the corresponding row(in MYTABLE) with the incremented value(in this case row_number).
Another IDEA:
You may use multithreading in PHP to do this job.
Create N threads.
Assign each thread a non overlapping region (1 to 10000, 10001 to
20000 etc) like the above query.
Caution: The query will get slower in higher offset.

Repeated Insert copies on ID

We have records with a count field on an unique id.
The columns are:
mainId = unique
mainIdCount = 1320 (this 'views' field gets a + 1 when the page is visited)
How can you insert all these mainIdCount's as seperate records in another table IN ANOTHER DBASE in one query?
Yes, I do mean 1320 times an insert with the same mainId! :-)
We actually have records that go over 10,000 times an id. It just has to be like this.
This is a weird one, but we do need the copies of all these (just) counts like this.

The most straightforward way to this is with a JOIN operation between your table, and another row source that provides a set of integers. We'd match each row from our original table to as many rows from the set of integer as needed to satisfy the desired result.
As a brief example of the pattern:
INSERT INTO newtable (mainId,n)
SELECT t.mainId
, r.n
FROM mytable t
JOIN ( SELECT 1 AS n
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
) r
WHERE r.n <= t.mainIdCount
If mytable contains row mainId=5 mainIdCount=4, we'd get back rows (5,1),(5,2),(5,3),(5,4)
Obviously, the rowsource r needs to be of sufficient size. The inline view I've demonstrated here would return a maximum of five rows. For larger sets, it would be beneficial to use a table rather than an inline view.
This leads to the followup question, "How do I generate a set of integers in MySQL",
e.g. Generating a range of numbers in MySQL
And getting that done is a bit tedious. We're looking forward to an eventual feature in MySQL that will make it much easier to return a bounded set of integer values; until then, having a pre-populated table is the most efficient approach.

mysql order by rand() performance issue and solution

i was using order by rand() to generate random rows from database without any issue but i reaalised that as the database size increase this rand() causes heavy load on server so i was looking for an alternative and i tried by generating one random number using php rand() function and put that as id in mysql query and it was very very fast since mysql was knowing the row id
but the issue is in my table all numbers are not availbale.for example 1,2,5,9,12 like that.
if php rand() generate number 3,4 etc the query will be blank as there is no id with number 3 , 4 etc.
what is the best way to generate random numbers preferable from php but it should generate the available no in that table so it must check that table.please advise.
$id23=rand(1,100000000);
SELECT items FROM tablea where status='0' and id='$id23' LIMIT 1
the above query is fast but generate sometimes no which is not availabel in database.
SELECT items FROM tablea where status=0 order by rand() LIMIT 1
the above query is too slow and causes heavy load on server

First of, all generate a random value from 1 to MAX(id), not 100000000.
Then there are at least a couple of good solutions:
Use > not =
SELECT items FROM tablea where status='0' and id>'$id23' LIMIT 1
Create an index on (status,id,items) to make this an index-only query.
Use =, but just try again with a different random value if you don't find a hit. Sometimes it will take several tries, but often it will take only one try. The = should be faster since it can use the primary key. And if it's faster and gets it in one try 90% of the time, that could make up for the other 10% of the time when it takes more than one try. Depends on how many gaps you have in your id values.

Use your DB to find the max value from the table, generate a random number less than or equal to that value, grab the first row in which the id is greater than or equal to your random number. No PHP necessary.
SELECT items
FROM tablea
WHERE status = '0' and
id >= FLOOR(1 + RAND() * (SELECT MAX(id) FROM tablea))
LIMIT 1

You are correct, ORDER BY RAND() is not good solution if you are dealing with large datasets. Depending how often it needs to be randomized, what you can do is generate a column with a random number and then update that number at some predefined interval.
You would take that column and use it as your sort index. This works well for a heavy read environment and produces predicable random order for a certain period of time.

A possible solution is to use limit:
$id23=rand(1,$numberOfRows);
SELECT items FROM tablea where status='0' LIMIT $id23 1
This wont produce any missed rows (but as hek2mgl mentioned) requires knowing the number of rows in the select.

Getting random results from large tables

I'm trying to get 4 random results from a table that holds approx 7 million records. Additionally, I also want to get 4 random records from the same table that are filtered by category.
Now, as you would imagine doing random sorting on a table this large causes the queries to take a few seconds, which is not ideal.
One other method I thought of for the non-filtered result set would be to just get PHP to select some random numbers between 1 - 7,000,000 or so and then do an IN(...) with the query to only grab those rows - and yes, I know that this method has a caveat in that you may get less than 4 if a record with that id no longer exists.
However, the above method obviously will not work with the category filtering as PHP doesn't know which record numbers belong to which category and hence cannot select the record numbers to select from.
Are there any better ways I can do this? Only way I can think of would be to store the record id's for each category in another table and then select random results from that and then select only those record ID's from the main table in a secondary query; but I'm sure there is a better way!?

You could of course use the RAND() function on a query using a LIMIT and WHERE (for the category). That however as you pointed out, entails a scan of the database which takes time, especially in your case due to the volume of data.
Your other alternative, again as you pointed out, to store id/category_id in another table might prove a bit faster but again there has to be a LIMIT and WHERE on that table which will also contain the same amount of records as the master table.
A different approach (if applicable) would be to have a table per category and store in that the IDs. If your categories are fixed or do not change that often, then you should be able to use that approach. In that case you will effectively remove the WHERE from the clause and getting a RAND() with a LIMIT on each category table would be faster since each category table will contain a subset of records from your main table.
Some other alternatives would be to use a key/value pair database just for that operation. MongoDb or Google AppEngine can help with that and are really fast.
You could also go towards the approach of a Master/Slave in your MySQL. The slave replicates content in real time but when you need to perform the expensive query you query the slave instead of the master, thus passing the load to a different machine.
Finally you could go with Sphinx which is a lot easier to install and maintain. You can then treat each of those category queries as a document search and let Sphinx randomize the results. This way you offset this expensive operation to a different layer and let MySQL continue with other operations.
Just some issues to consider.

Working off your random number approach
Get the max id in the database.
Create a temp table to store your matches.
Loop n times doing the following
Generate a random number between 1 and maxId
Get the first record with a record Id greater than the random number and insert it into your temp table
Your temp table now contains your random results.
Or you could dynamically generate sql with a union to do the query in one step.
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
UNION
SELECT * FROM myTable WHERE ID > RAND() AND Category = zzz LIMIT 1
Note: my sql may not be valid, as I'm not a mySql guy, but the theory should be sound

First you need to get number of rows ... something like this
select count(1) from tbl where category = ?
then select a random number
$offset = rand(1,$rowsNum);
and select a row with offset
select * FROM tbl LIMIT $offset, 1
in this way you avoid missing ids. The only problem is you need to run second query several times. Union may help in this case.

For MySQl you can use
RAND()
SELECT column FROM table
ORDER BY RAND()
LIMIT 4

Best way to have random order of elements in the table

I have got table with 300 000 rows. There is specially dedicated field (called order_number) in this table to story the number, which is later used to present the data from this table ordered by order_number field. What is the best and easy way to assign the random number/hash for each of the records in order to select the records ordered by this numbers? The number of rows in the table is not stable and may grow to 1 000 000, so the rand method should take it into the account.

Look at this tutorial on selecting random rows from a table.

If you don't want to use MySQL's built in RAND() function you could use something like this:
select max(id) from table;
$random_number = ...
select * from table where id > $random_number;
That should be a lot quicker.

UPDATE table SET order_number = sha2(id)
or
UPDATE table SET order_number = RAND(id)
sha2() is more random than RAND().

I know you've got enough answers but I would tell you how we did it in our company.
The first approach we use is with additional column for storing random number generated for each record/row. We have INDEX on this column, allowing us to order records by it.
id, name , ordering
1 , zlovic , 14
2 , silvas , 8
3 , jouzel , 59
SELECT * FROM table ORDER BY ordering ASC/DESC
POS: you have index and ordering is very fast
CONS: you will depend on new records to keep the randomization of the records
Second approach we have used is what Karl Roos gave an his answer. We retrieve the number of records in our database and using the > (greater) and some math we retrieve rows randomized. We are working with binary ids thus we need to keep autoincrement column to avoid random writings in InnoDB, sometimes we perform two or more queries to retrieve all of the data, keeping it randomized enough. (If you need 30 random items from 1,000,000 records you can run 3 simple SELECTs each for 10 items with different offset)
Hope this helps you. :)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to check/update columns in a database contains about million rows? - php

You should first put a unique or primary constraint on the column you want to be unique. After that you can execute the update command UPDATE TABLE_NAME SET COLMUN_NAME=generateRandomString() WHERE messageid !="";

You could use SQL directly, and create a simple hash based upon some data in the row, for instance: UPDATE table_name SET messageid = MD5(messageid) WHERE messageid !=""; You might want to do that in batches, so add a LIMIT to the statement, ie. LIMIT 0,1000 to do bunch at a time.

Related

Update Current Row in MySQL Loop

Repeated Insert copies on ID

mysql order by rand() performance issue and solution

Getting random results from large tables

Best way to have random order of elements in the table

Categories

Resources