I would like like to delete a bulk of data. this table have approximately 11207333 now
However I have several method to delete it.
The data that will be deleted is approximately 300k. I have two method to do this but unsure which one perform faster.
My first option:
$start_date = "2011-05-01 00:00:00";
$end_date = "2011-05-31 23:59:59";
$sql = "DELETE FROM table WHERE date>='$start_date' and date <='$end_date'";
$mysqli->query($sql);
printf("Affected rows (DELETE): %d\n", $mysqli->affected_rows);
second option:
$query = "SELECT count(*) as count FROM table WHERE date>='$start_date' and date <='$end_date'";
$result = $mysqli->query($query);
$row = $result->fetch_array(MYSQLI_ASSOC);
$total = $row['count'];
if ($total > 0) {
$query = "SELECT * FROM table WHERE date>='$start_date' and date <='$end_date' LIMIT 0,$total";
$result = $mysqli->query($query);
while ($row = $result->fetch_array(MYSQLI_ASSOC)) {
$table_id = $row['table_id']; // primary key
$query = "DELETE FROM table where table_id = $table_id LIMIT 0,$total";
$mysqli->query($query);
}
}
This table data is displayed to client to see, I afraid that if the deletion go wrong and it will affect my client.
I was wondering are there any method better than mine.
If you guys need more info from me just let me know.
Thank you
In my opinion, the first option is faster.
The second option contains looping which I think will be slower because it keeps looping several times looking for your table id.
If you did not provide the wrong start and end date, I think you're safe either option, but option 1 is faster in my opinion.
and yea, i dont see any deletion in option 2, but I assume you have it in mind but using looping method.
Option one is your best bet.
If you are afraid something will "go wrong" you could protect yourself by backing up the data first, exported the rows you plan to delete, or implementing a logical delete flag.
Assuming that there is indeed a DELETE query in it, the second method is not only slower, it may break if another connection deletes one of the rows you intend to delete in your while loop, before it had a chance to do it. For it to work, you need to wrap it in a transaction:
mysqli_query("START TRANSACTION;");
# your series of queries...
mysql_query("COMMIT;");
This will allow the correct processing of your queries in isolation of the rest of the events happening in the db.
At any rate, if you want the first query to be faster, you need to tune your table definition by adding an index on the column used for the deletion, namely `date` (however, recall that this new index may amper other queries in your app, if there are already several indexes on that table).
Without that index, mysql will basically process the query more or less the same way as in method 2, but without:
PHP interpretation,
network communication and
query analysis overhead.
You don't need any SELECTS to make the delete in a loop. Just use LIMIT in your delete query and check if there are affected rows:
$start_date = "2011-05-01 00:00:00";
$end_date = "2011-05-31 23:59:59";
$deletedRecords = 0;
$sql = "DELETE FROM table WHERE date>='$start_date' and date <='$end_date' LIMIT 100";
do {
$mysqli->query($sql);
$deletedRecords += $mysqli->affected_rows;
while ($mysqli->affected_rows > 0);
}
printf("Affected rows (DELETE): %d\n", $deletedRecords);
Which method is better depends on the storage engine you are using.
If you are using InnoDB, this is the recommended way. The reason is that the DELETE statement runs in a transaction (even in auto-commit mode, every sql statement is run in a transaction, in order to be atomic... if it fails in the middle, the whole delete will be rolled back and you won't end with half-data). Which means that you will have a long running transaction, and you will have a lot of locked rows during the transaction, which will block anyone who wants to update such data (it can block insterts if there are unique indexes involved) and reads will be done via the rollback log. In other words, for InnoDB, large deletes are faster if performed in chunks.
In MyISAM however, the delete locks the entire table. If you do in lot of small chunks, you will have too many LOCK/UNLOCK commands executed, which will actually slow the process. I would make it in a loop for MyISAM as well, to give chance to other processes to use the table, but in larger chunks compared to InnoDB. I would never do it row by row for MyISAM based table because of the LOCK/UNLOCK overhead.
Related
I want to only run the update query if row exists (and was inserted). I tried several different things but this could be a problem with how I am looping this. The insert which works ok and creates the record and the update should take the existing value and add it each time (10 exists + 15 added, 25 exists + 15 added, 40 exists... I tried this in the loop but it ran for every item in a list and was a huge number each time. Also the page is run each time when a link is clicked so user exits and comes back
while($store = $SQL->fetch_array($res_sh))
{
$pm_row = $SQL->query("SELECT * FROM `wishlist` WHERE shopping_id='".$store['id']."'");
$myprice = $store['shprice'];
$sql1 = "insert into posted (uid,price) Select '$uid','$myprice'
FROM posted WHERE NOT EXISTS (select * from `posted` WHERE `uid` = '$namearray[id]') LIMIT 1";
$query = mysqli_query($connection,$sql1);
}
$sql2 = "UPDATE posted SET `price` = price + '$myprice', WHERE shopping_id='".$_GET['id']."'";
$query = mysqli_query($connection,$sql2);
Utilizing mysqli_affected_rows on the insert query, verifying that it managed to insert, you can create a conditional for the update query.
However, if you're running an update immediately after an insert, one is led to believe it could be accomplished in the same go. In this case, with no context, you could just multiply $myprice by 2 before inserting - you may look into if you can avoid doing this.
Additionally, but somewhat more complex, you could utilize SQL Transactions for this, and make sure you are exactly referencing the row you would want to update. If the insert failed, your update would not happen.
Granted, if you referenced the inserted row perfectly for your update then the update will not happen anyway. For example, having a primary, auto-increment key on these rows, use mysqli_insert_id to get the last inserted ID, and updating the row with that ID. But then this methodology can break in a high volume system, or just a random race event, which leads us right back to single queries or transaction utilization.
I'm trying to lock a row in a table as being "in use" so that I don't process the data twice when my cron runs every minute. Because of the length of time it takes for my script to run, the cron will cause multiple instances of the script to run at once (usually around 5 or 6 at a time). For some reason, my "in use" method is not always working.
I do not want to LOCK the tables because I need them available for simultaneous processing, that is why I went the route of pseudo-locking individual rows with an 'inuse' field. I don't know of a better way to do this.
Here is an illustration of my dilemma:
<?
//get the first row from table_1 that is not in use
$result = mysqli_query($connect,"SELECT * FROM `table_1` WHERE inuse='no'");
$rows = mysqli_fetch_array($result, MYSQLI_ASSOC);
$data1 = $rows[field1];
//"lock" our row by setting inuse='yes'
mysqli_query($connect,"UPDATE `table_1` SET inuse='yes' WHERE field1 = '$data1'");
//insert new row into table_2 with our data if it doesn't already exist
$result2 = mysqli_query($connect,"SELECT * FROM `table_2` WHERE field='$data2'");
$numrows = mysqli_num_rows($result2);
if($numrows >= 1) {
//do nothing
} else {
//run some unrelated script to get data
$data2 = unrelatedFunction();
//insert our data into table_2
mysqli_query($connect,"INSERT INTO `table_2` (field) value ('$data2')");
}
//"unlock" our row in table_1
mysqli_query($connect,"UPDATE `table_1` SET inuse='no' WHERE field1 = '$data1'");
?>
You'll see here that $data2 won't be collected and inserted if a row already exists with $data2, but that part is for error-checking and does not answer my question as the error still occurs. I'm trying to understand why (if I don't have that error-check in there) my 'inuse' method is sometimes being ignored and I'm getting duplicate rows in table_2 with $data2 in them.
There's a lot of time in between your first select and the first update where another process can do the same operation. You're not using transaction either, so you're not guaranteeing any order of the changes becoming visible to others.
You can either move everything into a transaction with the isolation level you need and use SELECT .... FOR UPDATE syntax. Or you can try doing the copy in a different way. For example update N rows that you want to process and SET in_use=your_current_pid WHERE in_use IS NULL. Then you can read back the rows you manually marked for processing. After you finish, reset in_use to NULL.
I have a mysql table with a lot of data in it. All of the rows in this table need to have one field modified in a way that is not easily expressed in pure SQL.
I'd like to be able to loop over the table row by row, and update all the entries one by one.
However to do this I would do something like:
$sql = "SELECT id,value FROM objects";
foreach ($dbh->query($sql) as $row)
{
$value = update_value( $row['value'] );
$id = $row['id'];
$update_sql = "UPDATE objects SET value='$value' WHERE id=$d";
$dbh->query( $update_sql );
}
Will this do something bad? (Other than potentially being slow?)
Clarification: In particular I'm worried about the first select using a cursor, rather than retrieving all the data in one hit within the foreach, and then
there being something I don't know about cursor invalidation caused by the update inside the loop. If there is some rule like "don't update the same table while scanning it with another cursor" it's likely that it will only show up on huge tables, and so me performing a small test case is pretty much useless.
If someone can point me to docs that say doing this is OK, rather than a particular problem with working this way, that'd also be great.
The results of a single query are consistent, so updates won't affect it. To keep in mind:
Use prepared statements; it will reduce the traffic between your process and the database, because only the values are transferred instead of a whole query every time.
If you're worried about other processes running at the same time, you should use transactions and proper locking, e.g.
// transaction started
SELECT id,value
FROM objects
LOCK IN SHARE MODE
// your other code
// commit transaction
Seems like you have two options right out of the gate:
(straight-forward): use something like 'fetchAll' to get all the results of the first query before you start looping through it. this will help keep you from overlapping cursors.
(more obscure): change this to use a stored function (in place of 'update_value') so you can turn the two queries into a single 'update objects set value=some_function( id )'
Depending on the size and duration of this you may need to lock everything beforehand.
I will create 5 tables, namely data1, data2, data3, data4 and data5 tables. Each table can only store 1000 data records.
When a new entry or when I want to insert a new data, I must do a check,
$data1 = mysql_query(SELECT * FROM data1);
<?php
if(mysql_num_rows($data1) > 1000){
$data2 = mysql_query(SELECT * FROM data2);
if(mysql_num_rows($data2 > 1000){
and so on...
}
}
I think this is not the way right? I mean, if I am user 4500, it would take some time to do all the check. Is there any better way to solve this problem?
I haven decided the numbers, it can be 5000 or 10000 data. The reason is flexibility and portability? Well, one of my sql guru suggest me to do this way
Unless your guru was talking about something like Partitioning, I'd seriously doubt his advise. If your database can't handle more than 1000, 5000 or 10000 rows, look for another database. Unless you have a really specific example how a record limit will help you, it probably won't. With the amount of overhead it adds it probably only complicates things for no gain.
A properly set up database table can easily handle millions of records. Splitting it into separate tables will most likely increase neither flexibility nor portability. If you accumulate enough records to run into performance problems, congratulate yourself on a job well done and worry about it then.
Read up on how to count rows in mysql.
Depending on what database engine you are using, doing count(*) operations on InnoDB tables is quite expensive, and those counts should be performed by triggers and tracked in a adjacent information table.
The structure you describe is often designed around a mapping table first. One queries the mapping table to find the destination table associated with a primary key.
You can keep a "tracking" table to keep track of the current table between requests.
Also be on alert for race conditions (use transactions, or insure only one process is running at a time.)
Also don't $data1 = mysql_query(SELECT * FROM data1); with nested if's, do something like:
$i = 1;
do {
$rowCount = mysql_fetch_field(mysql_query("SELECT count(*) FROM data$i"));
$i++;
} while ($rowCount >= 1000);
I'd be surprised if MySQL doesn't have some fancy-pants way to manage this automatically (or at least, better than what I'm about to propose), but here's one way to do it.
1. Insert record into 'data'
2. Check the length of 'data'
3. If >= 1000,
- CREATE TABLE 'dataX' LIKE 'data';
(X will be the number of tables you have + 1)
- INSERT INTO 'dataX' SELECT * FROM 'data';
- TRUNCATE 'data';
This means you will always be inserting into the 'data' table, and 'data1', 'data2', 'data3', etc are your archived versions of that table.
You can create a MERGE table like this:
CREATE TABLE all_data ([col_definitions]) ENGINE=MERGE UNION=(data1,data2,data3,data4,data5);
Then you would be able to count the total rows with a query like SELECT COUNT(*) FROM all_data.
If you're using MySQL 5.1 or above, you can let the database handle this (nearly) automatically using partitioning:
Read this article or the official documentation
How do i go about looking into a table and searching to see if a row exist. the back gorund behind it is the table is called enemies. Every row has a unique id and is set to auto_increment. Each row also has a unique value called monsterid. the monster id isn't auto_increment.
when a monster dies the row is deleted and replaced by a new row. so the id is always changing. as well the monsterid is changed too.
I am using in php the $_GET method and the monsterid is passing through it,
basically i am trying to do this
$monsterID = 334322 //this is the id passed through the $_GET
checkMonsterId = "check to see if the monster id exist within the enemies table"
if monsterid exist then
{RUN PHP}
else
{RUN PHP}
If you need anymore clarity please ask. and thanks for the help in advance.
Use count! If it returns > 0, it exists, else, it doesn't.
select count(*) from enemies where monsterid = 334322
You would use it in PHP thusly (after connecting to the database):
$monsterID = mysql_real_escape_string($monsterID);
$res = mysql_query('select count(*) from enemies where monsterid = ' . $monsterid) or die();
$row = mysql_fetch_row($res);
if ($row[0] > 0)
{
//Monster exists
}
else
{
//It doesn't
}
Use count, like
select count(*) from enemies where monsterid = 334322
However be sure to make certain you've added an index on monsterid to the table. Reason being that if you don't, and this isn't the primary key, then the rdbms will be forced to issue a full table scan - read every row - to give you the value back. On small datasets this doesn't matter as the table will probably sit in core anyway, but once the number of rows becomes significant and you're hitting the disk to do the scan the speed difference can easily be two orders of magnitude or more.
If the number of rows is very small then not indexing is rational as using an non-primary key index requires additional overhead when inserting data, however this should be a definite decision (I regularly impress clients who've used a programmer who doesn't understand databases by adding indexes to tables which were fine when the coder created them but subsequently slow to a crawl when loaded with real volumes of data - quite amazing how one line of sql to add an index will buy you guru status in your clients eyes cause you made his system usable again).
If you're doing more complex queries against the database using subselect, something like finding all locations where there is no monster, then look up the use of the sql EXISTS clause. This is often overlooked by programmers (the temptation is to return a count of actual values) and using it is generally faster than the alternatives.
Simpler :
select 1 from enemies where monsterid = 334322
If it returns a row, you have a row, if not, you don't.
The mysql_real_escape_string is important to prevent SQL injection.
$monsterid = mysql_real_escape_string($_GET['monsterid']);
$query = intval(mysql_query("SELECT count(*) FROM enemies WHERE monsterid = '$monsterid'));
if (mysql_result > 0) {
// monster exists
} else {
// monster doesn't exist
}