I have a database setup and populated with some dummy entries and need the option to move it to recycle but not remove it. I'm starting to use relational databases and wondering the best way to do this.
Should I set a boolean field 'recycle' and query based on that or should I actually move those entries into a different table? I'm not sure how either one of these options compare to each other for performance.
Create new new column named deleted. Set it for deletion. Add it to all your WHERE clauses:
SELECT * FROM table WHERE deleted = false
DON'T archive your rows by moving them to another table. I did that when I was a kid and a novice database designer. That's a major headache and in this day and age it won't save you anything on query time. You exponentially increase the risk of losing data by moving rows like that. I like to look at Propel behaviors for how to implement these. Do it this way (read this issue in regard to the deprecation warning, and ignore the warning because it is incorrect: https://github.com/propelorm/Propel/issues/810):
http://propelorm.org/Propel/behaviors/soft-delete.html
Not this way:
http://propelorm.org/Propel/behaviors/archivable.html
If you had about 1,000,000 rows in a table, then I would suggest archiving them.
It depends on how often you'll need to read and create those rows, and how many there will be.
If you use the "recycle field" feature you will have to modify all queries (of course you can try renaming the original table as table_full, and create a VIEW of the table having only recycle set to false, but I'm not sure it will work in all foreseeable circumstances):
ALTER TABLE rows ADD COLUMN recycle BOOLEAN NOT NULL DEFAULT false;
ALTER TABLE rows RENAME TO rows_real;
SET #sql = CONCAT('CREATE VIEW rows AS SELECT ', (SELECT
REPLACE(GROUP_CONCAT(COLUMN_NAME), ',recycle ', ' ')
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'rows_real' AND TABLE_SCHEMA = 'test_db'),
' FROM rows_real WHERE recycle = false');
PREPARE stmt FROM #sql;
EXECUTE stmt;
Now INSERT, SELECT and DELETEs with primary key work as before, so those queries need no change.
Doing it the other way you will have to worry about locking the tables during the move, and wrapping everything into transactions.
To begin, I'd favor the "one table" approach, since it looks "cleaner" to me. With proper indexing, performances shouldn't be a problem.
You can create the column (boolean type) called e.g. hidden, and filter results like that:
SELECT * FROM table WHERE hidden = FALSE AND [your text]
You can also copy the row to another table and then remove it from the old one. I prefer that first option.
You've actually already answered this yourself. If you just want a hidden effect, then I too would use a TINYINT (or a boolean ofc, whatever your preference is!) field called hidden (imagine that!) and just set it as and when needed, then when pulling data from it. You'd just make sure you were only selecting the rows where hidden=0
:)
Related
I'm doing a food delivery system for my final year project. For the database, I'm required to hide the record that is no longer in used, instead of deleting the record permanently. For example, if the seller doesn't want to sell a particular meal, they can disable the meal but the record of the meal still available in the database. I need to achieve this by using PHP and SQL. Can someone give me some ideas on how to achieve this? Thanks in advance.
The feature you are referring to is something called soft deletion. In soft deletion, a record is logically removed from the database table, without actually removing the record itself.
One common way to implement soft deletion is to add a column which keeps track of whether a column has been soft deleted. You can use the TINYINT(1) type for this purpose.
Your table creation statement would look something like this:
CREATE TABLE yourTable (`deleted` TINYINT(1), `col1` varchar, ...)
To query out records which have not been logically deleted, you could use:
SELECT *
FROM yourTable
WHERE deleted <> 1
And having a soft delete column also makes it easy to remove stale records if the time comes to do that.
A extra deleted column is a great option in many cases. But you have to be very careful that you always check it, and in some cases it can be hard to control this.
Another good choice is a "shadow table" with the same structure, and change your delete process to first copy to the shadow table, and then delete. This means your original table is safe to use, but you cannot do queries on all data (not easily - although UNION can help)
I'm facing a challenge that has never come up for me before and having trouble finding an efficient solution. (Likely because I'm not a trained programmer and don't know all the terminology).
The challenge:
I have a feed of data which I need to use to maintain a mysql database each day. To do this requires checking if a record exists or not, then updating or inserting accordingly.
This is simple enough by itself, but running it for thousands of records -- it seems very inefficient to do a query for each record to check if it already exists in the database.
Is there a more efficient way than looping through my data feed and running an individual query for each record? Perhaps a way to somehow prepare them into one larger query (assuming that is a more efficient approach).
I'm not sure a code sample is needed here, but if there is any more information I can provide please just ask! I really appreciate any advice.
Edits:
#Sgt AJ - Each record in the data feed has a number of different columns, but they are indexed by an ID. I would check against that ID in the database to see if a record exists. In this situation I'm only updating one table, albeit a large table (30+ columns, mostly text).
What is the problem;
if problem is performance for checking, inserting & updating;
insert into your_table
(email, country, reach_time)
values ('mike#gmail.com','Italy','2016-06-05 00:44:33')
on duplicate key update reach_time = '2016-06-05 00:44:33';
I assume that, your key is email
Old style, dont use
if email exists
update your_table set
reach_time = '2016-06-05 00:44:33'
where email = 'mike#gmail.com';
else
insert into your_table
(email, country, reach_time)
values ('mike#gmail.com','Italy','2016-06-05 00:44:33')
It depends on how many 'feed' rows you have to load. If it's like 10 then doing them record by record (as shown by mustafayelmer) is probably not too bad. Once you go into the 100 and above region I would highly suggest to use a set-based approach. There is some overhead in creating and loading the staging table, but this is (very) quickly offset by the reduction of queries that need to be executed and the amount of round-trips going on over the network.
In short, what you'd do is :
-- create new, empty staging table
SELECT * INTO stagingTable FROM myTable WHERE 1 = 2
-- adding a PK to make JOIN later on easier
ALTER TABLE stagingTable ADD PRIMARY KEY (key1)
-- load the data either using INSERTS or using some other method
-- [...]
-- update existing records
UPDATE myTable
SET field1 = s.field1,
field2 = s.field2,
field3 = s.field3
FROM stagingTable s
WHERE s.key1 = myTable.key1
-- insert new records
INSERT myTable (key1, field1, field2, field3)
SELECT key1, field1, field2, field3
FROM stagingTable new
WHERE NOT EXISTS ( SELECT *
FROM myTable old
WHERE old.key1 = new.key1 )
-- get rid of staging table again
DROP TABLE stagingTable
to bring your data up to date.
Notes:
you might want to make the name of the stagingTable 'random' to avoid the situation where 2 'loads' are running in parallel and might start re-using the same table giving all kinds of weird results (and errors). Since all this code is 'generated' in php anyway you can simply add a timestamp or something to the tablename.
on MSSQL I would load all the data in the staging table using a bulk-insert mechanism. It can use bcp or BULK INSERT; .Net actually has the SqlBulkCopy class for this. Some quick googling shows me mysql has mysqlimport if you don't mind writing to a temp-file first and then loading from there, or you could use this to do big INSERT blocks rather than one by one. I'd avoid doing 10k inserts in one go though, rather do them per 100 or 500 or so, you'll need to test what's most efficient.
PS: you'll need to adapt my syntax a bit here and there, like I said I'm more familiar with MSSQLs T-SQL dialect. Also, it could be you can use the on duplicate key methodology on the staging table direclty, thus combining the UPDATE and INSERT in one command. [MSSQL uses MERGE for this, but it would look completely different so I won't bother to include that here.]
Good luck.
I want to delete a row from one of my .sql tables by a user defined integer (1 through rowCount()). The below is pseudocode, but it illustrates what I want I think.
$i = 1; //example
$quedb = $db->query("
DELETE *
FROM table
WHERE ROWNUMBER = '$i'
");
Is there a way to do this within the SQL environment? I don't want to delete a row based on a specific element (a friend of mine suggested querying for an element in the row I define, but I just want to delete the nth row, where n is user defined).
You shouldn't do that.
"nth row" is a nonsense in the context of databases.
Database is something different from lists you're familiar to. They have no predefined order at all.
Database is an abstract heap of rows whose take order only at select time, always different, based on the field(s) chosen to order.
To identify a row you have to use unique identifier, which invented to serve the very purpose.
So, add id auto_incremented primary key field to your table and use it to identify the row.
"Other tables" are not only reason to keep consistency. Your own links on the site require consistent addressing too, no matter if you added or deleted some rows.
If you want to enumerate your output, do it at select time, using PHP. That's the only proper way.
Please, before inventing your own wheel, learn the very basics of database design.
Or at least follow good advises from more experienced people.
You can do that using a nested query. Like so...
Delete from table where id in company(Select id from table limit 5,1)
But it is really not recommended as the behavior is not very consistent.
Is it possible to UPDATE and then INSERT where row exists in mysql? I have this query,
$q = $dbc -> prepare("UPDATE accounts SET lifeforce = maxLifeforce, inHospital = 0 WHERE hospitalTime <= NOW() AND inHospital = 1");
$q -> execute();
How can I either get the primary key into an associative array to then do an insert for each item in the array, or do an UPDATE AND INSERT?
Or does it involve doing a SELECT to get all that match criteria, then UPDATE then INSERT using array from the select? This seems rather a long way to do it?
Basically I need to INSERT onto another table using the same primary keys that get updated.
Or does it involve doing a SELECT to get all that match criteria, then UPDATE then INSERT using array from the select?
Yes, sorry, that's the main way.
Another approach is to add a column called (say) last_updated, that you set whenever you update the column. You can then use that column in a query that drives your insert. That would have other advantages — I find last_updated columns to be useful for many things — but it's overkill if this is the only thing you'd ever use it for.
Edited to add: Another option, which just occurred to me, is to add a trigger to your accounts table, that will perform the insert you need. That's qualitatively different — it causes the insertion to be a property of accounts, rather than a matter of application logic — but maybe that's what you want? Even the most extreme partisans of the "put-all-constraints-in-the-database-so-application-logic-never-introduces-inconsistency" camp are usually cautious about triggers — they're really not a good way to implement application logic, because it hides that logic somewhere that no-one will think to look for it. But if the table you're inserting into is some sort of account_history table that will keep track of all changes to account, then it might be the way to go.
You can use a multiple table update as written in the manual: http://dev.mysql.com/doc/refman/5.0/en/update.html
If the second table needs an insert, you probably would have to do it manually.
You can use the mysqli_last_id function:
http://php.net/manual/en/mysqli.insert-id.php
Also, when running consecutive queries like that, I'd recommend using transactions:
http://www.techrepublic.com/article/implement-mysql-based-transactions-with-a-new-set-of-php-extensions/6085922
I am trying to find the fastest way to insert data into a table (data from a select)
I always clear the table:
TRUNCATE TABLE table;
Then I do this to insert the data:
INSERT INTO table(id,total) (SELECT id, COUNT(id) AS Total FROM table2 GROUP BY id);
Someone told me I shouldn't do this.
He said this would be much faster:
CREATE TABLE IF NOT EXISTS table (PRIMARY KEY (inskey)) SELECT id, count(id) AS total FROM table2 GROUP BY id
Any ideas on this one?
I think my solution is cleaner, because I don't have to check for the table.
This will be ran in a cron job a few times a day
EDIT: I wasn't clear. The truncate is always ran. It's just the matter of the fastest why to insert all the data
I also think your solution is cleaner, plus the solution by "someone" looks to me to have some problems:
it does not actually delete old data that may be in the table
create table...select will create table columns with types based on what the select returns. That means changes in the table structure of table2 will propagate to table. That may or may not be what you want. It at least introduces an implicit coupling, which I find to be a bad idea.
As for performance, I see no reason why one should be faster than the other. So the usual advice applies: Choose the cleanest, most maintainable solution, test it, only optimize if performance is a problem :-).
Your solution would be my choice, the performance difference loss (if any, which I'm not sure because you don't drop/create the table and re-compute column type) is negligible and IMHO overweight cleanliness.
CREATE TABLE IF NOT EXISTS table (PRIMARY KEY (inskey))
SELECT id, count(id) AS total
FROM table2
GROUP BY
id
This will not delete old values from the table.
If that's what you want, it will be faster indeed.
Perhaps something has been lost in the translation between your Someone and yourself. One possibility s/he might have been referring to is DROP/SELECT INTO vs TRUNCATE/INSERT.
I have heard that the latter is faster as it is minimally logged (but then again, what's the eventual cost of the DROP here?). I have no hard stats to back this up.
I agree with "sleske"s suggestion in asking you test it and optimize the solution yourself. DIY!
Every self respecting DB will give you the opportunity to rollback your transaction.
1. Rolling back your INSERT INTO... will require DB to keep track of every row inserted into the table
2. Rolling back the CREATE TABLE... is super easy for the DB - Simply get rid of the table.
Now, if you were designing & coding the DB, which would be faster? 1 or 2?
"someone"s suggestion DOES have merit especially if you are using Oracle.
Regards,
Shiva
I'm sure that any time difference is indistinguishable, but yours is IMHO preferable because it's one SQL statement rather than two; any change in your INSERT statement doesn't require more work on the other statement; and yours doesn't require the host to validate that your INSERT matches the fields in the table.
From the manual: Beginning with MySQL 5.1.32, TRUNCATE is treated for purposes of binary logging and replication as DROP TABLE followed by CREATE TABLE — that is, as DDL rather than DML. This is due to the fact that, when using InnoDB and other transactional storage engines where the transaction isolation level does not allow for statement-based logging (READ COMMITTED or READ UNCOMMITTED), the statement was not logged and replicated when using STATEMENT or MIXED logging mode.
You can simplify your insert to:
INSERT INTO table
( SELECT id, COUNT(id) FROM table2 GROUP BY id );