I have the following problem:
I have got a dataset inside a text file (not xml or csv encoded or something, just field values separated by \t and \n) which is updated every 2 minutes. I need to put the data from the file into a MariaDB Database, which itself is not very difficult to do.
What I am unsure about however, is how I would go about updating the table if the file's contents change. I thought about truncating the table and then filling it again, but doing that every 2 minutes with about 1000 datasets would mean some nasty problems with the database being incomplete during those updates, which makes it not a usable solution (which it wouldn't have been with fewer datasets either :D)
Another solution I thought about was to append the new data to the existing table, and use a delimter on the unique column (e.g. use cols 1-1000 before update, append data, then use values 1001-2000 after the update and remove 1-1000, after 2 or so updates start at id 1 again).
Updating the changing fields is not an option, because the raw data format would make that really difficult to keep track of the column that has changed (or hasn't)
I am, however unsure about best practices, as I am relatively new to SQL and stuff, and would like to hear your opinion, maybe I am just overlooking something obvious...
Even better...
CREATE TABLE new LIKE real; -- permanent, not TEMPORARY
load `new` from the incoming data
RENAME TABLE real TO old, new TO real;
DROP TABLE old.
Advantages:
The table real is never invisible, nor empty, to the application.
The RENAME is "instantaneous" and "atomic".
As suggested by Alex, I will create a temporary table, insert my data into the temporary table, truncate the production table and then insert from the temporary table. Works like a charm!
Related
I am going to set up a cron job to update some data via an API. I want it to update the database with the new feeds.
i.e. I would have an existing feed of entries, a script would go through the new feed. If the entry is already there, then dont update, if it is not in the db, then add it, and all other entries need to be deleted.
I was wondering what a good way to do this was have a column called "updated". This would be 0 by default. When a new entry is added, or an existing one is checked, the columns value becomes 1. Once the cron job has completed its updating, it would then remove all values that are still 0, and reset the remainder to 0.
Is this the right way to do such a job, if it helps there are over 10 million rows.
First of all there is no right or wrong answer and it always depends.
Now that being said with your approach you'll be updating all 10m+ rows in your main (target) table twice each time you do the sync up, which depending on how busy this table is may or may not be acceptable.
You may consider a different approach that is widely used in ETL:
load your feed data into a staging table first; do batch inserts or if possible use LOAD DATA INFILE - the fastest way of ingesting data in MySQL
optionally build indexes to help with lookups
"massage" your data if necessary (clean up, transform, augment etc)
insert into main table all new rows that present in staging and not in the main table
delete all rows from the main table that don't present in the staging table
truncate staging table
Hello this is my first time i post but hopefully i won't mess up to much.
Basically i'm trying to to copy two tables into a new table, the data in table 2 and 3 are temp data that i update with two csv files. It's just basic data that share the same ID so thats the Primary Key and i want these to be combined into a new table. This is supposed to be done just once a day handling about 2000 lines Below follows a better description of what i'm looking for.
3 tables, Core, temp_data1, temp_data2
temp_data1 has id, name, product
temp_data2 has id, description
id is a unique since it's the product_nr of the product
First copy the data from temp_data1 to Core. Insert new line if the product does not exist, if it do exist it should update the row with the information
Next update Core with the description where id=id and do not insert if id do not exist (it should not exist)
I'm looking for something that can be done in one push of a button, first i upload the csv file into the two different databases (two different files) next i push a button to merge the two tables to the Core one. I know you can do this right away with the two csv files and skip the two tables but i feel like that is so over my head it's not even funny.
I can handle programming php it's all the mysql stuff that's messing with my head.
Hopefully you guys can help me and in return i will help out any other place i can.
Thanks in advance.
I'm not sure I understand it correctly, but this can be done using only sql script, using INSERT INTO...SELECT...ON DUPLICATE KEY UPDATE... - see http://dev.mysql.com/doc/refman/5.6/en/insert-select.html
I've got a PHP script pulling a file from a server and plugging the values in it into a Database every 4 hours.
This file can and most likely change within the 4 hours (or whatever timeframe I finally choose). It's a list of properties and their owners.
Would it be better to check the file and compare it to each DB entry and update any if they need it, or create a temp table and then compare the two using an SQL query?
None.
What I'd personally do is run the INSERT command using ON DUPLICATE KEY UPDATE (assuming your table is properly designed and that you are using at least one piece of information from your file as UNIQUE key which you should based on your comment).
Reasons
Creating temp table is a hassle.
Comparing is a hassle too. You need to select a record, compare a record, if not equal update the record and so on - it's just a giant waste of time to compare a piece of info and there's a better way to do it.
It would be so much easier if you just insert everything you find and if a clash occurs - that means the record exists and most likely needs updating.
That way you took care of everything with 1 query and your data integrity is preserved also so you can just keep filling your table or updating with new records.
I think it would be best to download the file and update the existing table, maybe using REPLACE or REPLACE INTO. "REPLACE works exactly like INSERT, except that if an old row in the table has the same value as a new row for a PRIMARY KEY or a UNIQUE index, the old row is deleted before the new row is inserted." http://dev.mysql.com/doc/refman/5.0/en/replace.html
Presumably you have a list of columns that will have to match in order for you to decide that the two things match.
If you create a UNIQUE index over those columns then you can use either INSERT ... ON DUPLICATE KEY UPDATE(manual) or REPLACE INTO ...(manual)
I have a MySQL DB that receives a lot of data from a source once every week on a certain day of the week at a given time (about 1.2million rows) and stores it in, lets call it, the "live" table.
I want to copy all the data from "live" table into an archive and truncate the live table to make space for the next "current data" that will come in the following week.
Can anyone suggest an efficient way of doing this. I am really trying to avoid -- insert into archive_table select * from live --. I would like the ability to run this archiver using PHP so I cant use Maatkit. Any suggestions?
EDIT: Also, the archived data needs to be readily accessible. Since every insert is timestamped, if I want to look for the data from last month, I can just search for it in the archives
The sneaky way:
Don't copy records over. That takes too long.
Instead, just rename the live table out of the way, and recreate:
RENAME TABLE live_table TO archive_table;
CREATE TABLE live_table (...);
It should be quite fast and painless.
EDIT: The method I described works best if you want an archive table per-rotation period. If you want to maintain a single archive table, might need to get trickier. However, if you're just wanting to do ad-hoc queries on historical data, you can probably just use UNION.
If you only wanted to save a few periods worth of data, you could do the rename thing a few times, in a manner similar to log rotation. You could then define a view that UNIONs the archive tables into one big honkin' table.
EDIT2: If you want to maintain auto-increment stuff, you might hope to try:
RENAME TABLE live TO archive1;
CREATE TABLE live (...);
ALTER TABLE LIVE AUTO_INCREMENT = (SELECT MAX(id) FROM archive1);
but sadly, that won't work. However, if you're driving the process with PHP, that's pretty easy to work around.
Write a script to run as a cron job to:
Dump the archive data from the "live" table (this is probably more efficient using mysqldump from a shell script)
Truncate the live table
Modify the INSERT statements in the dump file so that the table name references the archive table instead of the live table
Append the archive data to the archive table (again, could just import from dump file via shell script, e.g. mysql dbname < dumpfile.sql)
This would depend on what you're doing with the data once you've archived it, but have you considered using MySQL replication?
You could set up another server as a replication slave, and once all the data gets replicated, do your delete or truncate with a SET BIN-LOG 0 before it to avoid that statement also being replicated.
I want to use temporary tables in my PHP code. It is a form that will be mailed. I do use session variables and arrays but some data filled in must be stored in a table format and the user must be able to delete entries in case of typos etc. doing this with arrays could work (not sure) but I'm kinda new at the php and using tables seems so much simpler. My problem is that using mysql_connect creates the table and adds the line of data but when i add my 2nd line it drops table and create it again... Using mysql_pconnect works by not dropping the table but creates more than on instance of the table at times and deleting entry's? what a mess! How can I best use temporary tables and not have them droped when my page refreshes? not using temporary tables may cause other issues if the user closes the page and leaving the table in the database.
Sounds like a mess! I am not sure why you are using a temp table at all, but you could create a random table name and assign it to a session variable. But this is hugely wrong as you would have a table for each user!
If you must use a database, add field to the table called sessionID. When you do your inserting/deleting reference the php sessionid.
Just storing the data in the session would probably be much easier though...
Better to create a permanent table and temporary rows. So, say you've serialized the object holding all your semi-complete form data as $form_data. At the end of the script, the last thing that should happen is that $form_date should be stored to the database and the resulting row id be stored in your $_SESSION. Something like:
$form_data=serialize($form_object); //you can also serialize arrays, etc.
//$form_data may also need to be base64_encoded
$q="INSERT INTO incomplete_form_table(thedata) '$form_data'";
$r=mysqli->query($q);
$id=$mysql->last_insert_id;
$_SESSION['currentform']=$id;
Then, when you get to the next page, you reconstitute your form like this:
$q="SELECT thedata FROM incomplete_form_table WHERE id={$_SESSION['currentform']}";
$r=$mysql->query($q);
$form_data=$r->fetch_assoc();
$form=$form_data['thedata'];//if it was base64_encoded, don't forget to unencode it first
You can (auto-) clean up the incomplete_form_data table periodically if the table has a timestamp field. Just delete everything that you consider expired.
The above code has not been exhaustively checked for syntax errors.