I am using mysqldump to create DB dumps of the live application to be used by developers.
This data contains customer data. I want to anonymize this data, i.e. remove customer names / credit card data.
An option would be:
create copy of database (create dump and import dump)
fire SQL queries that anonymize the data
dump the new database
But this has to much overhead.
A better solution would be, to do the anonymization during dump creation.
I guess I would end up parsing all the mysqlsqldump output? Are there any smarter solutions?
You can try Myanon: https://myanon.io
Anonymization is done on the fly during dump:
mysqldump | myanon -f db.conf | gzip > anon.sql.gz
Why are you selecting from your tables if you want to randomize the data?
Do a mysqldump of the tables that are safe to dump (configuration tables, etc) with data, and a mysqldump of your sensitive tables with structure only.
Then, in your application, you can construct the INSERT statements for the sensitive tables based on your randomly created data.
I had to develop something similar few days ago. I couldn't do INTO OUTFILE because the db is AWS RDS. I end up with that approach:
Dump data in tabular text form from some table:
mysql -B -e 'SELECT `address`.`id`, "address1" , "address2", "address3", "town", "00000000000" as `contact_number`, "example#example.com" as `email` FROM `address`' some_db > addresses.txt
And then to import it:
mysql --local-infile=1 -e "LOAD DATA LOCAL INFILE 'addresses.txt' INTO TABLE \`address\` FIELDS TERMINATED BY '\t' ENCLOSED BY '\"' IGNORE 1 LINES" some_db
only mysql command is required to do this.
As the export is pretty quick (couple of seconds for ~30.000 rows), the import process is a bit slower, but still fine. I had to join few tables on the way and there was some foreign keys so it will surely be faster if you don't need that. Also if you disable foreign key checks while importing it will also speed up things.
You could do a select of each table (and not a select *) and specify the columns you want to have and omit or blank those you don't want to have, and then use the export option of phpmyadmin for each query.
You can also use the SELECT ... INTO OUTFILE syntax from a SELECT query to make a dump with a column filter.
I found to similar questions but it looks like there is no easy solution for what you want. You will have to write a custom export yourself.
MySQL dump by query
MySQL: Dump a database from a SQL query
phpMyAdmin provides an export option to the SQL format based on SQL queries. It might be an option to extract this code from PHPmyadmin (which is probably well tested) and use it in this application.
Refer to the phpMyAdmin export plugin - exportData method for the code.
Related
I want to do an export of a table and we do not have mysqldump installed.
I thought I can do this:
root:~> mysql news media > news.media.7.26.2016.sql
where news is the database name and media is the table name
it doesn't seem to work correctly.
Your command tries to mimic mysqldump but mysql does not have a table parameter. You could run it like this:
mysql -D news -e "SELECT * FROM media" > news.media.7.26.2016.txt
That will work but you won't get nice SQL statements in the output, just tabular data export.
I mean that you may (or may not) run into problems when importing the data back. There's a chance to use
mysql -D news -e "LOAD DATA INFILE 'news.media.7.26.2016.txt' INTO TABLE media"
but I do not have much experience with that. First of your concerns is secure-file-priv setting that has been made strict starting in MySQL 5.7.6. Second, I would be a tad bit nervous about preserving data types.
I have a MySQL DB that receives a lot of data from a source once every week on a certain day of the week at a given time (about 1.2million rows) and stores it in, lets call it, the "live" table.
I want to copy all the data from "live" table into an archive and truncate the live table to make space for the next "current data" that will come in the following week.
Can anyone suggest an efficient way of doing this. I am really trying to avoid -- insert into archive_table select * from live --. I would like the ability to run this archiver using PHP so I cant use Maatkit. Any suggestions?
EDIT: Also, the archived data needs to be readily accessible. Since every insert is timestamped, if I want to look for the data from last month, I can just search for it in the archives
The sneaky way:
Don't copy records over. That takes too long.
Instead, just rename the live table out of the way, and recreate:
RENAME TABLE live_table TO archive_table;
CREATE TABLE live_table (...);
It should be quite fast and painless.
EDIT: The method I described works best if you want an archive table per-rotation period. If you want to maintain a single archive table, might need to get trickier. However, if you're just wanting to do ad-hoc queries on historical data, you can probably just use UNION.
If you only wanted to save a few periods worth of data, you could do the rename thing a few times, in a manner similar to log rotation. You could then define a view that UNIONs the archive tables into one big honkin' table.
EDIT2: If you want to maintain auto-increment stuff, you might hope to try:
RENAME TABLE live TO archive1;
CREATE TABLE live (...);
ALTER TABLE LIVE AUTO_INCREMENT = (SELECT MAX(id) FROM archive1);
but sadly, that won't work. However, if you're driving the process with PHP, that's pretty easy to work around.
Write a script to run as a cron job to:
Dump the archive data from the "live" table (this is probably more efficient using mysqldump from a shell script)
Truncate the live table
Modify the INSERT statements in the dump file so that the table name references the archive table instead of the live table
Append the archive data to the archive table (again, could just import from dump file via shell script, e.g. mysql dbname < dumpfile.sql)
This would depend on what you're doing with the data once you've archived it, but have you considered using MySQL replication?
You could set up another server as a replication slave, and once all the data gets replicated, do your delete or truncate with a SET BIN-LOG 0 before it to avoid that statement also being replicated.
I have an application at Location A (LA-MySQL) that uses a MySQL database; And another application at Location B (LB-PSQL) that uses a PostgreSQL database. (by location I mean physically distant places and different networks if it matters)
I need to update one table at LB-PSQL to be synchronized with LA-MySQL but I don't know exactly which are the best practices in this area.
Also, the table I need to update at LB-PSQL does not necessarily have the same structure of LA-MySQL. (but I think that isn't a problem since the fields I need to update on LB-PSQL are able to accommodate the data from LA-MySQL fields)
Given this data, which are the best practices, usual methods or references to do this kind of thing?
Thanks in advance for any feedback!
If both servers are in different networks, the only chance I see is to export the data into a flat file from MySQL.
Then transfer the file (e.g. FTP or something similar) to the PostgreSQL server and import it there using COPY
I would recommend to import the flat file into a staging table. From there you can use SQL to move the data to the approriate target table. That will give you the chance to do data conversion or do updates on existing rows.
If that transformation is more complicated you might want to think about using an ETL tool (e.g. Kettle) to do the migration on the target server .
Just create a script on LA that will do something like this (bash sample):
TMPFILE=`mktemp` || (echo "mktemp failed" 1>&2; exit 1)
pg_dump --column-inserts --data-only --no-password \
--host="LB_hostname" --username="username" \
--table="tablename" "databasename" \
awk '/^INSERT/ {i=1} {if(i) print} # ignore everything to first INSERT' \
> "$TMPFILE" \
|| (echo "pg_dump failed" 1>&2; exit 1)
(echo "begin; truncate tablename;"; cat "$TMPFILE"; echo 'commit;' ) \
| mysql "databasename" < "$TMPFILE" \
|| (echo "mysql failed" 1>&2; exit 1) \
rm "$TMPFILE"
And set it to run for example once a day in cron. You'd need a '.pgpass' for postgresql password and mysql option file for mysql password.
This should be fast enough for a less than a million of rows.
Not a turnkey solution, but this is some code to help with this task using triggers. The following assumes no deletes or updates for brevity. Needs PG>=9.1
1) Prepare 2 new tables. mytable_a, and mytable_b. with the same columns as the source table to be replicated:
CREATE TABLE mytable_a AS TABLE mytable WITH NO DATA;
CREATE TABLE mytable_b AS TABLE mytable WITH NO DATA;
-- trigger function which copies data from mytable to mytable_a on each insert
CREATE OR REPLACE FUNCTION data_copy_a() RETURNS trigger AS $data_copy_a$
BEGIN
INSERT INTO mytable_a SELECT NEW.*;
RETURN NEW;
END;
$data_copy_a$ LANGUAGE plpgsql;
-- start trigger
CREATE TRIGGER data_copy_a AFTER INSERT ON mytable FOR EACH ROW EXECUTE PROCEDURE data_copy_a();
Then when you need to export:
-- move data from mytable_a -> mytable_b without stopping trigger
WITH d_rows AS (DELETE FROM mytable_a RETURNING * ) INSERT INTO mytable_b SELECT * FROM d_rows;
-- export data from mytable_b -> file
\copy mytable_b to '/tmp/data.csv' WITH DELIMITER ',' csv;
-- empty table
TRUNCATE mytable_b;
Then you may import the data.csv to mysql.
i have a .csv file containing a table data.I want to dump the total csv file into db using a query in mysql using column name as key...
rather than doing it manually...like using "insert into" query...
if any other language like php or python program make this work...then also ok..
can some one suggest me....pls
you can do this fairly easily with navicat software.
more info here
load data local infile ...
LOAD DATA LOCAL INFILE '/tmp/your_file.csv' INTO TABLE your_table;
You need to ensure the mysql user has enough privileges to perform load
I'm writing a PHP script to generate SQL dumps from my database for version control purposes. It already dumps the data structure by means of running the appropriate SHOW CREATE .... query. Now I want to dump data itself but I'm unsure about the best method. My requirements are:
I need a record per row
Rows must be sorted by primary key
SQL must be valid and exact no matter the data type (integers, strings, binary data...)
Dumps should be identical when data has not changed
I can detect and run mysqldump as external command but that adds an extra system requirement and I need to parse the output in order to remove headers and footers with dump information I don't need (such as server version or dump date). I'd love to keep my script as simple as I can so it can be hold in an standalone file.
What are my alternatives?
I think the mysqldump output parsing is still the easiest. I think there are really few meta-data you have to exclude, and thus dropping those should be only a few lines of code.
You could also look at the following mysqdump options: --tab, --compact
Could always try the system() function. Hope you are in linux :)
"Select * from tablename order by primary key"
you can get the table names from the information_schema database or the "show tables" command. Also can get the primary key from "show indexes from tablename"
and then loop and create insert statements strings