I just took over a pretty terrible database design job, which heavily use comma separated value to store data. I know I know, it is hell.
The db is mysql, currently accessing it using MySql Workbench.
I already had idea in mind what to remove, and what new relations table needed.
So, my question is, how shall I proceed by migrating comma separated data to the new table? Any tools specialize for normalizing database?
Edit:
The server code is in PHP.
Define you new tables and attributes first.
Then, use PHP or Python or your favorite language with MySQL calls and write a 1 time converter which loops and reads the old table(s) and records and inserts the proper records into the new tables.
It appears you are looking for standard practices. There are varying degree of denormalized databases out there. The ones I have come across have been normalized with custom code and tools.
SQL Server Integration Services (SSIS) can be used for some case. In your case, I'd build a script for the migration that involves:
creation of normalized tables
creating stored procedure or PHP script(s) to read data from denormalized table, transform it and load it into normalized table
creating a log table or log file
performing the migration in sandbox; write logs while doing so
version control the script
correct the proc/script as needed
create another sandbox
run the full script on sandbox
if successful, run the full script on prod (with logging)
SSIS is used for ETL in many organizations; it's standard tool for Microsoft BI stack and can also be used to migrate data between non-Microsoft DBs also.
Open Source ETL tool called Talend might also help in transforming your data. I personally believe that a PHP script will be the fastest and easiest to manipulate data.
Related
This is somewhat of an abstract question but hopefully pretty simple at the same time. I just have no idea the best way to go about this except for an export/import and I can't do that due to permission issues. So i need some alternatives.
On one server, we'll call it 1.2.3 I have a database with 2 schemas, Rdb and test. These schemas have 27 and 3 tables respectively. This database stores call info from our phone system but we have reader access only so we're very limited in what we can do beyond selecting and joining for data records and info.
I then have a production database server, call it 3.2.1 With my main schemas and I'd like to place the previous 30 tables into one of these production schemas. After the migration is done, I'll need to create a script that will check the data on the first connection and then update the new schema on the production connection, but that's after the bulk migration is done.
I'm wondering if a php script would be the way to go about this initial migration, though. I'm using MySQL workbench and the export wizard fails for the read only database, but if there's another way in the interface then I don't know about it.
It's quite a bit of data, and I'm not necessarily looking for the fastest way but the easiest and most fail safe way.
For a one time data move, the easiest way is to use the command line tool mysqldump to dump your tables to file, then load the resulting file with mysql. This assumes that you are either shutting down 1.2.3, or will reconfigure your phone system to point to 3.2.1 (or update DNS appropriately). Also, this is much easier if you can get downtime on the phone system to move the data.
we have reader access only so we're very limited in what we can do beyond selecting and joining for data records
This really limits your options.
Master/Slave replication requires REPLICATION SLAVE privilege, which you probably need a user with SUPER privilege to create a replication user.
Trigger based replication solutions like SymetricDS will require a user with CREATE ROUTINE in order to create the triggers
An "Extract, Transform, Load" solution like Clover ETL will work best if tables have LAST_CHANGED timestamps. If they don't, then you would need ALTER TABLE privilege.
Different tools for different goals.
Master/Slave replication is generally used for Disaster Recovery, Availability or Read Scaling
Hetergenous Replication to replicate some (or all) tables between different environments (could be different RDBMS, or different replica sets) in a continuous, but asynchronous fashion.
ETL for bulk, hourly/daily/periodic data movements, with the ability to pick a subset of columns, aggregate, convert timestamp formats, merge with multiple sources, and generally fix whatever you need to with the data.
That should help you determine really what your situation is - whether it's a one time load with a temporary data sync, or if it's an on-going replication (real-time, or delayed).
Edit:
https://www.percona.com/doc/percona-toolkit/LATEST/index.html
Check out the Persona Toolkit. Specifically pt-table-sync and pt-table-checksum. They will help with this.
My problem is I'm using a HUGE web application (a school system), with no documentation for the internal logic. I need to make a bulk update of a particular value, but I don't know what tables in the MySQL database contain the relevant data to update. The app it's self runs from php. Is there an easy way to compare the database before I do an operation and after I do it so I can see what tables are effected? I tried using a diff comparing tool on the dumped sql before and after, but the database is so huge it's really impractical to use, wondering if there is something better or if I can just configure php somehow to log any mysql operations from whatever file happens to trigger them.
You may want to run the performance tool from the mysql workbench and look at the performance reports/statement analysis. This will work if you pick a time when the system is not being used and then run some function in the web that updates the tables with the values you need to change. Look at the performance table before and after you run your experiment and look for those sql statements which show use. It's not perfect, but this will at least help you begin to hone in on the data you're looking for. The big 'gotcha' here is if the value you want to change is dynamically derived during the query process. Then you'll have to understand how the derivation works and the source columns. But, again, this will give you a brute-force starting place.
Is it preferred to create tables in mysql using a third party application (phpmyadmin, TOAD, etc...) instead of php?
The end result is the same, I was just wondering if one way is protocol.
No, there isn't a 'set-in-stone' program to manage your database and query to it.
However, I highly recommend MySQL Workbench.
It allows you to graphically design your database, query to your database server and do all kinds of administration tasks.
I'd say it is far easier to do so within an application created for that purpose. The database itself obviously doesn't care as it's just DDL to it. Using Toad or PHP MyAdmin would help you do the job quicker and allow you to catch syntax errors prior to execution or use a wizard where you're not writing it by hand in the first place.
usually a software project provides one or more text files containing the ddl statements to create the necessary tables. what tool you use to execute those statements doesn't really matter. some php projects alwo provide a installer wizard php file which can be executed directly in the browser, so you don't need any additional tools at all.
I'll try to only answer what your question is - "Is it preferred to create tables in mysql using a third party application (phpmyadmin, TOAD, etc...) instead of php?"...
Yes, it is preferred to create tables or alter them or delete them or perhaps do any DB-related activity that is outside the scope of what interfaces your application provides, in MySQL using any of the many available MySQL clients. And the reason is because these applications are designed to perform DB related tasks and are best at doing them.
Though you may as well use PHP for creating tables depending on the situations, like if the application uses dynamic tables or needs "temporary" tables for performing complex jobs or storing intermediary results/calculations. Or perhaps if the application provides interfaces to manage/control certain aspects, like assume that a certain application consists of various user-roles that have their respective columns in the table. If the application provides rights to the admin to delete or add new roles, which will need to delete or add new columns, it's best to do such queries from PHP.
So, putting it again, use MySQL for any DB work that is not related or affected by what functionality or interfaces your PHP code provides.
Sidenote: Though I've used phpMyAdmin, TOAD, WorkBench and a few others, I think nothing's as efficient and quick as the MySQL client itself, i.e. working directly on the MySQL prompt. If you've always used GUI clients, you might find it unattractive to work on the prompt initially but it's real fun and helps you keep syntaxes on your tips :-)
You question might have been misunderstood by some people.
Charles Sprayberry was saying there's no best practice as far as which 3rd party MySQL client (i.e. phpmyadmin, TOAD, etc.) to use to edit your database. It comes down to personal preference.
Abhay was saying (and I really think this was the answer to your question), that typically, your application does not do DDL (although exceptions exist). Rather, your application will usually be performing DML commands only.
DML is Data Manipulation Language. For example:
select
insert
update
delete
DDL is Data Definition Language. For example:
create table
alter table
drop table
Basic SQL statements: DDL and DML
I have a windows program which generates PGP forms which will be filled in later.
Those PHP forms will populate a database. It looks very much like MySql, but I can't be certain, so let's call it ODBC.
And, yes, it does have to be a windows program.
There will also be PHP forms which query the database - examine which tables and fields it contains and then generates forms which can be used to search the database (e.g, it finds a table with fields "employee_name", etc and generates a form which lets you search based on employee name.
Let's call that design time and run time.
At design time, some manager or IT guy or similar gets to define the nature of the database and at runtime 1) a worker fills in the form daily and 2) management can extract reports.
Here's my question: given that the database is defined at "design time" (and populated at run time), where and how is best to do so?
1 I could use an ODBC interface from the windows program, but I am having difficulty finding something good to work with Delphi. Things like ADO and firebird tend to expect you to already have a database and allow you to manipulate it, but I can find no code example of how to create a database and some tables, so ...
2 I could used DOS commands from Delphi in my windows program. I just tried and got a response to MySql --version, but am not sure if MySql etc are more interactive. That is, can I use a script file or a very long stacked command with semicolons and returns separating? e.g 'CREATE DATABASE db; CREATE TABLE t1;'
3) Since the best way to work with databases seems to be PHP, perhaps my windows program could spit out a PHP page which would, when run in a browser, create the database.
I have tried to make this as uncomplicated as I can, but please feel free to ask questions. It may be that there are several valid ways, but there is probably one 'better' solution in terms of ease of implementation or maintenance.
Better scratch option 3. What if the user later wants to come back and have the windows program change the input form? It needs to update the database too.
Creating a database is usually a database administrator task. Unless it is a local database, maybe an embedded one, the user would need to know where and how create the database on the remote server, and she can have no clue about it. Where to store the database files? Which disks are available? And there could be many more parameters to set (memoery buffers size, etc.), users to be created and so on. And also you need very elevate privileges to be able to create a database, not something you give to average users or applications.
Thereby usually you ask the database administrator to create your database/schema, he will give you the credentials you need to connect, and then your application (or its setup) will create and initialize the needed objects (tables, etc.). Creating table (and other object) is usually as simple as running "CREATE TABLE...." statements. Just remember SQL takes one command only, if you need to run several commands you have to send them one after another yourself, although there are Delphi components which are able to split a script in commands and run one after another.
I am trying to import various pipe delimited files using php 5.2 into a mysql database. I am importing various formats of piped data and my end goal is to try put the different data into a suitably normalised data structure but need to do some post processing on the data to put it into my model correctly.
I thought the best way to do this is to import into a table called buffer and map out the data then import into various tables. I am planning to create a table just called "buffer" with fields that represent each columns (there will be up to 80 columns) then apply some data transforms/mapping to get it to the right table.
My planned approach is to create a base class that generically reads the the pipe data into the buffer table then extend this class by having a function that contain various prepared statements to do the SQL magic, allowing me the flexibility to check the format is the same by reading the headers on the first row and changing it for one format.
My questions are:
Whats the best way to do step one of reading the data from a local file saved into the table? I'm not too sure if i should use the LOAD DATA of mysql (as suggested in Best Practice : Import CSV to MYSQL Database using PHP 5.x) or just fopen then insert the data line by line.
is this the best approach? How have other people approach this?
Is there anything in the zen framework that may help?
Additional : I am planning to do this in a scheduled task.
You don't need any PHP code to do that, IMO. Don't waste time on classes. MySQL LOAD DATA INFILE clause allows a lot of ways to import data, for 95% of your needs. Whatever delimiters, whatever columns to skip/pick. Read the manual attentively, it's worth to know what you CAN do with it. After importing the data, it can be already in a good shape if you write the query properly. The buffer table can be a temporary one. Then normalize or denormalize it and drop the initial table. Save the script in a file to reproduce the sequence of scripts if there's a mistake.
The best way is to write a SQL script, test if finally the data is in proper shape, seek for mistakes, modify, re-run the script. If there's a lot of data, do tests on a smaller set of rows.
[added] Another reason for sql-mostly approach is that if you're not fluent in SQL, but are going to work with a database, it's better to learn SQL earlier. You'll find a lot of uses for it later and will avoid the common pitfalls of programmers who know it superficially.
I personally use the free ETL software Kettle by Pentaho (this bit of software is commonly referred to as kettle). While this software is far from perfect, I've found that I can often import data in a fraction of the time I would have to spend writing a script for one specific file. You can select a text file input and specify the delimiters, fixed width, etc.. and then simply export directly into your SQL server (they support MySql, SQLite, Oracle, and much more).
There are dozens and dozens of ways. If you have local filesystem access to the MySQL instance, LOAD DATA. Otherwise you can just as easily transform each line into SQL (or a VALUES line) for periodic submittal to MySQL via PHP.
In the end i used dataload AND modified this http://codingpad.maryspad.com/2007/09/24/converting-csv-to-sql-using-php/ for different situations.