MySQL database normalization (taken from Excel) - php

I have imported a SQL database from an Excel sheet, so it's a little bit messy. For example there's a product field with VARCHAR values such as product8. I would like to grep through these data using some regex, capture the id in this example, and alter column data types. As of now I would start preg_matching the long and hard PHP way, and I'm curious how a database normalization is done right using SQL commands. Thanks for your support in advance.

you can select case, to pull the ids
select right(product,length(product)-7) as productID from table
this will pull the numbers, then you can do whatever

Related

Create a dataabse field vs generating data by coding

I was wondering where the cut off point is to creating a field in a database to hold your data or generating the data yourself in your code. For example, I need to know certain value that is generated from two different columns in the database. Column1-Column2 = Column3. So here is the question will it be better to generate that data in the code or should I create a Column3 and put the data there while populating the DB and then retrieve it later. In my case the data is a two digit integer or a single character string, basically small data.
I am using the latest mysql the programming language is php with the mysqli library. Also this website should not get too much traffic and the size of the db will be 200k rows at the most.
This type of attribute (column) is called derived attribute. You should not put them in database as they will increase redundancy. Just put column1 and column2 and calculate it while fetching. For example like this,
`Column1` - `Column2` as `Column3`
If you dont bother query that way every time create a view with added derived attributes.
Note, If calculating is cpu intensive you should consider using cache. and then you must implement how and when this cache will be invalidated.
It depends on how resource-hungry the calculation is and how often it's gonna be done. In your case it's pretty simple so storing the difference in a separate column would be overkill. You can do your calculation in SQL query like this:
SELECT col1, col2, col1-col2 AS col3...

Transfer raw data into optimized MySQL table: algorithm to detect correct data type?

I am importing a raw txt file to MySQL and need an algorithm to detect the correct MySQL data type after scanning each column of data. Can anyone suggest an existing PHP library to accomplish this task?
The algorithm would have the following features:
scan each record in a column and assign the correct type, for example...
"-30021" -> SMALLINT
"foo bar" -> CHAR(7)
aggregate all types in the column to select the type that has complete coverage.
settings to allow for more conservative vs. relaxed type detection. (If 98% of values are integers, and there are a few strings, we'll reject the strings as noise, and use an INT data type.)
After all columns have been detected, automate the MySQL table creation and data import.
Scripting & reports
Thanks for your tips
MySQL has this functionality built in! It's called PROCEDURE ANALYSE
I wrote about it in a blog post I wrote last September: What makes a good MySQL index? Part 1: Column Size

Comparing data to table in the database

I receive raw data in CSVs, and upload it to a table in a MySQL database (upon which my website functions). I want to compare a newer CSV to the data I uploaded from an older CSV, and I want to see the differences between the two (basically I want to diff the raw data with the table).
I have PHP, MySQL, and my desktop apps (e.g. Excel) at my disposal. What's the best way to go about this? Possible ways I can think of:
Inserting the newer data into a
Table_Copy, then somehow diffing the
two tables in mysql.
Somehow querying the database in
comparison to the rawdata without
having to upload it.
Downloading the data from the
database into raw CSV format, and
then comparing the two raw CSV's using a desktop program
Why don't you use the where clause to pull only the data that is new? For instance
select * from table where dateadded > '1-1-2011 18:18'
This depends on your table having a dateadded column and populating that with the date and time the data is added.
diff <(mysqldump test old_csv --skip-extended-insert) <(mysqldump test new_csv --skip-extended-insert) --side-by-side --suppress-common-lines --width=690 | more
You can use the following approaches
1) Database Table comparison - create a copy of the table and then compare data.
You can use propriety tools to do it easily (Eg : EMS Data comparer).
You can also write some simple queries to achieve this (Eg : select id from table_copy not in (select id in table) )
2) Use a file comparer like winmerge
Take the dump of both the tables with exact method, and them compare it.
I use both the approaches depending on my data size. For smaller data 2nd approach is good.

How can I search all of the databases on my mysql server for a single string of information

I have around 150 different databases, with dozens of tables each on one of my servers. I am looking to see which database contains a specific person's name. Right now, i'm using phpmyadmin to search each database indvidually, but I would really like to be able to search all databases and all tables at once. Is this possible? How would I go about doing this?
A solution would be to use the information_schema database, to list all database, all tables, all fields, and loop over all that...
There is this script that could help for at least some part of the work : anywhereindb (quoting) :
This code is search all the tables and
all the rows and columns in a MYSQL
Database. The code is written in PHP.
For faster result, we are only
searching in the varchar field.
But, as Harmen noted, this only works with one database -- which means you'd have to wrap something arround it, to loop over each database on your server.
For more informations about that, take a look at Chapter 19. INFORMATION_SCHEMA Tables ; especially, the SCHEMATA table, which contains the name of all databases on the server.
Here's another solution, based on a stored procedure -- which means less client/server calls, which might make it faster : http://kedar.nitty-witty.com/miscpages/mysql-search-through-all-database-tables-columns-stored-procedure.php
The right way to go about it would be to NORMALIZE your data in the first place!!!
You say name - but most people have at least 2 names (a surname and a forename) are these split up or in the same field? If they are in the same field, then what order do they appear in? how are they capitalized?
The most efficient way to try to identify where the data might be would be to write a program in C which sifts the raw data files (while the DBMS is shut down) looking for the data - but that will only tell you what table they apppear in.
Failing that you need to write some PHP which iterates through each database ('SHOW databases' works much like a select statement), then iterates through each table in the database, then generates a SELECT statement filtering on each CHAR or VARCHAR column large enough to hold the name you are looking for (try running 'DESC $table').
Good luck.
C.
The best answer probably depends on how often you want to do this. If it is ad-hoc once a week type stuff then the above answers are good.
If you want to do this kind of search once a second, maybe create a "data warehouse" database that contains just the table:columns you want to search (heavily indexed, with a reference back to the source database if that is needed) populated by cron job or by stored procedures driven by changes in the 150 databases...

PHP mySQL UPDATE SET LIKE %value

Is there a logical (and possible) way to do something like this?
UPDATE $table SET LIKE %_checkbox = '' WHERE id='$id'
I have fields like allowed_checkbox and types_checkbox and they are sent to the database script dynamically. Can you use a wildcard when referring to the column name?
You've got a bit of a Frankenstein syntax there. The server will need to know the table and column names before compiling the SQL - so you can't do what you're after directly.
Does your php code have no prior knowledge of the database schema?
The key word you used is dynamically - you could find matching column names using a query against the MySQL INFORMATION_SCHEMA.COLUMNS table. You could do this per-update, which would be expensive, or once at application start up extract the schema for all tables you need.
No. You would have to generate the SQL string and then execute it separately. If you're trying to do something like this then you've probably got a bad schema design.

Categories