MySQL design with dynamic number of fields - php

My experience with MySQL is very basic. The simple stuff is easy enough, but I ran into something that is going to require a little more knowledge. I have a need for a table that stores a small list of words. The number of words stored could be anywhere between 1 to 15. Later, I plan on searching through the table by these words. I have thought about a few different methods:
A.) I could create the database with 15 fields, and just fill the fields with null values whenever the data is smaller than 15. I don't really like this. It seems really inefficient.
B.) Another option is to use just a single field, and store the data as a comma separated list. Whenever I come back to search, I would just run a regular expression on the field. Again, this seems really inefficient.
I would hope there is a good alternative to those two options. Any advice would be very appreciated.
-Thanks

C) use a normal form; use multiple rows with appropriate keys. an example:
mysql> SELECT * FROM blah;
+----+-----+-----------+
| K | grp | name |
+----+-----+-----------+
| 1 | 1 | foo |
| 2 | 1 | bar |
| 3 | 2 | hydrogen |
| 4 | 4 | dasher |
| 5 | 2 | helium |
| 6 | 2 | lithium |
| 7 | 4 | dancer |
| 8 | 3 | winken |
| 9 | 4 | prancer |
| 10 | 2 | beryllium |
| 11 | 1 | baz |
| 12 | 3 | blinken |
| 13 | 4 | vixen |
| 14 | 1 | quux |
| 15 | 4 | comet |
| 16 | 2 | boron |
| 17 | 4 | cupid |
| 18 | 4 | donner |
| 19 | 4 | blitzen |
| 20 | 3 | nod |
| 21 | 4 | rudolph |
+----+-----+-----------+
21 rows in set (0.00 sec)
This is the table I posted in this other question about group_concat. You'll note that there is a unique key K for every row. There is another key grp which represents each category. The remaining field represents a category member, and there can be variable numbers of these per category.

What other data is associated with these words?
One typical way to handle this kind of problem is best described by example. Let's assume your table captures certain words found in certain documents. One typical way is to assign each document an identifier. Let's pretend, for the moment, that each document is a web URL, so you'd have a table something like this:
CREATE TABLE WebPage (
ID INTEGER NOT NULL,
URL VARCHAR(...) NOT NULL
)
Your Words table might look something like this:
CREATE TABLE Words (
Word VARCHAR(...) NOT NULL,
DocumentID INTEGER NOT NULL
)
Then, for each word, you create a new row in the table. To find all words in a particular document, select by the document's ID:
SELECT Words.Word FROM Words, WebPage
WHERE Words.DocumentID = WebPage.DocumentID
AND WebPage.URL = 'http://whatever/web/page/'
To find all documents with a particular word, select by word:
SELECT WebPage.URL FROM WebPage, Words
WHERE Words.Word = 'hello' AND Words.DocumentID = WebPage.DocumentID
Or some such.

Hurpe, is the scenario you are describing that you will have a database table with a column that can contain a up to 15 keywords. Later you will use these keywords to search the table which will presumably have other columns as well?
Then isn't the answer to have a separate table for the keywords? You will also need to have a many-to-many relationship between the keywords and the main table.
So using cars as an example, the WORD table that will store the 15 or so keywords would have the following structure:
ID int
Word varchar(100)
The CAR table would have a structure something like:
ID int
Name varchar(100)
Then finally you need a CAR_WORD table to hold the many-to-many relationships:
ID int
CAR_ID int
WORD_ID int
And sample data to go with this for the WORD table:
ID Word
001 Family
002 Sportscar
003 Sedan
004 Hatchback
005 Station-wagon
006 Two-door
007 Four-door
008 Diesel
009 Petrol
together with sample data for the CAR table
ID Name
001 Audi TT
002 Audi A3
003 Audi A4
then the intersection CAR_WORD table sample data could be:
ID CAR_ID WORD_ID
001 001 002
002 001 006
003 001 009
which give the Audi TT the correct characteristics.
and finally the SQL to search would be something like:
SELECT c.name
FROM CAR c
INNER JOIN CAR_WORD x
ON c.id = x.id
INNER JOIN WORD w
ON x.id = w.id
WHERE w.word IN('Petrol', 'Two-door')
Phew! Didn't intend to set out to write quite so much, it looks complicated but it is where I always seem to end up however hard I try to simplify things.

I would create a table with and ID and one field, then store your results as multiple records. This offers many benefits. For example, you can then programatically enforce your 15 word limit instead of doing it in your design, so if you ever change your mind it should be rather easy. Your queries to search on the data will also be much faster to run, regular expressions take a lot of time to run (comparatively). Plus using a varchar for the field will allow you to compress your table much better. And indexing on the table should be much easier (more efficient) with this design.

Do the extra work and store the 15 words as 15 rows in the table, i.e. normalize the data. It may require you to re-think your strategy a bit, but trust me when the client comes along and says "Can you change that 15 limit to 20...", you'll be glad you did.

Depending on exactly what you want to accomplish:
Use a full-text index on your string table
Three tables: one for the original string, one for unique words (after word-rooting?), and a join table. This would also let you do more complicated searches, like "return all strings containing at least three of the following five words" or "return all strings where 'fox' occurs after 'dog'".
CREATE TABLE string (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
string TEXT NOT NULL
)
CREATE TABLE word (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
word VARCHAR(14) NOT NULL UNIQUE,
UNIQUE INDEX (word ASC)
)
CREATE TABLE word_string (
id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
string_id INT NOT NULL,
word_id INT NOT NULL,
word_order INT NOT NULL,
FOREIGN KEY (string_id) REFERENCES (string.id),
FOREIGN KEY (word_id) REFERENCES (word.id),
INDEX (word_id ASC)
)
// Sample data
INSERT INTO string (string) VALUES
('This is a test string'),
('The quick red fox jumped over the lazy brown dog')
INSERT INTO word (word) VALUES
('this'),
('test'),
('string'),
('quick'),
('red'),
('fox'),
('jump'),
('over'),
('lazy'),
('brown'),
('dog')
INSERT INTO word_string ( string_id, word_id, word_order ) VALUES
( 0, 0, 0 ),
( 0, 1, 3 ),
( 0, 2, 4 ),
( 1, 3, 1 ),
( 1, 4, 2 ),
( 1, 5, 3 ),
( 1, 6, 4 ),
( 1, 7, 5 ),
( 1, 8, 7 ),
( 1, 9, 8 ),
( 1, 10, 9 )
// Sample query - find all strings containing 'fox' and 'quick'
SELECT
UNIQUE string.id, string.string
FROM
string
INNER JOIN word_string ON string.id=word_string.string_id
INNER JOIN word AS fox ON fox.word='fox' AND word_string.word_id=fox.id
INNER JOIN word AS quick ON quick.word='quick' AND word_string.word_id=word.id

You are correct that A is no good. B is also no good, as it fails to adhere to First Normal Form (each field must be atomic). There's nothing in your example that suggests you would gain by avoiding 1NF.
You want a table for your list of words with each word in its own row.

Related

php mysql update all fields

Sorry for asking a trivial question. I want to translate some of the fields of my database which has one million rows. So what I want to do is
to read field 1 and perform the translate function and write it to field 3 and respectively field 2 needs to be written into field 4.
initial table
field id|field 1 |field 2 |field 3|field 4|
1 | apple | pear | empty |empty |
2 | banana | pineapple | empty |empty |
end result table translate(apple) - yabloko
field id|field 1 |field 2 |field 3|field 4|
1 | apple | pear | yablogo |grusha |
2 | banana | pineapple | banan |ananas |
I already have the translate function, the question is how to perform
this on all one million rows. How to construct the loop through it correctly? (surely there are some IDs missing, as some of the data was removed).
thank you so much in advance!!!
Rather than "construct a loop" and process row by row, the normative pattern would be to perform the operation in a single statement.
I'd populate a translation table:
CREATE TABLE my_translation
( old_word VARCHAR(100) NOT NULL PRIMARY KEY
, new_word VARCHAR(100)
) Engine=InnoDB;
INSERT INTO my_translation (old_word, new_word) VALUES
('apple' ,'yablogo')
,('pear' ,'grush')
,('banana' ,'banan')
,('pineapple','ananas);
Then do an update. The tricky part is leaving field_3 and field_4 unmodified if there's no match.
UPDATE my_table t
LEFT
JOIN my_translation c3
ON c3.old_word = t.field_1
LEFT
JOIN my_translation c4
ON c4.old_word = t.field_2
SET t.field_3 = IF(c3.old_word IS NULL,t.field_3,c3.new_word)
, t.field_4 = IF(c4.old_word IS NULL,t.field_4,c4.new_word)
NOTE: If this is a one-time operation, I might consider doing this as an INSERT into a new table, and then swapping the table names and changing foreign key references, to put the new table in place of the old table.

WHERE vs HAVING in generated queries

I know that this title is overused, but it seems that my kind of question is not answered yet.
So, the problem is like this:
I have a table structure made of four tables (tables, rows, cols, values) that I use to recreate the behavior of the information_schema (in a way).
In php I am generating queries to retrieve the data, and the result would still look like a normal table:
SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')
HAVING (col2 LIKE "%4%")
OR
SELECT * FROM
(SELECT
(SELECT value FROM `values` WHERE `col` = "3" and row = rows.id) as "col1",
(SELECT value FROM `values` WHERE `col` = "4" and row = rows.id) as "col2"
FROM rows WHERE `table` = (SELECT id FROM tables WHERE name = 'table1')) d
WHERE col2 LIKE "%4%"
note that the part where I define the columns of the result is generated by a php script. It is less important why I am doing this, but I want to extend this algorithm that generates the queries for a broader use.
And we got to the core problem, I have to decide if I will generate a where or a having part for the query, and I know when to use them both, the problem is my algorithm doesn't and I have to make a few extra checks for this. But the two above queries are equivalent, I can always put any query in a sub-query, give it an alias, and use where on the new derived table. But I wonder if I will have problems with the performance or not, or if this will turn back on me in an unexpected way.
I know how they both work, and how where is supposed to be faster, but this is why I came here to ask. Hopefully I made myself understood, please excuse my english and the long useless turns of phrases, and all.
EDIT 1
I already know the difference between the two, and all that implies, my only dilemma is that using custom columns from other tables, with variable numbers and size, and trying to achieve the same result as using a normally created table implies that I must use HAVING for filtering the derived tables columns, at the same time having the option to wrap it up in a subquery and use where normally, this probably will create a temporary table that will be filtered afterwards. Will this affect performance for a large database? And unfortunately I cannot test this right now, as I do not afford to fill the database with over 1 billion entries (that will be something like this: 1 billion in rows table, 5 billions in values table, as every row have 5 columns, 5 rows in cols table and 1 row in tables table = 6,000,006 entries in total)
right now my database looks like this:
+----+--------+-----------+------+
| id | name | title | dets |
+----+--------+-----------+------+
| 1 | table1 | Table One | |
+----+--------+-----------+------+
+----+-------+------+
| id | table | name |
+----+-------+------+
| 3 | 1 | col1 |
| 4 | 1 | col2 |
+----+-------+------+
where `table` is a foreign key from table `tables`
+----+-------+-------+
| id | table | extra |
+----+-------+-------+
| 1 | 1 | |
| 2 | 1 | |
+----+-------+-------+
where `table` is a foreign key from table `tables`
+----+-----+-----+----------+
| id | row | col | value |
+----+-----+-----+----------+
| 1 | 1 | 3 | 13 |
| 2 | 1 | 4 | 14 |
| 6 | 2 | 4 | 24 |
| 9 | 2 | 3 | asdfghjk |
+----+-----+-----+----------+
where `row` is a foreign key from table `rows`
where `col` is a foreign key from table `cols`
EDIT 2
The conditions are there just for demonstration purposes!
EDIT 3
For only two rows, it seems there is a difference between the two, the one using having is 0,0008 and the one using where is 0.0014-0.0019. I wonder if this will affect performance for large numbers of rows and columns
EDIT 4
The result of the two queries is identical, and that is:
+----------+------+
| col1 | col2 |
+----------+------+
| 13 | 14 |
| asdfghjk | 24 |
+----------+------+
HAVING is specifically for GROUP BY, WHERE is to provide conditional parameters. See also WHERE vs HAVING
I believe the having clause would be faster in this case, as you're defining specific values, as opposed to reading through the values and looking for a match.
See: http://database-programmer.blogspot.com/2008/04/group-by-having-sum-avg-and-count.html
Basically, WHERE filters out columns before passing them to an aggregate function, but HAVING filters the aggregate function's results.
you could do it like that
WHERE col2 In (14,24)
your code WHERE col2 LIKE "%4%" is bad idea so what about col2 = 34 it will be also selected.

Storing variable number of values of something in a database

I'm developing a QA web-app which will have some points to evaluated assigned to one of the following Categories.
Call management
Technical skills
Ticket management
As this aren't likely to change it's not worth making them dynamic but the worst point is that points are like to.
First I had a table of 'quality' which had a column for each point but then requisites changed and I'm kinda blocked.
I have to store "evaluations" that have all points with their values but maybe, in the future, those points will change.
I thought that in the quality table I could make some kind of string that have something like that
1=1|2=1|3=2
Where you have sets of ID of point and punctuation of that given value.
Can someone point me to a better method to do that?
As mentioned many times here on SO, NEVER PUT MORE THAN ONE VALUE INTO A DB FIELD, IF YOU WANT TO ACCESS THEM SEPERATELY.
So I suggest to have 2 additional tables:
CREATE TABLE categories (id int AUTO_INCREMENT PRIMARY KEY, name VARCHAR(50) NOT NULL);
INSERT INTO categories VALUES (1,"Call management"),(2,"Technical skills"),(3,"Ticket management");
and
CREATE TABLE qualities (id int AUTO_INCREMENT PRIMARY KEY, category int NOT NULL, punctuation int NOT nULL)
then store and query your data accordingly
This table is not normalized. It violates 1st Normal Form (1NF):
Evaluation
----------------------------------------
EvaluationId | List Of point=punctuation
1 | 1=1|2=1|3=2
2 | 1=5|2=6|3=7
You can read more about Database Normalization basics.
The table could be normalized as:
Evaluation
-------------
EvaluationId
1
2
Quality
---------------------------------------
EvaluationId | Point | Punctuation
1 | 1 | 1
1 | 2 | 1
1 | 3 | 2
2 | 1 | 5
2 | 2 | 6
2 | 3 | 7

(My)SQL Query to search for multiple values on multiple tables (some rows, some columns)

I am creating a search using MySQL & PHP on an existing table structure.
Multiple search keywords can be entered and the user can opt to either match ALL or ANY. The any form is not too difficult, but i am breaking my head on writing an efficient solution for the AND form.
The following is about the AND form, so all the search keywords must be found.
The 2 tables i have to work with (search in) have a structure as follows:
Table1
- item_id (non-unique)
- text
Table2
- item_id (unique)
- text_a
- text_b
- text_c
(The real solution will also have a 3rd table, but that is structure the same way as Table1. Table2 will have around 20 searchable columns)
Table1 can have multiple rows for each item_id with different text.
Consider having only 2 search keywords (can be more in real live), then both must exist in:
- both in a single row/column
or:
- in 2 different columns of maybe different tables.
or:
- in 2 different rows with the same item_id (in case of both keywords found in different rows of Table1)
All i could come up with are very intensive sub-queries but that would bring the server down or the response times would be huge.
As i am using PHP i could use intermediate queries and store the results for use in a later final query.
Anyone some good suggestions?
Edit: There where requests for real examples, so here it goes.
Consider the following 2 tables with data:
Table 1
+---------+-----------+-----------+-----------+-----------+
| item_id | t1_text_a | t1_text_b | t1_text_c | t1_text_d |
+---------+-----------+-----------+-----------+-----------+
| 1 | aaa bbb | NULL | ccc | ddd |
| 2 | aaa ccc | ddd | fff | ggg |
| 3 | bbb | NULL | NULL | NULL |
+---------+-----------+-----------+-----------+-----------+
Table2
+---------+----------+---------+
| item_id | sequence | t2_text |
+---------+----------+---------+
| 1 | 1 | kkk lll |
| 2 | 1 | kkk |
| 2 | 2 | lll |
| 3 | 1 | mmm |
+---------+----------+---------+
PS In the real database (which i can not change, so full text indexes or changes to table definition are not an option) Table1 has about 20 searchable columns and there are 2 tables like Table2. This should not make a difference to the solution, although it is something to consider from a performance perspective.
Example searches:
Keywords: aaa bbb
Should return:
- item_id=1. Both keywords are found in column t1_text_a.
Keywords: ccc ddd
Should return:
- item_id=1. "ccc" is found in t1_text_c, "ddd" is found in t1_text_d.
- item_id=2. "ccc" is found in t1_text_a, "ddd" is found in t1_text_b.
Keywords: kkk lll
Should return:
- item_id=1. Both keywords found in a single row of Table2 in column t2_text.
- item_id=2. Both keywords found in Table2, but in separate rows with the same item_id.
Keywords: bbb mmm
Should return:
- item_id=3. "bbb" is found in table1.t1_text_a, "mmm" is found in table2.t2_text.
My progress so far
I actually, for now, gave up on trying to catch this in mostly SQL.
What i did do is to create a query for each table retrieving any row that matches at least 1 of the search keywords. If there is only 1 search keyword the query uses a LIKE, otherwise a REGEXP 'keyword1|keyword2'.
These rows are put in a PHP array with the item_id as the index, and a concatenation of all the strings (searchable columns) as value. When finished retrieving all possible rows, i search the array for rows that match all keywords in the concatenated field.
Most likely not the best solution and it will not scale very well if the search will return many candidate rows with at least 1 match.
It's hard to provide you with a finite answer since you do not give a lot of details about your case.
But maybe this can give you a starting point:
SELECT * FROM table1 AS tbl1
INNER JOIN table2 AS tbl2
WHERE
tbl1.text LIKE %search_word1%
AND tbl1.text LIKE %search_word2%
AND tbl2.text_a LIKE %search_word1%
AND tbl2.text_a LIKE %search_word2%
AND tbl2.text_b LIKE %search_word1%
AND tbl2.text_b LIKE %search_word2%
AND tbl2.text_c LIKE %search_word1%
AND tbl2.text_c LIKE %search_word2%
You can adapt with JOIN, INNER JOIN, LEFT JOIN, RIGHT JOIN and the different LIKE and AND/OR statements to obtain the result you're looking for.
Google some join examples with LIKE statements for more details.
But as Tom H. said, it'd be better if you could post a more precise table structure and a real exemple of search terms...

How do I track changes and store calculated content in Nermalization?

I'm trying to create a table like this:
lives_with_owner_no from until under_the_name
1 1998 2002 1
3 2002 NULL 1
2 1997 NULL 2
3 1850 NULL 3
3 1999 NULL 4
2 2002 2002 4
3 2002 NULL 5
It's the Nermalization example, which I guess is pretty popular.
Anyway, I think I am just supposed to set up a dependency within MySQL for the from pending a change to the lives_with table or the cat_name table, and then set up a dependency between the until and from column. I figure the owner might want to come and update the cat's info, though, and override the 'from' column, so I have to use PHP? Is there any special way I should do the time stamp on the override (for example, $date = date("Y-m-d H:i:s");)? How do I set up the dependency within MySQL?
I also have a column that can be generated by adding other columns together. I guess using the cat example, it would look like:
combined_family_age family_name
75 Alley
230 Koneko
132 Furrdenand
1,004 Whiskers
Should I add via PHP and then input the values with a query, or should I use MySQL to manage the addition? Should I use a special engine for this, like MemoryAll?
I disagree with the nermalization example on two counts.
There is no cat entity in the end. Instead, there is a relation (cat_name_no, cat_name), which in your example has the immediate consequence that you can't tell how many cats named Lara exist. This is an anomaly that can easily be avoided.
The table crams two relations, lives_with_owner and under_the_name into one table. That's not a good idea, especially if the data is temporal, as it creates all kinds of nasty anomalies. Instead, you should use a table for each.
I would design this database as follows:
create table owner (id integer not null primary key, name varchar(255));
create table cat (id integer not null primary key, current_name varchar(255));
create table cat_lives_with (
cat_id integer references cat(id),
owner_id integer references owner(id),
valid_from date,
valid_to date);
create table cat_has_name (
cat_id integer references cat(id),
name varchar(255),
valid_from date,
valid_to date);
So you would have data like:
id | name
1 | Andrea
2 | Sarah
3 | Louise
id | current_name
1 | Ada
2 | Shelley
cat_id | owner_id | valid_from | valid_to
1 | 1 | 1998-02-15 | 2002-08-11
1 | 3 | 2002-08-12 | 9999-12-31
2 | 2 | 2002-01-08 | 2001-10-23
2 | 3 | 2002-10-24 | 9999-12-31
cat_id | name | valid_from | valid_to
1 | Ada | 1998-02-15 | 9999-12-31
2 | Shelley | 2002-01-08 | 2001-10-23
2 | Callisto | 2002-10-24 | 9999-12-31
I would use a finer grained date type than just year (in the nermalization example having 2002-2002 as a range can really lead to messy query syntax), so that you can ask queries like select cat_id from owner where '2000-06-02' between valid_from and valid_to.
As for the question of how to deal with temporal data in the general case: there's an excellent book on the subject, "Developing Time-Oriented Database Applications in SQL" by Richard Snodgrass (free full-text PDF distributed by Richard Snodgrass), which i believe can even be legally downloaded as pdf, Google will help you with that.
Your other question: you can handle the combined_family_age either in sql externally, or, if that column is needed often, with a view. You shouldn't manage the content manually though, let the database calculate that for you.

Categories