I working on a food database, every food has a list of properties (fats, energy, vitamins, etc.)
These props are composed by 50 different columns of proteins, fat, carbohydrates, vitamins, elements, etc.. (they are a lot)
the number of columns could increase in the future, but not too much, 80 for extreme case
Each column needs an individual reference to one bibliography of a whole list from another table (needed to check if the value is reliable or not).
Consider the ids, should contain a number, a NULL val, or 0 for one specific exception reference (will point to another table)
I've though some solution, but they are very different eachothers, and I'm a rookie with db, so I have no idea about the best solution.
consider value_1 as proteins, value_2 as carbohydrates, etc..
The best (I hope) 2 alternatives I thought are:
(1) create one varchar(255?) column, with all 50 ids, so something like this:
column energy (7.00)
column carbohydrates (89.95)
column fats (63.12)
column value_bil_ids (165862,14861,816486) ## as a varchar
etc...
In this case, I can split it with "," to an array and check the ids, but I'm still worried about coding praticity... this could save too many columns, but I don't know how much could be pratical in order to scalability too.
Principally, I thought this option usual for query optimization (I hope!)
(2) Simply using an additional id column for every value, so:
column energy (7.00)
column energy_bibl_id (165862)
column carbohydrates (89.95)
column carbohydrates_bibl_id (14861)
column fats (63.12)
column fats_bibl_id (816486)
etc...
It seems to be a weightful number of columns, but much clear then first, especially for the relation of any value column and his ID.
(3) Create a relational table behind values and bibliographies, so
table values
energy
carbohydrates
fats
value_id --> point to table values_and_bibliographies val_bib_id
table values_and_bibliographies
val_bib_id
energy_id --> point to table bibliographies biblio_id
carbohydrates_id --> point to table bibliographies biblio_id
fats_id --> point to table bibliographies biblio_id
table bibliographies
biblio_id
biblio_name
biblio_year
I don't know if these are the best solutions, and I shall be grateful if someone will help me to bring light on it!
You need to normalize that table. What you are doing is madness and will cause you to loose hair. They are called relational databases so you can do what you want without adding of columns. You want to structure it so you add rows.
Please use real names and we can whip a schema out.
edit Good edit. #3 is getting close to a sane design. But you are still very unclear about what a bibliography is doing in a food schema! I think this is what you want. You can have a food and its components linked to a bibliography. I assume bibliography is like a recipe?
FOODS
id name
1 broccoli
2 chicken
COMPONENTS
id name
1 carbs
2 fat
3 energy
BIBLIOGRAPHIES
id name year
1 chicken soup 1995
FOOD_COMPONENTS links foods to their components
id food_id component_id bib_id value
1 1 1 1 25 grams
2 1 2 1 13 onces
So to get data you use a join.
SELECT * from FOOD_COMPONENTS fc
INNER JOIN COMPONENTS c on fc.component_id = c.id
INNER JOIN FOODS f on fc.foods_id = f.id
INNER JOIN BIBLIOGRAPHIES b on fc.bib_id = b.id
WHERE
b.name = 'Chicken Soup'
You seriously need to consider redesiging your database structure - it isn't recommended to keep adding columns to a table when you want to store additional data that relates to it.
In a relational database you can relate tables to one another through the use of foreign keys. Since you want to store a bunch of values that relate to your data, create a new table (called values or whatever), and then use the id from your original table as a foreign key in your new table.
Such a design that you have proposed will make writing queries a major headache, not to mention the abundance of null values you will have in your table assuming you don't need to fill every column..
Here's one approach you could take to allow you to add attributes all day long without changing your schema:
Table: Food - each row is a food you're describing
Id
Name
Description
...
Table: Attribute - each row is a numerical attribute that a food can have
Id
Name
MinValue
MaxValue
Unit (probably a 'repeating group', so should technically be in its own table)
Table: Bibliography - i don't know what this is, but you do
Id
...
Table: FoodAttribute - one record for each instance of a food having an attribute
Food
Attribute
Bibliography
Value
So you might have the following records
Food #1 = Cheeseburger
Attribute #1 = Fat (Unit = Grams)
Bibliography #1 = whatever relates to cheeseburgers and fat
Then, if a cheeseburger has 30 grams of fat, there would be an entry in the FoodAttribute table with 1 in the Food column, 1 in the Attribute column, a 1 in the Bibliography column, and 30 in the Value column.
(Note, you may need some other mechanisms to deal with non-numeric attributes.)
Read about Data Modeling and Database Normalization for more info on how to approach these types of problems...
Appending more columns to a table isn't recommended nor popular in the DB world, except with a NoSQL system.
Elaborate your intentions please :)
Why, for the love of $deity, are you doing this by columns? That way lies madness!
Decompose this table into rows, then put a column on each row. Without knowing more about what this is for and why it is like it is, it's hard to say more.
I re-read your question a number of times and I believe you are in fact attempting a relational schema and your concern is with the number of columns (you mention possibly 80) associated with a table. I assure you that 80 columns on a table is fine from a computational perspective. Your database can handle it. From a coding perspective, it may be high.
Proposed (1) Will fail when you want to add a column. You're effectively storing all your columns in a comma delimited single column. Bad.
I don't understand (2). It sounds the same as (3)
(3) is correct in spirit, but your example is muddled and unclear. Whittle your problem down to a simple case with five columsn or something and edit your question or post again.
In short, don't worry about number of columns right now. Low on the priority list.
If you have no need to form queries based on arbitrary key/value pairs you'd like to add to every record, you could in a pinch serialize()/unserialize() an associative array and put that into a single field
Related
I'm making a mysql database that has one table for each student in a school, and in each table it then has the timetable of each student. I need to be able to run a script that will search every table in the database and every column for 2 values. For example, it needs to search all tables and columns for teacher "x" where day_week = MondayA. In the table, there are 11 columns total, one for the day_week then 5 for period lesson (so period 1 lesson, period 2 lesson ect) then another 5 for the teacher they have for each period.
Any help would be much appreciated.
Thanks.
Fix your schema
First of all, your schema sounds very bad. Every time you add a new student, you have to change it (add a new table), and if this were for a real school, that would be an absolute disaster! Changing the schema is more expensive than simply inserting a row into a table, and if your web application can directly change the database, then any security exploits that might be exposed could potentially lead to people messing with your tables without you realizing it.
On top of that, it makes querying, say, the number of students an absolute pain. Ideally, your data should be laid out in a way that lets you answer any and all questions you might ever have for it. Not just questions you have now, but further down the road.
And if that's not bad enough, it makes querying a nightmare. You have to keep track of the number of tables somehow, and their names, so that every time you query information it's running an entirely different query. Some queries, like 'List students that joined in the last year', grow in size, complexity, and time to run as the list of students (the number of tables) grows. This may be what you're running into already, though it's hard to tell simply from your question.
Normalization
Normalization is, put simply, 'Designing the schema well'. It's a bit of a vague topic, but it's broken down into varying levels; and each level depends on the last.
To be perfectly honest, I don't understand the wording of the different levels, and I'm a little bit of a newb at databases myself, but here is the gist of normalization, from what I've been taught:
Every value means one, small, simple thing
Basically, don't go crazy and put a bunch of stuff in a single column. It's bad design to have a column like, 'Categories', and the value be a long string that reads like, "Programming, Databases, Web Development, MySQL, Cows".
First of all, parsing strings is time consuming, especially the longer they are, and second of all, if those categories are associated with anything else - like, perhaps you have a table of categories for people to choose from - then now you're checking larger strings for the contents of smaller strings. If you want to pull up every item of a certain category, you will be matching that string against the ENTIRE database... Which can be excruciatingly slow.
I'm not sure if this is part of normalization, but what I've learned to do is to make a numeric 'ID' for everything I refer to in more than one table. For example, instead of a database table that has the columns 'Name', 'Address', 'Birthday', I'll have, 'ID', 'Name', 'Address', 'Birthday'. ID would be a unique number for every row, a primary key, and if at any time I wanted to refer to ANY of the people in it, I'd just use that number.
Numbers are much quicker to compare/match, much quicker to look up, and overall much nicer for the database to deal with, and let you create queries that run at very tiny fractions of the amount of time as with a string-based database.
To complete the example, you could have three tables; say, 'Articles', 'Categories', and 'Article_Categories'.
'Articles' would hold all the actual articles and their properties. Something like, 'ID', 'Title', 'Content'.
'Categories' would hold all of the individual categories available, with 'ID' and 'Category' fields.
'Article_Categories' would hold the combinations of articles to categories; a unique combination of 'Article_ID' and 'Category_ID'.
What this might look like:
Articles
1, 'Web Cow Geniuses', 'Cows have been shown to know how to create great databases for websites using MySQL.';
2, 'Why to use MySQL', "It's free, duh!";
Categories
1, Cows;
2, Databases;
3, MySQL;
4, Programming;
5, Web Development;
Article_Categories
1, 1;
1, 2;
1, 3;
1, 4;
1, 5;
2, 2;
2, 3;
Notice that each combination in 'Article_Categories' is unique; you never see, for example, '1, 3' twice. But '1' is in the first column multiple times, and '3' is in the second column multiple times.
This is called a 'many to many' table. You use it when you have a relationship between two data sets, where there are multiple combinations for mixing them. Essentially, where any number of items in one can correspond to any number of items from the other.
Do not mix data and metadata
Basically, data is the content of the tables. The values inside the rows. Metadata is the tables themselves; the table names, the value types, and the relationships between two different sets of data.
Metadata inside data
Here's an example of putting metadata inside data:
A 'People' table that has, as columns, 'isStudent' and 'isTeacher'.
When data is put in 'People', you might have a row where they are both a teacher and a student, so you put something like 'ID', 'Name', 'yes', 'yes'. This doesn't sound bad, and there may well be a teacher who's taking classes at the same school so it is possible.
However, it takes up more space since you have to have a value of some sort in both columns, even if they are only one or the other.
A better way to make this would be to split it out into three separate tables:
A 'People' table that has an ID, name, and other data that every person has.
A 'Students' table that uses only the values of the 'People.ID' as data.
A 'Teachers' table that uses only the values of the 'People.ID' as data.
This way, everybody who is a student gets referenced to in 'Students', and everyone who's a teacher gets referenced in 'Teachers'. As mentioned previously, we use the 'ID' field because it's quicker to match up across tables. Now, there are only as many Teachers referenced as there need to be, and the same goes for Students. This initially takes up more space due to the size overhead of having them as separate tables, but as the database grows, this is more than made up for.
This also allows you to reference teachers directly. Say you have a table of 'Classes', and you only want Teachers capable of being the, well, Teacher. Your 'Classes' table, in the 'Teachers' column, can have a foreign key to 'Teachers.ID'. That way, if a Student hacks the database and tries to put themselves as teaching a class somehow, it's impossible for them to do so.
Data inside metadata
This is quite similar to what you appear to be having problems with.
Data is, essentially, what it is we are trying to store. Student names, teacher names, schedules for both, etc. However, sometimes we put data - like a student's name - inside of metadata - like the name of a table.
Whenever you see yourself regularly adding onto or changing the schema of a database, it is a HUGE sign that you are putting data inside of metadata. In your case, every student having their own table is essentially putting their name in the metadata.
Now, there are times where you kinda want to do this, when the number of tables will not change THAT often. It can make things simpler.. For example, if you have a website selling underwear, you might have both 'Mens_Products' and 'Womens_Products' tables. Obviously the 'neater' solution would be to have a 'Product_Categories' table, in case you want to add transgender products or other sell products to both genders, but in this case it doesn't matter that much. It wouldn't be hard to add a 'Trans_Products' table, and it's not like you'd be adding new tables frequently.
Do not duplicate data
At first, this'll sound like I'm contradicting EVERYTHING I've just said. "How am I supposed to copy those IDs everywhere if I'm not supposed to duplicate data?!" But alas, that's not exactly what I mean. In fact, this is another reason for having a separate ID for each item you might refer to!
Essentially, you don't want to have to update more data than you need to. If, for example, you had a 'Birthday' column in your 'Students' and your 'Teachers' tables in the above example, and you had someone who was both a Student and a Teacher, suddenly their birthday is recorded in two different spots! Now, what if the birthday was wrong, and you wanted to change it? You'd have to change it twice!
So instead, you put it in your 'People' table. That way, for each person, it only exists once.
This might seem like an obvious example, but you'd be surprised at how often it can occur by accident. Just be careful, and watch for anything that requires you to update the same value in two different locations.
Queries
So, with all that out of the way, how should you query? What sort of SELECT statement should you use?
Lets say you have the following schema (primary key in bold):
People:
ID
Name (Unique)
Birthday
Teachers:
People_ID (Foreign: People.ID)
Students:
People_ID (Foreign: People.ID)
Classes:
ID
Name (Unique)
Teacher_ID (Foreign: Teachers.ID)
Class_Times:
Class_ID (Foreign: Classes.ID)
Day (Enum: 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday')
Start_Time
Student_Classes:
Student_ID (Foreign: Students.ID)
Class_ID (Foreign: Classes.ID)
First note that 'Student_Classes' has two primary keys... This makes the combination of the two unique, not the individual ones. This makes it a many-to-many table, as discussed earlier. I did this also for 'Class_ID' and 'Day' so that you wouldn't put the class twice on the same day.
Also, it may be bad that we use an Enum for the days of the week... If we wanted to add Sunday classes, we'd have to change it, which is a change in the schema, which could potentially break things. However, I didn't feel like adding a 'Days' table and all that.
At any rate, if you wanted to find all of the teachers who were teaching on a Monday, you could just do this:
SELECT
People.Name
FROM
People
LEFT JOIN
Teachers
ON
People.ID = Teachers.People_ID
LEFT JOIN
Classes
ON
People.ID = Classes.Teacher_ID
LEFT JOIN
Class_Times:
ON
Classes.ID = Class_Times.Class_ID
WHERE
Class_Times.Day = 'Monday';
Or, formatted in one big long string (like it'll be when you put it in your other programming langauge):
SELECT People.Name FROM People LEFT JOIN Teachers ON People.ID = Teachers.People_ID LEFT JOIN Classes ON People.ID = Classes.Teacher_ID LEFT JOIN Class_Times: ON Classes.ID = Class_Times.Class_ID WHERE Class_Times.Day = 'Monday';
Essentially, here is what we do:
Select the main thing we want, the teacher's name. The name is stored in the 'People' table, so we select from that first.
We then left join it to the 'Teachers' table, telling it that all of the People we select must be a Teacher.
After that, we do the same with 'Classes'; narrowing it down to only Classes that the Teacher actually teaches themselves.
Then we also grab 'Class_Times' (important for the final step), but only for those Classes that the Teacher is teaching.
Finally, we specify that the Day the Class takes place must be a 'Monday'.
First, it's worth noting this is probably not the best approach. A table per student sounds like a bad idea. You are going to be generating massive amounts of dynamic queries and not able to leverage indexing, so performance will suffer. I would highly recommend finding an approach to get the tables into one table and time series into a join table. Or look at a noSQL (non-relational approach). A document database seems like it might be a fit here.
That said, to answer your question: You need to query the schema (information_schema tables) for lists of tables and columns and then loop through querying the tables.
Start with the mysql docs here on information_schema
You need to create one table for students and one for timetable and have foreign key of student in timetable. Use best practices, consider you have 1000 students, you will end up creating 1000 tables while database is there is make life easier. Create one table, add as many entries as you want.
Secondly, ask your question more clearly using this structure so we may be able to help you
Table 1: Student:
id firstName lastName
Table 2: Schedule:
studentID day period classID
studentID(relates to Student.id)
classID(relates to Classes.id)
Table 3: Classes:
id className teacherName
BOLD is primary key
This will gather all students that have that teacher:
Select S1.firstName, S1.lastName, C.teacherName from Student as S1 join Schedule as S2 join Classes as C where S1.id = S2.studentID and S2.classID = C.id and C.teacherName = XXXX
This will gather all students that are in a certain class:
Select S1.firstName, S1.lastName from Student as S1 join Schedule as S2 where S1.id = S2.studentID and S2.classID = XXXX
I had a hard time summing up my question. Basically:
There's a table called "files". Files holds an entry called "grades". It is used to identify the particular grade level a file might be useful for. Because a file can be useful for > 1 grade level, I store things like this
If it's only good for 3rd grade
grades: 3
If it's good for 3rd, 4th and 5th:
grades: 3,4,5
etc etc.
When putting together a SQL query to retrieve these files, I ran into a weird issue- Basically a user can say "I only want things that are good for 2nd and 3rd grade". So I should look for files that have "2,3" in the Grades area. Easy! BUT!
It could also have "1,2,3" or "2,3,4" or "2,4".
I;m getting a headache just thinking about it. It's easy enough to parse those entries via the commas to get "1" and "2", but what's the most efficient way to match a SQL record to the query? It seems like a waste to get EVERY RECORD in the DB, parse them down and then match them up again.
Is it better to go back to square one and create a DB called "files" and individual tables for each grade? That also seems like a waste- Writing multiple records for one file.
What's the solution here? I'm a little flummoxed.
several options here...
1) store the grades as an integer where each grade corresponds to a bit. grade 1 = bit 0, grade 2 = bit 1, grade 3 = bit 2, and so on. then grades 1,2,3 would correspond to 0x00000111 (8) and grades 2,4 would be 0x00001010 (10) etc; then querying becomes a simple matter of doing an AND comparison... if you want all rows where grades 2 and 4 are selected (and possibly others) then select * from files where (grades & 10) == true
2) if there are only a relatively few grades you could store each as a boolean column.
3) store the grades in a separate table and then the relationship between grades and files n a 3rd join table (since it is a many to many relationship).
To elaborate on what #emh said. Best option IMHO, would be having a grades table that connects to the files table on the file id (#3). You can then store the connection between grade and file in a new row each time (if the connection doesn't already exist)
tbl_file_grades
-----------
file_id
grade
When you're doing the search, you can join the two tables and filter the search by the grade column.
SELECT files.file_info FROM files
INNER JOIN tbl_file_grades ON files.file_id = tbl_file_grades.file_id
WHERE tbl_file_grades.grade = 1 AND tbl_file_grades.grade = 2 ...
I'm not sure whether the extra table for grades is necessary. That would depend on your needs. It seems like if you're happy without it now, then it isn't all that important to have.
And also, most important, welcome to SO.
I have a table which would contain information about a certain month, and one column in that row would have mysql row id's for another table in it to grab multiple information from
is there a more efficent way to get the information than exploding the ids and doing seperate sql queryies on each... here is an example:
Row ID | Name | Other Sources
1 Test 1,2,7
the Other Sources has the id's of the rows from the other table which are like so
Row ID | Name | Information | Link
1 John | No info yet? | http://blah.com
2 Liam | No info yet? | http://blah.com
7 Steve| No info yet? | http://blah.com
and overall the information returned wold be like the below
Hi this page is called test... here is a list of our sources
- John (No info yet?) find it here at http://blah.com
- Liam (No info yet?) find it here at http://blah.com
- Steve (No info yet?) find it here at http://blah.com
i would do this... i would explode the other sources by , and then do a seperate SQL query for each, i am sure there could be a better way?
Looks like a classic many-to-many relationship. You have pages and sources - each page can have many sources and each source could be the source for many pages?
Fortunately this is very much a solved problem in relational database design. You would use a 3rd table to relate the two together:
Pages (PageID, Name)
Sources (SourceID, Name, Information, Link)
PageSources (PageID, SourceID)
The key for the "PageSources" table would be both PageID and SourceID.
Then, To get all the sources for a page for example, you would use this SQL:
SELECT s.*
FROM Sources s INNER JOIN PageSources ps ON s.SourceID = ps.SourceID
AND ps.PageID = 1;
Not easily with your table structure. If you had another table like:
ID Source
1 1
1 2
1 7
Then join is your friend. With things the way they are, you'll have to do some nasty splitting on comma-separated values in the "Other Sources" field.
Maybe I'm missing something obvious (been known to), but why are you using a single field in your first table with a comma-delimited set of values rather than a simple join table. The solution if do that is trivial.
The problem with these tables is that having a multi-valued column doesn't work well with SQL. Tables in this format are considered to be normalized, as multi-valued columns are forbidden in First Normal Form and above.
First Normal Form means...
There's no top-to-bottom ordering to the rows.
There's no left-to-right ordering to the columns.
There are no duplicate rows.
Every row-and-column intersection contains exactly one
value from the applicable domain (and
nothing else).
All columns are regular [i.e. rows have no hidden components such as
row IDs, object IDs, or hidden timestamps].
—Chris Date, "What First Normal Form Really Means", pp. 127-8[4]
Anyway, the best way to do it is to have a many to many relationship. This is done by putting a third table in the middle, like Dominic Rodger does in his answer.
Here is the scenario 1.
I have a table called "items", inside the table has 2 columns, e. g. item_id and item_name.
I store my data in this way:
item_id | item_name
Ss001 | Shirt1
Sb002 | Shirt2
Tb001 | TShirt1
Tm002 | TShirt2
... etc, i store in this way:
first letter is the code for clothes, i.e S for shirt, T for tshirt
second letter is size, i.e s for small, m for medium and b for big
Lets say in my items table i got 10,000 items. I want to do fast retrieve, lets say I want to find a particular shirt, can I use:
Method1:
SELECT * from items WHERE item_id LIKE Sb99;
or should I do it like:
Method2:
SELECT * from items WHERE item_id LIKE S*;
*Store the result, then execute second search for the size, then third search for the id. Like the hash table concept.
What I want to achieve is, instead of search all the data, I want to minimize the search by search the clothes code first, follow by size code and then id code. Which one is better in term of speed in mysql. And which one is better in long run. I want to reduce the traffic and not to disturb the database so often.
Thanks guys for solving my first scenario. But another scenario comes in:
Scenario 2:
I am using PHP and MySQL. Continue from the preivous story. If my users table structure is like this:
user_id | username | items_collected
U0001 | Alex | Ss001;Tm002
U0002 | Daniel | Tb001;Sb002
U0003 | Michael | ...
U0004 | Thomas | ...
I store the items_collected in id form because one day each user can collect up to hundreds items, if I store as string, i.e. Shirt1, pants2, ..., it would required a very large amount of database spaces (imagine if we have 1000 users and some items name are very long).
Would it be easier to maintain if I store in id form?
And if lets say, I want to display the image, and the image's name is the item's name + jpg. How to do that? Is it something like this:
$result = Select items_collected from users where userid= $userid
Using php explode:
$itemsCollected = explode($result, ";");
After that, matching each item in the items table, so it would like:
shirt1, pants2 etc
Den using loop function, loop each value and add ".jpg" to display the image?
The first method will be faster - but IMO it's not the right way of doing it. I'm in agreement with tehvan about that.
I'd recommend keeping the item_id as is, but add two extra fields one for the code and one for the size, then you can do:
select * from items where item_code = 'S' and item_size = 'm'
With indexes the performance will be greatly increased, and you'll be able to easily match a range of sizes, or codes.
select * from items where item_code = 'S' and item_size IN ('m','s')
Migrate the db as follows:
alter table items add column item_code varchar(1) default '';
alter table items add column item_size varchar(1) default '';
update items set item_code = SUBSTRING(item_id, 1, 1);
update items set item_size = SUBSTRING(item_id, 2, 1);
The changes to the code should be equally simple to add. The long term benefit will be worth the effort.
For scenario 2 - that is not an efficient way of storing and retrieving data from a database. When used in this way the database is only acting as a storage engine, by encoding multiple data into fields you are precluding the relational part of the database from being useful.
What you should do in that circumstance is to have another table, call it 'items_collected'. The schema would be along the lines of
CREATE TABLE items_collected (
id int(11) NOT NULL auto_increment KEY,
userid int(11) NOT NULL,
item_code varchar(10) NOT NULL,
FOREIGN KEY (`userid`) REFERENCES `user`(`id`),
FOREIGN KEY (`itemcode`) REFERENCES `items`(`item_code`)
);
The foreign keys ensure that there is Referential integrity, it's essential to have referential integrity.
Then for the example you give you would have multiple records.
user_id | username | items_collected
U0001 | Alex | Ss001
U0001 | Alex | Tm002
U0002 | Daniel | Sb002
U0002 | Daniel | Tb001
U0003 | Michael | ...
U0004 | Thomas | ...
The first optimization would be splitting the id into three different fields:
one for type, one for size, one for the current id ending (whatever the ending means)
If you really want to keep the current structure, go for the result straight away (option 1).
If you want to speed up for results you should split up the column into multiple columns, one for each property.
Step 2 is to create an index for each column. Remember that mysql only uses one index per table per query. So if you really want speedy queries and your queries vary a lot with these properties, then you might want to create an index on (type,size,ending), (type,ending,size) etc.
For example a query with
select * from items where type = s and size = s and ending = 001
Can benefit from the index (type,size,ending) but:
select * from items where size = s and ending = 001
Can not, because the index will only be used in order, so it needs type, then size, then ending. This is why you might want multiple indexes if you really want fast searches.
One other note, generally it is not a good idea to use * in queries, but to select only the columns you need.
You need to have three columns for the model, size and id, and index them this way:
CREATE INDEX ix_1 ON (model, size, id)
CREATE INDEX ix_2 ON (size, id)
CREATE INDEX ix_3 ON (id, model)
Then you'll be able to search efficiently on any subset of the parameters:
model-size-id, model-size and model queries will use ix_1;
size-id and size queries will use ix_2;
model-id and id queries will use ix_3
Index on your column as it is now is equivalent to ix_1, and you can use this index to efficiently search on the appropriate conditions (model-size-id, model-size and model).
Actually, there is a certain access path called INDEX SKIN SCAN that may be used to search on non-first columns of a composite index, but MySQL does not support it AFAIK.
If you need to stick to your current design, you need to index the field and use queries like:
WHERE item_id LIKE #model || '%'
WHERE item_id LIKE #model || #size || '%'
WHERE item_id = #model || #size || #id
All these queries will use the index if any.
There is not need to put in into multiple queries.
I'm comfortable that you've designed your item_id to be searchable with a "Starts with" test. Indexes will solve that quickly for you.
I don't know MySQL, but in MSSQL having an index on a "Size" column that only has choices of S, M, L most probably won't achieve anything, the index won't be used because the values it contains are not sufficiently selective - i.e. its quicker to just go through all the data rather than "Find the first S entry in the index, now retrieve the data page for that row ..."
The exception is where the query is covered by the index - i.e. several parts of the WHERE clause (and indeed, all of them and also the SELECT columns) are included in the index. In this instance, however, the first field in the index (in MSSQL) needs to be selective. So put the column with the most distinct values first in the index.
Having said that if your application has a picklist for Size, Colour, etc. you should have those data attributes in separate columns in the record - and separate tables with lists of all the available Colours and Sizes, and then you can validate that the Colour / Size given to a Product is actually defined in the Colour / Size tables. Cuts down the Garbage-in / Garbage-out problem!
Your item_selected needs to be in a separate table so that it is "normalised". Don't store a delimited list in a single column, store it using individual rows in a separate table
Thus your USERS table will contain user_id & username
Your, new, items_collected table will contains user_id & item_id (and possibly also Date Purchased or Invoice Number)
You can then say "What did Alex buy" (your design has that) and also "Who bought Ss001" (which, in your design, would require ploughing through all the rows in your USERS table and splitting out the items_collected to find which ones contained Ss001 [1])
[1] Note that using LIKE wouldn't really be safe for that because you might have an item_id of "Ss001XXX" which would match WHERE items_collected LIKE '%Ss001%'
I am a new php and mysql programmer. I am handling quite large amount of data, and in future it will grow slowly, thus I am using hash table. I have couple of questions:
Does mysql have hash table built in function? If yes, how to use that?
After couple of days doing research about hash table. I briefly know what hash table is but I just could not understand how to start creating one. I saw a lot of hash table codes over the internet. Most of them, in the first step in to create a hashtable class. Does it mean, they store the hash table value in the temporary table instead of insert into mysql database?
For questions 3,4 & 5, example scenario:
User can collect items in the website. I would like to use hash table to insert and retrieve the items that the user collected.
[Important] What are the possible mysql database structure looks like?
e.g, create items and users table
in items table have: item_id, item_name, and item_hash_value
in users table have: user_id, username, item_name, item_hash_value
I am not sure if the users table is correct?
[Important] What are the steps of creating hash table in php and mysql?
(If there is any sample code would be great :))
[Important] How to insert and retrieve data from hash table? I am talking about php and mysql, so I hope the answers can be like: "you can use mysql query i.e SELECT * from blabla..."
(sorry about the italics, underscores can trigger them but I can't find a good way to disable that in the middle of a paragraph. Ignore the italics, I didn't mean to put them there)
You don't need to worry about using a hashtable with MySQL. If you intend to have a large number of items in memory while you operate on them a hashtable is a good data structure to use since it can find things much faster than a simple list.
But at the database level, you don't need to worry about the hashtable. Figuring out how to best hold and access records is MySQL's job, so as long as you give it the correct information it will be happy.
Database Structure
items table would be: item_id, item_name
Primary key is item_id
users table would be: user_id, username
Primary key is user_id
user_items table would be: user_id, item_id
Primary key is the combination of user_id and item_id
Index on item_id
Each item gets one (and only one) entry in the items table. Each user gets one (and only one) entry in the users table. When a user selects an item, it goes in the user items table. Example:
Users:
1 | Bob
2 | Alice
3 | Robert
Items
1 | Headphones
2 | Computer
3 | Beanie Baby
So if Bob has selected the headphones and Robert has selected the computer and beanie baby, the user_items table would look like this:
User_items (user_id, item_id)
1 | 1 (This shows Bob (user 1) selected headphones (item 1))
3 | 2 (This shows Robert (user 3) selected a computer (item 2))
3 | 3 (This shows Robert (user 3) selected a beanie baby (item 3))
Since the user_id and item_id on the users and items tables are primary keys, MySQL will let you access them very fast, just like a hashmap. On the user_items table having both the user_id and item_id in the primary key means you won't have duplicates and you should be able to get fast access (an index on item_id wouldn't hurt).
Example Queries
With this setup, it's really easy to find out what you want to know. Here are some examples:
Who has selected item 2?
SELECT users.user_id, users.user_name FROM users, user_items
WHERE users.user_id = user_items.user_id AND user_items.item_id = 2
How many things has Robert selected?
SELECT COUNT(user_items.item_id) FROM user_items, users
WHERE users.user_id = user_items.user_id AND users.user_name = 'Robert'
I want a list of each user and what they've selected, ordered by the user name
SELECT user.user_name, item.item_name FROM users, items, user_items
WHERE users.user_id = user_items.user_id AND items.item_id = user_items.item_id
ORDER BY user_name, item_name
There are many guides to SQL on the internet, such as the W3C's tutorial.
1) Hashtables do exist in MySQL but are used to keep internal track of keys on tables.
2) Hashtables work by hashing a data cell to create a number of different keys that separate the data by these keys making it easier to search through. The hashtable is used to find what the key is that should be used to bring up the correct list to search through.
Example, you have 100 items, searching 100 items in a row takes 10 seconds. If you know that they can be separated by type of item and break it up into 25 items of t-shirts, 25 items of clocks, items rows of watches, and items rows of shoes. Then when you need to find a t-shirt, you can only have to search through the 25 items of t-shirts which then takes 2.5 seconds.
3) Not sure what your question means, a MySQL database is a binary file that contains all the rows in the database.
4) As in #2 you would need to decide what you want your key to be.
5) #2 you need to know what your key is.
If you think a hash table is the right way to store your data, you may want to use a key-value database like CouchDB instead of MySQL. They show you how to get started with PHP.
I am a new php and mysql programmer. I am handling quite large amount of data, and in future it will grow slowly, thus I am using hash table.
lookin at your original purpose, use "memcache" instead, it is the most scalable solution while offers the minimal changes in your code, you can scale up the memcache servers as your data go larger and larger.