My question:
I have a mysql database that consists of something like a fact table (although not every field is a lookup) and a variety of other tables. When I want to display data from that "fact" table, is it necessary to run a query to each individual lookup or is there a way to make a temporary table that has already done the "looking up"?
Example:
Table structure -
unique_id(auto increment int),
model(int, lookup to table #2),
type(int, lookup from table #2 to table #3)
employee(int, lookup to table #4)
notes(text)
cost(float)
hours(float)
-
So for instance when I want to make a php page to enter this data it seems like a lot more "work" than it needs to be:
unique_id (not shown as a data entry field, increments automatically
on submit)
model (drop down box. population requires query to table #2 where status = X)
type (read-only text box shows type of model. Requires query to table #3 based on column from table #2)
employee (drop down box. population requires query to table #4 where employee_status = "Active")
notes (text box, user inputs related notes to submission)
cost (texts box, user enters costs related to submission)
hours (text box, user enters hours related to submission)
Just to get a simple form populated with valid data requires what seems to me like A LOT of queries/lookups.
Is this the best way? Is there a better way?
Aside: I have control over the data structure, so if the problem is the database design, then those suggestions would be helpful as well.
Dimension tables typically don't change very often, at least relative to the number of inserts to the fact table. Dimension tables are also individually much smaller than the fact table. This makes dimension tables good candidates for caching.
What some people do to good effect is to render the partial HTML output for the form, with all the data populated as dropdowns, radiobuttons, etc. Then store that partial HTML under a memcached key so you don't have to do any of the database queries or the HTML render for most PHP requests -- you just fetch the pre-populated HTML fragment out of memcached and echo it verbatim. I think of this like the "Ikea" of database-driven output.
Of course if you ever do change data in a dimension table, you'd want to invalidate the cached HTML, or even better re-generate it and store a new version of the HTML in memcached.
Regarding doing all the lookups, I'll point out that there's no requirement to use pseudokeys in a fact table. You can use the natural values, and make them reference the primary key of the dimension table, which also can be a natural key instead of a pseudokey. It might take a bit more space in some cases, but it eliminates the lookups. Of course it may make sense to continue using pseudokeys for dimensions that are long varchars.
I'm not quite sure what you mean by " a query to each individual lookup". Do you mean a way to save your entire table in your php script? Or do you mean a way to cache on the mysql server to eliminate process resources on the database node?
MySQL includes a built in Caching system that eliminates a lot of server cycles for similar queries. You can find more here-> MySQL Caching
As far as your database structure, you're going to have to provide a little bit more detail about your schema (What your database is meant to do) if you would like some suggestions. It's hard to know what kind of structure works and is effective without knowing what it's supposed to do. (Are there multiple notes per employee, what are costs? Are they per employee? etc)
Related
I am working on a form for storing information about the themes of the queries we receive in our company.
I am using php/MySQL to store radio button data such as:
Name of employee
Medium of query (e-mail/phone/in person)
With radio button data such as this, I can easily use PHP to query the database and e.g. count the numbers of queries answered by e-mail by each employee.
The trouble I am having is with data regarding the theme of the query. This is checkbox data with a lot of different checkboxes (50+). We would like to be able to add or remove checkboxes from time to time, though not very often.
I used to store this data just as comma-separated values in a single cell in the database and then export to excel to work with the data, but now I'd like to use another PHP form to generate statistics on the themes.
My research has lead me two to ways of doing this, that may be possible:
Creating a separate table in my database for my themes with one column for each possible theme, so that I'd have as many columns in my database as the number of checkboxes in my form.
Use the php functions serialize to store the checkbox data in one cell in my database and then later using unserialize to work with the data in php.
I am an absolute beginner, so with both options I'm unsure how I'd actually implement it:
With this option I am unsure whether my MySQL columns should just be "theme 1", "theme 2", etc., or whether they should have the names of my checkbox values, e.g. "money", "personal problems", "practical issues", etc. I have not been able to find a good resource on how to store the checkbox data in the right way, when the user may sometimes have clicked just 1 theme, and in other instances may have clicked 10 themes.
With this option I am unsure how I could populate a dropdown with unique values, and how I could later count instances of a unique value across the rows in my database.
Any help you can give me on this, including links to tutorials or questions similar to this on stackoverflow, would be much appreciated. I haven't been able to find anything about this that I could understand, other than I am thinking option 1 is probably the right way to go.
EDIT: After having received an idea about how to do this from David, I am updating my post with my attempt to understand how I would go about this.
What you're describing can be thought of as a many-to-many relationship. You have:
A form record, which can relate to many themes
A theme, which can relate to many form records
In a situation like this, the relationship itself is a database record. Consider this table structure:
FormRecords
----------
ID
SomeTitle
UserIDWhoFilledOutForm
etc.
Themes
----------
ID
ThemeName
etc.
FormRecordThemes
----------
FormRecordID
ThemeID
Each "primary entity" has an identifier and information about that entity. Then there's a "linking table" which has information about the relationship between those two entities.
Any time you present a form, you simply select from the Themes to populate the check boxes. You can add new ones as you see fit. You probably shouldn't remove any, though you can "soft delete" them by setting some flag on the record to indicate not to display them on the form.
If you ever want to edit the Themes in any significant way (a way which would somehow invalidate previous uses of that record, such as completely changing its name/title), then keep in mind that you'd be modifying the entire history of its use. I don't know if this is a risk in your domain, but in cases like that it might help to de-normalize a little bit by storing "Theme at that time" values in the relationship table. Like, the name of the Theme at the time that relationship was created. It's best to avoid this scenario entirely if possible, mostly by making key Theme values immutable in the domain.
Don't store delimited lists, don't store serialized data (unless the entire object really is a single data point)... Keep values separated into their own actual values in the database. Relational databases are really good at querying relational data.
You can put all values of checkboxes got from user into one CSV format string and store in one cell. Later on you can just split the string and get the values back.
This is going to be a long read, so please bear with me. I'm pretty good at PHP, but my knowledge in database designing is poor. I'm working on it as I've realized that having command over designing a database is the most important thing while working with applications and how things get so easier if you get the hang of it. The inception of this thread is also because of my weak knowledge about database interaction.
I basically have around 40 forms whose structure is similar to the image given below:
The number of columns and rows vary from one form to another, but they more or less follow a similar structure as shown above.
WHAT I'M PLANNING TO ACHIEVE
Use a single / same php page to process (i.e., insert data into
database) all the 40 forms. In other words, I intend to use the same <form
action="process.php"> for all the forms.
Avoid hard-coding; keep the code in process.php as much dynamic as
possible.
ROADBLOCKS
The number of parameters in each form vary, so I need to figure out a
way to find out the number of rows present in each form.
Finding it difficult to decide how to name the elements (radio
buttons, checkboxes, dropdowns, textareas, textboxes etc) being used
in the form.
I'm having trouble trying to figure out how to proceed with the
insertion. Inserting one row at a time seems to be appropriate, but
how would I achieve this? For example, if I use a loop, in the first
iteration data related to Ambiance should be inserted. In the
second loop data related to TV Room should be inserted and so on.
The problem is how to code accordingly?
WHAT I'VE IN MIND
The columns of the form (parameter, meets requirement, observation,
status, remarks) become the fields of the MySql table.
Then insert one row at a time in the table using a loop.
Use arrays for naming the elements used in the form. For example,
ambiance[requirement], ambiance[observation], ambiance[status],
ambiance[remarks] for elements in the first row and
room[requirement], room[observation], room[status], room[remarks] for
the second row and so on. Then insertion can be done in a single line
by using INSERT INTO tablename (implode(',', array_keys($ambiance)))
VALUES (implode(',', $ambiance))
WHAT THIS IS NOT ABOUT
Asking to supply / post code.
Give me teh codez is not my way of operating. I'm just seeking
instructions on how to go about with the task.
WHAT THIS IS ABOUT
Asking for suggestions.
Determining if I'm going in the right direction.
Asking if there are alternate ways.
I have done something similar but not exactly the same. I used these two tables:
Form
ID
Name (e.g. feedback, comments, survey)
FormFields**
ID
FormID
Caption (e.g. name, company, address)
SubCaption (e.g. enter your full name)
Required
DataType (e.g. integer, number, string, email)
MaxLength
** Look at the schema of a database table for more ideas.
In your case you need another pair of tables:
FormSubmission
ID
Date
FormSubmissionValues
ID
FormSubmissionID
FormFieldID (can be used to deduce FormID or you can add FormID in FormSubmission table)
Value (a field large enough to accommodate any value)
There were several catches. I'll note down a few:
The UI needs to vary depending on datatype (small 10 character input for integer/number, checkbox for boolean, textarea for text)
Not sure how to handle list boxes. Perhaps you need another table for this
You cannot use indexes properly since everything is stored in a generic text field
Client side validation is possible but difficult
The tables need to be very generic. Based on my experience this requires extra effort to ensure that x works with all y.
I'm trying to optimize my PHP and MySQL, but my understanding of SQL databases is shoddy at best. I'm creating a website (mostly for learning purposes) which allows users to make different kinds of posts (image/video/text/link).
Here is the basics of what I'm storing
Auto - int (key index)
User ID - varchar
Post id - varchar
Post Type - varchar (YouTube, vimeo, image, text, link)
File Name - varchar (original image name or link title)
Source - varchar (external link or name of file + ext)
Title - varchar (post title picked by user)
Message - text (user's actual post)
Date - int (unix timestamp)
I have other data stored relevant to the post in other tables which I grab with the post id (like user information) but I'm really doubting if this is the method I should be storing information in. I do use PDO, but I'm afraid this format might just be extremely slow.
Would there be any sense in storing the post information in another format? I don't want excessively large tables, so from a performance standpoint should I store some information as a blob/binary/xml/json?
I can't seem to find any good resources on PHP/MySQL optimization. Most information I come across tends to be 5-10 years old, content you have to pay for, too low-level, or just straight documentation which can't hold my attention for more than half an hour.
Databases are made to store 'data', and are fast to retrieve the data. Do not switch to anything else, stick with a database.
Try not to store pictures and video's in a database. Store them on disk, and keep a reference to them in a database table.
Finally, catch up on database normalization, it will help you in getting your database in optimal condition.
What you have seems okay, but you have missed the important bit about indexes and keys.
Firstly, I am assuming that your primary key will be field 1. Okay, no problems there, but make sure that you also stick an index on userID, PostID, Date and probably a composite on UserID, Date.
Secondly, are you planning on having search functions on these? In that case you may need to enable full text searches.
Don't muck around trying to store data in a JSON or other such things. Store it plain and simple. The last thing you want to be doing is trying to extract a field from the database just to see what is inside. If you database can't work it out, it is bad design.
On that note, there isn't anything wrong with large tables. As long as they are indexed nicely, a small table or large table will make very little difference in terms of accessing it (short of huge badly written SQL joins), so worry about simplicity to be able to get the data back from it.
Edit: A Primary Key is lovely way to identify a row by a unique column of some sort. So, if you want to delete a row, in your example, you might specify a delete from yourTable where ID=6 and you know that this will only delete one row as only one row can have ID=6.
On the other hand, an index is different to a key, in that it is like a cheat-sheet for the database to know where certain information is inside the table. For example, if you have an index on the UserID column, when you pass a userID in a query, the database won't have to look though the entire table, it looks at the index and knows the location of all the rows for that user.
A composite index is taking this one step further again, if you know what you will want to constantly query data for both UserID and ContentType, you can add in a composite index (meaning an index on BOTH fields in one index) which will then allow the database to return only the data you specify in a query using both those columns without having to sift through the entire table - nor even sift through all of a users posts to find the right content type.
Now, indexes take up some extra space on the server, so keep that in mind, but if your tables grow to be larger (which is perfectly fine) the improved efficiency is staggering.
At this time, stick with RDMS for now. Once you will be comfortable with PHP and MySQL then may be later on there will be more to learn like NoSQL, MongoDB etc. but for current purpose of yours as every thing has its purpose, this is quite right and will not slow down. Your table schema seems right with few modifications.
User id and Post id will be integer and I think this table is post so post id will be auto incremented and it will be primary key.
Other thing is that you are using 2 fields, filename and source, please note that filename will be file's name that is uploaded but if by source you mean complete path of file then then DB is not the place for storing complete path. Generate path from PHP function. to access that path every time not in DB. Otherwise if you will need to change path then it will be much overhead.
Also you asked about blob etc. Please note that it is better to store file in file system not in db while these fields like blob etc are good when one want to store file in DB table, that I don't recommend here.
I am in the process of creating a website where I need to have the activity for a user (similar to your inbox in stackoverflow) stored in sql. Currently, my teammates and I are arguing over the most effective way to do this; so far, we have come up with two alternate ways to do this:
Create a new table for each user and have the table name be theirusername_activity. Then when I need to get their activity (posting, being commented on, etc.) I simply get that table and see the rows in it...
In the end I will have a TON of tables
Possibly Faster
Have one huge table called activity, with an extra field for their username; when I want to get their activity I simply get the rows from that table "...WHERE username=".$loggedInUser
Less tables, cleaner
(assuming I index the tables correctly, will this still be slower?)
Any alternate methods would also be appreciated
"Create a new table for each user ... In the end I will have a TON of tables"
That is never a good way to use relational databases.
SQL databases can cope perfectly well with millions of rows (and more), even on commodity hardware. As you have already mentioned, you will obviously need usable indexes to cover all the possible queries that will be performed on this table.
Number 1 is just plain crazy. Can you imagine going to manage it, and seeing all those tables.
Can you imagine the backup! Or the dump! That many create tables... that would be crazy.
Get you a good index, and you will have no problem sorting through records.
here we talk about MySQL. So why would it be faster to make separate tables?
query cache efficiency, each insert from one user would'nt empty the query cache for others
Memory & pagination, used tables would fit in buffers, unsued data would easily not be loaded there
But as everybody here said is semms quite crazy, in term of management. But in term of performances having a lot of tables will add another problem in mySQL, you'll maybe run our of file descriptors or simply wipe out your table cache.
It may be more important here to choose the right engine, like MyIsam instead of Innodb as this is an insert-only table. And as #RC said a good partitionning policy would fix the memory & pagination problem by avoiding the load of rarely used data in active memory buffers. This should be done with an intelligent application design as well, where you avoid the load of all the activity history by default, if you reduce it to recent activity and restrict the complete history table parsing to batch processes and advanced screens you'll get a nice effect with the partitionning. You can even try a user-based partitioning policy.
For the query cache efficiency, you'll have a bigger gain by using an application level cache (like memcache) with history-per-user elements saved there and by emptying it at each new insert .
You want the second option, and you add the userId (and possibly a seperate table for userid, username etc etc).
If you do a lookup on that id on an properly indexed field you'd only need something like log(n) steps to find your rows. This is hardly anything at all. It will be way faster, way clearer and way better then option 1. option 1 is just silly.
In some cases, the first option is, in spite of not being strictly "the relational way", slightly better, because it makes it simpler to shard your database across multiple servers as you grow. (Doing this is precisely what allows wordpress.com to scale to millions of blogs.)
The key is to only do this with tables that are entirely independent from a user to the next -- i.e. never queried together.
In your case, option 2 makes the most case: you'll almost certainly want to query the activity across all or some users at some point.
Use option 2, and not only index the username column, but partition (consider a hash partition) on that column as well. Partitioning on username will provide you some of the same benefits as the first option and allow you to keep your sanity. Partitioning and indexing the column this way will provide a very fast and efficient means of accessing data based on the username/user_key. When querying a partitioned table, the SQL Engine can immediately lop off partitions it doesn't need to scan as it can tell based off of the username value queried vs. the ability of that username to reside within a partition. (in this case only one partition could contain records tied to that user) If you have a need to shard the table across multiple servers in the future, partitioning doesn't hinder that ability.
You will also want to normalize the table by separating the username field (and any other elements in the table related to username) into its own table with a user_key. Ensure a primary key on the user_key field in the username table.
This majorly depends now on where you need to retrieve the values. If its a page for single user, then use first approach. If you are showing data of all users, you should use single table. Using multiple table approach is also clean but in sql if the number of records in a single table are very high, the data retrieval is very slow
On my website, I have two tables which are linked using a pivot table. What I am trying to do is let a user update the relationships between the two tables (inserting and removing records from the pivot table). I have no problem doing this in PHP, but what I am concerned about is the way the form is displayed in the users web browser.
The way I am doing it now, is to have a table full of checkboxes, with each checkbox corresponding to a relationship between the column header and the row header (which represent the database tables). The user can check the checkbox to tell the PHP that a record should be present for that relationship (an unchecked box means there is no relationship). However this method can get quite ugly (columns stretching outside page bounds) if there are quite a few columns and quite a few rows, and is a bit tedious to use.
What would be a good way to display this form to the user?
Maybe use a data grid? These are quite powerful:
jQuery TableFilter (click "Go")
ExtJS Grid Filter (click a small down arrow ▼ that appears near the column name)
It may be a time consuming task to make it work through Ajax, though.
As this is more about the UI of the application than anything else, I don't think there is going to be a single right answer, as it will come down to a combination of what works (which is difficult without being able to see / play with things) and your personal preferences.
A few progressions I would run through:
Visual feedback
Make you table more interactive by providing visual feedback to the user. At the most basic level, try adding some colour to the cells - a colour for those that are checked. This will allow the user to quickly see which options are "in play". It may be the reverse of this works better (highlighting unchecked cells) - but this all depends what the form is doing / intending to indicate - i.e. if it's more important to make clear that the unchecked state is bad, you may want these to be red.
The next level up is to add some dynamic highlighting. If the table is huge, you may want to highlight the row and column header cells that correspond the the cell under the cursor. You could also consider highlight the whole row / column (cross-hair style) to allow the user to examine 'companion' cells.
Dynamic table
Slightly more involved would be to add some spice to you table. Instead of showing rows and columns of check-boxes, use graphical icons / images. They are a lot easier on the eye, and will probably allow you to have tighter control on the dimensions of the table. The entire UI could then be done via Javascript and on-click - which is pretty easy these days if you employ something like JQuery.
Split the interface
This is based on the assumption that all combinations of Table A & Table B aren't setup in the pivot table to begin with - only when a user tries to relate A.item with B.item
Instead of showing all possible combinations, show only those which are active (have an entry in the pivot table). Then provide the user with a second form (probably of two drop-downs) that allows them to relate a record from the first table to the second.
Filter the interface
Provide the user with the ability to filter the interface - to show only the relationships between a single record from one of the tables. This would have the effect of restricting your table to a single column, making it a bit easier to accommodate in the design.
However, I would still allow the user to get to the "big view" of all records, as, depending on what you are doing, such as view can be very useful to quickly cross reference lots of records.