Using PHP & Mysql-
I have a list of 120,000 employees. Each has a supervisor field with the supervisor employee number.
I am looking to build something that shows the employees in a tree like format. Given that if you click on anyone that you have an option to download all of the employees (with their info) that are under them.
So two questions - should I write my script to handle the query (which I have but is SLOW) or should create some sort of helper table/view? I am looking for best practice behind this.
Also I am sure this has been done a million times. Is there a good class that handles organization hierarchy?
The standard way of doing this is to use one table to store all of the employees, with a primary key field for the employee_id, and a field for supervisor_id which is a 'self join' - meaning that the value in this field points back to the employee id of this employee's supervisor. As far as displaying the employee tree - for relatively small trees, the entire tree structure can be sent to the client's browser when the page is created, and tree nodes can be displayed as the nodes are clicked from the stored data. But, for larger trees, it is better to fetch the data as needed, i.e. when the nodes are clicked. If you have 120,000 employees, then you might want to use the later approach.
Related
Premature optimization is the root of all evil...but...
I am allowing users to input data within categories as in favorite players, favorite teams etc. They can then use these choices to filter results. I let them input lists separated by commas so after exploding the data I have it in an array. So how to store.
Method 1: I could create a table of users, one row per user, with the categories, as in players, teams as fields and save the choices of each users as an array in the respective field. (userid would link to basic users table.)
Method 2. Or I could create separate tables for each thing, players, teams, etc, and have a fixed number of fields say 10, break up the array into each individual value, store and place it in its own field. (Already have this code working.) (Again userid is primary key.)
The advantage of Method 1 is it's a bit simpler, one table, no limit on number of choices.
Method 2 seems a bit more robust. The data is more visible and possibly easier to get and retrieve--although maybe not.
Does anyone have experience with this sort of thing and could recommend one over another?
Thanks for any recommendations, suggestions!
I am making a mysql/php project and in one form, the specs require dynamic fields in a way that you have an initial selectbox with a value from a lookup table that describe job unions. Depending on the selected value, it will spawn (probably via reflection) different fields.
For example, I have 2 associations with ids 1 and 2. If the user selects union #1, then the fields would be first name, last name, phone, address. If the user selects union #2 then the fields would be mobile, email, im name, "enroll now?"(checkbox).
Now, this might occur often, because the total of unions are more than 10 and specs require it to be flexible.
What I thought is this:
form loads up the fields of the first lookup (jobunions)
user selects the job union, and the value of the selectbox is the name of another table, for example LK_TABLE_2
The reflection examines the LK_TABLE_2 fields and retrieves/renders the fields below the selectbox of step 1.
I need your opinion on whether this business logic is acceptable (is there a pattern I could use?), if not anything to suggest and if it's doable how to store the filled data into the user preferences table.
Any insight would do.
Update: here is a schema (pdf) of what I am trying to do
https://docs.google.com/viewer?a=v&pid=explorer&chrome=true&srcid=0B_vptVa0K8J2YjBjMGJmZDgtYzUxZi00ZTE5LTgxZjgtOTNlMjQ5OGM3ZTY1&hl=en_US
I am closing this question, because of the project that was terminated, and further more I don't work in the specific company anymore (over half a year ago).
We jumped to a more nosql solution before this happened and it was working fine... until termination.
I am rebuilding the background system of a site with a lot of traffic.
This is the core of the application and the way I build this part of the database is critical for a big chunk of code and upcoming work. The system described below will have to run millions of times each day. I would appreciate any input on the issue.
The background is that a user can add what he or she has been eating during the day.
Simplified, the process is more or less this:
The user arrives to the site and the site lists his/her choices for the day (if entered before as the steps below describes).
The user can add a meal (consisting of 1 to unlimited different items of food and their quantity). The meal is added through a search field and is organized in different types (like 'Breakfast', 'Lunch').
During the meal building process a list of the most commonly used food items (primarily by this user, but secondly also by all users) will be shown for quick selection.
The meals will be stored in a FoodLog table that consists of something like this: id, user_id, date, type, food_data.
What I currently have is a huge database with food items from which the search will be performed. The food items are stored with information on both the common name (like "pork cutlets") and on producer (like "coca cola"), along with other detailed information needed.
Question summary:
My problem is that I do not know the best way to store the data for it to be easily accessible in the way I need it and without the database going out of hand.
Consider 1 million users adding 1 to 7 meals each day. To store each food item for each meal, each day and each user would potentially create (1*avg_num_meals*avg_num_food_items) million rows each day.
Storing the data in some compressed way (like the food_data is an json_encoded string), would lessen the amount of rows significally, but at the same time making it hard to create the 'most used food items'-list and other statistics on the fly.
Should the table be split into several tables? If this is the case, how would they interact?
The site is currently hosted on a mid-range CDN and is using a LAMP (Linux, Apache, MySQL, PHP) backbone.
Roughly, you want a fully normalized data structure for this. You want to have one table for Users, one table for Meals (one entry per meal, with a reference to User; you probably also want to have a time / date of the meal in this table), and a table for MealItems, which is simply an association table between Meal and the Food Items table.
So when a User comes in and creates an account, you make an entry in the Users table. When a user reports a Meal they've eaten, you create a record in the Meals table, and a record in the MealItems table for every item they reported.
This structure makes it straightforward to have a variable number of items with every meal, without wasting a lot of space. You can determine the representation of items in meals with a relatively simple query, as well as determining just what the total set of items any one user has consumed in any given timespan.
This normalized table structure will support a VERY large number of records and support a large number of queries against the database.
First,
Storing the data in some compressed way (like the food_data is an
json_encoded string)
is not a recommended idea. This will cause you countless headaches in the future as new requirements are added.
You should definitely have a few tables here.
Users
id, etc
Food Items
id, name, description, etc
Meals
id, user_id, category, etc
Meal Items
id, food_item_id, meal_id
The Meal Items would tie the Meals to the Food Items using ids. The Meals would be tied to Users using ids. This makes it simple to use joins in order to get detailed lists of data- totals, averages, etc. If the fields are properly indexed, this should be a great model to support a large number of records.
In addition to what's been said:
be judicious in your use of indexes. Properly applying these to your database could significantly speed up read access to your tables.
Consider using language-specific features to minimize space. You mention that you're using mysql; consider using ENUM when appropriate (food types, meal types) to minimize database size and to simplify management.
I would split up your meal table into two tables, one table stores a single row for each meal, the second table stores one row for each food item used in a meal, with a foreign key reference to the meal it was used in.
After that, just make sure you have indices on any table columns used in joins or WHERE clauses.
I have a site that scrapes all the episodes from tv.com from certain series.
So my database has a User table, a Show table, an Episode table, a Show_User table (to associate shows with users, HABTM), an Episode_Show table (to associate episodes with shows, HABTM), a Episode_User table (to associate episodes with shows, only as a way of marking them as 'watched').
Now I want to add a way of marking an episode as 'downloaded' too.
So at the moment, the Episode_User table has two fields, Episode_Id and User_Id. Should I create a new table entirely for this? Or just add a field to the Episode_User table?
I'm using CakePHP's automagic features, and don't particularly want to break it. But if I have to, I have to...
Thanks for any advice.
I don't see why you would want to create a new table for episodes a user has downloaded. To me it would make the most sense to modify the Episode_User table to have a field for watched and a field for downloaded, since it's all relating back to the same pair of entities, users and episodes.
However, any time I've stored information about a relationship between two tables in that manner, I've found that regardless of the framework I'm using, the ORM inevitably become more complicated, but I don't think there's any way around there.
With CakePHP for handling those kinds of complicated situations, read up about the model behavior, Containable. It's not very well documented in the CakePHP book, but is really quite useful in a situation where you need to use the fields in Episode_User, for example, if you needed to find all of the users that had watched a particular episode, but not downloaded it.
Also, it occurred to me, while reading your post, that you could possibly make your data model more simple by having a hasMany relationship between shows and episodes. An episode will never belong to more than one show, so your episodes table could just have another field, show_id, which related back to the show table, and you wouldn't even need the Episode_Show table.
So, not having come from a database design background, I've been tasked with designing a web app where the end user will be entering products, and specs for their products. Normally I think I would just create rows for each of the types of spec that they would be entering. Instead, they have a variety of products that don't share the same spec types, so my question is, what's the most efficient and future-proof way to organize this data? I was leaning towards pushing a serialized object into a generic "data" row, but then are you able to do full-text searches on this data? Any other avenues to explore?
split products and specifications into two tables like this:
products
id name
specifications
id name value product_id
get all the specifations of a product when you know the product id:
SELECT name,
value
FROM specifications
WHERE product_id = ?;
add a specification to a product when you know the product id, the specification's name and the value of said specification:
INSERT INTO specifications(
name,
value,
product_id
) VALUES(
?,
?,
?
);
so before you can add specifications to a product, this product must exist. also, you can't reuse specifications for several products. that would require a somewhat more complex solution :) namely...
three tables this time:
products
id name
specifications
id name value
products_specifications
product_id specification_id
get all the specifations of a product when you know the product id:
SELECT specifications.name,
specifications.value
FROM specifications
JOIN products_specifications
ON products_specifications.specification_id = specifications.id
WHERE products_specifications.product_id = ?;
now, adding a specification becomes a little bit more tricky, cause you have to check if that specification already exists. so this will be a little heavier than the first way of doing this, since there are more queries on the db, and there's more logic in the application.
first, find the id of the specification:
SELECT id
FROM specifications
WHERE name = ?
AND value = ?;
if no id is returned, this means that said specification doesn't exist, so it must be created:
INSERT INTO specifications(
name,
value
) VALUES(
?,
?
);
next, either use the id from the select query, or get the last insert id to find the id of the newly created specification. use that id together with the id of the product that's getting the new specification, and link the two together:
INSERT INTO products_specifications(
product_id,
specification_id
) VALUES(
?,
?
);
however, this means that you have to create one row for every specific specification. e.g. if you have size for shoes, there would be one row for every known shoe size
specifications
id name value
1 size 7
2 size 7½
3 size 8
and so on. i think this should be enough though.
You could take a look at using an EAV model.
I've never built a products database, but I can point you to a data model for that. It's one of over 200 models available for the taking, at Database Answers. Here is the model
If you don't like this one, you can find 15 different data models for Product oriented databases. Click on "Data Models" to get a list and scroll down to "Products".
You should pick up some good design ideas there.
This is a pretty common problem - and there are different solutions for different scenarios.
If the different types of product and their attributes are fixed and known at development time, you could look at the description in Craig Larman's book (http://www.amazon.com/Applying-UML-Patterns-Introduction-Object-Oriented/dp/0131489062/ref=sr_1_1/002-2801511-2159202?ie=UTF8&s=books&qid=1194351090&sr=1-1) - there's a section on object-relational mapping and how to handle inheritance.
This boils down to "put all the possible columns into one table", "create one table for each sub class" or "put all base class items into a common table, and put sub class data into their own tables".
This is by far the most natural way of working with a relational database - it allows you to create reports, use off-the-shelf tools for object relational mapping if that takes your fancy, and you can use standard concepts such as "not null", indexing etc.
Of course, if you don't know the data attributes at development time, you have to create a flexible database schema.
I've seen 3 general approaches.
The first is the one described by davogotland. I built a solution on similar lines for an ecommerce store; it worked great, and allowed us to be very flexible about the product database. It performed very well, even with half a million products.
Major drawbacks were creating retrieval queries - e.g. "find all products with a price under x, in category y, whose manufacturer is z". It was also tricky bringing in new developers - they had a fairly steep learning curve.
It also forced us to push a lot of relational concepts into the application layer. For instance, it was hard to create foreign keys to other tables (e.g. "manufacturer") and enforce them using standard SQL functionality.
The second approach I've seen is the one you mention - storing the variable data in some kind of serialized format. This is a pain when querying, and suffers from the same drawbacks with the relational model. Overall, I'd only want to use serialization for data you don't have to be able to query or reason about.
The final solution I've seen is to accept that the addition of new product types will always require some level of development effort - you have to build the UI, if nothing else. I've seen applications which use a scaffolding style approach to automatically generate the underlying database structures when a new product type is created.
This is a fairly major undertaking - only really suitable for major projects, though the use of ORM tools often helps.