How should I design the database structure for this problem?

How should I design the database structure for this problem? - php

I am rebuilding the background system of a site with a lot of traffic.
This is the core of the application and the way I build this part of the database is critical for a big chunk of code and upcoming work. The system described below will have to run millions of times each day. I would appreciate any input on the issue.
The background is that a user can add what he or she has been eating during the day.
Simplified, the process is more or less this:
The user arrives to the site and the site lists his/her choices for the day (if entered before as the steps below describes).
The user can add a meal (consisting of 1 to unlimited different items of food and their quantity). The meal is added through a search field and is organized in different types (like 'Breakfast', 'Lunch').
During the meal building process a list of the most commonly used food items (primarily by this user, but secondly also by all users) will be shown for quick selection.
The meals will be stored in a FoodLog table that consists of something like this: id, user_id, date, type, food_data.
What I currently have is a huge database with food items from which the search will be performed. The food items are stored with information on both the common name (like "pork cutlets") and on producer (like "coca cola"), along with other detailed information needed.
Question summary:
My problem is that I do not know the best way to store the data for it to be easily accessible in the way I need it and without the database going out of hand.
Consider 1 million users adding 1 to 7 meals each day. To store each food item for each meal, each day and each user would potentially create (1*avg_num_meals*avg_num_food_items) million rows each day.
Storing the data in some compressed way (like the food_data is an json_encoded string), would lessen the amount of rows significally, but at the same time making it hard to create the 'most used food items'-list and other statistics on the fly.
Should the table be split into several tables? If this is the case, how would they interact?
The site is currently hosted on a mid-range CDN and is using a LAMP (Linux, Apache, MySQL, PHP) backbone.

Roughly, you want a fully normalized data structure for this. You want to have one table for Users, one table for Meals (one entry per meal, with a reference to User; you probably also want to have a time / date of the meal in this table), and a table for MealItems, which is simply an association table between Meal and the Food Items table.
So when a User comes in and creates an account, you make an entry in the Users table. When a user reports a Meal they've eaten, you create a record in the Meals table, and a record in the MealItems table for every item they reported.
This structure makes it straightforward to have a variable number of items with every meal, without wasting a lot of space. You can determine the representation of items in meals with a relatively simple query, as well as determining just what the total set of items any one user has consumed in any given timespan.
This normalized table structure will support a VERY large number of records and support a large number of queries against the database.

First,
Storing the data in some compressed way (like the food_data is an
json_encoded string)
is not a recommended idea. This will cause you countless headaches in the future as new requirements are added.
You should definitely have a few tables here.
Users
id, etc
Food Items
id, name, description, etc
Meals
id, user_id, category, etc
Meal Items
id, food_item_id, meal_id
The Meal Items would tie the Meals to the Food Items using ids. The Meals would be tied to Users using ids. This makes it simple to use joins in order to get detailed lists of data- totals, averages, etc. If the fields are properly indexed, this should be a great model to support a large number of records.

In addition to what's been said:
be judicious in your use of indexes. Properly applying these to your database could significantly speed up read access to your tables.
Consider using language-specific features to minimize space. You mention that you're using mysql; consider using ENUM when appropriate (food types, meal types) to minimize database size and to simplify management.

I would split up your meal table into two tables, one table stores a single row for each meal, the second table stores one row for each food item used in a meal, with a foreign key reference to the meal it was used in.
After that, just make sure you have indices on any table columns used in joins or WHERE clauses.

Related

scaling php mysql shopping cart database

Although the topic is quite trivial but I am still confused what could be the best design. So I am trying build a php mysql based website. The idea is to have user registrations, as well as store items in current shopping cart into database and also store purchased items record per user. (I Plan to serialize items, maybe comma seperated string with Produt IDs and then deserialize upon retrieval from DB)
In the beginning it looks trivial, but lets imagine there are a million users.
Now i wonder how this design should scale.
My initial idea is to have a simple table with EMAIL - ID (autogenerated)
And then use this id to generate further table containing Name, Address, Phone, Password
And another table with current items as well as purchased items in the past against that ID.
So based on this ID, i can query all these tables for quick reference.
Do you think this idea would scale good lets say with a million users accounting for 3-4 million tables. Or i should try to squeeze everything in a single table??

What is the best way to handle large recursive queries in mysql?

Using PHP & Mysql-
I have a list of 120,000 employees. Each has a supervisor field with the supervisor employee number.
I am looking to build something that shows the employees in a tree like format. Given that if you click on anyone that you have an option to download all of the employees (with their info) that are under them.
So two questions - should I write my script to handle the query (which I have but is SLOW) or should create some sort of helper table/view? I am looking for best practice behind this.
Also I am sure this has been done a million times. Is there a good class that handles organization hierarchy?

The standard way of doing this is to use one table to store all of the employees, with a primary key field for the employee_id, and a field for supervisor_id which is a 'self join' - meaning that the value in this field points back to the employee id of this employee's supervisor. As far as displaying the employee tree - for relatively small trees, the entire tree structure can be sent to the client's browser when the page is created, and tree nodes can be displayed as the nodes are clicked from the stored data. But, for larger trees, it is better to fetch the data as needed, i.e. when the nodes are clicked. If you have 120,000 employees, then you might want to use the later approach.

MySQL database structure for infinite items per user

I have a MySQL database with a growing number of users and each user has a list of items they want and of items they have - and each user has a specific ID
The current database was created some time ago and it currently has each users with a specific row in a WANT or HAVE table with 50 columns per row with the user id as the primary key and each item WANT or HAVE has a specific id number.
this currently limits the addition of 50 items per user and greatly complicates searches and other functions with the databases
When redoing the database - would it be viable to instead simply create a 2 column WANT and HAVE table with each row having the user ID and the Item ID. That way there is no 'theoretical' limit to items per user.
Each time a member loads the profile page - a list of their want and have items will then be compiled using a simple SELECT WHERE ID = ##### statement from the have or want table
Furthermore i would need to make comparisons of user to user item lists, most common items, user with most items, complete user searches for items that one user wants and the other user has... - blah blah
The amount of users will range from 5000 - 20000
and each user averages about 15 - 20 items
will this be a viable MySQL structure or do i have to rethink my strategy?
Thanks alot for your help!

This will certainly be a viable structure in mysql. It can handle very large amounts of data. When you build it though, make sure that you put proper indexes on the user/item IDs so that the queries will return nice and quick.
This is called a one to many relationship in database terms.
Table1 holds:
userName | ID
Table2 holds:
userID | ItemID
You simply put as many rows into the second table as you want.
In your case, I would probably structure the tables as this:
users
id | userName | otherFieldsAsNeeded
items
userID | itemID | needWantID
This way, you can either have a simple lookup for needWantID - for example 1 for Need, 2 for Want. But later down the track, you can add 3 for wishlist for example.
Edit: just make sure that you aren't storing your item information in table items just store the user relationship to the item. Have all the item information in a table (itemDetails for example) which holds your descriptions, prices and whatever else you want.

I would recommend 2 tables, a Wants table and a Have table. Each table would have a user_id and product_id. I think this is the most normalized and gives you "unlimited" items per user.
Or, you could have one table with a user_id, product_id, and type ('WANT' or 'HAVE'). I would probably go with option 1.

As you mentioned in your question, yes, it would make much more sense to have a separate tables for WANTs and HAVEs. These tables could have an Id column which would relate the row to the user, and a column that actually dictates what the WANT or HAVE item is. This method would allow for much more room to expand.
It should be noted that if you have a lot of of these rows, you may need to increase the capacity of your server in order to maintain quick queries. If you have millions of rows, they will have a great deal of strain on the server (depending on your setup).

What you're theorizing is a very legitimate database structure. For a many to many relationship (which is what you want), the only way I've seen this done is to, like you say, have a relationships table with user_id and item_it as the columns. You could expand on it, but that's the basic idea.
This design is much more flexible and allows for the infinite items per user that you want.
In order to handle wants and have, you could create two tables or you could just use one and have a third column which would hold just one byte, indicating whether the user/item match is a want or a need. Depending on the specifics of your projects, either would be a viable option.
So, what you would end up with is at least the following tables:
Table: users
Cols:
user_id
any other user info
Table: relationships
Cols:
user_id
item_id
type (1 byte/boolean)
Table: items
Cols:
item_id
any other item info
Hope that helps!

PHP/MYSQL store variables in array or separate fields

Premature optimization is the root of all evil...but...
I am allowing users to input data within categories as in favorite players, favorite teams etc. They can then use these choices to filter results. I let them input lists separated by commas so after exploding the data I have it in an array. So how to store.
Method 1: I could create a table of users, one row per user, with the categories, as in players, teams as fields and save the choices of each users as an array in the respective field. (userid would link to basic users table.)
Method 2. Or I could create separate tables for each thing, players, teams, etc, and have a fixed number of fields say 10, break up the array into each individual value, store and place it in its own field. (Already have this code working.) (Again userid is primary key.)
The advantage of Method 1 is it's a bit simpler, one table, no limit on number of choices.
Method 2 seems a bit more robust. The data is more visible and possibly easier to get and retrieve--although maybe not.
Does anyone have experience with this sort of thing and could recommend one over another?
Thanks for any recommendations, suggestions!

Schema design question

I have an industry lookup table: ID, Name.
I have other industry properties such as Industry sector, Industry service, Industry products, etc. These are all required properties for each industry so any industry being entered will have these data. These data are fixed list items like Industry sector = (Primary, Secondary, tertiary). On site, these values will be either auto-suggest or single select drop down list values. Also these will be used as search filters to further filter industries on site. And these will be used for reporting like -> Displaying count of companies belonging to primary sector industries only from people you are friends with.
For schema I see two ways it can be designed:
1) Industry lookup table will have all these additional data as text
2) The additional data will be stored as IDs which FK reference to their lookup tables.
3) Open to other design ideas too.
Issue with #1 is there will be no enforcement of data quality.
Issue with# 2 is there are many many fixed list items, so each having its own lookup table means there will be tons of lookup tables and FKs for the parent tables.
I am not sure in the real world of large scale systems how this is done. Industry is only one entity; I have many entities and each has at least 40-50 fixed list items (columns), so which way is better? For more info, this is a user content website - professional networking website so performance is important.
Suggestions?

Go with Option 2: If you measure a performance impact of doing many joins, first make sure you have the right indexes for the query workload, and then if join performance is still an issue, possibly denormalise.

These are all required properties for
each industry so any industry being
entered will have these data
But will this always be the case? Meaning, may properties be removed or added? This is a likely occurrence in most applications which means you'll be deleting/adding columns to make that happen. This should make you at least consider making these rows, rather than columns.
So I suggest #3:
Rather than having look up tables for each property, have only one. That way you have four tables total:
industries (id, name)
industry_property_names (id, name) // Contains the name of the property, e.g., Industry sector
industry_property_values (id, industry_property_name_id, name) // Primary, Secondary, tertiary
industry_properties ( (pk: industry_id, industry_property_name_id), industry_property_value_id)
Requires some code enforcement during data entry but properties will be dynamic and look ups will be relatively fast.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.