Multiple Trees with Hierarchical Data (Left and Right values) - php

What issues are associated with maintaining multiple tress within a single table?
The motivation for having multiple trees is to avoid excessive updates to all nodes when inserting a node at the start. Each of the trees are completely separate entities.
Example Table:
tree_id | id | lft | rgt | parent_id | various fields . . .
---------------------------------------------------------------------
1 | 1 | 1 | 4 | NULL | ...
1 | 2 | 2 | 3 | 1 | ...
2 | 3 | 1 | 4 | NULL | ...
2 | 4 | 2 | 3 | 3 | ...

It's very common to store multiple trees in one table, just have to make sure the values that comprise a tree are stored correctly, otherwise it'll lead to data integrity issues like nonsensical tree constructions.
Suppose we have a binary tree, (like the one in your example). If a tree was 5 depth. ((2^n)-1) = (2^5 - 1) nodes would exist or 31 rows in the database which is trivial. Even at 10 depth it's still a small amount of rows, but would be a rather ginormous tree. And so having multiple trees, X, in there would be X((2^n)-1) = rows... in the database which isn't bad. So potentially a hundred trees could exist in one table and would only be 100k rows which is relatively small.
Additionally, suppose every new tree constructed was stored in its own table, then very quickly, the database would be filled with quite a bit of tables over time to match the number of trees that exist. And it just seems like not a good idea to make extra tables that are unneeded, adds unneeded complexity in the code side to have to access these multiple tables.
Looking at your table in detail, it doesn't quite look right in terms of columns, but I'm sure that table example is just something thrown up quickly to show us what you mean.
tree_id, node_id, left_node_id, right_node_id, various_fields...
Um, be sure to index those _id fields.

Related

Recursive tree in laravel with totals

I'm thoroughly having an issue coming up with a solution to create a recursive hierarchy in mysql while summing the results as I go. Here's the quick structure to keep it simple.
----------------------
id | name | parent_id
----------------------
1 | A | 0
2 | B | 1
3 | C | 1
4 | D | 2
5 | E | 2
6 | F | 3
7 | G | 3
I can recursively create this menu successfully as a php loop or in mysql:
A
-B
--D
--E
-C
--F
--G
However, I these IDs reference another table (contacts) and these are types of contacts. The issue is that only the leafs are assigned to the contacts, but I need to rollup the totals to each level. So I can get to:
A=0
-B=0
--D=100
--E=100
-C=0
--F=200
--G=200
But what I need is to roll up each subsection and sum that to the parent (without a lot of queries) In reality, this tree is several hundred elements in length. This is just a simplified version, but I can't figure out how to walk back up and end up with:
A=600
-B=200
--D=100
--E=100
-C=400
--F=200
--G=200
I'd be happy with a MySQL or PHP implementation. Really just anything to get me headed in the right direction would be much appreciated.
If the number of elements in tree is small enough, you can use ranged IDs.
For example top most parent you can say id will be between 100000 - 199999, first child of this node can be between 100000-109999, second child between 110000-119999 etc. So you know for each node its children ids will be in certain range.
When you want count for a particular node, you just check if id is in that range. I hope this helps.

mySQL table organisation for multi-dimensional categorisation

Existing System:
I have a mySQL database that stores category related information for approximately 200 different unique users. The information being stored and retrieved for each user is in the hierarchy of
imageCategories
> Parent Category 1
> Child Category 1 : "45,19,3,4,8"
> Child Category 2 : "17,1,99"
> ... etc
> Parent Category 2
> Child Category 1 : "83,6"
> Child Category 2 : "19,74,26"
... etc
> etc
The string value of each child category is a series of comma-separated ids which reference descriptions (on a separate table) stored under that child category. I store all of this as an array in a column for each user by means of a json_encoded string in the form of:
{"Parent Category 1":{"Child Category 1":["45,19,3,4,8"],"Child Category 2":["17,1,99"]},"Parent Category 2":{"Child Category 1":["83,6"],"Child Category 2":["19,74,26"]}}
The system works by retrieving this json_string when a user logs and decoding it to a session array. Whenever any changes are made to it, it's reencoded to a json string, saved to the database and the session array is updated to reflect this. This works fine. While my research way back when made me do so, I was never quite sure if storing a multi-dimensional array in mySQL is good best practise. What I do know is that this keeps organising it quite stress-free and I haven't noticed it causing a lot of overhead, which is not to say that it doesn't.
The conundrum:
What I want to do now is add a string description to each Child Category in the database. Potentially to each Parent Category later but baby steps first.
I was initially going to start a third dimension for the overall array. Instead of:
"Child Category Key" : "id string"
I would change it to:
"Child Category Key" : ["id string", "description string"]
or:
"Child Category Key" : ["id string", id for description on another table]
I don't see an issue with either, but I'm wondering if im veering way off best practises. Should I be creating a new table for the entire category structure, rather than storing all of it as a json string in a column with other user settings (it's never going to get too unwieldly in terms of character length). The current structure is quite easy to get my head around and I wouldn't necessarily jump to a solution that would provide minimal overhead benefits if it's structure makes managing the database unecessarily complicated (keep in mind some of us aren't naturals at this and our brains process this kinda structure a little slower than others).
Design Requirements:
I may miss out on describing specifics needed as I'm unsure what the most pertinent information is from what's relevant. I can elaborate where needed. What seems the most important design requirement is that each user has unique category keys and values. They can only be in the form of parent > child > csv of ids but each user will have custom key titles and a different number of each. The order of each is also essential.
I'm currently running on a server with ssd disk, 1gb of memory and a single 2ghz core from an Intel hexcore. Requests to the database are primarily retrieving the categories on both a front and backend. The majority use little traffic so nothing has been too taxing apart from occasional spikes. I will upgrade when I see a bottleneck approaching. Just trying to use what I have as efficiently as possible at the moment and keep best practices in play.
Database Structure:
Right now my table structure is in the form of (omitting other columns not relevant to the question):
Table usersettings:
+-----+----------------------+-----+
| id | imageCategories | ... |
+-----+----------------------+-----+
| 1 | {"Parent Category... | ... |
| 2 | {"Parent Category... | ... |
| 3 | {"Parent Category... | ... |
| ... | | |
+-----+----------------------+-----+
Table users:
+-----+----------------------+---------+--------+
| id | username | cluster | server |
+-----+----------------------+---------+--------+
| 1 | johndoe | 1 | 1 |
| 2 | katedoe | 1 | 1 |
| 3 | ellendoe | 1 | 1 |
| ... | | | |
+-----+----------------------+---------+--------+
Table descriptions_0001:
+-----+---------+---------------+-----+
| id | title | descriptions | ... |
+-----+---------+---------------+-----+
| 11 | Title 1 | Description 1 | ... |
| 56 | Title 2 | Description 2 | ... |
| 78 | Title 3 | Description 3 | ... |
| ... | | | |
+-----+---------+---------------+-----+
There is an equal row for every usersettings entry in users with matching ids. So their username etc. can always referenced from usersettings by knowing its own id number. Currently I only have one database but in an attempt to future proof it to some degree I store descriptions in a table with an index in its name and each user has a cluster number value as well as a server number value. Each user has, on average, about 100 descriptions row so this is coming to 20,000 rows at the moment. When this is creating a bottleneck I'll start a descriptions table 0002, and later a second server should it be needed. Perhaps I'm naive in my workflow but it seems like it should help.
Summary:
So in summary, should I adapt my categories array to store a string description for child categories by:
Making the child categories key have an array value rather than the
current string value that contains the current string value and an
additional string description.
Like 1 but make the string description an id number that references
a string on a new table
Look at not using a json encoded array at all and move the entire
category structure into its own table
Create a table for parent categories, one for child categories and one for the csv contents. Include a description column (per the conundrum above) and an order column (essential, per the design requirements above) in each - or is there a better method of storing order than retrieving and updating the order column for each relevant row when the table will contain unique category information for multiple users? It sounds like it may require a lot of overhead.
I ended up going for a solution somewhat similar to (4). I also better appreciate the importance of describing the design requirements now as what led me to this decision was the realisation that it was more efficient in processing (I believe?) and simpler to comprehend working with select levels of a hierarchy at a time.
For example, If I'm dealing with all descriptions under parent category 2, child category 1, I just fetch or insert all descriptions in a description table with a shared identifier, rather than dealing with a multidimensional array that contains all hierarchies. The latter made organising users in the db easier but the categorisation was becoming large enough that I decided it did warrant separate tables for each level of the hierarchy. There's enough situations where I'm working with only an isolated level of the categorisation hierarchy that putting the entire categorisation into a single md array felt like the poorer choice.
In terms of overhead difference, I'm unsure for now. There's less sorting of arrays happening in php to isolate data I need but there's far more calls to the db.
My hesitation in understanding the design requirements (and still not giving a thorough answer on this) is that I'm new to large user databases and am not good at forecasting the needs. I'm designing it in such a way that it feels scalable to me and so, again, the table for each level of the hierarchy feels the least cumbersome (after the cumbersome set up - I'm currently redoing tonnes of code to make functions work with the new set up) and more scaleable as needs change.

"horizontal" vs. "vertical" table design, SQL

Apologies if this has been covered thoroughly in the past - I've seen some related posts but haven't found anything that satisfies me with regards to this specific scenario.
I've been recently looking over a relatively simple game with around 10k players. In the game you can catch and breed pets that have certain attributes (i.e. wings, horns, manes). There's currently a table in the database that looks something like this:
-------------------------------------------------------------------------------
| pet_id | wings1 | wings1_hex | wings2 | wings2_hex | horns1 | horns1_hex | ...
-------------------------------------------------------------------------------
| 1 | 1 | ffffff | NULL | NULL | 2 | 000000 | ...
| 2 | NULL | NULL | NULL | NULL | NULL | NULL | ...
| 3 | 2 | ff0000 | 1 | ffffff | 3 | 00ff00 | ...
| 4 | NULL | NULL | NULL | NULL | 1 | 0000ff | ...
etc...
The table goes on like that and currently has 100+ columns, but in general a single pet will only have around 1-8 of these attributes. A new attribute is added every 1-2 months which requires table columns to be added. The table is rarely updated and read frequently.
I've been proposing that we move to a more vertical design scheme for better flexibility as we want to start adding larger volumes of attributes in the future, i.e.:
----------------------------------------------------------------
| pet_id | attribute_id | attribute_color | attribute_position |
----------------------------------------------------------------
| 1 | 1 | ffffff | 1 |
| 1 | 3 | 000000 | 2 |
| 3 | 2 | ffffff | 1 |
| 3 | 1 | ff0000 | 2 |
| 3 | 3 | 00ff00 | 3 |
| 4 | 3 | 0000ff | 1 |
etc...
The old developer has raised concerns that this will create performance issues as users very frequently search for pets with specific attributes (i.e. must have these attributes, must have at least one in this colour or position, must have > 30 attributes). Currently the search is quite fast as there are no JOINS required, but introducing a vertical table would presumably mean an additional join for every attribute searched and would also triple the number of rows or so.
The first part of my question is if anyone has any recommendations with regards to this? I'm not particularly experienced with database design or optimisation.
I've run tests for a variety of cases but they've been largely inconclusive - the times vary quite significantly for all of the queries that I ran (i.e. between half a second and 20+ seconds), so I suppose the second part of my question is whether there's a more reliable way of profiling query times than using microtime(true) in PHP.
Thanks.
This is called the Entity-Attribute-Value-Model, and relational database systems are really not suited for it at all.
To quote someone who deems it one of the five errors not to make:
So what are the benefits that are touted for EAV? Well, there are none. Since EAV tables will contain any kind of data, we have to PIVOT the data to a tabular representation, with appropriate columns, in order to make it useful. In many cases, there is middleware or client-side software that does this behind the scenes, thereby providing the illusion to the user that they are dealing with well-designed data.
EAV models have a host of problems.
Firstly, the massive amount of data is, in itself, essentially unmanageable.
Secondly, there is no possible way to define the necessary constraints -- any potential check constraints will have to include extensive hard-coding for appropriate attribute names. Since a single column holds all possible values, the datatype is usually VARCHAR(n).
Thirdly, don't even think about having any useful foreign keys.
Finally, there is the complexity and awkwardness of queries. Some folks consider it a benefit to be able to jam a variety of data into a single table when necessary -- they call it "scalable". In reality, since EAV mixes up data with metadata, it is lot more difficult to manipulate data even for simple requirements.
The solution to the EAV nightmare is simple: Analyze and research the users' needs and identify the data requirements up-front. A relational database maintains the integrity and consistency of data. It is virtually impossible to make a case for designing such a database without well-defined requirements. Period.
The table goes on like that and currently has 100+ columns, but in general a single pet will only have around 1-8 of these attributes.
That looks like a case for normalization: Break the table into multiple, for example one for horns, one for wings, all connected by foreign key to the main entity table. But do make sure that every attribute still maps to one or more columns, so that you can define constraints, data types, indexes, and so on.
Do the join. The database was specifically designed to support joins for your use case. If there is any doubt, then benchmark.
EDIT: A better way to profile the queries is to run the query directly in the MySQL interpretter on the CLI. It will give you the exact time that it took to run the query. The PHP microtime() function will also introduce other latencies (Apache, PHP, server resource allocation, network if connection to a remote MySQL instance, etc).
What you are proposing is called 'normalization'. This is exactly what relational databases were made for - if you take care of your indexes, the joins will run almost as fast as if the data were in one table.
Actually, they might even go faster: instead of loading 1 table row with 100 columns, you can just load the columns you need. If a pet only has 8 attributes, you only load those 8.
This question is a very subjective. If you have the resources to update the middleware to reflect the column that has been added then, by all means, go with horizontal there is nothing safer and easier to learn than a fixed structure. One thing to remember, anytime you update a tables structure you have to update each one of its dependencies unless there is some catch-all like *, which I suggest you stay aware from unless you are just dumping data to a screen and order of columns is irrelevant.
With that said, Verticle is the way to go if you don't have all of your requirements in place or don't have the desire to update code in n number of areas. Most of the time you just need storage containers to store data. I would segregate things like numbers, dates, binary, and text in separate columns to preserve some data integrity, but there is nothing wrong with verticle storage, as long as you know how to formulate and structure queries to bring back the data in the appropriate format.
FYI, Wordpress uses verticle data storage for majority of the dynamic content it has to store for the millions of uses it has.
First thing from Database point of view is that your data should be grow vertically not in horizontal way. So, adding a new column is not a good design at all. Second thing, this is very common scenario in DB design. And the way to solve this you have to create three tables. 1st is of Pets, 2nd is of Attributes and 3rd is mapping table between theres two. Here is the example:
Table 1 (Pet)
Pet_ID | Pet_Name
1 | Dog
2 | Cat
Table 2 (Attribute)
Attribute_ID | Attribute_Name
1 | Wings
2 | Eyes
Table 3 (Pet_Attribute)
Pet_ID | Attribute_ID | Attribute_Value
1 | 1 | 0
1 | 2 | 2
About Performance:
Pet_ID and Attribute_ID are the primary keys which are indexed (http://developer.mimer.com/documentation/html_92/Mimer_SQL_Engine_DocSet/Basic_concepts4.html), so the search is very fast. And this is the right way to sovle the problem. Hope, now it will be clear to you.

comparing values between different rows of database and getting maximum count

i have a table in which a row contains following data. So i need to compare data among themselves and show which data has maximum count.for ex. my table has following fruits name. So i need to compare these fruits among themselves and show max fruit count first.
s.no | field1 |
1 |apple,orange,pineapple |
2 |apple,pineapple,strawberry,grapes|
3 |apple,grapes, |
4 |orange,mango |
i.e apple comes first,grapes second,pineapple third and so on. and these datas are entered dynamically, so whatever the values is entered dynamically it needs to compare among themselves and get max count
Great question.
This is a classical bad outcome of not having the data normalized.
I recommend you to read about Database Normalization, normalize your tables and see after that how easy it is to do this with simple SQL queries
If you need to run queries on column field 1, then why not consider normalization ? Otherwise it might keep on getting complex and dirty in future.
Your current table will look like this (for serianl number 1 only), Pk can be an autoincrement primary key.
Pk | s.no |fruitId|
1 | 1 |1 |
2 | 1 |2 |
3 | 1 |3 |
Your New Table of Fruits
PK |fruitName |
1 |Apple |
2 |Orange |
3 |Pineapple |
This also helps you to avoid redundancy.
Quick solution would be counting the amount of fruits where you insert/update the row and add a fruitCount column. You can then use this column to order by.
Zohaib has to correct solution though - if you have the time and possibility for such changes. And I definitely suggest you to read Tudor's link!

MySQL hierarchical storage: searching through all parent/grandparent/etc. nodes given a child node id?

I'm storing categories using a hierarchical model like so:
CATEGORIES
id | parent_id | name
---------------------
1 | 0 | Cars
2 | 0 | Planes
3 | 1 | Hatchbacks
4 | 1 | Convertibles
5 | 2 | Jets
6 | 3 | Peugeot
7 | 3 | BMW
8 | 6 | 206
9 | 6 | 306
I then store actual data with one of these category ids like so:
CARS
vehicle_id | category_id | name
-------------------------------
1 | 8 | Really fast silver Peugeot 206
2 | 9 | Really fast silver Peugeot 306
3 | 5 | Really fast Boeing 747
4 | 3 | Another Peugeot but only in Hatchbacks category
When searching for any of this data, I would like to find all child / grandchild / great grandchild etc. etc. nodes. So if someone wants to see all "Cars", they see everything with a parent_id of "Hatchbacks", and so everything with a parent_id of "Peugeot", and so on, to an arbitrary level.
So if I list a "really fast Peugeot 206" with a category_id of either 1, 3, 6, or 8, my query should be able to "travel up" the tree and find any higher categories which are parents/grandparents of that child category. E.g. a user searching for Peugeots in category "8" should find any Peugeots listed with categories 6, 3, or 1 - all of which category 8's descendants.
E.g. using the above data, searching for "Peugeot" in category 3 should actually find vehicles 1, 2 and 4, because vehicles 1 and 2 have a category ancestor trail which leads back up to category 3. See?
Sorry if I haven't explained this well. It's difficult! Thank you, though.
Note: I have read the MySQL dev article on hierarchies.
Normalized models are great, but not when you actually have to query them.
Just store the "path" to your category in category table. Like this: path = /1/3/4 and when query you database like "select .... where path like '/1/3/%'" It will be much more simple and fast than multiple hierarchical queries...
This article can help you http://www.phpro.org/tutorials/Managing-Hierarchical-Data-with-PHP-and-MySQL.html
I like the explanation provided by SitePoint. It gives you code and explains the theory behind it.
http://blogs.sitepoint.com/hierarchical-data-database/
Note: this method is better for reads than for writes. If you're constantly writing to the tree, I'd use a different algorithm. This method is optimized for reads (lookups).
You've represented your data as an Adjacency List model, whose querying in MySQL is best done using session variables. Now, this is not the only way you can represent a hierarchy in a relational database. For your particular problem, I would probably use a materialized path approach instead, where you do away with the actual categories table and instead have a column on your cars table that looks like Cars/Hatchbacks/Peugeot on a per record basis and use LIKE queries. Unfortunately that would be slow as the number of records grew. Now, if you know the maximum depth of your hierarchy (e.g. four levels) you could break that out into separate columns instead, which you allow you to take advantage of indexing.

Categories