Building breadcrumbs for unlimited category depth - PHP + SQL - php

I'd like to be able to build the breadcrumbs for a content page, however the categories a piece of content is in can have unlimited depth, so i'm not sure how to go about it without getting each category one by one and then getting its parent etc. It seems like it could be a simpler way but I can't figure it out.
I have an articles table
article_id
article_name
article_cat_id
I also have a categories table
cat_id
cat_name
cat_parent
Cat parent is the id of another category of which a category is a child.
Imagine an article which is 5 categories deep, as far as I can tell i'd have to build the breadcrumbs something like this (example code obviously inputs should be escaped etc)
<?php
$breadcrumbs = array(
'Category 5',
'Content Item'
);
$cat_parent = 4;
while($cat_parent != 0) {
$query = mysql_query('SELECT * FROM categories WHERE cat_id = '.$cat_parent);
$result = mysql_fetch_array($query, MYSQL_ASSOC);
array_unshift($breadcrumbs, $result['cat_name']);
$cat_parent = $result['cat_parent'];
}
?>
This would then give me
array(
'Category 1',
'Category 2',
'Category 3',
'Category 4',
'Category 5',
'Content Item'
)
Which I can use for my breadcrumbs, however its taken me 5 queries to do it, which isn't really preferable.
Can anyone suggest any better solutions?

Here are some easy options in order of simplicity:
Stick with the design you have, use the recursive/iterative approach and enjoy the benefits of having simple code. Really, this will take you pretty far. As a bonus, it is easier to move from here to something more performant, than from a more complicated setup.
If the nr of categories isn't very large, you can select all of them and build the hierarchy in PHP. Due to pagesize the amount of work required to fetch 1 rows vs a whole bunch of them (say a few hundred) is pretty much the same. This minimizes the nr of queries/network trips, but increases the amount of data transported over the cable. Measure!
Cache the hierarchy and reload it entirely every X unit of time or whenever categories are added/modified/deleted. In it's simplest form, the cache could be a PHP file with a nested variable structure containing the entire category hierarchy, along with a simple index for the nodes.
Create an additional table in which you have flattened the hierarchy in some way, either using nested sets, path enumeration, closure table etc. The table will be maintained using triggers on the category table.
I would go for (1) unless you are fairly certain that you will have a sustained load of several users per second in the near future. (1 user per second makes 2,5 million visits a month).
There is nothing wrong with simple code. Complicating code for a speedup that isn't noticable is wrong.

There are two commonly used methods of handling hierarchal data in relational databases: the adjacency list model and nested set model. Your schema here is currently following the adjacency list model. Check out this page for some example queries. See also this question here on SO with a lot of good information.

Related

How can I get a count on each instance of a model's "hasMany" relationship when I am retrieving lots of data in CakePHP?

NB: My title was quite difficult to word - If you think of a way to word it more concisely I would appreciate that.
I'm using CakePHP and I am making a basic forum system. My models are set out as follows:
ForumSection has many ForumCategories
ForumCategory has many ForumPosts, has one ForumSection
ForumPost has many ForumPosts, has one ForumCategory (NB: ForumPost can be a thread or a reply)
On the index page of the forum, I would like to display the number of posts and replies in each forum category. I can do this by implementing the Containable relationship (which I have) into my ForumSection model and then using the following statement:
$sections = $this->ForumSection->find('all', array(
'contain' => array(
'ForumCategory' => array(
'ForumPost' => array(
'conditions' => array('ForumPost.is_thread', '1')
)
)
)
));
Then, on my view, I can simply echo the count of ForumPosts.
This is, of course, suboptimal -- I could be potentially bringing back thousands upon thousands of rows of data and loading it into memory when in reality I could do a direct SQL along the lines of SELECT COUNT(*) FROM ForumPosts WHERE forum_category_id = x AND is_thread = 1 for count and avoid this. I could even use a Cake function to do this for me but the point stands that it's more efficient than loading the entire table into memory just to count them.
This would, though, require making a loop in my controller (or model potentially but I'd still need to loop the sections in the controller) and meddling with the returned data so as to insert counts into it.
The way I see it, I have two options:
I could, in the view, get the post count as I loop over the categories, e.g.
foreach ($categories as $category):
// get post count
echo post count
I am reluctant to do this since, to my knowledge, this seems to ill-advised on MVC projects and I'm certain there'll be a more optimal approach I have not considered.
Intercept the found data and insert counts before patching the data through to the view. This seems like the most true to MVC way of doing it but it still feels wrong.
My question, ultimately, is does Cake provide any way of including the count of a model's inner relationships without including the entire data set for said relationship? If not, is there an approach that I could take that would be more straightforward/follow conventions more effectively than my suggested two?

Optimal database structure for entries in flexible category/subcategory system?

I want to store reviews in a flexible system of categories and subcategories, and am currently in the process of designing the database structure for that. I have an idea how to do that, but I'm not entirely sure if it couldn't be done more elegant and/or efficient. These are my thoughts - if anybody can comment on if/how this can be improved I'd be really grateful.
(To keep this post concise, I only list the important field for the tables)
1.) The reviews are stored in the table "reviews". It has the following fields:
id: uniquite ID, auto-incrementing.
title: the title that will show up in <head><title>, etc.
stub: a version of the title without spaces, special chars, etc. so it can be part of the URL/URI
text: the actual content
2.) All categories are in the same table "categories"
id: unique ID, auto-incrementing.
title: the full title/name of the categorie how it will be output on the website
stub: version of the title that will be shown in the URL/URI.
parent_id: if this is a subcategory, here is the categories.id of the parent category. Else this is 0.
order_number: simple number to order the categories by (for display in the navigation menu)
3.) Now I need an indicator which reviews are in what categories. The can be in multiple. My first idea was to add a "review_list" field to the categories and have it contain all reviews.id's that should be in this category. However I think that adding and removing reviews from categories would be a hassle and "unelegant". So my current idea is to have a table "review_in_category" and have an entry for every review-category relation. The structure is:
id: Unique ID, auto-increment.
review_id: the reviews.id
category_id: the categories.id
So if a review is in 3 different categories it would result in 3 entries in the "review_in_category" table.
The idea is, that when a user opens www.mydomain.de/animation/sci-fi/ the wrapper script will break up the URL into its parts. If it finds more than one category with category.stub = "sci-fi", it will check which of those has a parent category with the stub "animation". Once the correct category is identified (most the time the stubs are unique anyway so this check can be skipped) I want to SELECT all review_id's from "review_in_category" where the category_id matches the the one determined by the wrapper script. All the review_id's are put into an array. A loop will iterate through this array and compose the SELECT statement for listing all review titles (and create links to them using the stub values) by "SELECT title, stub FROM reviews WHERE id=review_list[$counter]" and then add "OR id=review_list[$counter]" until the array is completely travelled.
SO my questions are:
- Is the method my creating a single SELECT statement with potentially a large number of "OR id=" parts an "elegent" and/or efficient way to handle this situation or are there better variants?
- Does using a "taxonomy"-style table (review_in_category) make sense or would it be better to store the "membership"/"relation" directly in the reviews or category tables?
- Any other thoughts... I just started to learn this stuff and appreciate any feedback.
Thank you
Your design looks sound.
To retrieve all reviews in a category, you should use a join:
SELECT reviews.title, reviews.stub FROM reviews, review_in_category WHERE reviews.id = review_in_category.review_id AND category_id = $category

Adjacency List Model + Website Navigation

I am using the adjacency list model to find sub categories within my website. I have working PHP code to find all the categories and sub categories, but now I cannot figure out how use that to create a navigation system. Here is how the site will work, very basic:
URL string
There will be a main category, followed by levels
index.php?category=category-name&level1=sub-category&level2=another-sub-category&level3=content-item
Later I will make SEO friendly links.
URL with no sub categories
Where Level 1 is the content item
www.website.com/category/content-item/
URL with sub categories
Where Level 1, 2, 3, etc are the sub categories and the final level is the content item
www.website.com/category/sub-category/sub-category-2/content-item/
Here is the code I am using to find categories and sub categories. Currently it just outputs a list of all categories and sub categories and number's the level of each child. Not sure if this helps, it just creates a list.
function display_children($ParentCategoryID, $Level) {
// retrieve all children of parent
if ($ParentCategoryID == ''){
$Result = mysql_query('SELECT * FROM categories WHERE parent_category_id IS null');
}
else{
$Result = mysql_query('SELECT * FROM categories WHERE parent_category_id="'.$ParentCategoryID.'";');
}
// display each child
while ($Row = mysql_fetch_array($Result)) {
echo str_repeat('-',$Level)."[".$Level."]".$Row['category_name']."<br />";
display_children($Row['category_id'], $Level + 1);
}
}
See this question first for options on how to represent hierarchical data in a database.
Adjacency list is great for its simplicity, and makes changes easy, but can be awful because it leads to recursive code, such as your function above, in practice, which is a performance killer under load. The best approach, absent changing your data model is using MySQL session variables to retrieve the entire hierarchy in one query, which brings back all the data you need in one database call. Even this though leads to poor performance under load - less so than the recursive function - but still not good; and, I write from experience :).
If it was me I'd use either Nested Sets, Adjacency List in combination with some denormalizations, such as the Bridge Table and Flat Table, or just a Lineage Table. Really depends on how often the data changes and if you need those changes to be done easily. All of these options should be much, much faster, to work with rather than relying upon just the parent-child ID columns.

Mysql parent child tables reducing database calls or merge in PHP

If I had 2 tables, say blog_category and blog, each "blog" can belong in a particular category only so a 1-1 relationship based on a key called "blog_category_id".
Now in my code I would do something like:
//Loop through categories such as
foreach($categories as $cat):
//then for each category create an array of all its posts
$posts = $cat->getPosts(); // This would be another DB call to get all posts for the cat
//do stuff with posts
endforeach;
Now to me this seems like it could end up quite expensive in terms of DB calls depending on the size of $categories. Would this still be the best solution to do this? Or would I be able to do something in the code and first retrieve all the categories, then retrieve all the blogs and map them to their corresponding category via the id somehow? This would in theory be only 2 calls to the DB, now size wise the result set for call 2 (the blogs) would definitely be larger, but would the actual DB call be as expensive?
I would normally go for the first option, but I'm just wondering if there would be a better way of approaching this or is it more likely that the extra processing in PHP would be more costly in terms of performance? Also specifically from an MVC perspective, if the model returns the categories, but it should also return the corresponding blogs for that category, I'm not sure how best to structure this, from my understanding, shouldn't the model return all the data required for the view?
Or would I be better off selecting all categories and blogs using inner joins in the first query and create the output I need of this? Perhaps by using a multi-dimensional array?
Thanks
You can use a simple SQL query to get all categories and posts like the following:
SELECT *
FROM posts p
JOIN categories c ON c.id = p.blog_category_id
ORDER BY c.category_name ASC,
p.posted_date DESC
Then when you loop over the returned records assign the current category id to a variable, which you can use to compare against the next records category. If the category is different then print the category title before printing the record. It is important to note that for this to work you need to get the posts ordered by category first and then post so that all posts in the same category are together.
So for example:
$category_id = null;
foreach($posts as $post) {
if($post['blog_category_id'] != $category_id) {
$category_id = $post['blog_category_id'];
echo '<h2>' . $post['category_name'] . '</h2>';
}
echo '<h3>' . $post['post_title'] . '</h3>';
echo $post['blog_content'];
}
Note: as you have not posted up the schema of these two tables I have had to make up column names that are similar to what I would expect to see in code like this. So the code above will not work with your code without some adjustments to account for this.
The best solution depends on what you are going to do with data.
Lazy loading
Load data when you need it. It's a good solution when you have, for instance, 20 categories and you load posts for only 2 of them. However, if you need to load posts for all of them it won't be efficient at all... It's called a n+1 queries (and it's really bad).
Eager loading
On the other hand, if you have to access to almost all of your posts, you should do an eager loading.
-- Load all your data in a query
SELECT *
FROM categories c
INNER JOIN posts p ON c.id = p.category_id;
// Basic example in JSON of how to format your result
{
'cat1': ['post1', 'post2'],
'cat2': ['post5', 'post4', 'post5'],
...
}
What to do?
In your case I would say an eager loading because you load everything in a loop. But if you don't access to the most of your data, you should re-design your model to perform a lazy loading in such a way that the SQL query to load posts for a specific category is actually performed when a view try to access them.
What do you think?

Decreasing queries in MySQL with many one-to-many relationships (ORM)

I'm currently designing an application using PHP and MySQL, built on the Kohana framework. I'm making use of the built in ORM and it has proved to be extremely useful. Everything works fine, but I'm very concerned with the number of queries being run on certain pages.
Setting
For example, there's a page on which you can view a category full of sections, which are in turn full of products. This is listed out in tabular format. Each product has (possibly) many attributes, flags, tier pricing breaks. This must all be represented in the table.
How many queries?
As far as queries are concerned: The category must query all the sections within it, and those sections must query all the products they contain. Not too bad, but each product must then query all it's product attributes, tier pricing, and flags. So, adding more products to a category increases the queries many times over (since I'm currently using the ORM primarily). Having a few hundred products in a section will result in a couple hundred queries. Small queries, but that is still not good.
So far...
All the keys are indexed. I can pull all of the information with a single query (see edit below), however, as you could imagine, this will result in a lot of redundant data spread out across multiple rows per each product, per each extra (e.g.) attribute, flag, etc.
I'm not opposed to ditching the ORM for the displaying part of the application and going with query building or even raw SQL.
The solution for this could be actually be quite simple and I'm just ignorant of it right now, which would be a relief honestly. Or maybe it's not. I'm not sure. If any of my explanation was not adequate enough to understand the problem just ask and I'll try to give a better example. (Edit: Better example given, see below
Although, a side note...
One thing that may have some relevance though: while I always want to have the application designed most efficiently, this isn't a site that's going to be hit dozens or hundreds of times a day. It's more of an administrative application, which probably won't be in use by more than a few individuals at once. I can't foresee too much reloading, as most of the editing of data on the page is done through AJAX. So, should I care as much if on this page it's running a couple hundred queries (fluctuating with how many products are in the currently viewed section) are running each time this particular page is loaded? Just a side thought, even so if it is possible to solve the main aforementioned problem I would prefer that.
Thank you very much!
EDIT
Based on a couple answers, it seems I didn't explain myself adequately. So, let me post an example so you see what's going on.
Before the example though, I should also make two clarifications: (1) there are also a couple many-to-many relationships, (2) and you could possibly liken what I'm looking for to that of a crosstab query.
Let's simplify and say we have 3 main tables:
products (product_id, product_name, product_date_added)
product_attributes (product_attribute_id, product_id, value)
notifications (notification_id, notification_label)
And 1 pivot talbe:
product_notifications (notification_id, product_id)
We're going to list all the products in a table. It's simple enough in the ORM to call all the products.
So per each 'products' we list the product_name and product_date_added. However, we also need to list all the products attributes out. There are a 0 or more of these per product. We also have to show what notifications a product has, of which there are 0 or more as well.
So at the moment, how it works is basically:
foreach ($products->find_all() as $product) //given that $products is an ORM object
{
echo $product->product_id; //lets just pretend these are surrounded by html
echo $product->product_name;
foreach ($products->product_attributes->find_all() as $attribute)
{
echo $attribute->value;
}
foreach ($products->notifications->find_all() as $notification)
{
echo $notification->notification_label;
}
}
This is oversimplified of course, but this is the principle I'm talking about. This works great already. However, as you can see, for each product it must query all of it's attributes to get the appropriate collection or rows.
The find_all() function will return the query results of something along the lines of:
SELECT product_attributes.* FROM product_attributes WHERE product_id = '#', and similarly for the notifications. And it makes these queries for each product.
So, for every product in the database, the number of queries is a few times that amount.
So, although this works well, it does not scale well, as it may potentially result in hundreds of queries.
If I perform a query to grab all the data in one query, along the lines of:
SELECT p.*, pa.*, n.*
FROM products p
LEFT JOIN product_attributes pa ON pa.product_id = p.product_id
LEFT JOIN product_notifications pn ON pn.product_id = p.product_id
LEFT JOIN notifications n ON n.notification_id = pn.notification_id
(Again oversimplified). This gets the data per se, but per each attribute and notification a product has, an extra row with redundant information will be returned.
For example, if I have two products in the database; one has 1 attribute and 1 flag and the other has 3 attributes and 2 flags, it will return:
product_id, product_name, product_date_added, product_attribute_id, value, notification_id, notification_label
1, My Product, 10/10/10, 1, Color: Red, 1, Add This Product
2, Busy Product, 10/11/10, 2, Color: Blue, 1, Add This Product
2, Busy Product, 10/11/10, 2, Color: Blue, 2, Update This Product
2, Busy Product, 10/11/10, 3, Style: New, 1, Add This Product
2, Busy Product, 10/11/10, 3, Style: New, 2, Update This Product
Needless to say that's a lot of redundant information. The number of rows returned per product would be the number of attributes it has times the number of notifications it has.
The ORM (or, just creating the new queries in the loop in general) consolidates all of the information in each row into it's own object, allowing for the data to be handled more logically. That's the rock. Calling the information in one query eliminates the need for possibly hundreds of queries, but creates lots of redundant data in rows and therefore does not return the (one/many)-to-many relationship data in succinct sets. That's the hard place.
Sorry it's so long, trying to be thorough, haha, thanks!
An interesting alternative is to handle your reads and your writes with completely separate models. (Command Query Separation). Sophisticated object models (and ORMS) are great for modeling complex business behavior, but are lousy as interfaces for querying and displaying information to users. You mentioned that you weren't opposed to ditching the ORM for rendering displays -- well, that's exactly what many software architects nowadays suggest. Write a totally different interface (with its own optimized queries) for reading and reporting on data. The "read" model could query the same database that you use with your ORM backed "write" model, or it could be a separate one that is denormalized and optimized for the reports/screens you need to generate.
Check out these two presentations. It may sound like overkill (and it may be if your performance requirements are very low), but it's amazing how this technique makes so many problems just go away.
Udi Dahan: "Command-Query
Responsibility Segregation"
Greg Young: "Unshackle Your
Domain"
A good ORM should handle this for you. If you feel you must do it manually, you can do this.
Fetch all the categories you need in a single query and store the primary key ID's in a PHP array.
Run a query similar to this:
mysql_query('SELECT yourListOfFieldsHere FROM Products WHERE Product_id IN ('.implode(',', $categoryIDs).')');
This should give you all the products that you need in a single query. Then use PHP to map these to the correct categories and display accordingly.

Categories