PHP / MySQL Adjacency List Question - php

I have the following issue : imaging an adjacency list, that's been walked over with a recursion
coming out of a sql like this
SELECT * FROM pages as ps ORDER BY COALESCE(child_of,page_id), page_id LIMIT x,y
public static function tree(&$arr, $id = NULL) {
$result = array();
foreach ($arr as $a) {
if ($id == $a['child_of']) {
$a ['children'] = self::tree($arr, $a['page_id']);
$result[] = $a;
}
}
return $result;
}
So far, so good - with another "flattener" I am getting where I need to be. Now , here is the trick ,
this works on "paginated" results, and what possibly can happen ( and it does ) is that the parent can be in one subset and the child in a different subset. With the recursion above is obvious that
the child won't make it to the tree with a missing parent.
Any ideas on how can i solve that?
Help is much appreciated.

Hierarchical data in relational tables, don't we all love it?
With your current database layout, you can only solve your problem by either always fetching all nodes, or doing as many JOINS as you have nesting levels, sort everything properly (your way of sorting only makes this fundamental problem, that you have, a little less important).
Before you ask, No, you should not do this.
The other method you have, is choose an entirely different model to create your hierarchy:
Nested sets
ascendant/descendant relationships between all nodes.
See slide 48 et seq. here.

A good read to start with is Hierarchical Data In MySQL (which I used to be able to find on the MySQL.com website, arghh)
Read It?
Here's how it could be done with the adjacency list model. But only for a known fixed amount of nesting (four nesting levels for this example).
I would find out which of my pages are root pages (of the tree). Then select only those with a query. Put the LIMIT x,x in this select statement.
After that, the following statement: (or something like it)
string query = "
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
LEFT JOIN category AS t4 ON t4.parent = t3.category_id
WHERE t1.name IN('ELECTRONICS', '<some other name>');
";
Could return something like this:
+-------------+----------------------+--------------+-------+
| lev1 | lev2 | lev3 | lev4 |
+-------------+----------------------+--------------+-------+
| ELECTRONICS | TELEVISIONS | TUBE | NULL |
| ELECTRONICS | TELEVISIONS | LCD | NULL |
| ELECTRONICS | TELEVISIONS | PLASMA | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | MP3 PLAYERS | FLASH |
| ELECTRONICS | PORTABLE ELECTRONICS | CD PLAYERS | NULL |
| ELECTRONICS | PORTABLE ELECTRONICS | 2 WAY RADIOS | NULL |
| etc... | etc... | etc... | |
+-------------+----------------------+--------------+-------+
The trick is to use only the root names of the query with limit (or ID's if you want) in the IN() statement of the query.
This should perform pretty good still (in theory).
The principle of the above query could also be used to find out how many descendants are in a root of a tree (with a little GROUP BYand COUNT() magic ;) Also, you could find out which of your pages are roots with this principle (though I would save that in the tabledata for performance reasons)
If you want a dynamic amount of nesting (for nearly endless scaling), implementing a nested set would be the way to go.

Related

Best way of representing hierarchical tasks/sub-tasks (MySQL/PHP)

I am making a simple todo app (Laravel4/MySQL) and it needs the ability to make tasks and subtasks (limiting it to max. 3 levels)
I was checking out Nested-Set implementations for Laravel here and here. Is it an overkill for my requirement?
I'm guessing nested-sets saves hierarchy data globally (against say, a per-user or per-project basis) and are better for items like a multilevel menu with a limited number of items.
What is the best implementation for my case, where hundreds of users would have a multitude of projects and each having hundreds of multilevel tasks/sub-tasks? Would there be unnecessary traversals/overheads if I implement nested-sets for my case?
I recommend to read the Managing hierarchical data in mysql article.
Briefly,
CREATE TABLE category(
category_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(20) NOT NULL,
parent INT DEFAULT NULL
);
INSERT INTO category VALUES(1,'ELECTRONICS',NULL),(2,'TELEVISIONS',1),(3,'TUBE',2),
(4,'LCD',2),(5,'PLASMA',2),(6,'PORTABLE ELECTRONICS',1),(7,'MP3 PLAYERS',1),(8,'FLASH',7),
(9,'CD PLAYERS',6),(10,'2 WAY RADIOS',6);
SELECT * FROM category ORDER BY category_id;
+-------------+----------------------+--------+
| category_id | name | parent |
+-------------+----------------------+--------+
| 1 | ELECTRONICS | NULL |
| 2 | TELEVISIONS | 1 |
| 3 | TUBE | 2 |
| 4 | LCD | 2 |
| 5 | PLASMA | 2 |
| 6 | PORTABLE ELECTRONICS | 1 |
| 7 | MP3 PLAYERS | 6 |
| 8 | FLASH | 7 |
| 9 | CD PLAYERS | 6 |
| 10 | 2 WAY RADIOS | 6 |
+-------------+----------------------+--------+
The query retrieve all your data:
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
WHERE t1.name = 'ELECTRONICS';
The retrieving only leaf names:
SELECT t1.name FROM
category AS t1 LEFT JOIN category as t2
ON t1.category_id = t2.parent
WHERE t2.category_id IS NULL;
The retrieving one path:
SELECT t1.name AS lev1, t2.name as lev2, t3.name as lev3, t4.name as lev4
FROM category AS t1
LEFT JOIN category AS t2 ON t2.parent = t1.category_id
LEFT JOIN category AS t3 ON t3.parent = t2.category_id
WHERE t1.name = 'ELECTRONICS' AND t3.name = 'FLASH';
My answer would be Closure Table.
I read about a few ways to solve hierarchies in the book SQL Antipatterns (https://books.google.com/books/about/SQL_Antipatterns.html?id=Ghr4RAAACAAJ). I definitely recommend reading the book.
My favorite way to implement hierarchies is via closure tables. This is a great source that explains them in depth: http://technobytz.com/closure_table_store_hierarchical_data.html.
To summarize: make one table that keeps track of the actual items in the hierarchy (e.g. task_id, task_description, time_opened, etc.) and another table to track the relations. This second table should have things such as task_id and parent_task_id. The best trick with these tables is to keep track of every parent-child relation, not just the direct parent-child relations. So if you have Task 1 that has a child task, Task 2, and Task 2 has a child task, Task 3, keep track of the parent child relation between Task 1 and Task 2 as well as between Task 1 and Task 3.
The tradeoff with closure tables vs nested sets is that closure tables consume more memory, but have less computing needed when doing operations. This is because you store every relation between every task (this takes memory) and the simple availability of all of these relationships makes it faster for the RDBMS to get information about the relationships.
Hope this helps!

Most efficient JOIN query - MySQL

Below is a gross over simplification of 2 very large tables I'm working worth.
campaign table
| id | uid | name | contact | pin | icon |
| 1 | 7 | bob | ted | y6w | yuy |
| 2 | 7 | ned | joe | y6e | ygy |
| 3 | 6 | sam | jon | y6t | ouy |
records table
| id | uid | cid | fname | lname | address | city | phone |
| 1 | 7 | 1 | lars | jack | 13 main | lkjh | 55555 |
| 2 | 7 | 1 | rars | jock | 10 maun | oyjh | 55595 |
| 2 | 7 | 1 | ssrs | frck | 10 eaun | oyrh | 88595 |
The page loops thru the records table and prints the results to an HTML table. The existing code, for some reason, does a separate query for each record "select name from campaign where id = $res['cid']" I'd like to get rid of the second query and do a some kind of join but what is the most effective way to do it?
I need to
SELECT * FROM records
and also
SELECT name FROM campaigns WHERE campaigns.id = records.cid
in a single query.
How can I do this efficiently?
Simply join the two tables. You already have the required WHERE condition. Select all columns from one but only one column from the other. Like this:
SELECT records.*, campaigns.name
FROM records, campaigns
WHERE campaigns.id = records.cid
Note that a record row without matching campaign will get lost. To avoid that, rephrase your query like this:
SELECT records.*, campaigns.name
FROM records LEFT JOIN campaigns
ON campaigns.id = records.cid
Now you'll get NULL names instead of missing rows.
The "most efficient" part is where the answer becomes very tricky. Generally a great way to do this would be to simply write a query with a join on the two tables and happily skip away singing songs about kittens. However, it really depends on a lot more factors. how big are the tables, are they indexed nicely on the right columns for the query? When the query runs, how many records are generated? Are the results being ordered in the query?
This is where is starts being a little bit of an art over science. Have a look at the explain plan, understand what is happening, look for ways to make it more efficient or simpler. Sometimes running two subqueries in the from clause that will generate only a subset of data each is much more efficient than trying to join the entire tables and select data you need from there.
To answer this question in more detail, while hoping to be accurate for your particular case will need a LOT more information.
If I was to guess at some of these things in your database, I would suggest the following using a simple join if your tables are less than a few million rows and your database performance is decent. If you are re-running the EXACT query multiple times, even a slow query can be cached by MySQL VERY nicely, so look at that as well. I have an application running on a terribly specc'ed machine, where I wrote a cron job that simply runs a few queries with new data that is loaded overnight and all my users think the queries are instant as I make sure that they are cached. Sometimes it is the little tricks that really pay off.
Lastly, if you are actually just starting out with SQL or aren't as familiar as you think you might eventually get - you might want to read this Q&A that I wrote which covers off a lot of basic to intermediate topcs on queries, such as joins, subqueries, aggregate queries and basically a lot more stuff that is worth knowing.
You can use this query
SELECT records.*, campaigns.name
FROM records, campaigns
WHERE campaigns.id = records.cid
But, it's much better to use INNER JOIN (the new ANSI standard, ANSI-92) because it's more readable and you can easily replace INNER with LEFT or other types of join.
SELECT records.*, campaigns.name
FROM records INNER JOIN campaigns
ON campaigns.id = records.cid
More explanation here:
SQL Inner Join. ON condition vs WHERE clause
INNER JOIN ON vs WHERE clause
SELECT *
FROM records
LEFT JOIN campaigns
on records.cid = campaigns.id;
Using a left join instead of inner join guarantees that you will still list every records entry.

Faceted Search (solr) vs Good old filtering via PHP?

I am planning on setting up a filter system (refine your search) in my ecommerce stores. You can see an example here: http://www.bettymills.com/shop/product/find/Air+and+HVAC+Filters
Platforms such as PrestaShop, OpenCart and Magento have what's called a Layered Navigation.
My question is what is the difference between the Layered Navigation in platforms such as Magento or PrestaShop in comparison to using something like Solr or Lucene for faceted navigation.
Can a similar result be accomplished via just php and mysql?
A detailed explanation is much appreciated.
Layered Navigation == Faceted Search.
They are the same thing, but Magento and al uses different wording, probably to be catchy. As far as I know, Magento supports both the Solr faceted search or the MySQL one. The main difference is the performance.
Performance is the main trade-off.
To do faceted search in MySQL requires you to join tables, while Solr indexes the document facets automatically for filtering. You can generally achieve fast response times using Solr (<100ms for a multi-facet search query) on average hardware. While MySQL will take longer for the same search, it can be optimized with indexes to achieve similar response times.
The downside to Solr is that it requires you to configure, secure and run yet another service on your server. It can also be pretty CPU and memory intensive depending on your configuration (Tomcat, jetty, etc.).
Faceted search in PHP/MySQL is possible, and not as hard as you'd think.
You need a specific database schema, but it's feasible. Here's a simple example:
product
+----+------------+
| id | name |
+----+------------+
| 1 | blue paint |
| 2 | red paint |
+----+------------+
classification
+----+----------+
| id | name |
+----+----------+
| 1 | color |
| 2 | material |
| 3 | dept |
+----+----------+
product_classification
+------------+-------------------+-------+
| product_id | classification_id | value |
+------------+-------------------+-------+
| 1 | 1 | blue |
| 1 | 2 | latex |
| 1 | 3 | paint |
| 1 | 3 | home |
| 2 | 1 | red |
| 2 | 2 | latex |
| 2 | 3 | paint |
| 2 | 3 | home |
+------------+-------------------+-------+
So, say someones search for paint, you'd do something like:
SELECT p.* FROM product p WHERE name LIKE '%paint%';
This would return both entries from the product table.
Once your search has executed, you can fetch the associated facets (filters) of your result using a query like this one:
SELECT c.id, c.name, pc.value FROM product p
LEFT JOIN product_classification pc ON pc.product_id = p.id
LEFT JOIN classification c ON c.id = pc.classification_id
WHERE p.name LIKE '%paint%'
GROUP BY c.id, pc.value
ORDER BY c.id;
This'll give you something like:
+------+----------+-------+
| id | name | value |
+------+----------+-------+
| 1 | color | blue |
| 1 | color | red |
| 2 | material | latex |
| 3 | dept | home |
| 3 | dept | paint |
+------+----------+-------+
So, in your result set, you know that there are products whose color are blue and red, that the only material it's made from is latex, and that it can be found in departments home and paint.
Once a user select a facet, just modify the original search query:
SELECT p.* FROM product p
LEFT JOIN product_classification pc ON pc.product_id = p.id
WHERE
p.name LIKE '%paint%' AND (
(pc.classification_id = 1 AND pc.value = 'blue') OR
(pc.classification_id = 3 AND pc.value = 'home')
)
GROUP BY p.id
HAVING COUNT(p.id) = 2;
So, here the user is searching for keyword paint, and includes two facets: facet blue for color, and home for department. This'll give you:
+----+------------+
| id | name |
+----+------------+
| 1 | blue paint |
+----+------------+
So, in conclusion. Although it's available out-of-the-box in Solr, it's possible to implement it in SQL fairly easily.
Magento Enterprise Edition has an implementation of Solr with faceted search. Still you need to configure Solr to index the correct data; i.e. Solr runs on Java on a host with a specific port. Magento connects to it through a given url. When Magento sets up the faceted search, it does a request to Solr and processes the received xml into a form on the frontend.
The difference would be one of speed. Requesting to Solr is very fast. If you have about 100,000+ products in your shop and want quick responses on search requests, you can use Solr. But still, if you have a separate server for the Magento database with a lot of memory, you can also just use Magento's built in Mysql based faceted search. If you don't have money to spend on Magento EE, you can use this solr implementation. But I do not have any experience with this one.
out of the solr box, you can use calculated facet, range, choose a facet or exclude one, declare if a facet is mono valued, or multi valued with a very low cpu/ram cost
On the other hand, it takes some time to parameter and secure the solr installation, it also takes some time to crawl your data.
You can created faceted search with just PHP and MySQL, Drupal Faceted Search is a good example. But if you already use Solr, you get faceted search included for free.

How to do an IF THEN Statement in MySQL?

MysQL (table1):
+----+--------+-------+--------+
| id | itemid | title | status |
+----+--------+-------+--------+
| 1 | 2 | title | 0 |
+----+--------+-------+--------+
| 2 | 2 | title | 1 |
+----+--------+-------+--------+
| 3 | 3 | title | 1 |
+----+--------+-------+--------+
| 4 | 3 | title | 0 |
+----+--------+-------+--------+
| 5 | 3 | title | 0 |
+----+--------+-------+--------+
MySQL (table2):
+----+---+---+
| id | x | y |
+----+---+---+
| id | 1 | 2 |
+----+---+---+
PHP:
(I know the query below makes no sense, but it should just illustrate what I am trying to do here.)
$a = mysql_query("SELECT t1.title FROM table1 AS t1 WHERE t1.title = 'title' ...IF t1.status = 1 THEN (SELECT * FROM table2 AS t2 WHERE t2.x = '1' AND t2.y = t1.itemid) ORDER BY `id`");
while($b = mysql_fetch_assoc($a))
{
echo $b['title'];
}
So, what I want to do is:
1) Get from table1 all rows that match title = title
2) If the status in table1 is equal to 0 do nothing, just display the data
3) However, if the status in table1 is equal to 1 then it should check if there is a record in table2 where x = 1 and y = itemid (from table1), if there isn't than the data from table1 should be excluded
Example:
In the example above, it should display ids (table1): 1, 2, 4, 5 ...3 should be excluded, because the status is 1, and there is no record that matches in table 2.
I hope this makes sense :/
you should use join for this.
Mysql join
or have a look how to use control flow functions in select statements:
Control flow functions
I would just like to assert that if you've got any if-then-else logic, any further processing of the data that you have stored in a database, use a programming language for that. There's nothing stopping you from writing all this logic in PHP, Java, C#, etc. There are far too many questions here about if-then-else stuff when retrieving data from database.
Some database management systems have their own dialect for programmatic SQL, be it PL/SQL or T-SQL for Oracle/SQL Server respectively... You could create your own modules with logic (packages, stored procedures) in your database and then use those, but then... do you really need to? Why could you not just implement all the data-presentation logic in your PHP script?
I'd like to expand on the previous answers a bit with some conceptual background: remember that SQL is declarative, not procedural. What this means is that you basically use SQL to tell the database what you want your result table to look like, and it figures out how to give it to you. When you're thinking about if/then statements and control logic, you're thinking in procedural terms, not declarative terms. This is why the previous answerer is suggesting to do your if/else logic in a 'programming language' like C# or PHP.
That's pretty abstract and so might not be directly applicable for you right this moment, but I find it helpful to compartmentalize things into 'procedural' and 'declarative' when I'm working with SQL and a scripting language (like PHP).
This StackOverflow question has some pretty good answers on the procedural vs. declarative concept.

Sub categories Hierarchy

I designed a SQL structure to represent categories and their subcategories.
I have 3 tables:
articles
articles_categories
categories
Articles table:
id,title,content
Categories table:
id, title, parent_id
articles_categories:
id,article_id,category_id
No problem with SQL, but now - lets say i'm on article id 5
article id 5 has - 3 categories, that 2 of them has parents, and the main has '0' as parent.
How do I fetch them all efficiently? (lets say for - breadcrumbs).
thanks!
Unless the depth of the category hierarchy is fixed, you cannot do this in MySQL with your current model (adjacency list). You'd have to traverse the hierarchy using several SQL statements in a loop.
If the category hierarchy is fairly static, you can "precompute" the tree by using:
Path enumeration
Nested sets
Closure table
All of the above, trades write performance for read performance.
Google or search SO for any of the above and you will find examples of how to implement it.
Quite often, I find that storing the data in a adjacency list (because of best matches the data model) and caching a copy of the tree in the application is good enough, but that depends on your requirements of course :)
This should do the job:
select * from articles_categories
left join categories on categories.id = articles_categories.category_id
where article_id=1;
+------+------------+-------------+------+--------+-----------+
| id | article_id | category_id | id | title | parent_id |
+------+------------+-------------+------+--------+-----------+
| NULL | 1 | 1 | 1 | first | 0 |
| NULL | 1 | 2 | 2 | second | 1 |
| NULL | 1 | 3 | 3 | third | 2 |
+------+------------+-------------+------+--------+-----------+
Additionally, I would remove the "id" column from associative table articles_categories.

Categories