I need some help please! Basically I have a system that has an unlimited amount of categories and the way in which it works is through unique IDs. So basically the system will find the root folder and match all subfolders based on its parent's UID. An endless loop...
But now I want to do the opposite of that in a single MySQL statement (if possible).
Basically I want it to do this.. (By the way this isn't my actual code, it's just how I want it to work)
SELECT UID FROM Table
WHERE UID = 'value'
--AND ALSO:
SELECT * FROM SameTable
WHERE UID = The Parent UID just fetched...
And do this until the UID = 'Specified Value'.
I seriously hope that makes sense!
Is it even possible? I could do it using multiple queries in a PHP loop I know, but that just feels like a long way around, and bad practice.
What you have is called "Hierarchical data". You have to read on it on google. In short, there are three main ways to represent it in a 2-dimensional table:
Adjacency list (what you have). You scarcely can make it with single query
Materialized path (my favorite). Natural and readable. Not so efficient though.
Nested set (Most complicated) yet most powerful.
You can choose any system you like ir stick to your current one. Single query is not Holy grail to pursue at any cost.
Related
I wonder if it is possible to query a specific part of a comma separated string, something like the following:
$results = mysql_query("SELECT * FROM table1 WHERE $pid=table1.recordA[2] ",$con);
$pid is a number
and recordA contains data like
34,9008,606,,416,2
where i want to check the third part (606)
Thank you in advance
Having comma seperated lists or any data seperation within a mySQL field is frowned upon and is to all extents bad practice.
Rather than looking at querying an element of a delimetered list within a mySQL field consider breaking the field into its own table and then creating an adjacency list to create a 1:many relationship between table1 and it's associated variables.
If you are commited to this route, the simplest method would be to use PHP to manage it as mySQL has very few tools (above and beyond regex / text searches) to drill down to the data you want to extract. $results = explode(',',$query); would create an array of your variables from the returned field allowing you to run as many conditional checks against it as needed.
However, consider adding this to your 'need to re-write / re-think' list. A relational tables structure would allow you to query the database for $pid's value directly as it would be contained within it's own field and linked
If the delimetered variable list is of an inderterminate length or the relationships between the variables are heirarchical you'd be better off searching stackoverflow for information on Directed Acyclic Graphs in mySQL to find a better solution to the problem.
Without knowing the nature or the intended purpose for this script I can't answer in any more detail. I hope this has helped a little.
How about this:
SELECT * FROM table1 WHERE FIND_IN_SET({$pid}, recordA) = 3
Make sure to index recordA. I love normalization as much as the next guy, but sometimes breaking it up is just more trouble than it's worth ;)
Meet Jimmy. He has made his new life goal to prove that Chocolate is the best ice cream flavor ever. For this he built a simple form with radio buttons and a text field for the name so he can send the link to his friends.
He is using a very common set up, MySQL and PHP to save the form submissions in a table that looks like this:
selection being the id of the flavor. The flavors are stored in a PHP array because he plans to use the favor list in future pages:
$flavors = array(
1=>"Chocolate",
2=>"Cherry",
....
);
The form was a success and his friends are starting to ask Jimmy to add new options, so Jimmy has decided to take it to the next level and add country, age, email and other things to the form, but this time he is doubtful about whether it is a better idea to put the flavor names, countries, ages and other static data in arrays or save each of them in a database table, he knows how to do joins in queries anyways.
First approach would mean having a PHP file with many arrays and having access it every time Jimmy needs the flavor name something like:
$query = mysql_query("SELECT name, flavor FROM votes")
while($row = mysql_fetch_assoc($query)){
echo $row["name"]." - ".$flavors[$row["flavor"]];
}
Second approach would mean having many tables in the database and having to do a join every time he needs a name like this:
$query = mysql_query("SELECT name, flavor FROM votes LEFT JOIN flavors
WHERE votes.flavor = flavors.flavor");
while($row = mysql_fetch_assoc($query)){
echo $row["name"]." - ".$row["flavor"];
}
Although there seems to be little difference this is an important decision for Jimmy as he wants to build many more and bigger forms in the future.
What is the best way for Jimmy to handle static data like flavor names, countries, age groups, etc. that is associated with IDs in the database?
Given environmental details:
The arrays are static and will almost never change
He will be using the data on several pages so hard coding is not convenient
Adding a new array is usually faster
Thanks in advance for helping him out.
The second option is of-course more scale-able and almost better in every aspect, the only argument could be performance given that his data is gonna eventually get really big. But even at that point jimmy can easily cache the result from the new flavors table, using a technology like memcache, x-cache, or even write a code that will create the php file with the array of flavors dynamically using the flavores database. I am very confused as why someone with your reputation will ask such a question?!
I think Jimmy should be thinking about how he wants to lay his data out. Why limit his conquest to flavors.
If Jimmy is looking to store a lot of small bits of data, databases are the way to go. If Jimmy wants to store images of the items, those should be stored in files and he should store their relative location to some root directory in the database
Maybe one table can contain:
VOTE_ITEMS
ID - PRIMARY KEY
NAME
IMAGE
TAGS - (Maybe an imploded ID array with the IDs pointing to a TAG table)
...
Another table can contain:
USERS
ID - PRIMARY KEY
...
(As much information as your want to collect from your users)
...
On to voting:
POLLS
ID
VOTE_ITEM_IDS
...
USER_VOTES
POLLS_ID
VOTE_ITEM
USER_ID
Since Jimmy seems to know a lot about databases, anytime he wants to add something on he can just add another column (or table) depending on his needs. Also, if we wraps a sweet user system he can reuse it in other projects in the future!
I tend to store values like this in db tables, mainly so they can be modified via CMS.
Then I retrieve them all at once, only once, near the beginning of my PHP code, in a globals array ... e.g. $glob['flavors'], $glob['cities'], etc. Then it's as simple as ...
foreach ($person) {
echo 'Their flavor = '. $glob['flavors'][$person['flavor_id']];
}
... but you have to remember to include the global in any functions that will use it.
Benefits of this: Only one db lookup, global access.
Drawbacks of this: Memory hog if array is huge
I'm using PHP and MySQL. I have records for:
events with various "event types" that are hierarchical (events can have multiple categories and subcategories, but there are a fixed amount of such categories and subcategories) (timestamped)
What is the best way to set up the table? Should I have a bunch of columns (30 or so) with enums for yes or no indicating membership in that category? or should I use MySQL SET datatype?
http://dev.mysql.com/tech-resources/articles/mysql-set-datatype.html
Basically I have performance in mind and I want to be able to retrieve all of the ids of the events for a given category. Just looking for some insight on the most efficient way to do this.
It sounds like you're chiefly concerned with performance.
A couple people have suggested splitting into 3 tables (category table plus either simple cross-reference table or a more sophisticated way of modeling the tree hierarchy, like nested set or materialized path), which is the first thing I thought when I read your question.
With indexes, a fully normalized approach like that (which adds two JOINs) will still have "pretty good" read performance. One issue is that an INSERT or UPDATE to an event now may also include one or more INSERT/UPDATE/DELETEs to the cross-reference table, which on MyISAM means the cross-reference table is locked and on InnoDB means the rows are locked, so if your database is busy with a significant number of writes you're going to have a larger contention problems than if just the event rows were locked.
Personally, I would try out this fully normalized approach before optimizing. But, I'll assume you know what you're doing, that your assumptions are correct (categories never change) and you have a usage pattern (lots of writes) that calls for a less-normalized, flat structure. That's totally fine and is part of what NoSQL is about.
SET vs. "lots of columns"
So, as to your actual question "SET vs. lots of columns", I can say that I've worked with two companies with smart engineers (whose products were CRM web applications ... one was actually events management), and they both used the "lots of columns" approach for this kind of static set data.
My advice would be to think about all of the queries you will be doing on this table (weighted by their frequency) and how the indexes would work.
First, with the "lots of columns" approach you are going to need indexes on each of these columns so that you can do SELECT FROM events WHERE CategoryX = TRUE. With the indexes, that is a super-fast query.
Versus with SET, you must use bitwise AND (&), LIKE, or FIND_IN_SET() to do this query. That means the query can't use an index and must do a linear search of all rows (you can use EXPLAIN to verify this). Slow query!
That's the main reason SET is a bad idea -- its index is only useful if you're selecting by exact groups of categories. SET works great if you'd be selecting categories by event, but not the other way around.
The primary problem with the less-normalized "lots of columns" approach (versus fully normalized) is that it doesn't scale. If you have 5 categories and they never change, fine, but if you have 500 and are changing them, it's a big problem. In your scenario, with around 30 that never change, the primary issue is that there's an index on every column, so if you're doing frequent writes, those queries become slower because of the number of indexes that have to updated. If you choose this approach, you might want to check the MySQL slow query log to make sure there aren't outlier slow queries because of contention at busy times of day.
In your case, if yours is a typical read-heavy web app, I think going with the "lots of columns" approach (as the two CRM products did, for the same reason) is probably sane. It is definitely faster than SET for that SELECT query.
TL;DR Don't use SET because the "select events by category" query will be slow.
It's good that the number of categories is fixed. If it wasn't you couldn't use either approach.
Check the Why You Shouldn't Use SET on the page you linked. I think that should give you a comprehensive guide.
I think the most important one is about indexes. Also, modifying a SET is slightly more complex.
The relationship between events and event types/categories is a many to many relationship, as echo says, but a simple xref table will leave you with a problem: If you want to query for all descendants of any given node, then you must make multiple recursive queries. On a deep tree, that will be very inefficient.
So when you say "retrieve all ids for a given category", if you do mean all descendants, then you want to use a Nested Set Model:
http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
The Nested Set model makes writes updates a bit slower, but makes it very easy to retrieve subtrees:
To get the Televisions sub tree, you query for all categories left >= 2 and right <= 9.
Leaf nodes always have left = right - 1
You can find the count of descendants without pulling those rows: (right - left - 1)/2
Finding inheritance paths and depth is also very easy (single query stuff). See the article for full details.
You might try using a cross-reference (Xref) table, to create a many-to-many relationship between your events and their types.
create table event_category_event_xref
(
event_id int,
event_category_id int,
foreign key(event_id) references event(id),
foreign key (event_category_id) references event_category(id)
);
Event / category membership is defined by records in this table. So if you have a record with {event_id = 3, event_category_id = 52}, it means event #3 is in category #52. Similarly you can have records for {event_id = 3, event_category_id = 27}, and so on.
Im sketching out a database layout for a website that has the potential to become huge with 100's of queries a minute.
I was thinking about doing the following:
user table
id
name
(few more fields)
Pages (this one will become the biggest table)
id
titel
img
text
restaurant (this will be the row that connects the pages to the user table, i was planning on creating an index on this one to increase speed)
So im wondering if creating an index for the 'restaurant' row will increase the speed of my queries or if there is any other way to speed up things?
Thanks in advance!
If you need to do some query like :
select *
from pages
where restaurant = ...
Or like :
select *
from user
inner join pages on pages.restaurant = user.id
where user.name = '...'
Or any other condition on the restaurant column, then, you'll probably want to add an index on that column, to avoid scanning all lines on the pages table.
But note that useful/necessary indexes will almost always depend on the kind of queries you'll be doing.
Which means that it's not quite possible to accurately guess which indexes you'll need -- first, you need to know how you will access you data.
Note : you should read the How MySQL Uses Indexes section of MySQL's manual : it contains stuff that's interesting to know ;-)
As a test, you can always run your query in your preferred tool and add EXPLAIN in front. This will show you what indices are being used and/or which temporary tables had to be created etc.
EXPLAIN select *
from pages
where restaurant = ...
If you're using the InnoDB storage, you should not just use 'an index' but make use of FOREIGN KEY. Thus, you will also decrease potential integrity problems.
Suggestion: do not use restaurant as a name. Add some more tables and it will be difficult to keep track what references what. Why not call it user_id? (This is a matter of personal preference, though.)
Indeed.com groups duplicate job postings by title and description. Here is an example of what I am talking about. How would I go about doing something like that? Is it just a simple Group By statement or something else entirely?
It could be done with a simple group by, but that will only group exact matches.
There are several parameters you can test to determine whether to group entries. In their example: company name, location, and keywords.
"Something else entirely" would involve analyzing the fields of one row to determine their similarity to another row. I think this would probably be too processor intensive to integrate on a large-scale.
I'm not exactly sure what you're looking at in the example. But it wouldn't really make sense to do a sql group on something like description. That would cause a ton of overhead, especially with the amount of data indeed is keeping track of.
A good way to store data similar to what indeed stores would be with document index, try googling solr or nosql.