Database Schema for News System

Database Schema for News System - php

I have a news system I'm designing, and it seemed straight-forward at first, but as I've pushed forward with my planned schema I've hit problems... Clearly I haven't thought it through. Can anyone help?
The system requires that the latest 20 news articles be grabbed from the database. It's blog-like in this way. Each article can have sub-articles (usually around 3) that can be accessed from the parent article. The sub-articles are only ever visible when the parent article is visible -- they're not used elsewhere.
The client needs to be able to hide/display news articles (easy), but also change their order, if they desire (harder).
I initially stored the sub-articles in a separate table, but then I realised that the fields were essentially the same: Headline, Copy, Image. So why not just put them all in one big table?
Now I've hit other problems around the ordering. It's Friday evening and my head hurts!
Can anyone offer advice?
Thanks.
Update: People have asked to see my "existing" schema:
articleID *
headline
copy
imageURL
visible
pageOrder
subArticleID *
articleID
headline
copy
imageURL
visible
pageNumber
pageOrder
Will this work? How would I go about letting users change the order? It seemed the wrong way to do it, to me, so I threw this out.

I initially stored the sub-articles in a separate table, but then I realised that the fields were essentially the same: Headline, Copy, Image. So why not just put them all in one big table?
Because referential integrities are not the same.
That is, of course, if you want to restrict the tree to exactly 2 levels. If you want more general data model (even if that means later restricting it at the application level), then go ahead and make a general tree.
This would probably look something like this:
Note how both PARENT_ARTICLE_ID and ORDER are NULL-able (so you can represent a root) and how both comprise the UNIQUE constraint denoted by U1 in the diagram above (so no two articles can be ambiguously ordered under the same parent).

Based on what you've described. I would use two tables. The first table would hold all the articles and sub-articles. The second would tie the articles to their sub-articles.
The first table (call it articles) might have these columns:
+-----------+----------+------+----------+---------+------------+-----------+
| articleID | headline | copy | imageURL | visible | pageNumber | pageOrder |
+-----------+----------+------+----------+---------+------------+-----------+
The second table (call it articleRelationships) might have these columns:
+-----------------+----------------+
| parentArticleID | childArticleID |
+-----------------+----------------+
Not sure if you already accomplish this with the pageNumber column, but if not, you could add a column for something like articleLevel and give it something like a 1 for main articles, 2 for sub-articles of the main one, 3 for sub-articles of a level 2 article, etc. So that way, when selecting the latest 20 articles to be grabbed, you just select from the table where articleLevel = 1.
I'm thinking it would probably also be useful to store a date/time with each article so that you can order by that. As far as any other ordering goes, you'll have to clarify more on that for me to be more help there.
To display them for the user, I would use AJAX. I would first display the latest 20 main articles on the screen, then when the user chooses to view the sub-articles for a particular article, use AJAX to call the database and do a query like this:
SELECT a.articleID, a.headline
FROM articles a
INNER JOIN articleRelationships ar ON a.articleID = ar.childArticleID
WHERE ar.parentArticleID = ? /* ? is the articleID that the user clicked */
ORDER BY articleID

The client needs to be able to hide/display news articles (easy), but
also change their order, if they desire (harder).
On this particular point, you'll need to store client-specific ordering in a table. Exactly how you do this will depend, in part, on how you choose to deal with articles and subarticles. Something along these lines will work for articles.
client_id article_id article_order
--
1 1067 1
1 2340 2
1 87 3
...
You'll probably need to make some adjustments to the table and column names.
create table client_article_order (
client_id integer not null,
article_id integer not null,
article_order integer not null,
primary key (client_id, article_id),
foreign key (client_id) references clients (client_id) on delete cascade,
foreign key (article_id) references articles (article_id) on delete cascade
) engine = innodb;
Although I made article_order an integer, you can make a good case for using other data types instead. You could use float, double, or even varchar(n). Reordering can be troublesome.
If you don't need the client id, you can store the article ordering in the article's table.
But this is sounding more and more like the kind of thing Drupal and Wordpress do right out of the box. Is there a compelling reason to reinvent this wheel?

Create a new field in news(article) table "parent" which will contain news id of parent article. This new field will be used as a connection between articles and sub articles.

As SlideID "owns" SubSlideID, I would use a composite primary key for the second table.
PrimaryKey: slideID, subSlideID
Other index: slideID, pageNumber, pageOrder (Or however they get displayed)
One blog post I prefer to point out about this is http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx as it explains why very nicely.
If you're replying on Auto_Increment, that can be handled too (with MyISAM tables), you can still set subSlideID to auto_increment.
If you're likely to go to a third level then merge - follow Branko above. But it does start to get very complicated, so keep separate for 2 layers only.

Related

Maintain many level hierarchy of tables in database

I have to maintain the data of friend list of friends who liked a particular category post. And this may be at any level. For eg.
if a friend of A who is B like a wanted post. then I ll maintain the record of A’s friends and B’s friend. Basically my requirement is
If user visit my product site I have to tell him/her that you're following friend already visited the same and they actually recommend you to use this and to build confidence that you are on the right way as your friends are also using it. I also want to suggest A that C who is the friend to B is using this product since this time and C suggest to many for using it.
I know this logic is already implemented in good sites.
I am just a starter. So pls suggest me the database for backend and required things for frontend.
Specially this question is to maintain the record on database. So I am asking for the database what should I use not how should I implement that would be next step.
As I am planning to use Graph database for it. In graph either bigdata or Neo4j.
Your ideas are most welcome and will be appreciated. Thanks

I hope my logic may takes you few steps forward
Initially we have to maintain the mutual friends records
foe example
id mut_id
1 2,3,4
Here 2,3,4 are your friends
next we need to maintain the records who has purchased/visited
prod_id buy_id
1 2,1
Now suppose 3 id wants to buy or visit site then we can show that your friend already visited or buyed product

Friends' relations is a classical many-to-many scheme. You need two tables to implement this:
1. A table with personal data, such as name, email etc. (could be more complex like person-properties relation)
2. A table with friends' retaionships data, usually it contains ID pairs of friends that relation is representing and some data about relation itself, such as category (friend/family/classmate etc) , level of affinity (if >0 it means positive relation, <0 negative such as enemies) and so on. Assume first ID is a person this relation belongs to (and can be maintained by), second ID is a person this relation applies to. Usually such kind of tables is constrained to pair of IDs to be unique, so nobody will be able to add same person as a friend twice
Here is some sample:
CREATE TABLE person
(
id INT NOT NULL AUTO_INCREMENT,
name VARCHAR(255),
email VARCHAR(255),
PRIMARY KEY (person_id)
);
CREATE TABLE relationship
(
id_person INT NOT NULL REFERENCES person(id),
id_person_related INT NOT NULL REFERENCES person(id),
id_category INT REFERENCES relcategories(id),
affinity INT,
PRIMARY KEY (id_person, id_person_related)
);
Note that affinity and id_category fiels are optional and last one requires table relcategories with INT id field to be create first
Visits of one friend to another can also be stored in relationship in a separate field

PHP: Categorizing images into database

I'm building a website in PHP to share my comics. I'd like to implement categorization to allow people to filter which comics they'd like to see.
I've asked this question before, but at that time my site's architecture was not using a database.I've since implemented a database (which is amazing, btw) so I need to change things up.
I'm thinking the way to do this is:
1) Make 2 tables: 1 for categories, 1 for images
2) Insert images into their respective tables based on which filesystem folder they're in and assign that table id
3) Insert all images into All_Images table with their newly assigned category id
4) Take in user input to decide which images to show. So, if user input = cat 1, then show images with category 1 id.
So, basically I need a way to initially assign categories to the images when they come in from the filesystem. Is there an easier way to do this? Do I have to create multiple tables?
Any thoughts on this?
Thanks!!

The normal way would be to have an images table (presumably with filenames rather than the actual images?) and then a one-to-many relationship to categories so that each comic can have more than one:
Table:Image
-----------
rowid: integer identity
displayname: varchar
filename: varchar
Table:Category
--------------
rowid: integer identity
displayname: varchar
Table:ImageCategoryLink
-----------------------
imageid: integer foreign key references Image:rowid
categoryid: integer foreign key references Category:rowid
Clear?

One table category with id and name etc, one table for image with id src name etc.
Two choices after that :
If an image has one and only one category, put a field id_category in table image
If an image can has several categories make another table image_category with id_image and id_category

Since you are a beginner (no offence) and I don't think there will be millions or cartoons in the table. Use a SET field (or ENUM if your cartoon can only have one category) for you categories.
There are people who will vote this down since there are some negative side effects of using this (mainly when you want to change the categories). But with a relative small table this will not have any effect.
This will be the easiest solution. If you expect the site to grow big, use a second table for your categories and join the tables.

Standard DB table structure of web page additional content?

I have this page table structure to store all the website page information,
page_id
page_url
page_title
page_subtitle
page_description
page_introduction
page_content_1
page_content_2
page_content_3
page_content_4
...
You can see that I have page_content_1 to page_content_4, instead of just page_content. The reason why I do this because I might want to store different types of page content for each page.
But I doubt whether this is a good practice or not? What if other developer comes to further develop on this page table, would you find this structure redundant?
I am thinking maybe I should create another table to store additional page content like this below?
table page_additional_content,
content_id
content_additional_1
content_additional_2
content_additional_3
content_additional_4
page_id
Is this better?
Or there is better standard idea that I should look into?

Building on Quentin's answer, if you want to be able to reuse content across pages (such as headers or footers), you could create a table structure like:
page_content:
content_id (primary key)
actual_content (actual content of the page)
page_structure:
page_id (foreign key of the page)
content_id (foreign key of the content)
order_in_page (order of the content in the page)
You content can grow to as many sections as you want without adding additional tables (just add a new row in page_structure and increment the order_in_page counter).

When you start having columns with the same name, but with a numerical suffix then you should usually start thinking "New table + foreign key".
You would probably be better off with something along the lines of a content table structured like:
document_id | page | content
1 | 1 | foo, bar, baz

Which of these 2 database setups should I choose?

I have 3 types of content: blogs, press releases, and reminders. All of them have a body and entered by fields. The blogs and press releases have a title field, which the reminder lacks, and the reminders has an hour field, which blogs and press releases lack. This is what it looks like in tabular format so it's easy for you to see...
blog press release reminder
---------------------------------------------------
entered by field yes yes yes
body field yes yes yes
title field yes yes --
time field -- -- yes
I'm creating a main table called content that links to the specialized tables blogs press releases reminders. I thought of 2 structures
First structure... This is how the content management system I use does it, but I don't want to follow in their steps blindly because my needs are not the same. Put ALL shared fields in the main content table. So the content table will not only have type and type id to link to the specialized tables, the content table will also have the common fields like body and entered by. The other 3 tables only have their unique fields.
content table B=blogs table PR=press releases table R=reminders table
------------------------------------------------------------------------------
id id id id
type=B/PR/R title title hour
type id
body
entered by
Second structure. content table only has the type and type id necessary to link to the other 3 tables, This means that the common fields get repeated in the 3 tables.
content table B=blogs table PR=press releases table R=reminders table
------------------------------------------------------------------------------
id id id id
type=B/PR/R entered by entered by entered by
type id body body body
title title hour
Which should I go with? I thought the first structure is better because I can search all content whether it's a blog or press release or reminder for a specific word. I still have to look in the other tables if I want to search the title which is available only to blogs and press releases, but...
So which structure is better, and why you think so? I'm also open to other ideas or improvements that are different from these 2.

The first structure is a classic super type-subtype approach, and recommended. I would just suggest naming primary keys with full table-name-id like ContentID to avoid possible confusion.

The first one is the better construct, it allows for a content to have a specific set of required or common data in the content table and then specialized data in the child tables. This also allows you to add more types in the future with other requirements that still reuse the common elements in content but retain any unique data.
One other key question is if that data is required, for example do all reminders require an hour and do all blogs/press release require a title. If they are required then you ensure that those child tables will always be populated. If they are not then perhaps you should look at flattening the structure (yes Virginia you should sometimes denormalize).
So instead your content table simply becomes (nn = not null, n = nullable)
id (nn) ,type id (nn), type (nn), body (nn), entered by (nn), title (n), hour (n). The main reason I usually find for doing this is that if the different data entities you are creating are so similar that over time it is possible they will merge. For example reminders at this time do not require a title, but in the future the might.

I would sooner go without any sort of "type" field, instead making four tables: content, blogs, pressreleases and reminders. Content has the common fields enteredby, body, and title. For each of blogs, pressreleases and reminders, they have an id that is a primary key and also a foreign key to a content id. This makes a 1:1 "is-a" relationship. reminder can have the additional time field. To determine what type of entry a content row is, do a join select.
This may not be the best in terms of performance but it's better normalized.

I think you should think about the common fields.
Do they really need to match?
If they need to match, it's easier to just put it in a single table.

Selecting rows from MySQL

I'm trying to create a web index. Every advertiser in my database will be able to appear on a few categories, so I've added a categorys column, and in that column I'll store the categories separated by "," so it will look like:
1,3,5
The problem is that I have no idea how I'm supposed to select all of the advertisers in a certain category, like: mysql_query("SELECT * FROM advertisers WHERE category = ??");

If categories is another database table, you shouldn't use a plain-text field like that. Create a "pivot table" for the purpose, something like advertisers_categories that links the two tables together. With setup, you could do a query like:
SELECT A.* FROM advertisers AS A
JOIN advertisers_categories AS AC ON AC.advertiser_id = A.id
WHERE AC.category_id = 12;
The schema of advertisers_categories would look something like this:
# advertisers_categories
# --> id INT
# --> advertiser_id INT
# --> category_id INT

You should design your database in another way. Take a look at Atomicity.
Short: You should not store your value in the form of 1,3,5.
I won't give you an answer because if you starting you use it this way now, you going to run into much more severe problems later. No offense :)

It's not possible having comma-separated values to do this strictly in an SQL query. You could return every row and have a PHP script which goes through each row, using explode($row,',') and then if(in_array($exploded_row,'CATEGORY')) to check for the existence of the category.
The more common solution is to restructure your database. You're thinking too two-dimensionally. You're looking for the Many to Many Data Model
advertisers
-----------
id
name
etc.
categories
----------
id
name
etc.
ad_cat
------
advertiser_id
category_id
So ad_cat will have at least one (usually more) entry per advertiser and at least one (usually more) entry per category, and every entry in ad_cat will link one advertiser to one category.
The SQL query then involves grabbing every line from ad_cat with the desired category_id(s) and searching for an advertiser whose id is in the resulting query's output.

Your implementation as-is will make it difficult and taxing on your server's resources to do what you want.
I'd recommend creating a table that relates advertisers to categories and then querying on that table given a category id value to obtain the advertisers that are in that category.

That is a very wrong way to define categories, because your array of values cannot be normalized.
Instead, define another table called CATEGORIES, and use a JOIN-table to match CATEGORIES with ADVERTIZERS.
Only then you will be able to properly select it.
Hope this helps!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.