I'm working on a real estate site and need to make notification mailer: when new property is inserted on a site, people who subscribed for notification in that particular country and/or area and/or city and/or particular property operation (rental, selling) will receive a notification on email. One person could subscribe for different areas, cities, etc, not only one. One person will receive only one notification a week let say if there are new properties for him, though. And I'm thinking on how better to create a mysql table for subscribers in order to easy retrieve them. Table like:
create table subscribers(
user_email varchar(255),
area_id int(4));
is a bad idea, because if there will be let say 100,000 (looking to the future) subscribers and each will subscribe for 10 areas there will be 1,000,000 rows in a table. So, I'm looking for efficient solution to do such task.
If you have additional recommendations, I will like to hear them.
Thanks in advance!
You should use a cross-reference (many-to-many) table. This will make data more normalized:
CREATE TABLE `areas` (
`id` int(10) unsigned NOT NULL auto_increment,
`name` varchar(255) NOT NULL
PRIMARY KEY (`id`)
)
CREATE TABLE `subscribers` (
`id` int(10) unsigned NOT NULL auto_increment,
`email` varchar(255) NOT NULL
PRIMARY KEY (`id`)
)
-- cross ref table
CREATE TABLE `areas_subscribers` (
`area_id` int(10) unsigned NOT NULL,
`subscriber_id` int(10) unsigned NOT NULL,
UNIQUE KEY (`area_id`,`subscriber_id`)
)
And a million rows is not a problem. Especially with a cross ref table.
there will be 1,000,000 rows in a table
So what? mySQL can handle it.
As far as I can see, the way you are doing it is perfectly fine. It's nicely normalized, I can't think of a better method.
Your table looks correct, assuming that user_email is the primary key identifying your users. If so, add to your subscribers table a PRIMARY KEY (user_email, area_id) to indicate that both fields together make up your primary key.
Your concern about duplicating e-mails has little to do with the schema design and more to do with the query you intend to run. That, of course, will depend largely on how your other data are stored, but might look something like:
SELECT DISTINCT user_email WHERE area_id IN (...)
(For a list of area_id values that have seen listings in the past week.)
That's a simple query that could be optimized and improved given the rest of your schema, but it illustrates how easy it is to avoid generating multiple e-mails despite the same person being listed multiple times.
You can make an extra table of the email addresses.
So you only store an ID in the subscriber table and not the same email address over and over again (whereas there might be some optimizations in the database anyway).
Related
First of all, I'm from Spain so I'm sorry if I made some mistakes writing. So, I have two problems. It will be better if I give context before. I am not even junior, still learning code, and I thought that it will be a good proyect to create a web page where you can add ingredients, foods with that ingredients, etc. So I decided to start learning PHP and SQL. Now I'm trying to create a database, starting with some ingredients and two kinds of rices. My 1st problem is that I don't know if I need to create a data base for that. The second and main one is that I don't have any idea about how to get this working as I want.
See, First of all I created the table for ingredients´
CREATE TABLE ingredientes(
id int(255) auto_increment not null,
ingrediente varchar(255) not null,
CONSTRAINT pk_ingredientes PRIMARY KEY(id) )ENGINE=InnoDb;
Sorry 'cause it's on spanish :/, but nothing to hard to understand.
So I add some ingredients.
Here the pic showing them
After that I created two tables, and add ingredients to them.
CREATE TABLE arroz_con_pollo(
id int(255) auto_increment not null,
ingrediente int(255) not null,
CONSTRAINT pk_arroz_con_pollo PRIMARY KEY(id),
CONSTRAINT fk_pollo_ingredientes FOREIGN KEY(ingrediente) REFERENCES ingredientes(id) )ENGINE=InnoDb;
CREATE TABLE arroz_cubana(
id int(255) auto_increment not null,
ingrediente int(255) not null,
CONSTRAINT pk_arroz_cubana PRIMARY KEY(id),
CONSTRAINT fk_cubana_ingredientes FOREIGN KEY(ingrediente) REFERENCES ingredientes(id))ENGINE=InnoDb;
Here the picture showing the ID's.
Here
So now I spend a lot of time researching and find out that I can show the names by using this command
SELECT a.id,i.ingrediente
FROM ingredientes i, arroz_cubana a
WHERE i.id = a.id;
And have something like this
At this point, everything is, more or less, working. My issue came when I want to create a data base that keep all the names (arroz con pollo, arroz cubana...) in an only table named as 'rices' to be able to choose a name, and automatically have the ingredients there, without any complication for the user. But, I literally have no idea. I've been coding for hours without any victory on that. And I haven't see anything similar on the web so, if someone tell me how to fix that issue or how to make that idea of a web to keep ingredients and foods, I'll be very greatful.
Your data structure is messed up. SQL is not designed to have a separate table for each ingredient. Instead, you want two other tables.
The first is for dishes:
CREATE TABLE dishes (
dish_id int auto_increment not null,
name varchar(255)
);
You would then insert appropriate rows into this:
INSERT INTO dishes (name)
VALUES ('arroz_on_pollo');
Then you have another table for the ingredients:
CREATE TABLE dishes_ingredients (
dish_ingredient_id int auto_increment primary key,
dish_id int not null
ingredient_id int not null,
CONSTRAINT fk_dish_ingredientes_dish FOREIGN KEY(dish_id) REFERENCES dishes(dish_id)
CONSTRAINT fk_dish_ingredientes_dish FOREIGN KEY(ingredient_id) REFERENCES ingredientes(ingredient_id)
);
Voila! New dishes are just rows in a table, so you can get the names using a SELECT.
Notes on structure:
int(255) really makes no sense. Just use int. The number in parentheses is a width for the value when printing it and 255 is a ridiculous width.
I am a fan of naming primary keys with the table name. That way, the primary key and foreign key typically have the same name.
You should not have a table per dish. Create one table "dish", that includes a column "name". Each row represents a dish. Then create a supporting table where you list the (multiple) ingredients for each dish. Look around for a tutorial on databases, this topic is too large to explain in a stackoverflow question (or several).
And so you do not need to be able to list the table names, the way you were considering. (Which is not something SQL supports directly; different databases provide non-standard ways to do it, but as explained you do not actually need such a feature.)
I made a "follow system". The DB design looks like this
CREATE TABLE IF NOT EXISTS `users_followers` (
`id` int(11) NOT NULL PRIMARY KEY AUTO_INCREMENT COMMENT 'auto incrementing USER_FOLLOWER_ID for each row unique index',
`user_id` int(11) NOT NULL COMMENT 'foriegn key to UserId column in users table',
`follower_id` int(11) NOT NULL COMMENT 'foriegn key to UserId column in users table',
`follower_since_timestamp` bigint(20) DEFAULT NULL COMMENT 'timestamp of the follow'
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1;
Before the follow system, all inputs/post/entries were visible to all. Going on a big feed/wall. Now with a couple 100 users it is hard to navigate.
Should I do it the way it was before: 1)check if I follow a user and fetch all information or should I future proof it and make an activity table?: 2)Check if i follow a user, look at the activity table and fetch the appropriate data from post table.
With the activity table it seems I can have a much bigger overview of what's going on. I could for example have a column deleted, but the data is still there in post.
Is an activity table necessary?
Just sharing my two cents, I built a system that enable researchers (well users of the system) to follow each other based on the study of interest, I did it exactly the way you proposed, it really made it easy to pull out data from the data, like for example if User A has a study interest of JavaScript, I can easily use the user_id of User A in table Activity, to pull areas of interest of User A, I hope you get what I am trying to say tho?
I am working on a project where I want to allow the end user to basically add an unlimited amount of resources when creating a hardware device listing.
In this scenario, they can store both the quantity and types of hard-drives. The hard-drive types are already stored in a MySQL Database table with all of the potential options, so they have the options to set quantity, choose the drive type (from dropdown box), and add more entries as needed.
As I don't want to create a DB with "drive1amount", "drive1typeid", "drive2amount", "drive2typeid", and so on, what would be the best way to do this?
I've seen similar questions answered with a many-to-many link table, but can't think of how I could pull this off with that.
Something like this?
CREATE TABLE `hardware` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(256) NOT NULL,
`quantity` int(11) NOT NULL,
`hardware_type_id` int(11) NOT NULL,
PRIMARY KEY (`id`),
KEY `type_id` (`hardware_type_id`),
CONSTRAINT `hardware_ibfk_1` FOREIGN KEY (`hardware_type_id`) REFERENCES `hardware_type` (`id`)
) ENGINE=InnoDB
hardware_type_id is a foreign key to your existing table
This way the table doesnt care what kind of hardware it is
Your answer relies a bit on your long term goals with this project. If you want to posses a data repository which has profiles all different types of hardware devices with their specifications i suggest you maintain a hardware table of each different types of hardware. For example you will have a harddisk table which consist of all different models and types of hardisks out there. Then you can assign a record from this specific table to the host configuration table. You can build the dataset as you go from the input from user.
If this is not clear to you let me know i will create a diagram and upload for you.
Weirdly I have done a lot of development with mySQL and never encountered some of the things I have encountered todays.
So, I have a user_items table
ID | name
---------
1 | test
I then have an item_data table
ID | item | added | info
-------------------------
1 | test | 12345 | important info
2 | test | 23456 | more recent important info
I then have an emails table
ID | added | email
1 | 12345 | old#b.com
2 | 23456 | a#b.com
3 | 23456 | b#c.com
and an emails_verified table
ID | email
-----------
1 | a#b.com
Now I appreciate the setup of these tables may not be efficient etc, but this cannot be changed, and is a lot more complex than it may seem.
What i want to do is as follows. I want to be able to search through a users items and display the associated info, as well as any emails associated, as well as displaying if the email has been verified.
user_items.name = item_data.item
item_data.added = emails.added
emails.email = emails_verified.email
So for user item 1, test. I want to be able to return its ID, its name, the most recent information, the most recent emails, and their verification status.
So I woud like to return
ID => 1
name => test
information => more recent important info
emails => array('0' => array('email' => 'a#b.com' , 'verified' => 'YES'),'1' => array('email' => 'b#c.com' , 'verified' => 'NO'))
Now I could do this with multiple queries with relative ease. My research however suggests that this is significantly more resource/time costly then using one (albeit very complex) mysql query with loads of join statements.
The reason using one query would also would be useful (I believe) is because I can then add search functionality with relative ease - adding to the query complex where statements.
To further complicated matters I am using CodeIgniter. I cannot be too picky :) so any none CI answers would still be very useful.
The code I have got thus far is as follows. It is however very much 'im not too sure what im doing'.
function test_search()
{
$this->load->database();
$this->db->select('user_items.*,item_data.*');
$this->db->select('GROUP_CONCAT( emails.email SEPARATOR "," ) AS emails', FALSE);
$this->db->select('GROUP_CONCAT( IF(emailed.email,"YES","NO") SEPARATOR "," ) AS emailed', FALSE);
$this->db->where('user_items.name','test');
$this->db->join('item_data','user_items.name = item_data.name','LEFT');
$this->db->join('emails','item_data.added = emails.added','LEFT');
$this->db->join('emailed','emails.email = emailed.email','LEFT');
$this->db->group_by('user_items.name');
$res = $this->db->get('user_items');
print_r($res->result_array());
}
Any help with this would be very much appreciated.
This is really complex sql - is this really the best way to achieve this functionality?
Thanks
UPDATE
Following on from Cryode's excellent answer.
The only thing wrong with it is that it only returns one email. By using GROUP_CONCAT however I have been able to get all emails and all email_verified statuses into a string which I can then explode with PHP.
To clarify is the subquery,
SELECT item, MAX(added) AS added
FROM item_data
GROUP BY item
essentially creating a temporary table?
Similar to that outlined here
Surely the subquery is necessary to make sure you only get one row from item_data - the most recent one?
And finally to answer the notes about the poorly designed database.
The database was designed this way as item_data is changed regularly but we want to keep historical records.
The emails are part of the item data but because there can be any number of emails, and we wanted them to be searchable we opted for a seperate table. Otherwise the emails would have to be serialized within the item_data table.
The emails_verified table is seperate as an email can be associated with more than one item.
Given that, although (clearly) complicated for querying it still seems a suitable setup..?
Thanks
FINAL UPDATE
Cryodes answer is a really useful answer relating to database architecture in general.
Having conceptualised this a little more, if we store the version id in user_items we dont need the subquery.
Because none of the data between versions is necessarily consistent we will scrap his proposed items table(for this case).
We can then get the correct version from a item_data tables
We can also get the items_version_emails rows based on the version id and from this get the respective emails from our 'emails' table.
I.E It works perfectly.
The downside of this is that when I add new version data in item_data I have to update the user_items table with the new version that has been inserted.
This is fine, but simply as a generalized point what is quicker?
I assume the reason such a setup has been suggested is that it is quicker - an extra update each time new data is added is worth it to save potentially hundreds of subqueries when lots of rows are being displayed. Especially given that we display the data more than we update it.
Just for knowledge when in future designing database architecture does anyone have any links/general guidance on what is quicker and why such that we can all make better optimized databases.
Thanks again to Cryode !!
Using your database structure, this is what I came up with:
SELECT ui.name, id.added, id.info, emails.email,
CASE WHEN ev.id IS NULL THEN 'NO' ELSE 'YES' END AS email_verified
FROM user_items AS ui
JOIN item_data AS id ON id.item = ui.name
JOIN (
SELECT item, MAX(added) AS added
FROM item_data
GROUP BY item
) AS id_b ON id_b.item = id.item AND id_b.added = id.added
JOIN emails ON emails.added = id.added
LEFT JOIN emails_verified AS ev ON ev.email = emails.email
But as others have pointed out, the database is poorly designed. This query will not perform well on a table with a lot of data, since there are no aggregate functions for this purpose. I understand that in certain situations you have little to no control over database design, but if you want to actually create the best situation, you should be emphatic to whomever can control it that it can be improved.
One of the biggest optimizations that could be made is to add the current item_data ID to the user_items table. That way the subquery to pull that wouldn't be necessary (since right now we're essentially joining item_data twice).
Converting this to CI's query builder is kind of a pain in the ass because of the sub query. Assuming you're only working with MySQL DBs, just stick with $this->db->query().
Added from your edit:
This query returns one email per row, it does not group them together. I left the CONCAT stuff out because it's one more thing that slows down your query -- your PHP can put the emails together afterwards much faster.
Yes, the subquery is that part -- a query within a query (pretty self-explanatory name :wink:). I wouldn't call it creating a temporary table, because that's something you can actually do. More like retrieving a subset of the information in the table, and using it kind of like a WHERE clause. The subquery is what finds the most recent row in your item_data table, since we have to figure it out ourselves (again, proper database design would eliminate this).
When we say you can optimize your database design, it doesn't mean you can't have it set up in a similar way. You made it sound like the DB could not be altered at all. You have the right idea as far as the overall scheme, you're just implementing it poorly.
Database Design
Here's how I would lay this out. Note that without knowing the whole extent of your project, this may need modification. May also not be 100% the best optimized on the planet -- I'm open for suggestions for improvement. Your mileage may vary.
User Items
CREATE TABLE `users_items` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`item_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Defines the relationship between a base item and a user.
Items
CREATE TABLE `items` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`item_name` varchar(50) NOT NULL DEFAULT '',
`created_on` datetime NOT NULL,
`current_version` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Your items table should have all of your items' base information -- things that will not change on a per-revision basis. Notice the current_version column -- this is where you'll store the ID from the versions table, indicating which is most recent (so we don't have to figure it out ourselves).
Items Versions (history)
CREATE TABLE `items_versions` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`item_id` int(10) unsigned NOT NULL,
`added` datetime NOT NULL,
`info` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Here is where you'd store the history of an item -- each update would create a new row here. Note that the item_id column is what ties this row to a particular base item.
Emails
CREATE TABLE `emails` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(100) NOT NULL DEFAULT '',
`verified` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Since emails can be shared between multiple products, we'll end up using what's called a many-to-many relationship. Emails can be tied to multiple products, and a product can be tied to multiple emails. Here we defined our emails, and include a verified column for whether it has been verified or not.
Item Emails
CREATE TABLE `items_versions_emails` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`version_id` int(11) NOT NULL,
`email_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Assuming the emails are tied to an item version and not the base item, this is the structure you want. Unfortunately, if you have a ton of versions and never change the email(s), this will result in a lot of repeated data. So there's room for optimization here. If you tie emails to the base item, you'll have less repeated data, but you'll lose the history. So there's options for this. But the goal is to show how to set up DB relationships, not be 100% perfect.
That should give you a good start on how to better lay out your DB structure.
Another Update
Regarding speed, inserting a new item version and then updating the related item row with the new version ID will give you much better performance than requiring a subquery to pull the latest update. You'll notice in the solution for your original structure, the item_info table is being joined twice -- once to join the most recent rows, and again to grab the rest of the data from that recent row (because of the way GROUP BY works, we can't get it in a single join). If we have the recent version ID stored already, we don't need the first join at all, which will improve your speed dramatically (along with proper indexing, but that's another lesson).
I wouldn't recommend ditching the base items table, but that's really up to you and your application's needs. Without a base item, there's no real way to track the history of that particular item. There's nothing in the versions that shows a common ancestor/history, assuming you're removing the item_id column.
I'm pretty new to php and mysql and I'm trying to put together a database that will contain customer details, assign these customers to groups, then assign promotions with unique codes to the customers on a group basis.
I've put together a simple schema http://i.imgur.com/5s2Kq.jpg would anyone be kind enough to give me some feedback, it seems pretty simple but maybe I'm missing some things those with more experience may pick up on.
Am I right in thinking those tables containing relationships with others are junction tables and are created this way:
CREATE TABLE customerPromotions (
customer_id int(11) REFERENCES customers (customer_id),
promotion_id int(11) REFERENCES promotions (promotion_id),
customerPromotions_code_code varchar(12) NOT NULL,
PRIMARY KEY (customer_id, group_id)
)
Any advice would be great, thanks.
CREATE TABLE customerPromotions (
customer_id int(11) REFERENCES customers (customer_id),
promotion_id int(11) REFERENCES promotions (promotion_id),
customerPromotions_code_code varchar(12) NOT NULL,
PRIMARY KEY (customer_id, promotion_id)
)