I'm building a aweber-like list management system (for phone numbers, not emails).
There are campaigns. A phone number is associated with each campaign. Users can text to a number after which they will be subscribed.
I'm building "Create a New Campaign" page.
My current strategy is to create a separate table for each campaign (campaign_1,campaign_2,...,campaign_n) and store the subscriber data in it.
It's also possible to just create a single table and add a campaign_id column to it.
Each campaign is supposed to have 5k to 25k users.
Which is a better option? #1 or #2?
Option 2 makes more sense and is widely used approach.
I suppose it really depends on the amount of campaigns you're going to have. Let's give you some pros/cons:
Pros for campaign_n:
Faster queries
You can have each instance run with its own code and own database
Cons for campaign_n:
Database modifications are harder (you need to sync all tables)
You get a lot of tables
Personally I'd go for option 2 (campaign_id field), unless you have a really good reason not to.
Related
I'm trying to create a Like/Unlike system akin to Facebook's for an existing comments section of a website, and I need help in designing the system.
Currently, every product on the website has a comments section and members can post and like comments. I need to know each member has posted how many comments and each of his comments has received how many likes. Of course, I need to know who liked what comments too (partly so that I can prevent a user from liking a comment more than once) for analytical purposes.
The naive way of implementing a Like system to the current comments module is to create a new table in the database that has foreign keys to the CommentID and UserID. Then for every "like" given to a comment by a user, I would insert a row to this new table with the targeting comment ID and user ID.
While this might work, the massive amount of comments and users is going to cause this table to grow quickly and retrieving records from and doing counts on this huge table will become slow and inefficient. I can index either one of the columns, but I don't know how effective it would be. The website has over a million comments.
I'm using PHP and MySQL. For a system like this with a huge database, how should I designing a Like system so that it is more optimised and stable?
For scalability, do not include the count column in the same table with other things. This is a rare case where "vertical partitioning" is beneficial. Why? The LIKEs/UNLIKEs will come fast and furious. If the code to do the increment/decrement hits a table used for other things (such as the text of the Comment), there will be an unacceptable amount of contention between the two.
This tip is the first of many steps toward being able to scale to Facebook levels. The other tips will come, not from a free forum, but from the team of smart engineers you will have to hire to get to that level. (Hints: Sharding, Buffering, Showing Estimates, etc.)
Your main concern will be a lot of counts, so the easy thing to do is to keep a separate count in your comments table.
Then you can create a TRIGGER that increments/decrements the count based on a like/unlike.
That way you only use the big table to figure out if a user already voted.
Issue: I am working on a kind of e-commerce platform which has sellers and buyers.Now in my case a seller can also be a buyer i.e every user can buy plus sell.
So i have a single table called users.Now I want to implement a follow vendor/user feature,wherein the user can click follow and he sees all the goods listed by that vendor under his account(till he unfollows).
Now my traditional approach was to have a table that has a key and two columns to store the follower and the followed Eg:
|id | userId| vendorId So it will go horizontally as the users go on following others.But if I have a user following many people(say 100) my query may take a lot of time to select a 100 records for each user.
Question: How can I implement the follow mechanism?Is there a better approach than this?I am using PHP and Mysql.
Reasearch: I tried going through how facebook and Pinterest handle it,but that seemed a bit too bigg for me to learn now as I don't expect as many users immedeately. Do I need to use memcache to enhance the performance and avoid recurring queries?Can I use a Document Database in any sense parallel with Mysql?
I would like a simple yet powerful implementation that would scale if my userbase grows gradually to a few thousands.
Any help or insights would be very helpful.
Since, from my understanding of this scenario, a user may follow many vendors, and a vendor may have many followers, this constitutes a many<->many relationship, and thus the only normalised way to achieve this in a database schema should be through using a link table, exactly as you described.
As for the performance considerations, I wouldn't worry too much about it, since it could be indexed on userId and vendorId, the queries should be fine.
The junction table is probably the best approach but still a lot depends on your clustered index.
Table clustered with a key on the substitute key id can make adding new records a bit faster.
Table clusetered with a key (userId,vendorId) will make the queries where you look for vendors a certain user follows faster
Table clustered with a key (vendorId,userId) will make the queries where you look for users that follow a certain vendor faster
We're making the plans now, so before I start progress I want to make sure I'm handing things in the best way.
We have a products table to which we're adding a new field called 'format', which is going to be the structure of the product (bag, box, etc). There is no set values for this, users can enter whatever they like into that field, however we want to show a drop down list of all formats that the user has already entered.
There's two ways I can think of to do that: either a basic SELECT DISTINCT on the products table to get all formats the user already filled in; or a separate table that stores the formats and is linked to by the product.
Instinctively I'd like to use SELECT DISTINCT, since it would make my life easier. However, assuming a table of a billion products, which would be the best way to go?
I think i would opt for the second option (additional table + foreign key if you want to add constraint), just because of the volume and because you can have management that will merge similar product form for example.
If you decide to keep everything in one table, then build an index on the column. This should speed the processing for creating the list in the user application.
I'm somewhat agnostic about which is the best approach. Often, when designing user interfaces, you want to try out different things. Having to make database changes impedes the creative process of building the application.
On the other hand, generally when users pick things from a drop down box in the application, these "things" are excellent examples of "entities" -- and that is what tables are intended to store.
In the end, I would say do what is most convenient while developing the application. As you get closer to finalizing it, consider whether it would be better to store these things in a separate table. One of the big questions is whether you want to know all formats that have every been used, even if no user currently has them defined.
Since you are letting users enter whatever they want I would go with the 2nd option.
Create a new table and insert in there all the new 'formats' and link to the product table.
Be sure when you create the code to add the format the user typed in, check if there is an equal value on the database so you won't need to distinct them as well.
Also, keep it consistent, either by having only the first letter upprcase of each word.
I'm using MySQL as my main database for a simple "Social Network" I'm spending a few weeks on.
As with all social networks, the user requires a connection with their friends in order to make it social.
My theory was to either add another column onto my user database and name it connections. There, I would store a string of user id's separated by a comma, then split them when needed.
Another theory I had was to create a completely new table connections and use two columns "user_1" and "user_2". The database would then, when searching for friends, would perform a select looking for their id and so on.
The question is though: What would be the most efficient? If I'm to support large numbers of users, is it risky going with option 2?
Some advice would be greatly appreciated,
Thanks!
A normalized structure (option #2) is highly preferable for structuring the type of data that you describe. It will be far more efficient to query a narrow table with two integer columns than to split through an ever-growing list of IDs.
I would suggest reading about the different normalization forms: http://en.wikipedia.org/wiki/Database_normalization (see "Normal Forms")
The second approach is much better. You're creating relations between the user by using a table 'connections'. This way you can create 'n:m' relations. If you want to add some kind of connection type ('love interest', 'friend') you can easily add it in a table, but not in a string.
There's another benefit: you don't have to think about the number of connections a user has. What would you use for the connections? A varchar? A text? Do you really want to parse this mess each and every time? How do you make sure that you don't add a connection twice?
tldr;: Use a table to show relations.
Option 1 will not end well. Go with a separate table.
A separate table called connections without a doubt would be easier. having multiple values in one column defeats the purpose of a database, can you imagine searching for all friends of user1 with option 1?
MySQL can certainly deliver good performance with option 2. It's easier to select friends and to do calculations. There's a lot you can do with caching, multiple servers, load balancing and all that.
And realistically speaking: by the time you reach a large number of users, you'll be rewriting the system anyway to incorporate all the lessons you've learned along the way.
Next to your normal user table "user"(user_id/user_email/user_pwd/etc), what is the best way to go to store profile information?
Would one just add fields to the user table like "user"
(user_id/user_email/user_pwd/user_firstname/user_lastname/user_views/etc)
or create another table called "profiles"
(profile_id/user_id/user_firstname/user_lastname/user_views/etc)
or would one go for a table with property definitions and another table to store those values?
I know the last one is the most flexible, as you can add and remove fields easily.
But for a big site (50k users up) would this be fast?
Things to consider with your approaches
Storing User Profile in Users Table
This is generally going to be the fastest approach in terms of getting at the profile data, although you may have a lot of redundant data in here (columns that may not have any information in them).
Quick (especially if you only pull columns you need from the db)
Wasted Data
More difficult to work with / maintain (arguably with interfaces such as PHPMyAdmin)
Storing User Profile in User_Profile Table 1-1 relationship to users
Should still be quite quick with a join and you may eliminate some data redundancy if user profiles aren't created unless a user fills one in.
Easier to work with
Ever so slightly slower due to join (or 2nd query)
Storing User Profile as properties and values in tables
*i.e. Table to store possible options, table to store user_id, option_id and value*
No redundant data stored, all data is relevant
Most normalised method
Slower to retrieve and update data
My impression is that most websites use the 2nd method and store profile information in a second table, its common for most larger websites to de-normalize the database (twitter, facebook) to achieve greater read performance at the expense of slower write performance.
I would think that keeping the profile information in a second table is likely the way to go when you are looking at 50,000 records. For optimum performance you want to keep data that is written heavily seperated from data that is read heavy to ensure cache can work effectively.
Table with property definitions isn't the good idea. I suggest to use three tables to store data:
user(id,login,email,pwd, is_banned, expired, ...)
-- rarely changed, keep small, extremaly fast search, easy to cache, admin data
profile(id, user_id, firstname,lastname, hobby,description, motto)
--data often changed by user,...
user_stats(id,user_id,last_login,first_login,post_counter, visit_counter, comment_counter)
--counters are very often updated, dml invalidate cache
The better way to store authorisation and authentication data is LDAP.
You need way more than 3 tables. How will he store data like multiple emails, multiple addresses, multiple educational histories, multiple "looking for" relationships, etc. Each needs its own row assuming many values will be lookups like city, sex preference, school names, etc. so either normalize it fully or go the noSQL route, no point in hanging in the middle, you will lose the best of both worlds.
you can duplicate rows but it wont be good. social networks do not live with 50,000 users. either you will be successful and have millions of users or you will crash and clsoe it because to run these you need $$$ which will only come if you have a solid user base. With only 50,000 users for life investors wont invest, ad revenues wont cover the cost and you will close it. So design it like you want to be the next facebook right from day one. Think big!