I'm thinking of this, if I make a web site for a specific university's students should I make the ID as standard IDs on MySQL (integer, auto increment) or should I give the IDs as how is their email address is, like if the student's email address is e12234#university.edu then his/her id is e12234 (on my web site). Is it ok, what about performance?
Edit: Also, there are such email addresses:
n13345#university.edu (exchange student)
jennajameson#university.edu (this is a professor)
I would strongly recommend a separate, independent value for the id (integer, auto increment). Id values should never change, never be updated. People change their emails all the time, corporations reissue the same email address to new users sometimes.
If an emailaddress is unique and static in your population (and make very sure it is), you may make it a primary key, and actually a full normalization would favor that option. There are however some pitfalls to consider:
People change emailaddresses once in while. What if a student becomes a professor, or is harassed on his/hers emailaddress so he/she applied for a new address and got one? The primary key shold not change, ever, so there goes your schema.
Sanitizing emailaddresses takes a little bit more effort then integers.
Depending on how many foreign keys point to this ID, needed storage space could be increased, and joining on CHARs rather then INTs could suffer in performance (you should test that though)
Generally you'd want to map strings to ids and reference the ID eveywhere
CREATE TABLE `student` (
`id` int unsigned NOT NULL auto_increment,
`email` varchar(150) NOT NULL
PRIMARY KEY (`id`)
)
This will reduce the size of any table reference the email table as it will be using an INT instead of a VARCHAR.
Also if you used part of their email and the user ever changed their email you'd have to go back through every table and update their ID.
Related
I'm new to Cassandra and I'm starting off by designing a simple user table for account registration and login purposes. This is pretty simple:
Row key: time(); Columns: email, name, password.
Regarding this simple structure, I've a question:
Here row key is a random one. How can I login by using email and password using php ?
Cassandra takes a query-based modeling approach, so you could have the same redundant, denormalized data in separate tables...and that's ok. You'll want to keep that in-mind going forward.
Registration and logging-in are actually two different things, so you're going to want to split those up. Thinking long-term about queries and access patterns, it probably makes sense to split-up user account data from credential data, because credentials can change.
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
CREATE TABLE usercredentials (
email text,
password text,
userid uuid,
PRIMARY KEY (email)
);
This way, when a user changes their password, they won't affect the overall users table. Additionally, the frequency with which most users change their emails, an occasional delete (and thus, generated tombstone) shouldn't be that big of a deal. This won't allow a SELECT * FROM usercredentials WHERE email=? AND password=? query to work, so you'll have to SELECT password FROM usercredentials WHERE email=? instead, but that removes the possibility of old passwords hanging around and causing potential issues. You could argue partitioning on email and clustering on password, but that really doesn't make sense as an email would never have more than one password at a time (although you could design an additional table like that to store password history).
To keep track of logins, I'd advise something like this:
CREATE TABLE logins (
time timestamp,
userid uuid,
email text,
PRIMARY KEY (userid, time)
);
This would key your rows by a combination of userid and time. The difference here, is that userid is the partition key, so all logins for each user would be stored together. time acts as a clustering key, so that you could perform ORDER BY operations on it. email is a payload field here, which makes sense because you can see it while also seamlessly grouping-together logins for a user who might have changed their email address. That should cover the underlying tables.
For the coding side, the CodeIgniter-PHPCassa project might be of some help to you.
A table in a MySQL database has a column for e-mail addresses. Ultimately the e-mail addresses are supposed to be unique and have valid formats. I'm having trouble deciding where the checking should be done and what checking is necessary.
Obviously SQL alone can't entirely validate an e-mail address but I was considering adding the NOT NULL constraint to prevent the submission of blank e-mail addresses. Since each e-mail address must be unique making the e-mail column a unique key seems reasonable, but just because a column is a unique key doesn't make it NOT NULL right? Since I'm probably going to be validating the e-mail address on the server using PHP I could just as well check to see if it's empty there.
A critical piece of information I'm making is does adding a unique key or a constraint make searches faster or slower?
For a column that holds e-mail addresses where there should be no duplicates and no empty strings/nulls etc. should it be made a unique key and/or given a NOT NULL constraint or something else?
I'm very novice with MySQL so code samples would be helpful. I've got phpMyAdmin if it's easier to work with.
For the unique I would use ALTER TABLE USER ADD UNIQUE INDEX(``e-mail``);
For the not null I would use ALTER TABLE user CHANGE ``e-mail`` varchar(254) NOT NULL;
Another idea I had was insert a row with a null e-mail address and then make the e-mail column unique so no other null e-mail addresses can be inserted.
Adding a unique constraint will actually make searches faster, because it will index the table by this field. Based on your description of the problem, I think your alter table statements are correct.
Fields with unique indexes can still allow nulls. nulls can never be equal to anything else, including themselves, so multiple nulls are not a violation of the uniqueness constraint. You can disallow nulls in the field by specifying it as NOT NULL, however.
A unique key is a normal field index, that simply doesn't allow multiple instances of a particular value. There will be a slight slowdown on insert/update so the key can be updated, but searches will be faster, because the index can (in some cases) be used to accelerate the search.
The answers so far are good, and I would recommend using UNIQUE KEY and NOT NULL for your application. Using UNIQUE KEY may slow down INSERT or UPDATE, but it would certainly not slow down searches.
However, one thing you should consider is that just because you use UNIQUE KEY, it does not necessarily enforce unique e-mail addresses. As an example, abc#gmail.com and a.b.c#gmail.com represent the same e-mail. If you don't want to allow this, you should normalize e-mail addresses in PHP before sending them to your database.
With MySQL you have to remember that unique index depends on the collation of your whole table (in other db you can make on upper() function).
See this link:
http://sqlfiddle.com/#!2/37386/1
Now, if you use utf8_general_ci insted of utf8_bin the index creation would fail.
I am developing a website using PHP and MySQL where users have options such as:
receive notifications via email
hide/display Facebook/Twitter links
It will also allow users to control privacy of their profile from information viewable to friends/non-friends and what friends are able to view certain photos/albums.
I was wondering how I would design this table to be robust and what things should I put into consideration. How does Facebook manage each users privacy settings and options just out of curiosity?
My solution so far is along the lines of this:
id - Primary Key
member_id - Primary Key and Foreign Key (Member tables 'id')
facebook_viewable - int(1) - 0 for No and 1 for Yes
email_notifications - int(1) - 0 for No and 1 for Yes
First off, you probably don't need to have both id and member_id if they are both going to be primary. Really you need member_id so you can just drop the other id.
To be robust, what you want is to drive settings into rows rather than columns. It is much easier from the database perspective to append rows than to alter the table to add columns.
What you need is a table that contains a list of your privacy rule types and then an intersection table between your member table and your privacy rule type table. The schema would be something like this:
MEMBER
id int not null PRIMARY KEY
, name nvarchar(50)...
, (and so forth)
PRIVACY_RULE
id int not null PRIMARY KEY
, rule_description nvarchar(50)
MEMBER_PRIVACY
member_id int not null
, privacy_rule_id int not null
, rule_value tinyint
, PRIMARY KEY (member_id, privacy_rule_id)
This way, the privacy rule ID becomes a constant in your application code that is used to enforce the actual rule programmatically and the values can be added easily to the intersection table. When you invent a new privacy rule, all you need to do is insert one record into PRIVACY_RULE and then do your application code changes. No database schema changes required.
Note that this same basic structure can be elaborated on to make it much more flexible. For example you could have many different types of values beyond a binary flag and you could control the interpretation of these values with an additional "rule type" attribute on the PRIVACY_RULE table. I don't want to go too far off topic with this so let me know if you want further detail of this aspect.
Assuming we have to log all the users activties of a community, i guess that in brief time our database will become very huge; so my question is:
is this anyway an acceptable compromise (to have a huge DB table) in order to offer this kind of service? Or we can do this in more efficent way?
EDIT:
the kind of activity to be logged is a "classic" social-networking activity-log whre people can look what others are doing or have done and viceversa, so it will track for example when user edit profile, post something, login, logout etc...
EDIT 2:
my table is already optimized in order to store only id's
log_activity_table(
id int
user int
ip varchar
event varchar #event-name
time varchar
callbacks text #some-info-from-the-triggered-event
)
Im actually working on a similar system so Im interested in the answers you get.
For my project, having a full historical accounting was not important so we chose to keep the table fairly lean much like what youre doing. Our tables look something like this:
CREATE TABLE `activity_log_entry` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`event` varchar(50) NOT NULL,
`subject` text,
`publisher_id` bigint(20) NOT NULL,
`created_at` datetime NOT NULL,
`expires_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `event_log_entry_action_idx` (`action`),
KEY `event_log_entry_publisher_id_idx` (`publisher_id`),
CONSTRAINT `event_log_entry_publisher_id_user_id`
FOREIGN KEY (`publisher_id`)
REFERENCES `user` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8
We decided that we dont want to store history forever so we will have a cron job that kills history after a certain time period. We have both created_at and expired_at columns simply out of convenience. When an event is logged these columns are updated automatically by the model and we use a simple strftime('%F %T', strtotime($expr)) where $expr is a string like '+30 days' we pull from configuration.
Our subject column is similar to your callback one. We also chose not to directly relate the subject of the activity to other tables because there is a possibility that not all event subjects will have a table, additionally its not even important to hold this relationship because the only thing we do with this event log is display activity feed messages. We store a serialized value object of data pertinent to the event for use in predetermined message templates. We also directly encode what the event pertained to (ie. profile, comment, status, etc..).
Our events (aka activities.) are simple strings like 'update','create', etc.. These are used in some queries and of course to help determine which message to display to a user.
We are still in the early stages so this may change quite a bit (possibly based on comments and answers to this question) but given our requirements it seemed like a good approach.
Case: When all user activities have different tables. Eg. Like, comment, post, become a member.
Then these table should have a key associating the entry to a user. Given a user you can get recent activities by querying each table by the user_key.
Hence if you don't have a schema yet or you are privileged to change it, go with having different tables for different activities and search multiple activities.
Case: There are some activities which are say generic and don't have individual table for it
Then have table for generic activities and search it along with other activity tables.
Do you need to store the specific activity of each user, or do you just want to log the kind of activity that is happening over time. If the latter, then you might consider something like RRDtool (or a similar approach) and store the amount of activity over different timesteps in a circular buffer, the size of which stays constant over time. See http://en.wikipedia.org/wiki/RRDtool.
I need to create a database schema for storing user information (id, name, p/w, email address ...etc). I have always picked arbitrary amounts when sizing these fields. With this said, I have two questions:
1) What are good sizes for these fields? I am sure there is a maximum email address length for example...etc.
2) I now need to store user mailing addresses for credit card purchases, including international mailing addresses. This is an area I do not want to pick arbitrary sizes.
Does anyone know of a good schema for either? Is there a project for this maybe?
Thanks!
Also consider which db engine you will use and whether the primary key will be email, rowid, or an arbitrary number. I typically save passwords on a second table called "security" using a hash as suggested above. Here's an example.
CREATE TABLE IF NOT EXISTS `users` (
`user_id` varchar(255) NOT NULL,
`active` char(1) default 'Y',
`created_date` INTEGER UNSIGNED default 0,
`email` varchar(255) default NULL,
`first_name` varchar(255) default NULL,
`last_name` varchar(255) default NULL,
`modified_date` INTEGER UNSIGNED default 0,
PRIMARY KEY (`user_id`, `active`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
I'll give you a hand with part 1. In general you shouldn't stress very much about the size of your MySQL DB fields, you don't have to get the number exactly right -- just make sure that someone with a reasonable answer doesn't get their data truncated.
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`username` varchar(255),
`email` varchar(255),
`password` char(256)
Notice that for password I have a 256bit character field instead of a varchar field. Thats because you should never store plain text passwords in a database. Instead, you should always store the password in a hashed format with some sort of unique "salt" for that password. You can find some tutorials online, and the length of the password field depends on the type of hashing you use on the password.
This is a pretty tough question to answer, because in my opinion there is a difference between what you "should" allow and what is considered allowable by the IETF.
The maximum allowable email address is 256 characters which includes a slash at the beginning and end of the email address (therefore only 254 usable characters). You can find detailed information about it on this page by Dominic Sayers.
But will any legitimate user actually have an email address that long?
As for street addresses, I don't believe that is specified anywhere, however according to the world's longest website the longest street name is 72 characters. Therefore if you made the field 100 characters you would have more than enough room for the street address.
You don't have to be really too concerned with getting everything 100% correct, you should be more concerned with the quality of the data which you decide to accept into the database (make sure it is valid/clean). Also provide clear rejection messages if someone does enter something which is simply too long -- and make sure there is an easy method for the owner of the website to be contacted if that does happen.
One thing I'd like to note, NoSQL is all the rage right now, and it uses schema-less database engines, for example MongoDB and CouchDB. It is not the best solution for everything, however if you are very concerned about having the correct schema, possibly a schema-less database might be a good option.