I have some location data, which is in a table locations with the key being the unique location_id
I have some user data, which is in a table users with the key being the unique user_id
Two ways I was thinking of linking these two together:
I can put the 'location' in each user's data.
'SELECT user_id FROM users WHERE location = "LOCATIONID";'
//this IS NOT searching with the table's key
//this does not require an explode
//this stores 1 integer per user
I can also put the 'userIDs' as a comma delimited string of ids into each location's data.
'SELECT userIDs FROM locations WHERE location_id = "LOCATIONID";'
//this IS searching with the tables key
//this needs an explode() once the comma delimited list is retrieved
//this stores 1 string of user ids per location
so I wonder, which would be most efficient. I'm not really sure how much the size of the data stored could also impact the speed. I want retrievals that are as fast as possible when trying to find out which users are at which location.
This is just an example, and there will be many other tables like location to compare to the users, so the efficiency, or lack of, will be multiplied across the whole system.
Stick with option 1. Keep your database tables normalised as much as possible till you know you have a performance problem.
There's a whole slew of problems with option 2, including the lack of ability to then use the user ID's till you pull them into PHP and then having to fire off lots more SQL queries for each ID. This is extremely inefficient. Do as much inside MySQL as possible, the optimisations that the database layer can do while running the query will easily be a lot quicker than anything you write in PHP.
Regarding your point about not searching on the primary key, you should add an index to the location column. All columns that are in a WHERE clause should be indexed as a general rule. This negates the issue of not searching on the primary key, as the primary key is just another type of index for the purposes of performance.
Use the first one to keep your data normalized. You can then query for all users for a location directly from the database without having to go back to the database for each user.
Be sure to add the correct index on your users table too.
CREATE TABLE locations (
locationId INT PRIMARY KEY AUTO_INCREMENT
) ENGINE=INNODB;
CREATE TABLE users (
userId INT PRIMARY KEY AUTO_INCREMENT,
location INT,
INDEX ix_location (location)
) ENGINE=INNODB;
Or to only add the index
ALTER TABLE users ADD INDEX ix_location(location);
Have you heard of foreign key ?
get details from many tables tables using join .
You can use of sub query also.
As you said there are two tables users and locations.
Keep userid as foreign key in locations and fetch it based on that.
When you store the user IDs as a comma-separated list in a table, that table is not normalized (especially it violates the first normal form, item 4).
It is perfectly valid to denormalize tables for optimization purposes. But only after you have measured that this is where the bottleneck actually is in your specific situation. This, however, can only be determined if you know which query is executed how often, how long they take and whether the performance of the query is critical (in relation to other queries).
Stick with option 1 unless you know exactly why you have to denormalize your table.
Related
I am developing a MySQL db for a user list, and I am trying to determine the most efficient way to design it.
My issue comes in that there are 3 types of users: "general", "normal", and "super". General and normal users differ only in the values of certain columns, so the schema to store them is identical. However, super users have at least 4 extra columns of info that needs to be stored.
In addition, each user needs a unique user_id for reference from other parts of the site.
So, I can keep all 3 users in the same table, but then I would have a lot of NULL values stored for the general and normal user rows.
Or, I can split the users into 2 tables: general/normal and super. This would get rid of the abundance of NULLs, but would require a lot more work to keep track of the user_ids and ensure they are unique, as I would have to handle that in my PHP instead of just doing a SERIAL column in the single table solution above.
Which solution is more efficient in terms of memory usage and performance?
Or is there another, better solution I am not seeing?
Thanks!
If each user needs a unique id, then you have the answer to your question: You want one users table with a UserId column. Often, that column would be an auto-incremented integer primary key column -- a good approach to the implementation.
What to do about the other columns? This depends on a number different factors, which are not well explained in your question.
You can store all the columns in the same table. In fact, you could then implement views so you can see users of only one type. However, if a lot of the extra columns are fixed-width (such as numbers) then space is still allocated. Whether or not this is an issue is simply a question of the nature of the columns and the relative numbers of different users.
You can also store the extra columns for each type in its own table. This would have a foreign key relationship to the original table, using the UserId. If both these keys are primary keys, then the joins should be very fast.
There are more exotic possibilities as well. If the columns do not need to be indexed, then MySQL 5.7 has support for JSON, so they could all go into one column. Some databases (particularly columnar-oriented ones) allows "vertical partitioning" where different columns in a single table are stored in separate allocation units. MySQL does not (yet) support vertical partitioning.
why not build an extra table; but only for the extra coloumns you need for super users? so 2 tables one with all the users and one with super users's extra info
If you want to have this type of schema. try to create a relation
like:
tb_user > user_id , user_type_id(int)
tb_user_type > user_type_id(int) , type_name
this way you will have just 2 tables and if the type is not set you can set a default value to a user.
I want to remove an index in a table whose access in php never uses the indexed column. Index takes up extra space and I am trying to trim it. It's a table of phone numbers. A phone number is linked to a user profile's id. So it has 3 columns. id (index), number and person. I was wondering if removing the index will affect the queries that use number or person in the where clause. My gut feeling is that it shouldn't but I am afraid computer science doesn't work on gut feelings. The data is accessed via joins. For example...
SELECT *
FROM people ... LEFT JOIN
phoneNumbers
ON people.id = phoneNumbers.person
Edit: Apparently no one seems to be able to answer the question in the title.
In the case you show, only the person column would benefit from an index.
Indexes help in basically four cases:
Row restriction, that is finding the rows by value instead of examining every row in the table.
Joining is a subset of row restriction, i.e. each distinct value in the first table looks up matching rows in the second table. Indexing a column that is referenced in the ON clause is done in the same way you would index a column referenced in the WHERE clause.
Sorting, to retrieve rows in index order instead of having to sort the result set as an additional step.
Distinct and Group By, to scan each distinct value in an index.
Covering index, that is when the query needs only the columns that are found in the index.
In the case of InnoDB, every table is treated as an index-organized table based on its primary key, and we should take advantage of this because primary key lookups are very efficient. So if you can redefine a primary key on your phoneNumbers.person column (in part), that would be best.
I think it is a good idea for all tables to have explicit primary keys and an index necessarily comes with these. For instance, it becomes difficult to delete rows in the table, if unwanted duplicates were to appear.
In general, indexes are used for where clauses, on clauses, and order by. If you have an id column, then foreign key references to the table should be using that column, and not the other two columns. The index might also be used for a select count(*) from table query, but I'm not 100% sure if MySQL does this.
If removing an index on a column makes that big a difference, then you should be investigating other ways to make your database more efficient. One method would be using partitioning to store different parts of the database in different files.
If the id column is an auto-incrementing integer, you have already indexed the table in the most efficient way possible. Removing it will make MySQL treat (number, person) as the table's primary key, which will cause less efficient look-ups.
Additionally, any index you create in the future will contain two columns, the first being the indexed field in the desired order, the second being the table's primary key. If you remove the id column and later decide to index the table on person, then your index will be larger than the table itself: each row would be: | person | (number, person) |.
Given that you're querying on this relationship, the person column should be indexed, and leaving the id column in place will ensure that the person index is as small and as quick as possible.
The column "id" seems useless. If I've understood you correctly, I'd
drop the "id" column,
add a primary key constraint on {person, number}, and
a foreign key reference from "person" to people.id.
I'm assuming each person can have more than one phone number.
Creating a primary key constraint has a side-effect that you might not want. It creates an internal index on the key columns.
I have a MySQL/PHP performance related question.
I need to store an index list associated with each record in a table. Each list contains 1000 indices. I need to be able to quickly access any index value in the list associated to a given record. I am not sure about the best way to go. I've thought of the following ways and would like your input on them:
Store the list in a string as a comma separated value list or using JSON. Probably terrible performance since I need to extract the whole list out of the DB to PHP only to retrieve a single value. Parsing the string won't exactly be fast either... I can store a number of expanded lists in a Least Rencently Used cache on the PHP side to reduce load.
Make a list table with 1001 columns that will store the list and its primary key. I'm not sure how costly this is regarding storage? This also feels like abusing the system. And then, what if I need to store 100000 indices?
Only store with SQL the name of the binary file containing my indices and perform a fopen(); fseek(); fread(); fclose() cycle for each access? Not sure how the system filesystem cache will react to that. If it goes badly then there are many solutions available to adress the issues... but that's sounds a bit overkill no?
What do you think of that?
What about a good old one-to-many relationship?
records
-------
id int
record ...
indices
-------
record_id int
index varchar
Then:
SELECT *
FROM records
LEFT JOIN indices
ON records.id = indices.record_id
WHERE indices.index = 'foo'
The standard solution is to create another table, with one row per (record, index), and add a MySQL Index to allow fast search
CREATE TABLE IF NOT EXISTS `table_list` (
`IDrecord` int(11) NOT NULL,
`item` int(11) NOT NULL,
KEY `IDrecord` (`IDrecord`)
)
Change the item's type according to your needs - I used int in my example.
The most logical solution would be to put each value in it's own tuple. Adding a MYSQL index to each tuple will enable the DBMS to quickly ascertain the value, and should improve performance.
The reasons we're not going with your other answers are as follows:
Option 1
Storing multiple values in one MYSQL cell is a violation of the first stage of database normalisation. You can read up on it here.
Option 3
This has heavy reliance on other files. You want to localize your data storage as much as possible, to make it easier to maintain in the future.
What is the purpose of the Secondary key? Say I have a table that logs down all the check-ins (similar to Foursquare), with columns id, user_id, location_id, post, time, and there can be millions of rows, many people have stated to use secondary keys to speed up the process.
Why does this work? And should both user_id and location_id be secondary keys?
I'm using mySQL btw...
Edit: There will be a page that lists/calculates all the check-ins for a particular user, and another page that lists all the users who has checked-in to a particular location
mySQL Query
Type 1
SELECT location_id FROM checkin WHERE user_id = 1234
SELECT user_id FROM checkin WHERE location_id = 4321
Type 2
SELECT COUNT(location_id) as num_users FROM checkin
SELECT COUNT(user_id) as num_checkins FROM checkin
The key (also called index) is for speeding up queries. If you want to see all checkins for a given user, you need a key on user_id field. If you want to see all checking for a given location, you need index on location_id field. You can read more at mysql documentation
I want to comment on your question and your examples.
Let me just suggest strongly to you that since you are using MySQL you make sure that your tables are using the innodb engine type for many reasons you can research on your own.
One important feature of InnoDB is that you have referential integrity. What does that mean? In your checkin table, you have a foreign key of user_id which is the primary key of the user table. With referential integrity, MySQL will not let you insert a row with a user_id that doesn't exist in the user table. Using MyISAM, you can. That alone should be enough to make you want to use the innodb engine.
To your question about keys/indexes, essentially when a table is defined and a key is declared for a column or some combination of columns, mysql will create an index.
Indexes are essential for performance as a table grows with the insert of rows.
All relational databases and Document databases depend on an implementation of BTree indexing. What Btree's are very good for, is finding an item (or not) using a predictable number of lookups. So when people talk about the performance of a relational database the essential building block of that is use of btree indexes, which are created via KEY statements or with alter table or create index statements.
To understand why this is, imagine that your user table was simply a text file, with one line per row, perhaps separated by commas. As you add a row, a new line in the text file gets added at the bottom.
Eventually you get to the point that you have 10,000 lines in the file.
Now you want to find out if you entered a line for one particular person with the last name of Smith. How can you find that out?
Without any sort of sortation of the file, or a separate index, you have but one option and that is to start at the first line in the file and scan through every line in the table looking for a match. Even if you found a Smith, that might not be the only 'Smith' in the table, so you have to read the entire file from top to bottom every time you want do do this search.
Obviously as the table grows the performance of searching gets worse and worse.
In relational database parlance, this is known as a "table scan". The database has to start at the first row and scan through reading every row until it gets to the end.
Without indexes, relational databases still work, but they are highly dependent on IO performance.
With a Btree index, the rows you want to find are found in the index first. The indexes have a pointer directly to the data you want, so the table no longer needs to be scanned, but instead the individual data pages required are read. This is how a database can maintain adequate performance even when there are millions or 10's or 100's of millions of rows.
To really start to gain insight into how mysql works, you need to get familiar with EXPLAIN EXTENDED ... and start looking at the explain plans for queries. Simple ones like those you've provided will have simple plans that show you how many rows are being examined to get a result and whether or not they are using one or more indexes.
For your summary queries, indexes are not helpful because you are doing a COUNT(). The table will need to be scanned when you have no other criteria constraining the search.
I did notice what looks like a mistake in your summary queries. Just based on your labels, I would think that these are the right queries to get what you would want given your column alias names.
SELECT COUNT(DISTINCT user_id) as num_users FROM checkin
SELECT COUNT(*) as num_checkins FROM checkin
This is yet another reason to use InnoDB, which when properly configured has a data cache (innodb buffer pool) similar to other rdbms's like oracle and sql server. MyISAM doesn't cache data at all, so if you are repeatedly querying the same sorts of queries that might require a lot of IO, MySQL will have to do all that data reading work over and over, whereas with InnoDB, that data could very well be sitting in cache memory and have the result returned without having to go back and read from storage.
Primary vs Secondary
There really is no such concept internally. A Primary key is special because it allows the database to find one single row. Primary keys must be unique, and to reflect that, the associated Btree index is unique, which simply means that it will not allow you to have 2 keys with the same data to exist in the index.
Whether or not an index is unique is an excellent tool that allows you to maintain the consistency of your database in many other cases. Let's say you have an 'employee' table with the SS_Number column to store social security #. It makes sense to have an index on that column if you want the system to support finding an employee by SS number. Without an index, you will tablescan. But you also want to have that index be unique, so that once an employee with a SS# is inserted, there is no way the database will let you enter a duplicate employee with the same SS#.
But to demystify this for you, when you declare keys these indexes are just being created for you and used automagically in most cases, when you define the tables.
It's when you aren't dealing with keys (primary or foreign) as in the example of usernames, first, last & last names, ss#'s etc., that you need to also be aware of how to create an index because you are searching (using where clause criteria) on one or more columns that aren't keys.
For example, I'm doing the next action:
SELECT COUNT(id)
FROM users
WHERE unique_name = 'Wiliam'
// if Wiliam don't exists then...
INSERT INTO users
SET unique_name = 'Wiliam'
The question is, I'm doing the SELECT COUNT(id) check every time I insert a new user, despite of using an unique key or not, so... if "unique_name" has an UNIQUE key it will be better for performance than using a normal key?
What you mean is a UNIQUE CONSTRAINT on the column which will be updated. Reads will be faster, Inserts will be just a bit slower. It will still be faster than your code checking first and then inserting the value though. Just let mysql do its thing and return an error to you if the value is not unique.
You didn't say what this is for, which would help. If its part of an authentication system, then why doesn't your query include the user's password as well? If it's not, a unique indexed column used to store names isn't going to work very well in a real-world system unless you are OK with having just 1 and only Wiliam in your system. (Was that supposed to be William?)
And if that name field is really unique you do not need to use COUNT(ID) in your query. If 'unique_name' is truly unique you either get an id number returned from your query or you get nothing.
You'd want something like this:
SELECT id FROM users WHERE unique_name = 'Wiliam'
No record return, no Wiliam.
An index (unique or non-unique -- I don't know what you're after here) on unique_name will improve the performance.
Your use of 'unique key' isn't very logical so I suspect you are getting confused about the nomenclature of keys, indexes, their relationships, and the purposes for them.
KEYS in a database are used to create and identify relationships between sets of data. This is what makes the 'relational' possible in a relational database.
Keys come in 2 flavors: Primary and foreign.
PRIMARY KEYS identify each row in a table. The value or values that comprise the key must be unique.
Primary keys can be made from a single column or made of several columns (in which case it is called a composite key) that together uniquely identifies the row. Again the important thing here is uniqueness.
I use MySql's auto-increment integer data type for my primary keys.
FOREIGN KEYS identify which rows in a table have a relationship with other rows in other tables. A foreign key of a record in one table is the primary key of the related record in the other table. A foreign key is not unique -- in many-to-many relationships there are by definition multiple records with the same foreign key. They should however be indexed.
INDEXES are used by the database as a sort of short-hand method to quickly look up values, as opposed to scanning the entire table or column for a match. Think of the index in the back of a book. Much easier to find something using a book's index than by flipping through the pages looking for it.
You may also want to index a non-key column for better performance when searching on that column. What column do you use frequently in a WHERE clause? Probably should index it then.
UNIQUE INDEX is an index where all the values in it must be distinct. A column with a unique index will not let you insert a duplicate value, because it would violate the unique constraint. Primary keys are unique indexes. But unique indexes do not have to be primary keys, or even a key.
Hope that helps.
[edited for brevity]
Having a unique constraint is a good thing because it prevents insertion of duplicated entries in case your program is buggy (are you missing a "for update" clause in your select statement?) or in case someone inserts data not using your application.
You should, however, not depend on it in your application for normal operation. Lets assume unique_name is an input field a user can specify. Your application should check whether the name is unique. If it is, insert it. If it was not, tell the user.
It is a bad idea to just try the insert in all cases and see if it was successful: It will create errors in the database server logs that makes it more difficult to find real errors. And it will render your current transaction useless, which may be an issue depending on the situation