Structure of table keys from multiple other column values - php

I am creating a new database and am thinking about the structure. Earlier I have always used auto-incremented undefined small-INT. There will be plenty of searched performed on some of the tables and the key structured according to the above would seldom be known before and the search would therefore be on a non-key column.
In some tables there are other unique values and there we got no problem just putting that as key, but in some there aren't, thus I am thinking that I instead could construct the key as a put-together of two or more column values in order to create a unique string that I can later search for and ensure better performance.
I haven't heard about this key-construction before so I would like to get some input about if I am thinking correctly here. Thanks!
Here is an example that will illustrate this "put-together"-key:
mysql> CREATE TABLE example (
key VARCHAR(100),
name VARCHAR(50),
category VARCHAR(20),
country VARCHAR(30)
);
name is not unique per se but is unique per category and country (which is ensured at input). The searches will 90% of the time involve these three parameters and thus the code doing the search can put together the key and search for the table id/key. In the 10% of cases when one of the parameters are unknown the search can be made for other columns example on countrydirectly if the user wants to see all rows with country=xyz.

Yes it's completely legal to use two or more columns as a unique identifier, it's called composite key.
http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx

Related

Prevent duplicate data in a table with more than 20000 records

everyday i add almost 5000 new records in mysql and i want to prevent insert duplicate row in table,i think i should check all of the bank befor any insert operation,is it suitable?
Or there is any better way to do that??
thanks in advance
It's a good choice to prevent the data model beeing corrupted by software by applying a unique index to the field attributes which must not be duplicatable.
It's even better to ask the database for duplicate candidates before inserting data.
The best is, to have both combined. The security on the database model and the question for duplicates in the software layer because a) error handling is much more expensive than querying and b) the constraint protects the data from human failure.
mysql supports unique indexes with the CREATE UNIQUE INDEX statement.
e.g: CREATE UNIQUE INDEX IDX_FOO ON BAR(X,Y,Z);
creates a unique index on table BAR. This index will also be used when running the query for duplicates - speeds up the processing very much.
See MySQL Documentation for more details.
When you have a data integrity issue, you want the database to enforce the rules (if possible). In your case, you do this with a unique index or unique constraint, which are two names for the same thing. Here is sample syntax:
create unique index idx_table_col1_col2 on table(col1, col2)
You want to do this in the database, for three reasons:
You want the database to know that that column is unique.
You do not want a multi-threaded application to "accidentally" insert duplicate values.
You do not want to put such important checks into the application, where they might "accidentally" be removed.
MySQL then has very useful constructs to deal with duplicates, in particular, insert . . . on duplicate key update, insert ignore, and replace.
When you run SQL queries from your application, you should be checking for errors anyway, so catching duplicate key errors should be no additional burden on the application.
Firstly, any column that needs to be unique you can use the UNIQUE constraint:
CREATE TABLE IF NOT EXISTS tableName
(id SERIAL, someUniqueColumnName VARCHAR(255) NOT NULL UNIQUE);
See the MySQL Documentation for adding uniqueness to existing columns.
You need to decide what constitutes a duplicate in your table, because uniqueness is not always restricted to a single column. For instance, in a table where you store users with a corresponding id for something else, then it may be both combined which have to be unique. For that you can have PRIMARY KEY which uses two columns:
CREATE TABLE IF NOT EXISTS tableName (
id BIGINT(20) UNSIGNED NOT NULL,
pictureId BIGINT(20) UNSIGNED NOT NULL,
someOtherColumn VARCHAR(12),
PRIMARY KEY(id, pictureId));

Do indexes help in queries that don't have the indexed column in where clause?

I want to remove an index in a table whose access in php never uses the indexed column. Index takes up extra space and I am trying to trim it. It's a table of phone numbers. A phone number is linked to a user profile's id. So it has 3 columns. id (index), number and person. I was wondering if removing the index will affect the queries that use number or person in the where clause. My gut feeling is that it shouldn't but I am afraid computer science doesn't work on gut feelings. The data is accessed via joins. For example...
SELECT *
FROM people ... LEFT JOIN
phoneNumbers
ON people.id = phoneNumbers.person
Edit: Apparently no one seems to be able to answer the question in the title.
In the case you show, only the person column would benefit from an index.
Indexes help in basically four cases:
Row restriction, that is finding the rows by value instead of examining every row in the table.
Joining is a subset of row restriction, i.e. each distinct value in the first table looks up matching rows in the second table. Indexing a column that is referenced in the ON clause is done in the same way you would index a column referenced in the WHERE clause.
Sorting, to retrieve rows in index order instead of having to sort the result set as an additional step.
Distinct and Group By, to scan each distinct value in an index.
Covering index, that is when the query needs only the columns that are found in the index.
In the case of InnoDB, every table is treated as an index-organized table based on its primary key, and we should take advantage of this because primary key lookups are very efficient. So if you can redefine a primary key on your phoneNumbers.person column (in part), that would be best.
I think it is a good idea for all tables to have explicit primary keys and an index necessarily comes with these. For instance, it becomes difficult to delete rows in the table, if unwanted duplicates were to appear.
In general, indexes are used for where clauses, on clauses, and order by. If you have an id column, then foreign key references to the table should be using that column, and not the other two columns. The index might also be used for a select count(*) from table query, but I'm not 100% sure if MySQL does this.
If removing an index on a column makes that big a difference, then you should be investigating other ways to make your database more efficient. One method would be using partitioning to store different parts of the database in different files.
If the id column is an auto-incrementing integer, you have already indexed the table in the most efficient way possible. Removing it will make MySQL treat (number, person) as the table's primary key, which will cause less efficient look-ups.
Additionally, any index you create in the future will contain two columns, the first being the indexed field in the desired order, the second being the table's primary key. If you remove the id column and later decide to index the table on person, then your index will be larger than the table itself: each row would be: | person | (number, person) |.
Given that you're querying on this relationship, the person column should be indexed, and leaving the id column in place will ensure that the person index is as small and as quick as possible.
The column "id" seems useless. If I've understood you correctly, I'd
drop the "id" column,
add a primary key constraint on {person, number}, and
a foreign key reference from "person" to people.id.
I'm assuming each person can have more than one phone number.
Creating a primary key constraint has a side-effect that you might not want. It creates an internal index on the key columns.

Merge several mySQL databases with equivalent structure

I would like write a php script that merges several databases, and I would like to be sure of how to go around it before I start anything.
I have 4 databases which have the same structure and almost same data. I want to merge them without any duplicate entry while preserving (or re-linking) the foreign keys.
For example there is a db1.product table which is almost the same as db2.products so I think I would have to use LIKE comparison on name and description columns to be sure that I only insert new rows. But then, when merging the orders table I have to make sure that the productID still indicates the right product.
So I thought of 2 solutions :
Either I use for each table insert into db1.x as select * from db2.x and then make new links and check for duplicate using triggers.
Either I delete duplicate entries and update new foreign keys (after having dropped constraints) and then insert row into the main database.
Just heard of MySQL Data Compare and Toad for mySQL, could they help me to merge tables ?
Could someone indicate to me what should be the right solution ?
sorry for my english and thank you !
First thing is how are you determining whether products are the same? You mentioned LIKE comparison on name and description. You need to establish a rule what says that product is one and the same in your db1, db2 and so on.
However, let's assume that product's name and description are the attributes that define it.
ALTER TABLE products ADD UNIQUE('name', 'description');
Run this on all of your databases.
After you've done that, select one of the databases you wish to import into and run the following query:
INSERT IGNORE INTO db1.products SELECT * FROM db2.products;
Repeat for the remaining databases.
Naturally, this all fails if you can't determine how you're going to compare the products.
Note: never use reserved words for your column names such as word "name".
Firstly, good luck with this - sounds like a tricky job.
Secondly, I wouldn't do this with PHP - I'd write SQL to do the work, assuming this is a one-off migration task and not a recurring task.
As an approach, I would do the following.
Create a database with the schema you want - it sounds like each of your 4 databases have small variations in the schema. Just create the schema for now, don't worry about the data.
Create a "working" database, with the same schema, but with columns for "old" primary keys. For instance:
table ORDER
order_id int primary key auto increment
old_order_id int not null
...other columns...
table ORDER_LINE
order_line_id int primary key auto increment
old_order_line_id int not null
order_id int foreign key
...other columns...
Table by table, Insert into your working database from your first source database. Let the primary keys auto_increment, but put the original primary key into the "old_" column.
For instance:
insert into workingdb.orders
select null, order_id, ....other columns...
from db1.orders
Where you have a foreign key, populate it by finding the record in the old_ column.
For instance:
insert into workingdb.order_line
select null, ol.order_line_id, o.order_id
from db1.order_line ol,
workingdb.order
where ol.order_id = o.old_order_id
Rinse and repeat for the other databases.
Finally, copy the data from your working database into the "proper" database. This is optional - it may help to retain the old IDs for lookups etc.

which of these 2 methods is most efficient with PHP/MYSQL

I have some location data, which is in a table locations with the key being the unique location_id
I have some user data, which is in a table users with the key being the unique user_id
Two ways I was thinking of linking these two together:
I can put the 'location' in each user's data.
'SELECT user_id FROM users WHERE location = "LOCATIONID";'
//this IS NOT searching with the table's key
//this does not require an explode
//this stores 1 integer per user
I can also put the 'userIDs' as a comma delimited string of ids into each location's data.
'SELECT userIDs FROM locations WHERE location_id = "LOCATIONID";'
//this IS searching with the tables key
//this needs an explode() once the comma delimited list is retrieved
//this stores 1 string of user ids per location
so I wonder, which would be most efficient. I'm not really sure how much the size of the data stored could also impact the speed. I want retrievals that are as fast as possible when trying to find out which users are at which location.
This is just an example, and there will be many other tables like location to compare to the users, so the efficiency, or lack of, will be multiplied across the whole system.
Stick with option 1. Keep your database tables normalised as much as possible till you know you have a performance problem.
There's a whole slew of problems with option 2, including the lack of ability to then use the user ID's till you pull them into PHP and then having to fire off lots more SQL queries for each ID. This is extremely inefficient. Do as much inside MySQL as possible, the optimisations that the database layer can do while running the query will easily be a lot quicker than anything you write in PHP.
Regarding your point about not searching on the primary key, you should add an index to the location column. All columns that are in a WHERE clause should be indexed as a general rule. This negates the issue of not searching on the primary key, as the primary key is just another type of index for the purposes of performance.
Use the first one to keep your data normalized. You can then query for all users for a location directly from the database without having to go back to the database for each user.
Be sure to add the correct index on your users table too.
CREATE TABLE locations (
locationId INT PRIMARY KEY AUTO_INCREMENT
) ENGINE=INNODB;
CREATE TABLE users (
userId INT PRIMARY KEY AUTO_INCREMENT,
location INT,
INDEX ix_location (location)
) ENGINE=INNODB;
Or to only add the index
ALTER TABLE users ADD INDEX ix_location(location);
Have you heard of foreign key ?
get details from many tables tables using join .
You can use of sub query also.
As you said there are two tables users and locations.
Keep userid as foreign key in locations and fetch it based on that.
When you store the user IDs as a comma-separated list in a table, that table is not normalized (especially it violates the first normal form, item 4).
It is perfectly valid to denormalize tables for optimization purposes. But only after you have measured that this is where the bottleneck actually is in your specific situation. This, however, can only be determined if you know which query is executed how often, how long they take and whether the performance of the query is critical (in relation to other queries).
Stick with option 1 unless you know exactly why you have to denormalize your table.

In mysql has the same effect in speed terms using a UNIQUE key than a NORMAL key?

For example, I'm doing the next action:
SELECT COUNT(id)
FROM users
WHERE unique_name = 'Wiliam'
// if Wiliam don't exists then...
INSERT INTO users
SET unique_name = 'Wiliam'
The question is, I'm doing the SELECT COUNT(id) check every time I insert a new user, despite of using an unique key or not, so... if "unique_name" has an UNIQUE key it will be better for performance than using a normal key?
What you mean is a UNIQUE CONSTRAINT on the column which will be updated. Reads will be faster, Inserts will be just a bit slower. It will still be faster than your code checking first and then inserting the value though. Just let mysql do its thing and return an error to you if the value is not unique.
You didn't say what this is for, which would help. If its part of an authentication system, then why doesn't your query include the user's password as well? If it's not, a unique indexed column used to store names isn't going to work very well in a real-world system unless you are OK with having just 1 and only Wiliam in your system. (Was that supposed to be William?)
And if that name field is really unique you do not need to use COUNT(ID) in your query. If 'unique_name' is truly unique you either get an id number returned from your query or you get nothing.
You'd want something like this:
SELECT id FROM users WHERE unique_name = 'Wiliam'
No record return, no Wiliam.
An index (unique or non-unique -- I don't know what you're after here) on unique_name will improve the performance.
Your use of 'unique key' isn't very logical so I suspect you are getting confused about the nomenclature of keys, indexes, their relationships, and the purposes for them.
KEYS in a database are used to create and identify relationships between sets of data. This is what makes the 'relational' possible in a relational database.
Keys come in 2 flavors: Primary and foreign.
PRIMARY KEYS identify each row in a table. The value or values that comprise the key must be unique.
Primary keys can be made from a single column or made of several columns (in which case it is called a composite key) that together uniquely identifies the row. Again the important thing here is uniqueness.
I use MySql's auto-increment integer data type for my primary keys.
FOREIGN KEYS identify which rows in a table have a relationship with other rows in other tables. A foreign key of a record in one table is the primary key of the related record in the other table. A foreign key is not unique -- in many-to-many relationships there are by definition multiple records with the same foreign key. They should however be indexed.
INDEXES are used by the database as a sort of short-hand method to quickly look up values, as opposed to scanning the entire table or column for a match. Think of the index in the back of a book. Much easier to find something using a book's index than by flipping through the pages looking for it.
You may also want to index a non-key column for better performance when searching on that column. What column do you use frequently in a WHERE clause? Probably should index it then.
UNIQUE INDEX is an index where all the values in it must be distinct. A column with a unique index will not let you insert a duplicate value, because it would violate the unique constraint. Primary keys are unique indexes. But unique indexes do not have to be primary keys, or even a key.
Hope that helps.
[edited for brevity]
Having a unique constraint is a good thing because it prevents insertion of duplicated entries in case your program is buggy (are you missing a "for update" clause in your select statement?) or in case someone inserts data not using your application.
You should, however, not depend on it in your application for normal operation. Lets assume unique_name is an input field a user can specify. Your application should check whether the name is unique. If it is, insert it. If it was not, tell the user.
It is a bad idea to just try the insert in all cases and see if it was successful: It will create errors in the database server logs that makes it more difficult to find real errors. And it will render your current transaction useless, which may be an issue depending on the situation

Categories