MySQL primary keys with two ID fields - php

I have a MySQL table, people, that looks like this:
id | object_id | name | sex | published
----------------------------------------------
1 | 1 | fred | male | [timestamp]
2 | 2 | john | male | [timestamp]
The reason I have two ids is that in my CRUD app the user might edit an existing object, in which case it becomes a draft, so that I have two rows (the draft record and the already-existing record) with the same object_id, something like this:
id | object_id | name | sex | published
----------------------------------------------
2 | 2 | john | male | [timestamp]
3 | 2 | john | female | NULL
This allows me to keep track of records' drafts and publication status. When the row with id of 3 is published, its published field will be stamped and the already published row deleted.
Each person also has a job history, so I have a table history:
id | person_object_id | job
----------------------------------
1 | 2 | dev
2 | 2 | accountant
This is John's job history. I refer to John's object_id in the person_object_id field, because if I refered to his id I'd risk delinking the two tables if I deleted one of the John rows as in my example above.
So my question is: is it not inefficient to refer to a table, as I do above, using a non-primary key (object_id instead of id)? How can I refer to a primary key when I require a non-unique id to keep track of drafts/published rows?

It looks like you want to keep versions of your data and you've come across the age-old problem of how to maintain foreign key pointers to versioned data. The solution is actually easy and it turns out that it is a special case of second normal form.
Take the following employee data:
EmpNo FirstName LastName Birthdate HireDate Payrate DeptNo
Now you are tasked with maintaining versions of the data as it changes. You could then add a date field which shows when the data changed:
EmpNo EffDate FirstName LastName Birthdate HireDate Payrate DeptNo
The Effective Date field shows the date each particular row took effect.
But the problem is that EmpNo, which was a perfect primary key for the table, can no longer serve that purpose. Now there can be many entries for each employee and, unless we want to assign a new employee number every time an employee's data is updated, we have to find another key field or fields.
One obvious solution is to make the combination of EmpNo and the new EffDate field be the primary key.
Ok, that solves the PK problem, but now what about any foreign keys in other tables that refer to specific employees? Can we add the EffDate field to those tables, also?
Well, sure, we can. But that means that the foreign keys, instead of referring to one specific employee, are now referring to one specific version of one specific employee. Not, as they say, nominal.
Many schemes have been implemented to solve this problem (see the Wikipedia entry for "Slowly Changing Dimension" for a list of a few of the more popular).
Here's a simple solution that allows you to version your data and leave foreign key references alone.
First, we realize that not all data is ever going to change and so will never be updated. In our example tuple, this static data is EmpNo, FirstName, Birthdate, HireDate. The data that is liable to change then, is LastName, Payrate, DeptNo.
But this means that the static data, like FirstName is dependent on EmpNo -- the original PK. Changeable or dynamic data, like LastName (which can change due to marriage or adoption) is dependent on EmpNo and EffDate. Our tuple is no longer in second normal form!
So we normalize. We know how to do this, right? With our eyes closed. The point is, when we are finished, we have a main entity table with one and only one row for each entity definition. All the foreign keys can refer to this table to the one specific employee -- the same as when we've normalized for any other reason. But now we also have a version table with all the data that is liable to change from time to time.
Now we have two tuples (at least two -- there could have been other normalization processes performed) to represent our employee entity.
EmpNo(PK) FirstName Birthdate HireDate
===== ========= ========== ==========
1001 Fred 1990-01-01 2010-01-01
EmpNo(PK) EffDate(PK) LastName Payrate DeptNo
===== ======== ======== ======= ======
1001 2010-01-01 Smith 15.00 Shipping
1001 2010-07-01 Smith 16.00 IT
The query to reconstruct the original tuple with all the versioned data is simple:
select e.EmpNo, e.FirstName, v.LastName, e.Birthdate, e.Hiredate, v.Payrate, v.DeptNo
from Employees e
join Emp_Versions v
on v.EmpNo = e.EmpNo;
The query to reconstruct the original tuple with only the most current data is not terribly complicated:
select e.EmpNo, e.FirstName, v.LastName, e.Birthdate, e.Hiredate, v.Payrate, v.DeptNo
from Employees e
join Emp_Versions v
on v.EmpNo = e.EmpNo
and v.EffDate =(
select Max( EffDate )
from Emp_Versions
where EmpNo = v.EmpNo );
Don't let the subquery scare you. A careful examination shows that it locates the desired version row with an index seek instead of the scan that most other methods will generate. Try it -- it's fast (though, of course, mileage may vary across different DBMSs).
But here's where it gets really good. Suppose you wanted to see what the data looked like on a particular date. What would that query look like? Just take the query above and make a small addition:
select e.EmpNo, e.FirstName, v.LastName, e.Birthdate, e.Hiredate, v.Payrate, v.DeptNo
from Employees e
join Emp_Versions v
on v.EmpNo = e.EmpNo
and v.EffDate =(
select Max( EffDate )
from Emp_Versions
where EmpNo = v.EmpNo
and EffDate <= :DateOfInterest ); --> Just this difference
That last line makes it possible to "go back in time" to see what the data looked like at any specific time in the past. And, if DateOfInterest is the current system time, it returns the current data. This means that the query to see current data and the query to see past data are, in fact, the same query.

It doesn't really matter as long as you have an index on that column (not-unique index). Than it would be almost as fast

Related

mysql like query exclude numbers

I have a small problem with a php mysql query, I am looking for help.
I have a family tree table, where I am storing for each person his/her ancestors id separated by a comma. like so
id ancestors
10 1,3,4,5
So the person of id 10 is fathered by id 5 who is fathered by id 4 who is fathered by 3 etc...
Now I wish to select all the people who have id x in their ancestors, so the query will be something like:
select * from people where ancestors like '%x%'
Now this would work fine except, if id x is lets say 2, and a record has an ancestor id 32, this like query will retrieve 32 because 32 contains 2. And if I use '%,x,%' (include commas) the query will ignore the records whose ancestor x is on either edge(left or right) of the column. It will also ignore the records whose x is the only ancestor since no commas are present.
So in short, I need a like query that looks up an expression that either is surrounded by commas or not surrounded by anything. Or a query that gets the regular expression provided that no numbers are around. And I need it as efficient as possible (I suck at writing regular expressions)
Thank you.
Edit: Okay guys, help me come up with a better schema.
You are not storing your data in a proper way. Anyway, if you still want to use this schema you should use FIND_IN_SET instead of LIKE to avoid undesired results.
SELECT *
FROM mytable
WHERE FIND_IN_SET(2, ancestors) <> 0
You should consider redesigning your database structure. Add new table "ancestors" to database with columns:
id id_person ancestor
1 10 1
2 10 3
3 10 4
After -- use JOIN query with "WHERE IN" to choose right rows.
You're having this issue because of wrong design of database.First DBMS based db's aren't meant for this kind of data,graph based db's are more likely to fit for this kind of solution.
if it contain small amount of data you could use mysql but still the design is still wrong,if you only care about their 'father' then just add a column to person (or what ever you call it) table. if its null - has no father/unknown otherwise - contains (int) of his parent.
In case you need more then just 'father' relationship you could use a pivot table to contain two persons relationship but thats not a simple task to do.
There are a few established ways of storing hierarchical data in RDBMS. I've found this slideshow to be very helpful in the past:
Models for Hierarchical Design
Since the data deals with ancestry - and therefore you wouldn't expect it to change that often - a closure table could fit the bill.
Whatever model you choose, be sure to look around and see if someone else has already implemented it.
You could store your values as a JSON Array
id | ancestors
10 | {"1","3","4","5"}
and then query as follows:
$query = 'select * from people where ancestors like \'%"x"%\'';
Better is of course using a mapping table for your many-to-many relation
You can do this with regexp:
SELECT * FROM mytable WHERE name REGEXP ',?(x),?'
where x is your searched value
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,ancestors VARCHAR(250) NOT NULL
);
INSERT INTO my_table VALUES(10,',1,3,4,5');
SELECT *
FROM my_table
WHERE CONCAT(ancestors,',') LIKE '%,5,%';
+----+-----------+
| id | ancestors |
+----+-----------+
| 10 | ,1,3,4,5 |
+----+-----------+
SELECT *
FROM my_table
WHERE CONCAT(ancestors,',') LIKE '%,4,%';
+----+-----------+
| id | ancestors |
+----+-----------+
| 10 | ,1,3,4,5 |
+----+-----------+

Creating tables on registration in php

The question is not new in any way but it has a small twist to it.
My webpage is a membership page where users places bets. My idea is to create a new table for the users(with a naming convention like TABLE userBet+$userid) bets. User login information is already handled, my goal is now to save the bets of the user to a new table. A table which is created when users register. This will hopefully make score counting easier. Am I right or wrong? Could this be done in a better way? (Everything is done in PHP MySQL)
User registers -> Table for bets get created
"CREATE Table $userID ,id_bet, games, result, points"
And then matching this table against the correct result?
So again my questions: Is this a good way to do it? Is creating a table with the userID a smart thing to do?
EDIT
The bets is always 40 matches, which makes the tables Huge with columns and rows.
Should I make 40 Tables, one for each games instead? and put all users in there?
Am I right or wrong?
You are wrong. Dynamically altering your database schema will only make it harder to work with. There's no advantage you gain from doing so. You can do the same things by storing all bets within the same table, adding a column userid.
Posting as an answer due to author's request : )
Suggested database schema:
table matches:
id | name |
---------------
1 | A vs B |
table user_bets
id | user_id | match_id | points | result |
-------------------------------------------
1 | X | 1 | Y | Z |
Where match_id is related on matches.id
user_id = user.id
user_bets is only one table, containing all the info. No need of separate tables, as it was clear from the comments it's considered bad practice to alter the db schema via user input.

Check if an id exists in a table before adding it to another table

I'm doing a small thing like the like feature you see on facebook. So the way I'm doing it is like this. I have a table called products which contains products that people can like.
Like this (stripped down):
id | prodName | status (0=clear, 1=blocked)
----------------------------------------------------------
1 | Philips Food Processor | 0
2 | Le Sharp Knife | 0
3 | Ye Cool Fridge | 0
Then comes the `likes` table like this:
id | prodName | prodId | userId
--------------------------------------------
1 | Philips Food Processor | 1 | 1
2 | Le Sharp Knife | 2 | 1
3 | Ye Cool Fridge | 3 | 1
4 | Ye Cool Fridge | 3 | 2
I need to check, before adding to the likes table, if a product with that id actually actually exists in the products table and its status = 0. I currently do this with a lot of php code. What would be a good way to do this using sql? Is it possible? Using foreign keys or something like that?
I'm using innodb table type.
You can do a conditional insert. For product 6 and user 7:
insert into Likes
(prodName, prodId, userId)
select prodName
, id
, 7
from Products
where id = 6
and status = 0
If this inserts no rows, you know that the product did not exist with status 0.
If you just want to phrase the insert so it follows the rules, then you can use insert . . . select as follows:
insert into likes(prodId, userId)
select <prodid>, <userid>
from products p
where p.prodid = <prodid> and status = 0
I don't think MySQL supports "partial" foreign key constraints, where you can also include the requirement on the flag.
And, you shouldn't put the product name int he likes table. You should look it up in the products table.
The key element of trying to add something to the likes table that does not exist in the product table is the feedback to the user that lets them know they're doing it wrong. Any answer you determine on should not ignore the user feedback side of things - which is basically going to require your PHP code.
However, yes - there is a way to do it via foreign keys. You can index the prodid in the second table, and reference it as a foreign key to the first table.id. This means that if you try an insert and you get an error, there's a chance that the problem is that you're trying to add something without a match in the first table.
However, trying to determine precisely what the error is so you can determine the proper logic to respond to that error causes its own mass of php code, and is less easily transparent for future developers to maintain. I'd suggest a simple method in your Product object: isValid( id ) that returns true/false - so your 'check for this' code simply goes if( Product.isValid( prodId ) ){ Like.insert( userId, prodId ); }
But at the same time, I'd REALLY recommend a foreign key constraint along with the php code you're probably already using, just as insurance against your database becoming cluttered with unlinked rows. It's usually best to have multiple barriers against bad data.
Additionally ... is there a reason why you're storing the product names both in the product table AND in the likes table? I don't see why you'd need it in the likes table.
--Check to see if cleared product exist in products table
Select count(*) from products p where p.status = 0 and p.id = %IDVALUE
--Check if your user previous liked product
Select count(*) from products p, likes l where p.id = l.prodId and l.userId = %USERID
In your code you can execute the statements (replace %IDVALUE and %USERID with actual values) and check the return column to get the count and preform your custom logic.
Currently you require the prodId to populate the likes table, hence you need to lookup the data regardless of the contraint regarding blocked. Hence:
INSERT INTO likes (prodname, prodId, userId)
SELECT prodname, id, 123456
FROM products
WHERE prodname='Le Sharp Knife'
AND status=0;
(just substitute 123456 and 'Le Sharp Knife' for the parameters you need).
Yuo need to query database to check record,
for example you product id is 2 so your query would be something like
$query = select * from 'your-like-table' where 'prodId ' = 'ID';
then
if ( !mysql_query('your-db',$query)):
if you come under this condition then it's the time when you enter your like to database
endif;
hope it helps

Rating System in PHP and MySQL

If we look at the stackoverflow website we have votes. But the question is what is the bestway to store who has voted and who has not. Lets also simplify this even more and say that we can only vote Up, and we can only Remove the Up vote.
I was thinking having the table to be in such form
question - Id(INT) | userId(INT) | title(TEXT) | vote(INT) | ratedBy(TEXT)
Thre rest is self explanitory but ratedBy is a Comma Seperated Id values of the Users.
I was thinking to read the ratedBy and compare it with the userId of the current logged in User. If he dosent exist in the ratedBy he can vote Up, otherwise he can remove his vote. Which in turn will remove the value from ratedBy
I think to make another table "vote" is better. The relationship between users and votes is n to n, therefore a new table should be created. It should be something like this:
question id (int) | user id (int) | permanent (bool) | timestamp (datetime)
Permanent field can be used to make votes stay after a given time, as SO does.
Other fields may be added according to desired features.
As each row will take at least 16B, you can have up to 250M rows in the table before the table uses 4GB (fat32 limit if there is one archive per table, which is the case for MyISAM and InnoDB).
Also, as Matthew Scharley points out in a comment, don't load all votes at once into memory (as fetching all the table in a resultset). You can always use LIMIT clause to narrow your query results.
A new table:
Article ID | User ID | Rating
Where Article ID and User ID make up the composite key, and rating would be 1, indicating upvote, -1 for a downvote and 0 for a removed vote (or just remove the row).
I believe your design won't be able to scale for large numbers of voters.
The typical thing to do is to create to tables
Table 1: question - Id(INT) | userId(INT) | title(TEXT)
Table 2: question - ID(INT) | vote(INT) | ratedBy(TEXT)
Then you can count the votes with a query like this:
SELECT t1.question_Id, t1.userId, t1.title, t2.sum(vote)
FROM table1 t1
LEFT JOIN table2 t2 ON t1.question_id = t2.question_id

Multiple users table VS 1 users table?

I am in dilemma situation. I am not sure if its a good idea to separate the users table. I notice my game highscores table performances, as the numbers growing, the loading is getting slower and slower.
My current users table store all users, which currently about 10k users. I am thinking of splitting the users table (for future) into like this:
Login Table => store user login details
==========================================
= id | username | password | tableid =
==========================================
= 1 | user1 | user1xx | 1 =
= 2 | user2 | user2xx | 1 =
...
= 20k1 | user20k1 | user20k1 | 2 =
etc
Users Data
==========================================
= id | money | items | preferences =
==========================================
= 1 | xx | xx | xx =
= 2 | xx | xx | xx =
...
= 20k1 | xx | xx | xx =
etc
So, when I try to get users data I just LEFT JOIN query to get the data.
My question is, are there any differences (speed, performances etc) between storing users data in multiple tables and storing users data in single table? (assume indexes and primary key are the same)
My current tables indexes:
Games highscores table => columns: id, gameid, name, score, date
Primary key : id
Indexes: gameid
Login Table => Columns: id, username, password
Primary key: id (userid)
Indexes: username
Users data => Columns: alots
Indexes: id
It sounds that the real question you have here is this: why ma app is slow. First of all splitting data between several tables is not going to help performance. If done right (for reasons other than performance) it will not hurt performance but I doubt it will help.
What's more, in my experience it is a bad idea to optimize based on gut feel. Somehow guesses about what holds your program back are usually wrong. You end up doing a lot of rewriting without any gain in speed.
The first step to speed it up is to find the real bottleneck. You need to add instrumentation and collect some stats to figure out - is it database or app server. Is it a particular sproc or might be the bandwidth of your network. Or may be it is some javascript on your pages.
Only after you know what to fix you can try to fix it.
Sounds like splitting the table won't do you any good. It seems like a 1:1 correlation would occur between the tables, and that would simply add a second query whenever you wanted something from that table.
Try using Partitioning on the table to help with performance in that aspect.
Normalizing is only useful if you have redundant data (so, you have the same user in your user table 5 times). Helpful if you want to lower data usage with particular users' high scores for multiple games, but ultimately it probably won't give you a performance increase on the table.
If you're querying for bits of information and you have lots of (edit:)columns, it's actually a really good idea to have them separated and you don't need the tableid field in the users table, all you need is a foreign key in the information table that points to the associated user in the users table.
You can have multiple tables like that and join them as you like, performance will most likely increase.

Categories