I am in dilemma situation. I am not sure if its a good idea to separate the users table. I notice my game highscores table performances, as the numbers growing, the loading is getting slower and slower.
My current users table store all users, which currently about 10k users. I am thinking of splitting the users table (for future) into like this:
Login Table => store user login details
==========================================
= id | username | password | tableid =
==========================================
= 1 | user1 | user1xx | 1 =
= 2 | user2 | user2xx | 1 =
...
= 20k1 | user20k1 | user20k1 | 2 =
etc
Users Data
==========================================
= id | money | items | preferences =
==========================================
= 1 | xx | xx | xx =
= 2 | xx | xx | xx =
...
= 20k1 | xx | xx | xx =
etc
So, when I try to get users data I just LEFT JOIN query to get the data.
My question is, are there any differences (speed, performances etc) between storing users data in multiple tables and storing users data in single table? (assume indexes and primary key are the same)
My current tables indexes:
Games highscores table => columns: id, gameid, name, score, date
Primary key : id
Indexes: gameid
Login Table => Columns: id, username, password
Primary key: id (userid)
Indexes: username
Users data => Columns: alots
Indexes: id
It sounds that the real question you have here is this: why ma app is slow. First of all splitting data between several tables is not going to help performance. If done right (for reasons other than performance) it will not hurt performance but I doubt it will help.
What's more, in my experience it is a bad idea to optimize based on gut feel. Somehow guesses about what holds your program back are usually wrong. You end up doing a lot of rewriting without any gain in speed.
The first step to speed it up is to find the real bottleneck. You need to add instrumentation and collect some stats to figure out - is it database or app server. Is it a particular sproc or might be the bandwidth of your network. Or may be it is some javascript on your pages.
Only after you know what to fix you can try to fix it.
Sounds like splitting the table won't do you any good. It seems like a 1:1 correlation would occur between the tables, and that would simply add a second query whenever you wanted something from that table.
Try using Partitioning on the table to help with performance in that aspect.
Normalizing is only useful if you have redundant data (so, you have the same user in your user table 5 times). Helpful if you want to lower data usage with particular users' high scores for multiple games, but ultimately it probably won't give you a performance increase on the table.
If you're querying for bits of information and you have lots of (edit:)columns, it's actually a really good idea to have them separated and you don't need the tableid field in the users table, all you need is a foreign key in the information table that points to the associated user in the users table.
You can have multiple tables like that and join them as you like, performance will most likely increase.
Related
I have a MySQL table, people, that looks like this:
id | object_id | name | sex | published
----------------------------------------------
1 | 1 | fred | male | [timestamp]
2 | 2 | john | male | [timestamp]
The reason I have two ids is that in my CRUD app the user might edit an existing object, in which case it becomes a draft, so that I have two rows (the draft record and the already-existing record) with the same object_id, something like this:
id | object_id | name | sex | published
----------------------------------------------
2 | 2 | john | male | [timestamp]
3 | 2 | john | female | NULL
This allows me to keep track of records' drafts and publication status. When the row with id of 3 is published, its published field will be stamped and the already published row deleted.
Each person also has a job history, so I have a table history:
id | person_object_id | job
----------------------------------
1 | 2 | dev
2 | 2 | accountant
This is John's job history. I refer to John's object_id in the person_object_id field, because if I refered to his id I'd risk delinking the two tables if I deleted one of the John rows as in my example above.
So my question is: is it not inefficient to refer to a table, as I do above, using a non-primary key (object_id instead of id)? How can I refer to a primary key when I require a non-unique id to keep track of drafts/published rows?
It looks like you want to keep versions of your data and you've come across the age-old problem of how to maintain foreign key pointers to versioned data. The solution is actually easy and it turns out that it is a special case of second normal form.
Take the following employee data:
EmpNo FirstName LastName Birthdate HireDate Payrate DeptNo
Now you are tasked with maintaining versions of the data as it changes. You could then add a date field which shows when the data changed:
EmpNo EffDate FirstName LastName Birthdate HireDate Payrate DeptNo
The Effective Date field shows the date each particular row took effect.
But the problem is that EmpNo, which was a perfect primary key for the table, can no longer serve that purpose. Now there can be many entries for each employee and, unless we want to assign a new employee number every time an employee's data is updated, we have to find another key field or fields.
One obvious solution is to make the combination of EmpNo and the new EffDate field be the primary key.
Ok, that solves the PK problem, but now what about any foreign keys in other tables that refer to specific employees? Can we add the EffDate field to those tables, also?
Well, sure, we can. But that means that the foreign keys, instead of referring to one specific employee, are now referring to one specific version of one specific employee. Not, as they say, nominal.
Many schemes have been implemented to solve this problem (see the Wikipedia entry for "Slowly Changing Dimension" for a list of a few of the more popular).
Here's a simple solution that allows you to version your data and leave foreign key references alone.
First, we realize that not all data is ever going to change and so will never be updated. In our example tuple, this static data is EmpNo, FirstName, Birthdate, HireDate. The data that is liable to change then, is LastName, Payrate, DeptNo.
But this means that the static data, like FirstName is dependent on EmpNo -- the original PK. Changeable or dynamic data, like LastName (which can change due to marriage or adoption) is dependent on EmpNo and EffDate. Our tuple is no longer in second normal form!
So we normalize. We know how to do this, right? With our eyes closed. The point is, when we are finished, we have a main entity table with one and only one row for each entity definition. All the foreign keys can refer to this table to the one specific employee -- the same as when we've normalized for any other reason. But now we also have a version table with all the data that is liable to change from time to time.
Now we have two tuples (at least two -- there could have been other normalization processes performed) to represent our employee entity.
EmpNo(PK) FirstName Birthdate HireDate
===== ========= ========== ==========
1001 Fred 1990-01-01 2010-01-01
EmpNo(PK) EffDate(PK) LastName Payrate DeptNo
===== ======== ======== ======= ======
1001 2010-01-01 Smith 15.00 Shipping
1001 2010-07-01 Smith 16.00 IT
The query to reconstruct the original tuple with all the versioned data is simple:
select e.EmpNo, e.FirstName, v.LastName, e.Birthdate, e.Hiredate, v.Payrate, v.DeptNo
from Employees e
join Emp_Versions v
on v.EmpNo = e.EmpNo;
The query to reconstruct the original tuple with only the most current data is not terribly complicated:
select e.EmpNo, e.FirstName, v.LastName, e.Birthdate, e.Hiredate, v.Payrate, v.DeptNo
from Employees e
join Emp_Versions v
on v.EmpNo = e.EmpNo
and v.EffDate =(
select Max( EffDate )
from Emp_Versions
where EmpNo = v.EmpNo );
Don't let the subquery scare you. A careful examination shows that it locates the desired version row with an index seek instead of the scan that most other methods will generate. Try it -- it's fast (though, of course, mileage may vary across different DBMSs).
But here's where it gets really good. Suppose you wanted to see what the data looked like on a particular date. What would that query look like? Just take the query above and make a small addition:
select e.EmpNo, e.FirstName, v.LastName, e.Birthdate, e.Hiredate, v.Payrate, v.DeptNo
from Employees e
join Emp_Versions v
on v.EmpNo = e.EmpNo
and v.EffDate =(
select Max( EffDate )
from Emp_Versions
where EmpNo = v.EmpNo
and EffDate <= :DateOfInterest ); --> Just this difference
That last line makes it possible to "go back in time" to see what the data looked like at any specific time in the past. And, if DateOfInterest is the current system time, it returns the current data. This means that the query to see current data and the query to see past data are, in fact, the same query.
It doesn't really matter as long as you have an index on that column (not-unique index). Than it would be almost as fast
The question is not new in any way but it has a small twist to it.
My webpage is a membership page where users places bets. My idea is to create a new table for the users(with a naming convention like TABLE userBet+$userid) bets. User login information is already handled, my goal is now to save the bets of the user to a new table. A table which is created when users register. This will hopefully make score counting easier. Am I right or wrong? Could this be done in a better way? (Everything is done in PHP MySQL)
User registers -> Table for bets get created
"CREATE Table $userID ,id_bet, games, result, points"
And then matching this table against the correct result?
So again my questions: Is this a good way to do it? Is creating a table with the userID a smart thing to do?
EDIT
The bets is always 40 matches, which makes the tables Huge with columns and rows.
Should I make 40 Tables, one for each games instead? and put all users in there?
Am I right or wrong?
You are wrong. Dynamically altering your database schema will only make it harder to work with. There's no advantage you gain from doing so. You can do the same things by storing all bets within the same table, adding a column userid.
Posting as an answer due to author's request : )
Suggested database schema:
table matches:
id | name |
---------------
1 | A vs B |
table user_bets
id | user_id | match_id | points | result |
-------------------------------------------
1 | X | 1 | Y | Z |
Where match_id is related on matches.id
user_id = user.id
user_bets is only one table, containing all the info. No need of separate tables, as it was clear from the comments it's considered bad practice to alter the db schema via user input.
I have a SQLite DB with about 24k records in one of the tables, 15 in the other. The table with 15 records holds information about forms that need to be completed by users (roughly 1k users). The table with 24k records holds information about which forms have been completed by who, and when. When a user logs in, there is about a ~3/4 second wait time while the queries run to determine what the user has finished so far. Too long for my client. I know I can't be doing my queries in the best way, because they are contained within a loop. But I cannot seem to figure out how to optimize my query.
The queries run as follows:
1) Select all of the forms and information
$result = $db->query("SELECT * FROM tbl_forms");
while($row = $result->fetchArray()){
//Run other query 2 here
}
2) For each form/row, run a query that figures out what is the most recent completion information about that form for the user.
$complete = $db->querySingle("SELECT * FROM tbl_completion AS forms1
WHERE userid='{$_SESSION['userid']}' AND form_id='{$row['id']}' AND forms1.id IN
(SELECT MAX(id) FROM tbl_completion
GROUP BY tbl_completion.userid, tbl_completion.form_id)", true);
There are 15 forms, so there is a total of 16 queries running. However, with my table structure, I'm unsure as how to get the "most recent" (aka max form id) form information using 1 joined query instead.
My table structure looks like so:
tbl_forms:
id | form_name | deadline | required | type | quicklink
tbl_completion:
id | userid | form_id | form_completion | form_path | timestamp | accept | reject
Edit: Index on tbl_forms (id), Index on tbl_forms (id, form_name), Index on tbl_complete (id)
I've tried using a query that is like:
SELECT * FROM tbl_completion AS forms1
LEFT OUTER JOIN tbl_forms ON forms1.form_id = tbl_forms.id
WHERE forms1.userid='testuser' AND forms1.id IN
(SELECT MAX(id) FROM tbl_completion GROUP BY tbl_completion.userid, tbl_completion.form_id)
Which will give me the most up-to-date information about the forms completed, as well as the form information, but the only problem with this is I need to output all the forms in a table (like: Form 1-Incomplete, Form 2-Completed, etc) I cannot seem to figure out how to get it to work with the left table being tbl_forms and getting all form info, as well as "latest" form tbl_completion info. I also tried doing a 3 LOJ with the last "table" as a temp table holding the maxid, but it was very slow AND didn't give me what I wanted.
Can anybody help?? Is there a better optimized query I can run once, or can I do something else on the DB side to speed this up? Thank you in advance.
You're missing indexes. See:
DOs and DONTs for Indexes
Also, the SELECT MAX(id) FROM tbl_completion GROUP BY tbl_completion.userid, tbl_completion.form_id could presumably discard unneeded rows if you toss in your userid in a where clause.
It sounds like you might be running into the concurrency limitations of SQLite. SQLite does not support concurrent writes, so if you have a lot of users, you end up having a lot of contention. You should consider migrating to another DBMS in order to satisfy your scaling needs.
I'm making an application where users can sign up for courses.One user can signup for multiple courses, and in the end the data that I am interested in looks like this.
array(
0 => array(
0 => 12, // 0 is nothing 12 is the course id which i use to refference
'date_joined' => 1301123384 // when the user joined the course
),
1 => array(
0 => 52, // the same as above
'date_joined' => 1301123384
)
)
I also need the keys of the main array to determine the order in which the user joined.
To store it i serialize it into a string and save it in the database.
Is this a good method ?
Could it be done differently ? Better ?
I don't need a mysql query i need to know if this information could be stored otherwise than an array turned into a string
No that's definitely not how you should be storing it. You need to normalize your DB design so you can use querying to get your job done. If you serialize it, you won't be able to query it (in the conventional way, at least). The following is a better schema
Students : sid | s_name | more | person | data | created |
Courses : cid | course_no | c_name | some | more | info | created
Students_Courses : id | sid | c_id | created
Breaking it down
You have a Students table with student information and a Courses table with course information. They you have a join table with the SID and the CID in it to make a unique key. This will let you query the courses a student is part of and all the students that subscribe for a course as well. Since you have a created column, you can use it to ORDER BY so you know which courses came first.
Why not use a table that has three columns
1)user_id
2)subject_id
3)date_joined
and store it there? If you serialize and store the array you will not be able to query easily. For example - to get the dates people signed up for a certain subject.
Serializing and storing the data structure as a String is certainly a very bad way of doing this.
You should use relational tables to store this data.
You could, for ex., create a course_attendees table that has 3 columns - user_id, course_id and join_date. Join_date can e used to track the order in which the courses were subscribed.
This will help you query the tables more efficiently rather than writing code to serialize and deserialize the string to your data structures.
If we look at the stackoverflow website we have votes. But the question is what is the bestway to store who has voted and who has not. Lets also simplify this even more and say that we can only vote Up, and we can only Remove the Up vote.
I was thinking having the table to be in such form
question - Id(INT) | userId(INT) | title(TEXT) | vote(INT) | ratedBy(TEXT)
Thre rest is self explanitory but ratedBy is a Comma Seperated Id values of the Users.
I was thinking to read the ratedBy and compare it with the userId of the current logged in User. If he dosent exist in the ratedBy he can vote Up, otherwise he can remove his vote. Which in turn will remove the value from ratedBy
I think to make another table "vote" is better. The relationship between users and votes is n to n, therefore a new table should be created. It should be something like this:
question id (int) | user id (int) | permanent (bool) | timestamp (datetime)
Permanent field can be used to make votes stay after a given time, as SO does.
Other fields may be added according to desired features.
As each row will take at least 16B, you can have up to 250M rows in the table before the table uses 4GB (fat32 limit if there is one archive per table, which is the case for MyISAM and InnoDB).
Also, as Matthew Scharley points out in a comment, don't load all votes at once into memory (as fetching all the table in a resultset). You can always use LIMIT clause to narrow your query results.
A new table:
Article ID | User ID | Rating
Where Article ID and User ID make up the composite key, and rating would be 1, indicating upvote, -1 for a downvote and 0 for a removed vote (or just remove the row).
I believe your design won't be able to scale for large numbers of voters.
The typical thing to do is to create to tables
Table 1: question - Id(INT) | userId(INT) | title(TEXT)
Table 2: question - ID(INT) | vote(INT) | ratedBy(TEXT)
Then you can count the votes with a query like this:
SELECT t1.question_Id, t1.userId, t1.title, t2.sum(vote)
FROM table1 t1
LEFT JOIN table2 t2 ON t1.question_id = t2.question_id