I am starting to think about my new project and I've found a couple of speed issues, so I hope you can help me with selecting a good and elegant way to code it.
Each user has in the database records of "places" he has visited. Each place has "schools" - a number of schools in this particular place. Each school has classes. Each class may end its "learning year" at different times, so it's number should increment if date is >= end of learning year.
So we have such a database:
"places" table:
place | user_id |
-----------------
1 | 4 |
2 | 4 |
User no 4 visited place no 1 and 2
"schools" table:
school | place |
----------------
5 | 2 |
6 | 2 |
Place 2 has two schools - with id 5 and 6.
"class" table:
class | school | end_learning | class_number
---------------------------------------------
20 | 5 | 01.01.2013 | 2
21 | 5 | 03.01.2013 | 3
22 | 5 | 05.01.2013 | 4
School 5 has 3 classes with ids 20, 21, 22. If date is greater than 01.01.2013, the class number of class 20 should be incremented to 3 and end learning date changed to 01.01.2014. And so on.
And now we got into the problem - if there is 1000 places, each with 100 schools, each with 10 classes we got 1000000 records. It's a lot. Because all I have presented is just a simple example I have to consider updating whole database every time user refreshes the page so I'm afraid it might be laggy on that amount of records.
I also can serialize class into one field in school table:
school | place | classes
-------------------------------------------------------------------------
5 | 2 | serialized class 20, 21, 22 with end_learning field and class number
6 | 2 | other serialized classes from school 6
In that case I get 10 times less records but each time I have to deserialize data, check dates and if it's less than now alter it, serialize and save to database. The second problem is that I have to select all records from db to manipulate them not only all those need to be altered.
I am also thinking about having two databases: One with records that might need change in further future, and second that might need change in next 24hrs (near future). Every 24hrs all the classes which end learning in next 24 hrs are moved to "near future" db so every refresh of the page works on thousands of records, not hundreds of thousands or millions. Instead of that it works on millions of records (further future) to create "near future" table only once per day.
What do you think about all those database schemas? Maybe you have a better idea?
I don't quite understand the business logic or data model you outline - but I will assume you have thought this through.
Firstly, RDBMS solutions like MySQL are really, really good at managing large numbers of records, as long as the data you are working with is relational. As far as I can tell, you will be searching across many records, but only updating a few (a user will only be enrolled in a limited number of classes); I don't see this as a huge problem.
Secondly, it's nearly always better to go with the "standard" relational model until you can prove it doesn't meet your performance needs than to go for "exotic" solutions at the start off (I class your serialization and partitioning solution as "exotic" for the purpose of this answer). A lot of time and energy has gone into optimizing performance of SQL; if there were a simple alternative, it would be part of the standard solution. There are, of course, points at which the standard relational model doesn't scale (Facebook-size traffic, for instance), or business domains where the relational model doesn't really fit (documents, graphs). However, all the alternatives have benefits and drawbacks just like "standard" MySQL.
Thirdly, the best way to deal with possible performance issues is, well, to deal with them. In code. Build a test rig, create a schema according to the relational model, populate it with test data (e.g. using DbMonster), throw some load at it (e.g. using JMeter) and tune your schema and queries to prove your situation doesn't fit the standard solution. Only go for something exotic if you really can prove that you can't play nice with standard, relational database stuff.
Related
I created a sistem to input results from a school basketball tournament. The idea is that after the game the operators will input the result in a format that the system fetches to save in the db in a format like the one below:
Date | Team | Score 1Q | Score 2Q | Score 3Q | Score 4Q | Score OT | Final Score | W | L | Won over Team | Lost to Team | Regular Season? | Finals?
I created a PHP page that calculate many stats from the table above, like Total Wins, Win%, Avg Points, Avg. Points per Quarter, % Turn Around Games when loosing on Half Time or 3Q, % Finals games disputed, Times became champions etc, and many more deep stats.
But I was thinking in creating a View with this information calcalated on the DB and in real time, instead of having the script handles it.
But how can I turn the selects needed from the first table into a working second table with all calculations done whenever we make the selection?
Thanks
#decio, I think your idea about creating a view to calculate those stats is not a bad idea. You might be able to do so with the something similar to the following SQL script:
CREATE VIEW result_stats_view AS SELECT SUM(W) as total_wins, SUM(L) as total_losses FROM precalculate_stats_table_name;
This shows the total wins and losses for the season, but you probably get the idea. Check out MySQL aggregate functions (like average, sum, etc.) here:
https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html
Once you have your calculations added to the view then you can simply do query like this to get your calculated data:
SELECT * from result_stats_view
I'm using tokens for how many messages a user can send (1 message requires 1 token). At the moment I've just got it subtracting the value from an overall value to check if the user has tokens remaining and that's working fine.
I'm trying to change it so that it shows which bundle is active, so I need to check if the user doesn't have enough tokens remaining in the active bundle change to the upcoming_bundle.
Example:
Stored User Data:
Table Name: Tokens
First Record
id: 1
user_id: 5
bundle_type: small
value: 10
value_remaining: 4
state: active_bundle
Second Record
id: 2
user_id: 5
bundle_type: large
value: 100
value_remaining: 100
state: Upcoming_bundle
User sends 10 messages (10 tokens)
Only 4 remaining tokens in first record. Use 4 remaining tokens and leave
6 tokens
Then subtract the 6 tokens from second record which is now active so that will leave 94 remaining tokens.
Should I have a check to database every time the message is sent and update the database to subtract 1 token at a time, then when the remaining_value hits 0 change active_bundle to inactive and upcoming_bundle to active?
If this is your data model then I would fetch all active & upcoming bundles and then do the logic in php, e.g. subtract remaining tokens, change status, etc and then update them as a transaction.
If you are flexible on how the data is structured, I would rather have some kind of transaction log, from which I can read each action, i.e. whether a bundle was added or a token was used with a timestamp. For example like this:
id | user | change | comment | timestamp
1 | 1 | 10 | bought small bundle | 2016-09-06 09:30:00
2 | 1 | -1 | sent message | 2016-09-06 10:56:00
3 | 2 | -3 | sent multi-message | 2016-09-06 10:57:00
Where id is the transaction id, user the user id, change is the number of tokens added (by adding a bundle) or used (by sending one or many messages) and comment a message describing the action. When you want to find out how many tokens there are left you can just do a search for that user and check their SUM(change) instead of weird searches for active/upcoming bundles. Obviously this can be more or less elaborate depending on your needs.
This does not take into account your actual domain! There are more approaches each having their drawbacks. For example my approach might have problems wen the transaction_log-table gets large because of number of users and increased activity, although it is very unlikely (I have seen mysql perform well with a few million records in a similar log table). The important part is: You should figure out what is important to your use case and build a solution around the requirements.
What I would do is, I would subtract it one at a time, not only this is safer, but also a lot easier.
I searched in the internet for an answer to select every columns that matches regex pattern. I didn't find one, or maybe I did, but I didin't understand it, because I'm new to DataBases. So here's the sql I was trying to run:
UPDATE `bartosz` SET 'd%%-%%-15'=1
(I know it's bad)
I have columns like:
ID | d1-1-15 | d2-1-15 | d3-1-15 | d4-1-15 ... (for 5 years, every month, and day)
So is there a way to select all columns from 2015?
I know i can loop it in php so the sql would look like:
UPDATE `bartosz` SET 'd1-1-15'=1, 'd1-1-15'=1, 'd3-1-15'=1 [...]
But it would be really long.
Strongly consider changing your approach. It may be technically possible to have a table with 2000 columns, but you are not using MySQL in a way that gets the most out of the available features such as DATE handling. The below table structure will give better flexibility and scaling in most use cases.
Look into tables with key=>value attributes.
id employee date units
1 james 2015-01-01 2
2 bob 2015-01-01 3
3 james 2015-01-02 6
4 bob 2015-01-02 4
With the above it is possible to write queries without needing to insert hundreds of column names. It will also easily scale beyond 5 years without needing to ALTER the table. Use the DATE column type so you can easily query by date ranges. Also learn how to use INDEXes so you can put a UNIQUE index on the employee and date fields to prevent duplication.
I have a question with im really unsure with.
First feel free to downvote me if its a must but i would really like to hear a more experienced developers opinion.
I am building a site where i would like to build similar functionality like google circles.
My logic would be this.
Every user will have circles attaced to them after signup.
example if the user will sign up
form filed and the following querys will be insierted to the database
**id | circle_name | user_id**
------------------------------------
1 | circle one | 1
------------------------------
2 | circle two | 1
------------------------------
3 | circle three | 1
Every circle will have a primary key
But this is what im unsure with, so after a time im a bit scared that the table will break, what im mean is if it will reach a number of id's it will actually stop generating more.
When you specifiy an int in the database the default value is 11, yes i know i can incrase or set it to the value what i want, but still giveing higher values is a good idea?
or is there any possibility to make a primary key auto increment to be unlimited?
thank you for the opinions and help outs
or is there any possibility to make a primary key auto increment to be unlimited?
You can use a BIGINT.
Strictly speaking it's not unlimited, but the range is so incredibly huge that you wouldn't be able to use up all the values even if you tried really hard.
Just run some maths and you ll get the answer yourself. If a length can store billions of values and you don't expect to have 1 million new registrations every week then getting to a point where it breaks would be "practically" tough, even if "theoretically" possible
Currently I am developing a time machine for a open-source Business Intelligence software from scratch using PHP/MySQL.
My time-machine table is used by all other tables that need date info (such as orders, products, etc.) and they binding with time_id. So its MySQL table like this:
time_id | timestamp | day | week | month | quarter
1 1303689654 25 17 4 2
2 1303813842 26 17 4 2
...
Order table binding like this:
order_id | time_id ...
3123 2
...
edit: it's similar to STAR SCHEMA.
The problem is getting TIME (13:45) information as well. Usually I don't need this, but like orders, and sometimes a couple of tables need this HOUR/MINUTE infomation.
How can I solve this problem cleverly? I have a couple of solutions, but first i want to see your opinions..
Why don't you simply store timestamps in your other tables?
Or, if you want to keep the dates table, simply add a TIME field to your other tables which need it.
If I understand your timestamp correctly, you can probably just go:
echo date('H:i.s',1303689654);
edit
What are you trying to do? After re-reading your question you may be looking for the JOIN keyword in SQL (brief tute)