Suppose a table videos
id | name | views
----------+--------------+-----------
1 | Video1 | 52
2 | Video2 | 150
...
For getting the video which is popular/most viewed this week, I could create another table: videoviews
id | foreign_key | viewed_on
----------+--------------+-----------
1 | 1 | 10/12/2018
2 | 1 | 09/12/2018
...
From this table, I can easily get the data for last week/last month etc. That's not an issue.
Problem:
Suppose I have 1000 Videos and Each video gets 100 Views per day.
My videoviews table will have 100000 records each day.
I know this is not the best way to achieve this functionality. Just wondering what is?
I found these on SO but..
How to get most visited posts of the week?
Popular Today, This Week, This Month - Design Pattern
Problem: Suppose I have 1000 Videos and Each video gets 100 Views per day. My videoviews table will have 100000 records each day.
Do you need a complete record of each individual view?
You could, instead, use a counter approach, where you store one row per video per day, and simply increment its value when a new row comes in. This is granular enough to provide useful per-day analytics, without having to store a million rows for a million video views.
Add extra columns called views and start_date on your videoviews table.
On hitting the page with the video, fetch the views, increment, and update where the week starts with start_date.
Only one row is required per week. You can also remove old weeks if you like.
Related
I have a page that shows restaurant profile data, and one of the data shown is the total checkin count of users to the restaurant
I have a mysql table like: user_checkins which stores the checkin of users into restaurants like:
id | user_id | res_id | checkin_date |
1 | 102 | 5526 | 2016-04-21 03:20:21 |
2 | 165 | 5574 | 2016-04-21 06:35:21 |
3 | 102 | 4565 | 2016-04-24 02:15:30 |
and another table res_checkin_count:
id | res_id | total_checkin_count |
1 | 5526 | 1055 |
after a while many rows will be created in user_checkins, because people checkin frequently
Question : Should I delete the older rows? like create a cronjob that deletes old rows periodically(like daily) for reach restaurant and update the restaurant total_checkin_count number in another mysql TABLE storing only the total_checkin_count of each restaurant? will this consume alot of memory?
or
I keep the rows and let it accumulate and use SELECT COUNT(*) all to get each restaurant total_checkin_count?
EDIT: the user_checkins table actually stores all user checkin for various restaurants, everytime someone visits a 'restaurant_profile' webpage, the SELECT COUNT(*) query will run on the user_checkins table for res_id x, to get the total checkin count of that restaurant, is that redundant?
When you say many rows you need to assess if many is beyond the capabilities of MySQL. In general MySQL should easily be able to handle in the order of 100 million rows per table. Are you expecting to exceed over 100 million rows any time soon? If not, then leave your data alone, it reduces the complexity that would come with an archiving system.
If on the other hand you are expecting more than hundreds of millions of rows on tables, then yes, running a daily job to delete or archive your data can be helpful in keeping your database running well.
seems to me those tables are in MySQL, however I'll just get rid of res_checkin_count is a duplicate of an aggregate function that is the COUNT so you are wasting memory, so there can only be 2 scenarios:
1 your user_checkins table does not have more than 2 million records and you create nuncluster index for Column res_id and will be fine.
2 You have a Monstrous website where you store more than 2 million active records and you create tables per State or per brick (3 to 5 zipcodes) that way you will have distributed records most likely people form TX will search and Query restaurants from TX and so on.
I am developing a personal finance tracker (for fun!) and I have a table of categories. Each category is an entry in the table and at the end of the month they are all duplicated with their relevant balances reset to the start of the month reading for the new month.
Among others, these categories can be of type 'savings' and so have a running total. If I want to retrieve a category or update it then I used the category_id field and this works fine for the current working month but linking months together is breaking my brain. For the savings categories I want to show how the running_total has increased over the previous six months but in my current DB design, categories don't "know" about their previous months as they are created new at the start of each month.
The only way I could currently retrieve the last 6 months of a savings running_total is to search by the category name but this is potentially unreliable.
I have considered adding a field to the table which is "previous_month_category_id" which would work as a way to link the categories together but would be expensive to implement as it would require 6 MSQL operations each time grabbing the "previous_month_category_id" from the result and then re running the query.
If MYSQL can do some kind of recursion then maybe this could work but I feel like there is a more obvious answer staring me in the face.
I'm using Codeigniter and MYSQL but not scared of vanilla PHP if required.
Help on how to do this would be great.
UPDATE 1:
Below is a sample from what the savings category might look like mixed in amongst other categories. At the end of each month the entry is duplicated with the same category_name, type, buget, year, and users_id but the category_id auto increments, the month updates to the new month number and the running total is the previous running_total + the budget. How would I do one database query to retrieve these without using the category_name? As this could change is the user decided to caller it "Bigger TV" at the end of July
+-------------+--------------+------+--------+---------------+------+-------+----------+
| category_id |category_name | type | budget | running_total | year | month | users_id |
+-------------+--------------+------+--------+---------------+------+-------+----------+
| 44 | Big TV | sav | 20 | 240 | 2012 | 8 | 77 |
+-------------+--------------+------+--------+---------------+------+-------+----------+
| 32 | Big TV | sav | 20 | 220 | 2012 | 7 | 77 |
+-------------+--------------+------+--------+---------------+------+-------+----------+
| 24 | Big TV | sav | 20 | 200 | 2012 | 6 | 77 |
UPDATE 2:
I'm not sure I'm explaining myself very well So I'll put some more detail around how the app works and see if that helps.
I have tables called "categories", "transactions" and "users". A category can be one of three types, 1: Cash, 2: Regular Payment, 3: Savings. Think of cash and regular payment types as buckets, at the start of each month each bucket is full and the aim is to take money out of it and make sure there is still a bit left at the end of the month (or at least not negative).
This is fine on a month by month basis and works very well (for me, I have used this system for 2 years now I think). The trip up comes with Savings as they are linked month by month and are more like a big bucket that is added to each month (with a set increment called budget) until it overspills and is then drained (like Big TV would be when you buy it), or taken from a little bit here and there and the aim is to build up an emergency fund (like "When my car breaks down" type thing).
When the relevant information is displayed for each category only the current month is shown for cash and regular as that is all that is important, for the savings however the current amount is also shown but it would be nice to show a small history graph of how it had built up (or depleted) over time. To do this I need some way of searching for the previous end of month states of these categories so that the graph can be plotted but currently I can't work out how to link them all by anything other than the category_name.
I have tried to implement a bit of DB normalisation but this is the first schema I've implemented having known about normalisation so I've probably missed some aspects of it and possibly avoided any over normalisation where it didn't feel right.
Below are my tables:
categories
+-------------+--------------+------+--------+---------------+------+-------+----------+
| category_id |category_name | type | budget | running_total | year | month | users_id |
+-------------+--------------+------+--------+---------------+------+-------+----------+
transactions
+----------------+--------------+--------+------+----------+------------------------+
| transaction_id | description | amount | date | users_id | categories_category_id |
+----------------+--------------+--------+------+----------+------+-------+---------+
they are joined on categories_category_id which is a foreign key
I have always worked off the premise that each category needs an new entry for each month but it seems from the comments and answers below that I would be better off having just one category entry regardless of month and then just calculating everything on the fly?
Although, the budgets can be changed by the user and so for record keeping I'm not sure if this would work also the "deposits" never really happen it is just the category being duplicated at the end of the month so I guess that would need to dealt with.....
The aim of this app has always been to decouple financial tracking from the physical transaction that occur in a bank account and provide a layer over someones finances thus allowing the user to avoid hard to explain transactions etc and just focus on over all cash position. There is no concept of an "income" in this system, or a bank account.
It seems to me like your database design could use some work. I'm still not completely familiar with what you're really trying to do, but my initial thoughts would be to store each transaction as a single row in a table, and then query that table in different ways to generate different types of reports on it. Something like this:
transactions:
+----+---------+--------+---------------+-----------+-------------+
| id | user_id | amount | running_total | datestamp | category_id |
+----+---------+--------+---------------+-----------+-------------+
categories:
+----+------+------+
| id | name | type |
+----+------+------+
Don't increment the categories based on time. Add an entry to the categories table when you actually have a new category. If a transaction could possibly belong to multiple categories, then use a third (relational) table that relates transactions (based on transaction ID) to categories (based on category ID).
When you have a deposit, the amount field will be positive and for withdrawals, it will be negative. You can get your current running total by doing something like:
SELECT running_total FROM transactions
WHERE id = (SELECT MAX(id) FROM transactions WHERE user_id = '$userID');
You can find your total difference for a particular month by doing this:
SELECT SUM(amount) FROM transactions WHERE DATE('%c', datestamp) = '$monthNumber';
You can find the total spending for a particular category by doing this:
SELECT SUM(t.amount) FROM transactions t
INNER JOIN categories c ON t.category_id = c.id WHERE c.name = 'Big TV';
There are plenty of other possibilities, but the purpose here is just to demonstrate a possibly better way to store your data.
I'm developing a sports court booking system and I need to generate a "booking table" that shows the columns in the table header as courts and the rows as time slots for bookings.
E.g.,
___________________________________
| | | |
| Court 1 | Court 2 | Court 3 |
|___________|___________|___________|
| | | |
| 10.00 am | 10.00 am | 10.00 am |
|___________|___________|___________|
| | | |
| 11.00 am | 11.00 am | 11.00 am |
|___________|___________|___________|
Requirements:
A club can have any number of courts
A club can have any time increment for bookings (e.g., 1 hour as shown above, 30 minutes, 40 minutes, etc)
Each cell in the table represents a "booking"
I want to make sure I do this right from the start so I have a few questions:
What entities would you create to achieve this
How would you go about generating this booking table
How would you link a cell in the above table to a booking
Thanks in advance.
Well, I think this is kind of standard?
First, you need a club entity. Each club can have n courts:
Club 1:n Court
Then there is a booking table, which is 1:n to a court:
Court 1:n Booking
I don't know if your second requirement means that one club has one time increment (in which case this is one variable on the club entity) or if it can has many (than there would be a TimeIncrement entity.
Generating the table can be a bit tricky. Thinking about it for a few minutes I got like 5-6 solutions which might work. You could use special objects which you can ask for the booking for a specific court and time and which search a Collection. Our you could build up an array where you have one key for every time and if there is no booking it's null. Have one array for each court, than do 2 nested for loops and read every value from the arrays. You could build up queries which rearrange the data so you can use them directly. Or maybe you can ask the court object itself for the booking on a specific date and time.
But I guess that is what the developer is for... Find out what works best for the given requirements and implement it.
What entities would you create to achieve this
Off the top of my head it looks like you'll need 3: Club, Court, Booking
How would you go about generating this booking table
The table should probably consist of id, court_id, start_time, end_time
How would you link a cell in the above table to a booking
As mentioned above, start/end times are columns in the bookings table.
I would just query the data from the database and turn it into json and pass it into the website. The frontend then can build the table with javascript.
For that I would create a custom entity BookingTable that returns data on request directly as an array which then can be easily turned into json with json_encode.
You can then concentrate on the more detailed pages that show the single booking for which you will automatically create the entities you need (if you didn't already to formulate the DQL for the custom entity for the table).
I am trying to create a fantasy football website. I'm trying to work out the table structure and I was looking for advice.
What I have so far:
usertable - > User Info
playertable - > Player Info
userleaguetable - > User League Info
matchtable - > Match Info
clubtable - > Club Info
Then the two tables that will be doing all the work:
scoringtable
Each week a players record will be added to the table, how many goals, how long he played, bookings, man of the match etc.
So that table will get pretty big: num_players * num_weeks
userteamtable
Each week the players on the users team will be added to the table, which player and which one was captain
So that table will (hopefully) get pretty big too: num_users * 11 * num_weeks
Why I was thinking of going this route with it is due to the fact that there will be a full week by week record of each users team, each players points etc.
So that's basically it, what I'm concerned about is table size, I mean if eventually there was 1000 users that would be 10000 rows added to the DB each week
Anyone have any suggestions for me??
Maybe the scoring table should have a row for each week with a unique identifier for the player. That way, in the scoring table you have 1 record per player with say 52 rows representing 52 weeks in the season. Each week you simply find the scoring record based on the unique identifier for the player and update that record with that weeks score.
This way, you're not adding a record every week for every player, you have 1 record per player for 1 season, it may look like this:
Player ID | Season ID | Week 1 | Week 2 | Week 3 | and so on..
For the next season, you add another record but change the season number. In 2 seasons you only have 2 records per player and so on. For example:
Bob the Player (ID 2450), Season 1, Week 1 score = 50, Week 2 score = 100, etc..
Player ID | Season ID | Week 1 | Week 2 | Week 3 | and so on..
-------------------------------------------------------------------
2450-------|----- 1------|---50---|---100--|--150----
Hope this helps you see the rest of your database in a more efficient way. Good luck!
also wanted to mention, if you're storing multiple values per week just store them in this fashion:
HOw many goals: 5
How long he played: 15min
Bookings: 500
Man of the match: 1 or 0
| Week 1 |
---------------
|5,15min,500,1|
Man of the match can simply be a 1 or 0 value, give the 1 value to the MVP and 0's to the rest, in the example above, this player was man of the match as the last value is a 1.
When your PHP reads in the Week 1 data, just explode it using the commas to create an array of the values for each week.
This is for an upcoming project. I have two tables - first one keeps tracks of photos, and the second one keeps track of the photo's rank
Photos:
+-------+-----------+------------------+
| id | photo | current_rank |
+-------+-----------+------------------+
| 1 | apple | 5 |
| 2 | orange | 9 |
+-------+-----------+------------------+
The photo rank keeps changing on a regular basis, and this is the table that tracks it:
Ranks:
+-------+-----------+----------+-------------+
| id | photo_id | ranks | timestamp |
+-------+-----------+----------+-------------+
| 1 | 1 | 8 | * |
| 2 | 2 | 2 | * |
| 3 | 1 | 3 | * |
| 4 | 1 | 7 | * |
| 5 | 1 | 5 | * |
| 6 | 2 | 9 | * |
+-------+-----------+----------+-------------+ * = current timestamp
Every rank is tracked for reporting/analysis purpose.
[Edit] Users will have access to the statistics on demand.
I talked to someone who has experience in this field, and he told me that storing ranks like above is the way to go. But I'm not so sure yet.
The problem here is data redundancy. There are going to be tens of thousands of photos. The photo rank changes on a hourly basis (many times- within minutes) for recent photos but less frequently for older photos. At this rate the table will have millions of records within months. And since I do not have experience in working with large databases, this makes me a little nervous.
I thought of this:
Ranks:
+-------+-----------+--------------------+
| id | photo_id | ranks |
+-------+-----------+--------------------+
| 1 | 1 | 8:*,3:*,7:*,5:* |
| 2 | 2 | 2:*,9:* |
+-------+-----------+--------------------+ * = current timestamp
That means some extra code in PHP to split the rank/time (and sorting), but that looks OK to me.
Is this a correct way to optimize the table for performance? What would you recommend?
The first one. Period.
Actually you'll lose much more. A timestamp stored in the int column will occupy only 4 bytes of space.
While the same timestamp stored in the string format will take 10 bytes.
Your first design is correct for a relational database. The redundancy in the key columns is preferable because it gives you a lot more flexibility in how you validate and query the rankings. You can do sorts, counts, averages, etc. in SQL without having to write any PHP code to split your string six ways from Sunday.
It sounds like you would like to use a non-SQL database like CouchDB or MongoDB. These would allow you to store a semi-structured list of rankings right in the record for the photo, and subsequently query the rankings efficiently. With the caveat that you don't really know that the rankings are in the right format, as you do with SQL.
I would stick with your first approach. In the second you will have a lot of data stored in the row, as time goes by it gets more ranks! That is, if a photo gets thousands and thousands of rankings.
The first approach is also more maintainable, that is, if you wish to delete a rank.
I'd think the database 'hit' of over normalistion (querying the ranks table over and over) is nicely avoided by 'caching' the last rank in current_rank. It does not really matter ranks is growing tremendously if it is seldom queried (analyis / reporting you said), never updated but just gets records inserted at the end: even a very light box would have no problem having millions of rows in that table.
You alternative would require lots of updates on different locations on the disk, possibly resulting in degraded performance.
Of course, if you need all the old data, and always by photo_id, you could plan a scheduled run to another table rankings_old, possibly with photo_id, year,month, rankings (including timestamps) when a month is over, so retrieving old data stays easily possible, but there are no updates needed in rankings_old or rankings, only inserts at the end of the table.
And take it from me: millions of records in a pure logging table should be absolutely no problem.
Normalized data or not normalized data. You will find thousands of articles about that. :)
It really depends of your needs.
If you want to build your database only with performance (speed or RAM consumption or...) in mind you should only trust the numbers. To do that you have to profile your queries with the expected data "volume" (You can generate the data with some script you write). To profile your queries, learn how to read the results of the 2 following queries:
EXPLAIN extended...
SHOW STATUS
Then learn what to do to improve the figures (mysql settings, data structure, hardware, etc).
As a starter, I really advise these 2 great articles:
http://www.xaprb.com/blog/2006/10/12/how-to-profile-a-query-in-mysql/
http://ajohnstone.com/archives/mysql-php-performance-optimization-tips/
If you want to build for the academic beauty of the normalization: just follow the books and the general recommandations. :)
Out of the two options - like everyone before me said - it has to be option 1.
What you should really be concerned about are the bottlenecks in the application itself. Are users going to refer to the historical data often, or does it only show up for a few select users? If the answer is that everyone gets to see historical data of the ranks, then option 1 is good enough. If you are not going to refer to the historical ranks that often, then you could create a third "archive" table, and before updating the ranks, you can copy the rows of the original rank table to the archive table. This ensures that the number of rows stays minimal on the main table that is being called.
Remember, if you're updating the rows, and there's 10s of thousands, it might be more fruitful to get the results in your code (PHP/Python/etc), truncate the table and insert the results in rather than updating it row by row, as that would be a potential bottleneck.
You may want to look up sharding as well (horizontal partitioning) - http://en.wikipedia.org/wiki/Shard_%28database_architecture%29
And never forget to index well.
Hope that helped.
You stated the rank is only linked to the image, in which case all you need is table 1 and keep updating the rank in real time. Table 2 just stores unnecessary data. The disadvantage of this approach is that user cant change his vote.
You said the second table is for analysing /statistics, so it actually isn't something that needs to be stored in db. My suggestion is to get rid of the second table and use a logging facility to record rank changes.
Your second design is very dangerous in case you have 1 million votes for a photo. Can PHP handle that?
With the first design you can do all math on the database level which will be returning you a small result set.