I am developing a personal finance tracker (for fun!) and I have a table of categories. Each category is an entry in the table and at the end of the month they are all duplicated with their relevant balances reset to the start of the month reading for the new month.
Among others, these categories can be of type 'savings' and so have a running total. If I want to retrieve a category or update it then I used the category_id field and this works fine for the current working month but linking months together is breaking my brain. For the savings categories I want to show how the running_total has increased over the previous six months but in my current DB design, categories don't "know" about their previous months as they are created new at the start of each month.
The only way I could currently retrieve the last 6 months of a savings running_total is to search by the category name but this is potentially unreliable.
I have considered adding a field to the table which is "previous_month_category_id" which would work as a way to link the categories together but would be expensive to implement as it would require 6 MSQL operations each time grabbing the "previous_month_category_id" from the result and then re running the query.
If MYSQL can do some kind of recursion then maybe this could work but I feel like there is a more obvious answer staring me in the face.
I'm using Codeigniter and MYSQL but not scared of vanilla PHP if required.
Help on how to do this would be great.
UPDATE 1:
Below is a sample from what the savings category might look like mixed in amongst other categories. At the end of each month the entry is duplicated with the same category_name, type, buget, year, and users_id but the category_id auto increments, the month updates to the new month number and the running total is the previous running_total + the budget. How would I do one database query to retrieve these without using the category_name? As this could change is the user decided to caller it "Bigger TV" at the end of July
+-------------+--------------+------+--------+---------------+------+-------+----------+
| category_id |category_name | type | budget | running_total | year | month | users_id |
+-------------+--------------+------+--------+---------------+------+-------+----------+
| 44 | Big TV | sav | 20 | 240 | 2012 | 8 | 77 |
+-------------+--------------+------+--------+---------------+------+-------+----------+
| 32 | Big TV | sav | 20 | 220 | 2012 | 7 | 77 |
+-------------+--------------+------+--------+---------------+------+-------+----------+
| 24 | Big TV | sav | 20 | 200 | 2012 | 6 | 77 |
UPDATE 2:
I'm not sure I'm explaining myself very well So I'll put some more detail around how the app works and see if that helps.
I have tables called "categories", "transactions" and "users". A category can be one of three types, 1: Cash, 2: Regular Payment, 3: Savings. Think of cash and regular payment types as buckets, at the start of each month each bucket is full and the aim is to take money out of it and make sure there is still a bit left at the end of the month (or at least not negative).
This is fine on a month by month basis and works very well (for me, I have used this system for 2 years now I think). The trip up comes with Savings as they are linked month by month and are more like a big bucket that is added to each month (with a set increment called budget) until it overspills and is then drained (like Big TV would be when you buy it), or taken from a little bit here and there and the aim is to build up an emergency fund (like "When my car breaks down" type thing).
When the relevant information is displayed for each category only the current month is shown for cash and regular as that is all that is important, for the savings however the current amount is also shown but it would be nice to show a small history graph of how it had built up (or depleted) over time. To do this I need some way of searching for the previous end of month states of these categories so that the graph can be plotted but currently I can't work out how to link them all by anything other than the category_name.
I have tried to implement a bit of DB normalisation but this is the first schema I've implemented having known about normalisation so I've probably missed some aspects of it and possibly avoided any over normalisation where it didn't feel right.
Below are my tables:
categories
+-------------+--------------+------+--------+---------------+------+-------+----------+
| category_id |category_name | type | budget | running_total | year | month | users_id |
+-------------+--------------+------+--------+---------------+------+-------+----------+
transactions
+----------------+--------------+--------+------+----------+------------------------+
| transaction_id | description | amount | date | users_id | categories_category_id |
+----------------+--------------+--------+------+----------+------+-------+---------+
they are joined on categories_category_id which is a foreign key
I have always worked off the premise that each category needs an new entry for each month but it seems from the comments and answers below that I would be better off having just one category entry regardless of month and then just calculating everything on the fly?
Although, the budgets can be changed by the user and so for record keeping I'm not sure if this would work also the "deposits" never really happen it is just the category being duplicated at the end of the month so I guess that would need to dealt with.....
The aim of this app has always been to decouple financial tracking from the physical transaction that occur in a bank account and provide a layer over someones finances thus allowing the user to avoid hard to explain transactions etc and just focus on over all cash position. There is no concept of an "income" in this system, or a bank account.
It seems to me like your database design could use some work. I'm still not completely familiar with what you're really trying to do, but my initial thoughts would be to store each transaction as a single row in a table, and then query that table in different ways to generate different types of reports on it. Something like this:
transactions:
+----+---------+--------+---------------+-----------+-------------+
| id | user_id | amount | running_total | datestamp | category_id |
+----+---------+--------+---------------+-----------+-------------+
categories:
+----+------+------+
| id | name | type |
+----+------+------+
Don't increment the categories based on time. Add an entry to the categories table when you actually have a new category. If a transaction could possibly belong to multiple categories, then use a third (relational) table that relates transactions (based on transaction ID) to categories (based on category ID).
When you have a deposit, the amount field will be positive and for withdrawals, it will be negative. You can get your current running total by doing something like:
SELECT running_total FROM transactions
WHERE id = (SELECT MAX(id) FROM transactions WHERE user_id = '$userID');
You can find your total difference for a particular month by doing this:
SELECT SUM(amount) FROM transactions WHERE DATE('%c', datestamp) = '$monthNumber';
You can find the total spending for a particular category by doing this:
SELECT SUM(t.amount) FROM transactions t
INNER JOIN categories c ON t.category_id = c.id WHERE c.name = 'Big TV';
There are plenty of other possibilities, but the purpose here is just to demonstrate a possibly better way to store your data.
Related
Big problem...
I'm implementing an online ticket sale system in PHP and MySQL. I have a table called "block_of_tickets", or something like that...
This table looks like:
+-----------+------------+--------------+--------------+--------------+
| idblock | block_name | total_tickets| block_gender | idblock_pair |
+-----------+------------+--------------+--------------+--------------+
| 1 | Block 1- M | 100 | MALE | 2 |
+-----------+------------+--------------+--------------+--------------+
| 2 | Block 1- F | 100 | FEMALE | 1 |
+-----------+------------+--------------+--------------+--------------+
Where:
idblock: The id (primary key) of the block of tickets.
block_name: The name of the block. In the example I have a "Block 1- M" and "Block 1- F" to represente the "Block 1 - Male" and "Block 1 - Female", respectively.
total_tickets: the total of available tickets
block_gender: the gender of the block of tickets
idblock_pair: the block wich is pair of the current block.
Note: There are also other columns, like "price", etc.
Here is the (big) problem:
When there is a "idblock_pair", it means that both block of tickets will share the same total_tickets (available tickets), so both cells must have exactly the same value in this case. As you can see in the example above, block 1 points to block 2 and vice-versa.
Lots of people buy lots of tickets in (almost) the same time, wich means that each sold ticket must decrement 1 in the "total_tickets" field, for both cells.
Database Normalization can solve this. However, it would lose a lot in performance.
I'm almost sure that I should use "SELECT... FOR UPDATE"... but I don't know how, since it's the same table, and a "deadlock" can occur...
How to solve this problem? Do I have to use Triggers? Proccedures? Do I have to use the PHP processing (and transactions) to solve this?
In the example below, one ticket were sold, and now I'm decrementing the total_tickets by 1:
START TRANSACTION;
SELECT *
FROM block_of_tickets
WHERE idblock in (1,2) FOR UPDATE;
UPDATE block_of_tickets
SET total_tickets = (total_tickets - 1)
WHERE idblock in (1,2);
COMMIT;
Is this a nice solution?
I have a page that shows restaurant profile data, and one of the data shown is the total checkin count of users to the restaurant
I have a mysql table like: user_checkins which stores the checkin of users into restaurants like:
id | user_id | res_id | checkin_date |
1 | 102 | 5526 | 2016-04-21 03:20:21 |
2 | 165 | 5574 | 2016-04-21 06:35:21 |
3 | 102 | 4565 | 2016-04-24 02:15:30 |
and another table res_checkin_count:
id | res_id | total_checkin_count |
1 | 5526 | 1055 |
after a while many rows will be created in user_checkins, because people checkin frequently
Question : Should I delete the older rows? like create a cronjob that deletes old rows periodically(like daily) for reach restaurant and update the restaurant total_checkin_count number in another mysql TABLE storing only the total_checkin_count of each restaurant? will this consume alot of memory?
or
I keep the rows and let it accumulate and use SELECT COUNT(*) all to get each restaurant total_checkin_count?
EDIT: the user_checkins table actually stores all user checkin for various restaurants, everytime someone visits a 'restaurant_profile' webpage, the SELECT COUNT(*) query will run on the user_checkins table for res_id x, to get the total checkin count of that restaurant, is that redundant?
When you say many rows you need to assess if many is beyond the capabilities of MySQL. In general MySQL should easily be able to handle in the order of 100 million rows per table. Are you expecting to exceed over 100 million rows any time soon? If not, then leave your data alone, it reduces the complexity that would come with an archiving system.
If on the other hand you are expecting more than hundreds of millions of rows on tables, then yes, running a daily job to delete or archive your data can be helpful in keeping your database running well.
seems to me those tables are in MySQL, however I'll just get rid of res_checkin_count is a duplicate of an aggregate function that is the COUNT so you are wasting memory, so there can only be 2 scenarios:
1 your user_checkins table does not have more than 2 million records and you create nuncluster index for Column res_id and will be fine.
2 You have a Monstrous website where you store more than 2 million active records and you create tables per State or per brick (3 to 5 zipcodes) that way you will have distributed records most likely people form TX will search and Query restaurants from TX and so on.
How would things like customer reviews be stored in a database? I cant imagine there would be rows for each item and columns for each review as one product may have 2 reviews and another may have 100+ - id presume they were stored in a separate file for reviews but then surely not one file per item! I dont know enough about storing data to be able to figure this one out by myself!
A similar situation is something like an online calendar - there is all the information about each appointment (time, duration, location, etc) and there can be many of these on each day, every day, for all users! A logical way would be to have a table for each user with all their appointments in, but at the same time that seems illogical because if you have 1000+ users, thats alot of tables!
Basically Id like to know what the common/best practice way is of storing this 'big dynamic data'.
Customer reviews can easily be stored by using two tables in one-to-many relationship.
Suppose you have a table containing products/articles/whatever worth reviewing. Each of them has an unique ID and other attributes.
Table "products"
+-------------------------------------+
| id | name | attribute1 | attribute2 |
+-------------------------------------+
Then you make another table, with its name indicating what it's about. It should contain at least an unique ID and a column for the IDs from the other table. Let's say it will also have an email of the user who submitted the review and (obviously) the review text itself:
Table "products_reviews"
+--------------------------------------------+
| id | product_id | user_email | review_text |
+--------------------------------------------+
So far, so good. Let's assume you're selling apples.
Table "products"
+-------------------------------+
| 1 | 'Apple' | 'green' | '30$' |
+-------------------------------+
Then, two customers come, each one buys one apple worth 30$ and likes it, so they both leave a review.
Table "products_reviews"
+-------------------------------------------------------------------------------+
| 1 | 2 | alice#mail.com | 'I really like these green apples, they are awesome' |
| 2 | 2 | bob#mail.com | 'These apples rock!' |
+-------------------------------------------------------------------------------+
So now all you have to do is to fetch all the reviews for your apples and be happy about how much your customers like them:
SELECT *
FROM products_reviews
INNER JOIN products ON products_reviews.product_id = products.id
WHERE products.name = 'Apple';
You can now display them under the shopping page for apples (just don't mention they cost 30$).
The same principle applies for things like an online calendar. You have one table with users, and many tables with other stuff - appointments, meetings, etc. which relate to that user.
Keep in mind, however, that things like meetings are better displayed in a many-to-many table, since they are shared by many people (usually). Here's a link that visualizes it very good, and here's a question here on SO with sample code for PHP. Go ahead and test it for yourself.
Cheers :)
I'm developing a sports court booking system and I need to generate a "booking table" that shows the columns in the table header as courts and the rows as time slots for bookings.
E.g.,
___________________________________
| | | |
| Court 1 | Court 2 | Court 3 |
|___________|___________|___________|
| | | |
| 10.00 am | 10.00 am | 10.00 am |
|___________|___________|___________|
| | | |
| 11.00 am | 11.00 am | 11.00 am |
|___________|___________|___________|
Requirements:
A club can have any number of courts
A club can have any time increment for bookings (e.g., 1 hour as shown above, 30 minutes, 40 minutes, etc)
Each cell in the table represents a "booking"
I want to make sure I do this right from the start so I have a few questions:
What entities would you create to achieve this
How would you go about generating this booking table
How would you link a cell in the above table to a booking
Thanks in advance.
Well, I think this is kind of standard?
First, you need a club entity. Each club can have n courts:
Club 1:n Court
Then there is a booking table, which is 1:n to a court:
Court 1:n Booking
I don't know if your second requirement means that one club has one time increment (in which case this is one variable on the club entity) or if it can has many (than there would be a TimeIncrement entity.
Generating the table can be a bit tricky. Thinking about it for a few minutes I got like 5-6 solutions which might work. You could use special objects which you can ask for the booking for a specific court and time and which search a Collection. Our you could build up an array where you have one key for every time and if there is no booking it's null. Have one array for each court, than do 2 nested for loops and read every value from the arrays. You could build up queries which rearrange the data so you can use them directly. Or maybe you can ask the court object itself for the booking on a specific date and time.
But I guess that is what the developer is for... Find out what works best for the given requirements and implement it.
What entities would you create to achieve this
Off the top of my head it looks like you'll need 3: Club, Court, Booking
How would you go about generating this booking table
The table should probably consist of id, court_id, start_time, end_time
How would you link a cell in the above table to a booking
As mentioned above, start/end times are columns in the bookings table.
I would just query the data from the database and turn it into json and pass it into the website. The frontend then can build the table with javascript.
For that I would create a custom entity BookingTable that returns data on request directly as an array which then can be easily turned into json with json_encode.
You can then concentrate on the more detailed pages that show the single booking for which you will automatically create the entities you need (if you didn't already to formulate the DQL for the custom entity for the table).
This is for an upcoming project. I have two tables - first one keeps tracks of photos, and the second one keeps track of the photo's rank
Photos:
+-------+-----------+------------------+
| id | photo | current_rank |
+-------+-----------+------------------+
| 1 | apple | 5 |
| 2 | orange | 9 |
+-------+-----------+------------------+
The photo rank keeps changing on a regular basis, and this is the table that tracks it:
Ranks:
+-------+-----------+----------+-------------+
| id | photo_id | ranks | timestamp |
+-------+-----------+----------+-------------+
| 1 | 1 | 8 | * |
| 2 | 2 | 2 | * |
| 3 | 1 | 3 | * |
| 4 | 1 | 7 | * |
| 5 | 1 | 5 | * |
| 6 | 2 | 9 | * |
+-------+-----------+----------+-------------+ * = current timestamp
Every rank is tracked for reporting/analysis purpose.
[Edit] Users will have access to the statistics on demand.
I talked to someone who has experience in this field, and he told me that storing ranks like above is the way to go. But I'm not so sure yet.
The problem here is data redundancy. There are going to be tens of thousands of photos. The photo rank changes on a hourly basis (many times- within minutes) for recent photos but less frequently for older photos. At this rate the table will have millions of records within months. And since I do not have experience in working with large databases, this makes me a little nervous.
I thought of this:
Ranks:
+-------+-----------+--------------------+
| id | photo_id | ranks |
+-------+-----------+--------------------+
| 1 | 1 | 8:*,3:*,7:*,5:* |
| 2 | 2 | 2:*,9:* |
+-------+-----------+--------------------+ * = current timestamp
That means some extra code in PHP to split the rank/time (and sorting), but that looks OK to me.
Is this a correct way to optimize the table for performance? What would you recommend?
The first one. Period.
Actually you'll lose much more. A timestamp stored in the int column will occupy only 4 bytes of space.
While the same timestamp stored in the string format will take 10 bytes.
Your first design is correct for a relational database. The redundancy in the key columns is preferable because it gives you a lot more flexibility in how you validate and query the rankings. You can do sorts, counts, averages, etc. in SQL without having to write any PHP code to split your string six ways from Sunday.
It sounds like you would like to use a non-SQL database like CouchDB or MongoDB. These would allow you to store a semi-structured list of rankings right in the record for the photo, and subsequently query the rankings efficiently. With the caveat that you don't really know that the rankings are in the right format, as you do with SQL.
I would stick with your first approach. In the second you will have a lot of data stored in the row, as time goes by it gets more ranks! That is, if a photo gets thousands and thousands of rankings.
The first approach is also more maintainable, that is, if you wish to delete a rank.
I'd think the database 'hit' of over normalistion (querying the ranks table over and over) is nicely avoided by 'caching' the last rank in current_rank. It does not really matter ranks is growing tremendously if it is seldom queried (analyis / reporting you said), never updated but just gets records inserted at the end: even a very light box would have no problem having millions of rows in that table.
You alternative would require lots of updates on different locations on the disk, possibly resulting in degraded performance.
Of course, if you need all the old data, and always by photo_id, you could plan a scheduled run to another table rankings_old, possibly with photo_id, year,month, rankings (including timestamps) when a month is over, so retrieving old data stays easily possible, but there are no updates needed in rankings_old or rankings, only inserts at the end of the table.
And take it from me: millions of records in a pure logging table should be absolutely no problem.
Normalized data or not normalized data. You will find thousands of articles about that. :)
It really depends of your needs.
If you want to build your database only with performance (speed or RAM consumption or...) in mind you should only trust the numbers. To do that you have to profile your queries with the expected data "volume" (You can generate the data with some script you write). To profile your queries, learn how to read the results of the 2 following queries:
EXPLAIN extended...
SHOW STATUS
Then learn what to do to improve the figures (mysql settings, data structure, hardware, etc).
As a starter, I really advise these 2 great articles:
http://www.xaprb.com/blog/2006/10/12/how-to-profile-a-query-in-mysql/
http://ajohnstone.com/archives/mysql-php-performance-optimization-tips/
If you want to build for the academic beauty of the normalization: just follow the books and the general recommandations. :)
Out of the two options - like everyone before me said - it has to be option 1.
What you should really be concerned about are the bottlenecks in the application itself. Are users going to refer to the historical data often, or does it only show up for a few select users? If the answer is that everyone gets to see historical data of the ranks, then option 1 is good enough. If you are not going to refer to the historical ranks that often, then you could create a third "archive" table, and before updating the ranks, you can copy the rows of the original rank table to the archive table. This ensures that the number of rows stays minimal on the main table that is being called.
Remember, if you're updating the rows, and there's 10s of thousands, it might be more fruitful to get the results in your code (PHP/Python/etc), truncate the table and insert the results in rather than updating it row by row, as that would be a potential bottleneck.
You may want to look up sharding as well (horizontal partitioning) - http://en.wikipedia.org/wiki/Shard_%28database_architecture%29
And never forget to index well.
Hope that helped.
You stated the rank is only linked to the image, in which case all you need is table 1 and keep updating the rank in real time. Table 2 just stores unnecessary data. The disadvantage of this approach is that user cant change his vote.
You said the second table is for analysing /statistics, so it actually isn't something that needs to be stored in db. My suggestion is to get rid of the second table and use a logging facility to record rank changes.
Your second design is very dangerous in case you have 1 million votes for a photo. Can PHP handle that?
With the first design you can do all math on the database level which will be returning you a small result set.