Caching a large SQL query - best way of structuring?

Caching a large SQL query - best way of structuring? - php

Let me give an example of my issue. Let's say I have a table called users and a table called payments. To calculate a user's total balance, I'd use a query to get all the payments after a certain date and then cache the result for a while.
However, I was wondering, due to the nature of this, would it be a good idea to have a column in the users table called balance and then when the cache expires, I use a different query to gather the payments but from a shorter time and then add this amount on to whatever is in the balance column?

To calculate a user's total balance,
You can create an additional table that always contains the users current balance. If a new payment is added for the user, that column needs to be updated, too. Do a transaction so adding the payment and updating the total balance is aligned.
If you need this more differentiated, you can next to the user relation, keep a date column representing the interval you need to be able to do calculations for. E.g. the week number or month number to be able to give a view back in the past.
If you need more flexibility, you can after some time compress existing payments into a total value and store that into such a balance table that is user-related and keeps a date column.
You can then UNION it with the table of the payments that are "realtime" for the dates not yet compressed / condensed. Then use an aggregation function to SUM the total balance. This might give you the best of both worlds if you need to keep recent data with more detail you can move out of the data-store after some time, just keeping statistic values.

Generally with these kinds of "pre-calculated" values I find that the most pain free way is to store/update them on save of any model that concerns the data
So in short, update the total balance whenever a new payment is saved. That way you can guarantee that your database and your data will always be in sync
The pre-calculation can be either a mysql trigger or a background task with something like Gearman
But as your own question suggested if you want to do some kind of incremental roll-up of the balance, I would advice going by months or some fixed date range. This would work providing that you have no payment backdating or something like that, where payment could appear for an old month.
Start of the new month, run a payment aggregator, bam, you now only have to sum the monthly tables.
It all really depends on how much data you have to deal with. But again I stress, data consistency is a lot more valuable than speed, you can always buy more servers.

Related

Double Entry Accounting pagination issue

There is a really serious issue about Double Entry Accounting systems with pagination, I think it is common but I still didn't find any solution for my problem yet.
You can use this link to read about the simple Double Entry Accounting systems just like the one I made with Laravel and AngularJS.
In this system, the expected result (for example) is something like this:
ID In Out Balance
1 100.00 0.00 100.00
2 10.00 0.00 110.00
3 0.00 70.00 40.00
4 5.00 0.00 45.00
5 0.00 60.00 -15.00
6 20.00 0.00 5.00
It is very easy to track the balance inside a cumulative function if you were showing all the transactions in one page, the balance in the last transaction is your current balance at the end of the day.
For example, for a specific range of dates $fromDate->$toDate, we do like:
$balanceYesterday = DB::table('journal')->where('date', '<', $fromDate)
->join('transactions','transactions.journal_id','journal.id')->where('transactions.type', "=", 0) /* 0 means the account of the company */
->select(DB::raw('SUM(amount) as total_balance'))
->first()->total_balance;
Now we have balance from yesterday, we depend on it to calculate the balance after that in a cumulative loop until the end of the process, reaching $toDate;
$currentBalance = $currentBalance + $currentTransaction->amount;
$currentTransactionBalance = $currentBalance;
Now the real problem starts when you have a big amount of transactions, and you need to paginate them $journal = $journal->paginate(100);, let's say 100 transactions per page, the system will work as expected for the first page, as we already can calculate the $balanceYesterday and depend on it to calculate the new balance after every transaction to the end of the 100 transactions in the first page.
Next page will have the problem that it doesn't know what was the last balance at the previous transaction in the first page, so it will start again from $balanceYesterday, making the whole table have wrong calculations.
What I did first to fix, was transferring the last transaction amount (in front-end) to the next page as a parameter, and use it as a starting amount to calculate again, and that was the best solution I had as I was using only << PREV and NEXT >> buttons, so it was very easy to fix it like that.
But I lately found out that this workaround will not work if I have a pagination with page numbers, as the user would like to go through pages to explore the journal, now it is impossible to know the last balance at a specific page, and the system will show wrong calculations.
What I am trying to do is finding a way to calculate the balance at a specific transaction, weather it was a credit or debit, I'm looking for a way to know how much the balance was after a specific transaction is done in a specific date, I DON'T WANT TO ADD A NEW BALANCE COLUMN AND SAVE THE BALANCE INSIDE IT, THE USER IS DOING A LOT OF MODIFICATIONS AND EDITS TO THE TRANSACTIONS FROM TIME TO TIME AND THAT WILL BREAK EVERYTHING AS A SMALL AMOUNT MODIFICATION WILL AFFECT ALL THE BALANCES AFTER IT, I CAN NOT depend on IDs of transactions in any method because transactions might have different random dates, so there will be no ordering by ID but there might be ordering by other fields like date or account owner or type or whatever..
I've been scratching my head on this for about 4 months, I searched online and found no solutions, I hope after this long explanation that my problem is clear, and I hope somebody can help me with a solution, please..
Thank you.

I believe the only thing you really need at this point is to calculate the sum of all transactions from the beginning of the paginated data set (all records, not just the current page's) until one before the first record displayed on the current page.
You can get this by finding the number of transactions that occurred between the start of your entire data set and the current page's transactions, retrieving them via LIMIT, and adding them up.
The first thing you'll want to have is the exact constraints of your pagination query. Since we want to grab another subset of paginated records besides the current page, you want to be sure the results of both queries are in the same order. Reusing the query builder object can help (adjust to match your actual pagination query):
$baseQuery = DB::table('journal')
->join('transactions', 'transactions.journal_id', 'journal.id')
->where('date', '>', $fromDate)
->where('date', '<', $toDate)
->where('transactions.type', "=", 0)
->orderBy('date', 'asc');
// Note that we aren't fetching anything here yet.
Then, fetch the paginated result set. This will perform two queries: one for the total count of records, and a second for the specific page's transactions.
$paginatedTransactions = $baseQuery->paginate(100);
From here, we can determine what records we need to retrieve the previous balance for. The pagination object returned is an instance of LengthAwarePaginator, which knows how many records in total, the number of pages, what current page its on, etc.
Using that information, we just do some math to grab the number of records we need:
total records needed = (current page - 1) * records per page
Assuming the user is on page 5, they will see records 401 - 500, so we need to retrieve the previous 400 records.
// If we're on Page 1, or there are not enough records to
// paginate, we don't need to calculate anything.
if ($paginatedTransactions->onFirstPage() || ! $paginatedTransactions->hasPages()) {
// Don't need to calculate a previous balance. Exit early here!
}
// Use helper methods from the Paginator to calculate
// the number of previous transactions.
$limit = ($paginatedTransactions->currentPage() - 1) * $paginatedTransactions->perPage();
Now that we have the number of transactions that occurred within our data set but before the current page, we can retrieve and calculate the sum by again utilizing the base query:
$previousBalance = $baseQuery->limit($limit)->sum('amount');
Adding a highlight here to explain that using your database to perform the SUM calculations will be a big performance benefit, rather than doing it in a loop in PHP. Take advantage of the DB as often as you can!
Add this balance to your original "yesterday" balance, and you should have an accurate beginning balance for the paginated transactions.
Note: everything pseudo-coded from theory, may need adjustments. Happy to revise if there are questions or issues.

You should be able to formulate a truth statement for the balance for each record as long as you can tell what the order is to calculate the sum for the balance at each point within that ordered list.
For sure this come with a massive overhead as you need to query the whole table for each record you display, but first of all one must be able to do that. As you've shown in the example, you are as long as you do not paginate.
What you could do for pagination is to pre-calculate the balance for each record and store it in relation to the original record. This would de-normalize your data but with the benefit that creating the pagination is rather straight forward.

MySql: saving date ranges VS saving single day

I am currently working on a simple booking system and I need to select some ranges and save them to a mysql database.
The problem I am facing is deciding if it's better to save a range, or to save each day separately.
There will be around 500 properties, and each will have from 2 to 5 months booked.
So the client will insert his property and will chose some dates that will be unavailable. The same will happen when someone books a property.
I was thinking of having a separate table for unavailable dates only, so if a property is booked from 10 may to 20 may, instead of having one record (2016-06-10 => 2016-06-20) I will have 10 records, one for each booked day.
I think this is easier to work with when searching between dates, but I am not sure.
Will the performance be noticeable worse ?
Should I save the ranges or single days ?
Thank you

I would advise that all "events" go into one table and they all have a start and end datetime. Use of indexes on these fields is of course recommended.
The reasons are that when you are looking for bookings and available events - you are not selecting from two different tables (or joining them). And storing a full range is much better for the code as you can easily perform the checks within a SQL query and all php code to handle events works as standard for both. If you only store one event type differently to another you'll find loads of "if's" in your code and find it harder to write the SQL.
I run many booking systems at present and have made mistakes in this area before so I know this is good advice - and also a good question.

This is too much for a comment,So I will leave this as an answer
So the table's primary key would be the property_id and the Date of a particular month.
I don't recommend it.Because think of a scenario when u going to apply this logic to 5 or 10 years system,the performance will be worse.You will get approximately 30*12*1= 360 raws for 1 year.Implement a logic to calculate the duration of a booking and add it to table against the user.

Setting up database dates with ORMs

I have been thinking/researching the best way to handle my entity dates in conjunction with an ORM tool. Currently I am using Doctrine2(php 5.3) with a MySQL driver(if anyone needs to know).
So my situation is as follows: I have a system that tracks WorkOrders and their Invoices from collaborating subcontractors. So a WorkOrder may have numerous invoices submitted by same/different subcontractors which will aggregate for a given pay period. This amount, is paid to the subcontractor. My question is what is the best way to handle fetching invoices that fall into a specific pay period/ or any date range for that matter? As an example I have a table which displays the totals for each subcontractor for each week in a year, but I also display totals for a month etc.. In addition I have a calendar view which displays the same invoices aggregated by day and week.
Currently I pass a date range(fromDate/thruDate) along with a class which is configured to iterate the result set and compose collections based on different criteria such as unit of time to aggregate results and a calculator to handle the totaling of the invoices based on user role and/or invoice type. This way seems to be very flexible so far, however I am concerned with the performance impact of fetching say 10,000 invoices, having doctrine hydrate the objects, me iterating the result set, and then iterating again in my view to display. I am thinking I could eliminate one step of me iterating the result step by looking into a custom hydrator.
I have also been thinking about setting up a entity with each date from the 'date of origin' of the system to a relevant current/future date with relationships to weeks/months/quarters/years which would save me the hassle of forming my own collections from the result set. This method seems like it would be nice especially since when I pass a date range to fetch invoices to display on a calendar, I have to find and pass fromDates and thruDate which more often than not extend into previous and future months because of how the weeks are totaled. I am beginning to lean more toward this approach but I have a feeling when I begin to implement it that I will begin to run into problems.
So enough rambling on for now, I'll just ask. Can anyone give me any pointers/tips/lessons learned/ reading material/ etc... on this subject.
Thanks for your time.

One idea may be to hydrate as an array when displaying the data and only hydrate into objects when you need to work with an individual invoice.
Another approach may be to limit the number of entities returned into a paginated list to ensure you have a known maximum number of objects being returned.
Hope that helps

How to calculate percentile rank for point totals over different time spans?

On a PHP & CodeIgniter-based web site, users can earn reputation for various actions, not unlike Stack Overflow. Every time reputation is awarded, a new entry is created in a MySQL table with the user_id, action being rewarded, and value of that bunch of points (e.g. 10 reputation). At the same time, a field in a users table, reputation_total, is updated.
Since all this is sort of meaningless without a frame of reference, I want to show users their percentile rank among all users. For total reputation, that seems easy enough. Let's say my user_id is 1138. Just count the number of users in the users table with a reputation_total less than mine, count the total number of users, and divide to find the percentage of users with a lower reputation than mine. That'll be user 1138's percentile rank, right? Easy!
But I'm also displaying reputation totals over different time spans--e.g., earned in the past seven days, which involves querying the reputation table and summing all my points earned since a given date. I'd also like to show percentile rank for the different time spans--e.g., I may be 11th percentile overall, but 50th percentile this month and 97th percentile today.
It seems I would have to go through and find the reputation totals of all users for the given time span, and then see where I fall within that group, no? Is that not awfully cumbersome? What's the best way to do this?
Many thanks.

I can think of a few options off the top of my head here:
As you mentioned, total up the reputation points earned during the time range and calculate the percentile ranks based on that.
Track updates to reputation_total on a daily basis - so you have a table with user_id, date, reputation_total.
Add some new columns to the user table (reputation_total, reputation_total_today, reputation_total_last30days, etc) for each time range. You could also normalize this into a separate table (reputation_totals) to prevent you from having to add a new column for each time span you want to track.
Option #1 is the easiest, but it's probably going to get slow if you have lots of rows in your reputation transaction table - it won't scale very well, especially if you need to calculate these in real time.
Option #2 is going to require more storage over time (one row per user per day) but would probably be significantly faster than querying the transaction table directly.
Option #3 is less flexible, but would likely be the fastest option.
Both options 2 & 3 would likely require a batch process to calculate the totals on a daily basis, so that's something to consider as well.
I don't think any option is necessarily the best - they all involve different tradeoffs of speed/storage space/complexity/flexibility. What you do will ultimately depend on the requirements for your application of course.

I don't see why that would be too overly complex. Generally all you would need is to add to your WHERE clause a query that limits results like:
WHERE DatePosted between #StartOfRange and #EndOfRange

Banner Impressions Tracking - Database Design

Looking for some good advice on db design for tracking multiple banner impressions.
IE I have 5 Banners over x Domains
I would like to build data on each banner on how many impressions per day per banner etc. So also be able to do lookups for other date ranges.
Would it be best to have a date per day per row or just track each impression per row.
Hope you can advise.
And thanks in Advance

I'd recommend to create the most flexible design that would allow you to create new reports as requirements extend in the future. You suggest that the customer wants reports on "impressions per day". What if they come in later and say "what time of the day are impressions shown most at"? How about "when are they clicked on most"?
So the most flexible way to do this is to have 1 record for each impression, where each record is just
banner_id
timestamp
Later on, you can create a stored procedure that aggregates historical data and thus purges HUGE amounts of data that you have accumulated - thus, creating reports on the level of granularity that you care about. I can imagine storing hourly data for a month, and daily data for a year. The stored procs would just write to an archive table:
Banner ID
Time interval identifier (of the month/year for monthly data, or day/month/year for daily data, etc)
Number of impressions

Why reinvent the wheel? There are plenty of free ad servers. The most notable one I've heard of is OpenX (used to be phpAdsNew). If nothing else, you can install it and see how they set up their DB.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.