Double Entry Accounting pagination issue - php

There is a really serious issue about Double Entry Accounting systems with pagination, I think it is common but I still didn't find any solution for my problem yet.
You can use this link to read about the simple Double Entry Accounting systems just like the one I made with Laravel and AngularJS.
In this system, the expected result (for example) is something like this:
ID In Out Balance
1 100.00 0.00 100.00
2 10.00 0.00 110.00
3 0.00 70.00 40.00
4 5.00 0.00 45.00
5 0.00 60.00 -15.00
6 20.00 0.00 5.00
It is very easy to track the balance inside a cumulative function if you were showing all the transactions in one page, the balance in the last transaction is your current balance at the end of the day.
For example, for a specific range of dates $fromDate->$toDate, we do like:
$balanceYesterday = DB::table('journal')->where('date', '<', $fromDate)
->join('transactions','transactions.journal_id','journal.id')->where('transactions.type', "=", 0) /* 0 means the account of the company */
->select(DB::raw('SUM(amount) as total_balance'))
->first()->total_balance;
Now we have balance from yesterday, we depend on it to calculate the balance after that in a cumulative loop until the end of the process, reaching $toDate;
$currentBalance = $currentBalance + $currentTransaction->amount;
$currentTransactionBalance = $currentBalance;
Now the real problem starts when you have a big amount of transactions, and you need to paginate them $journal = $journal->paginate(100);, let's say 100 transactions per page, the system will work as expected for the first page, as we already can calculate the $balanceYesterday and depend on it to calculate the new balance after every transaction to the end of the 100 transactions in the first page.
Next page will have the problem that it doesn't know what was the last balance at the previous transaction in the first page, so it will start again from $balanceYesterday, making the whole table have wrong calculations.
What I did first to fix, was transferring the last transaction amount (in front-end) to the next page as a parameter, and use it as a starting amount to calculate again, and that was the best solution I had as I was using only << PREV and NEXT >> buttons, so it was very easy to fix it like that.
But I lately found out that this workaround will not work if I have a pagination with page numbers, as the user would like to go through pages to explore the journal, now it is impossible to know the last balance at a specific page, and the system will show wrong calculations.
What I am trying to do is finding a way to calculate the balance at a specific transaction, weather it was a credit or debit, I'm looking for a way to know how much the balance was after a specific transaction is done in a specific date, I DON'T WANT TO ADD A NEW BALANCE COLUMN AND SAVE THE BALANCE INSIDE IT, THE USER IS DOING A LOT OF MODIFICATIONS AND EDITS TO THE TRANSACTIONS FROM TIME TO TIME AND THAT WILL BREAK EVERYTHING AS A SMALL AMOUNT MODIFICATION WILL AFFECT ALL THE BALANCES AFTER IT, I CAN NOT depend on IDs of transactions in any method because transactions might have different random dates, so there will be no ordering by ID but there might be ordering by other fields like date or account owner or type or whatever..
I've been scratching my head on this for about 4 months, I searched online and found no solutions, I hope after this long explanation that my problem is clear, and I hope somebody can help me with a solution, please..
Thank you.

I believe the only thing you really need at this point is to calculate the sum of all transactions from the beginning of the paginated data set (all records, not just the current page's) until one before the first record displayed on the current page.
You can get this by finding the number of transactions that occurred between the start of your entire data set and the current page's transactions, retrieving them via LIMIT, and adding them up.
The first thing you'll want to have is the exact constraints of your pagination query. Since we want to grab another subset of paginated records besides the current page, you want to be sure the results of both queries are in the same order. Reusing the query builder object can help (adjust to match your actual pagination query):
$baseQuery = DB::table('journal')
->join('transactions', 'transactions.journal_id', 'journal.id')
->where('date', '>', $fromDate)
->where('date', '<', $toDate)
->where('transactions.type', "=", 0)
->orderBy('date', 'asc');
// Note that we aren't fetching anything here yet.
Then, fetch the paginated result set. This will perform two queries: one for the total count of records, and a second for the specific page's transactions.
$paginatedTransactions = $baseQuery->paginate(100);
From here, we can determine what records we need to retrieve the previous balance for. The pagination object returned is an instance of LengthAwarePaginator, which knows how many records in total, the number of pages, what current page its on, etc.
Using that information, we just do some math to grab the number of records we need:
total records needed = (current page - 1) * records per page
Assuming the user is on page 5, they will see records 401 - 500, so we need to retrieve the previous 400 records.
// If we're on Page 1, or there are not enough records to
// paginate, we don't need to calculate anything.
if ($paginatedTransactions->onFirstPage() || ! $paginatedTransactions->hasPages()) {
// Don't need to calculate a previous balance. Exit early here!
}
// Use helper methods from the Paginator to calculate
// the number of previous transactions.
$limit = ($paginatedTransactions->currentPage() - 1) * $paginatedTransactions->perPage();
Now that we have the number of transactions that occurred within our data set but before the current page, we can retrieve and calculate the sum by again utilizing the base query:
$previousBalance = $baseQuery->limit($limit)->sum('amount');
Adding a highlight here to explain that using your database to perform the SUM calculations will be a big performance benefit, rather than doing it in a loop in PHP. Take advantage of the DB as often as you can!
Add this balance to your original "yesterday" balance, and you should have an accurate beginning balance for the paginated transactions.
Note: everything pseudo-coded from theory, may need adjustments. Happy to revise if there are questions or issues.

You should be able to formulate a truth statement for the balance for each record as long as you can tell what the order is to calculate the sum for the balance at each point within that ordered list.
For sure this come with a massive overhead as you need to query the whole table for each record you display, but first of all one must be able to do that. As you've shown in the example, you are as long as you do not paginate.
What you could do for pagination is to pre-calculate the balance for each record and store it in relation to the original record. This would de-normalize your data but with the benefit that creating the pagination is rather straight forward.

Related

Optimizing a database-driven table display

I'm in the process of revising a PHP page that displays all of our items with various statistics regarding each one. We're looking at a period running from the first of the month one year ago up to yesterday. I've managed to get a basic version of the script working, but it performs poorly. The initial implementation of this script (not my revision) retrieved all sales records and item information at once, then used the resulting records to create objects (not mysql_fetch_objects). These were then stored in an array that used hard-coded values to access the object's attributes. The way this is all set up is fairly confusing and doesn't easily lend itself to reusability. It is, however, significantly faster than my implementation since it only calls to the database once or twice.
My implementation utilizes three calls. The first obtains basic report information needed to create DateTime objects for the report's range (spanning the first of the month twelve months ago up to yesterday's date). This is, however, all it's used for and I really don't think I even need to make this call. The second retrieves all basic information for items included in the report. This comes out to 854 records. The last select statement retrieves all the sales information for these items, and last I checked, this returned over 6000 records.
What I tried to do was select only the records pertinent to the current item in the loop, represented by the following.
foreach($allItems as $item){
//Display properties of item here
//db_controller is a special class used to handle DB manipulation
$query = "SELECT * FROM sales_records WHERE ...";
$result = $db_controller->select($query);
//Use sales records to calculate report values
}
This is the problem. Calling to the database for each and every item is extremely time-consuming and drastically impacts performance. What's returned is simply the sums of quantities sold in each month within the timeframe specified earlier in the script, along with the resulting sales amounts. At maximum, each item will only have 13 sales records, ranging from 2015/1 to 2016/1 for example. However, I'm not sure if performing a single fetch for all these sales records before the loop will help performance, the reason being that I would then have to search through the result array for the first instance of a sales record pertaining to the current item. What can I do to alleviate this issue? Since this is a script important to the company's operations, I want to be sure that performance on my script is just as quick as the old script or, at the very least, only slightly slower than the old script. My results are accurate but just painfully slow.

Caching a large SQL query - best way of structuring?

Let me give an example of my issue. Let's say I have a table called users and a table called payments. To calculate a user's total balance, I'd use a query to get all the payments after a certain date and then cache the result for a while.
However, I was wondering, due to the nature of this, would it be a good idea to have a column in the users table called balance and then when the cache expires, I use a different query to gather the payments but from a shorter time and then add this amount on to whatever is in the balance column?
To calculate a user's total balance,
You can create an additional table that always contains the users current balance. If a new payment is added for the user, that column needs to be updated, too. Do a transaction so adding the payment and updating the total balance is aligned.
If you need this more differentiated, you can next to the user relation, keep a date column representing the interval you need to be able to do calculations for. E.g. the week number or month number to be able to give a view back in the past.
If you need more flexibility, you can after some time compress existing payments into a total value and store that into such a balance table that is user-related and keeps a date column.
You can then UNION it with the table of the payments that are "realtime" for the dates not yet compressed / condensed. Then use an aggregation function to SUM the total balance. This might give you the best of both worlds if you need to keep recent data with more detail you can move out of the data-store after some time, just keeping statistic values.
Generally with these kinds of "pre-calculated" values I find that the most pain free way is to store/update them on save of any model that concerns the data
So in short, update the total balance whenever a new payment is saved. That way you can guarantee that your database and your data will always be in sync
The pre-calculation can be either a mysql trigger or a background task with something like Gearman
But as your own question suggested if you want to do some kind of incremental roll-up of the balance, I would advice going by months or some fixed date range. This would work providing that you have no payment backdating or something like that, where payment could appear for an old month.
Start of the new month, run a payment aggregator, bam, you now only have to sum the monthly tables.
It all really depends on how much data you have to deal with. But again I stress, data consistency is a lot more valuable than speed, you can always buy more servers.

Too many SQL calls on page load?

I'm constructing a website for a small collection of parents at a private daycare centre. One of the desired functions of the site is to have a calendar where you can pick what days you can be responsible for the cleaning of the locales. Now, I have made a working calendar. I found a simple script online that I modified abit to fit our purpose. Technically, it works well, but I'm starting to wonder if I really should alter the way it extracts information from the databse.
The calendar is presented monthly, and drawn as a table using a for-loop. That means that said for-loop is run 28-31 times each time the page is loaded depending on the month. To present who is responsible for cleaning each day, I have added a call to a MySQL database where each member's cleaning day is stored. The pseudo code looks like this, simplified:
Draw table month
for day=start_of_month to day=end_ofmonth
type day
select member from cleaning_schedule where picked_day=day
type member
This means that each reload of the page does at least 28 SELECT calls to the database and to me it seems both inefficient and that one might be susceptible to a DDOS-attack. Is there a more efficient way of getting the same result? There are much more complex booking calendars out there, how do they handle it?
SELECT picked_day, member FROM cleaning_schedule WHERE picked_day BETWEEN '2012-05-01' AND '2012-05-31' ORDER BY picked_day ASC
You can loop through the results of that query, each row will have a date and a person from the range you picked, in order of ascending dates.
The MySQL query cache will save your bacon.
Short version: If you repeat the same SQL query often, it will end up being served without table access as long as the underlying tables have not changed. So: The first call for a month will be ca. 35 SQL Queries, which is a lot but not too much. The second load of the same page will give back the results blazing fast from the cache.
My experience says, that this tends to be much faster than creating fancy join queries, even if that would be possible.
Not that 28 calls is a big deal but I would use a join and call in the entire month's data in one hit. You can then iterate through the MySQL Query result as if it was an array.
You can use greater and smaller in SQL. So instead of doing one select per day, you can write one select for the entire month:
SELECT day, member FROM cleaning_schedule
WHERE day >= :first_day_of_month AND day >= :last_day_of_month
ORDER BY day;
Then you need to pay attention in your program to handle multiple members per day. Although the program logic will be a bit more complex, the program will be faster: The interprocess or even network based communication is a lot slower than the additional logic.
Depending on the data structure, the following statement might be possible and more convenient:
SELECT day, group_concat(member) FROM cleaning_schedule
WHERE day >= :first_day_of_month AND day >= :last_day_of_month
GROUP BY day
ORDER BY day;
28 queries isnt a massive issue and pretty common for most commercial websites but is recommend just grabbing your monthly data by each month on one hit. Then just loop through the records day by day.

Storing and Displaying Live Stats

Say we are a site receiving massive amounts of traffic, Amazon.com size traffic. And say we wanted to display a counter on the home page displaying the total number of sales since December the first and the counter was to refresh via ajax every 10 seconds.
How would we go about doing this?
Would we have a summary database table displaying the total sales and each checkout would +1 to the counter and we would get that number every 10 seconds? Would we COUNT() the entire 'sales' table every 10 seconds?? Is there an external API I can push the stats off to and then do an ajax pull from them?
Hope you can help, Thanks
If your site is ecomm based, in that you are conducting sales, then you MUST have a sales tracking table somewhere. You could simply make the database count part of the page render when a user visits or refreshes your site.
IMO, there is no need to ajax this count as most visitors won't really care.
Also, I would recommend this query be run against a readonly (slave) database if your traffic is truly at amazon levels.
I would put triggers on the tables to manage the counter tables. When inserting a new sale the sum table would get the new value added to the row for the current day. That also gives sales per day historically without actually querying the big table.
Also, it allows for orders to be entered manually for other dates than today and that day get updated statistics.
As for the Ajax part that's just going to be a query into that sum table.
Whatever you do, do not re-COUNT everything every 10 seconds. Why not to have a cronjob, which does the counting of data every 10 seconds? It could take current time-10 seconds and in slave database add the difference to current count ?
Still 10 seconds sound bizarre. Every minute, mm?

Popularity Algorithm

I'd like to populate the homepage of my user-submitted-illustrations site with the "hottest" illustrations uploaded.
Here are the measures I have available:
How many people have favourited that illustration
votes table includes date voted
When the illustration was uploaded
illustration table has date created
Number of comments (not so good as max comments total about 10 at the moment)
comments table has comment date
I have searched around, but don't want user authority to play a part, but most algorithms include that.
I also need to find out if it's better to do the calculation in the MySQL that fetches the data or if there should be a PHP/cron method every hour or so.
I only need 20 illustrations to populate the home page. I don't need any sort of paging for this data.
How do I weight age against votes? Surely a site with less submission needs less weight on date added?
Many sites that use some type of popularity ranking do so by using a standard algorithm to determine a score and then decaying eternally over time. What I've found works better for sites with less traffic is a multiplier that gives a bonus to new content/activity - it's essentially the same, but the score stops changing after a period of time of your choosing.
For instance, here's a pseudo-example of something you might want to try. Of course, you'll want to adjust how much weight you're attributing to each category based on your own experience with your site. Comments are rare, but take more effort from the user than a favorite/vote, so they probably should receive more weight.
score = (votes / 10) + comments
age = UNIX_TIMESTAMP() - UNIX_TIMESTAMP(date_created)
if(age < 86400) score = score * 1.5
This type of approach would give a bonus to new content uploaded in the past day. If you wanted to approach this in a similar way only for content that had been favorited or commented on recently, you could just add some WHERE constraints on your query that grabs the score out from the DB.
There are actually two big reasons NOT to calculate this ranking on the fly.
Requiring your DB to fetch all of that data and do a calculation on every page load just to reorder items results in an expensive query.
Probably a smaller gotcha, but if you have a relatively small amount of activity on the site, small changes in the ranking can cause content to move pretty drastically.
That leaves you with either caching the results periodically or setting up a cron job to update a new database column holding this score you're ranking by.
Obviously there is some subjectivity in this - there's no one "correct" algorithm for determining the proper balance - but I'd start out with something like votes per unit age. MySQL can do basic math so you can ask it to sort by the quotient of votes over time; however, for performance reasons, it might be a good idea to cache the result of the query. Maybe something like
SELECT images.url FROM images ORDER BY (NOW() - images.date) / COUNT((SELECT COUNT(*) FROM votes WHERE votes.image_id = images.id)) DESC LIMIT 20
but my SQL is rusty ;-)
Taking a simple average will, of course, bias in favor of new images showing up on the front page. If you want to remove that bias, you could, say, count only those votes that occurred within a certain time limit after the image being posted. For images that are more recent than that time limit, you'd have to normalize by multiplying the number of votes by the time limit then dividing by the age of the image. Or alternatively, you could give the votes a continuously varying weight, something like exp(-time(vote) + time(image)). And so on and so on... depending on how particular you are about what this algorithm will do, it could take some experimentation to figure out what formula gives the best results.
I've no useful ideas as far as the actual agorithm is concerned, but in terms of implementation, I'd suggest caching the result somewhere, with a periodic update - if the resulting computation results in an expensive query, you probably don't want to slow your response times.
Something like:
(count favorited + k) * / time since last activity
The higher k is the less weight has the number of people having it favorited.
You could also change the time to something like the time it first appeared + the time of the last activity, this would ensure that older illustrations would vanish with time.

Categories