Optimizing a database-driven table display - php

I'm in the process of revising a PHP page that displays all of our items with various statistics regarding each one. We're looking at a period running from the first of the month one year ago up to yesterday. I've managed to get a basic version of the script working, but it performs poorly. The initial implementation of this script (not my revision) retrieved all sales records and item information at once, then used the resulting records to create objects (not mysql_fetch_objects). These were then stored in an array that used hard-coded values to access the object's attributes. The way this is all set up is fairly confusing and doesn't easily lend itself to reusability. It is, however, significantly faster than my implementation since it only calls to the database once or twice.
My implementation utilizes three calls. The first obtains basic report information needed to create DateTime objects for the report's range (spanning the first of the month twelve months ago up to yesterday's date). This is, however, all it's used for and I really don't think I even need to make this call. The second retrieves all basic information for items included in the report. This comes out to 854 records. The last select statement retrieves all the sales information for these items, and last I checked, this returned over 6000 records.
What I tried to do was select only the records pertinent to the current item in the loop, represented by the following.
foreach($allItems as $item){
//Display properties of item here
//db_controller is a special class used to handle DB manipulation
$query = "SELECT * FROM sales_records WHERE ...";
$result = $db_controller->select($query);
//Use sales records to calculate report values
}
This is the problem. Calling to the database for each and every item is extremely time-consuming and drastically impacts performance. What's returned is simply the sums of quantities sold in each month within the timeframe specified earlier in the script, along with the resulting sales amounts. At maximum, each item will only have 13 sales records, ranging from 2015/1 to 2016/1 for example. However, I'm not sure if performing a single fetch for all these sales records before the loop will help performance, the reason being that I would then have to search through the result array for the first instance of a sales record pertaining to the current item. What can I do to alleviate this issue? Since this is a script important to the company's operations, I want to be sure that performance on my script is just as quick as the old script or, at the very least, only slightly slower than the old script. My results are accurate but just painfully slow.

Related

Double Entry Accounting pagination issue

There is a really serious issue about Double Entry Accounting systems with pagination, I think it is common but I still didn't find any solution for my problem yet.
You can use this link to read about the simple Double Entry Accounting systems just like the one I made with Laravel and AngularJS.
In this system, the expected result (for example) is something like this:
ID In Out Balance
1 100.00 0.00 100.00
2 10.00 0.00 110.00
3 0.00 70.00 40.00
4 5.00 0.00 45.00
5 0.00 60.00 -15.00
6 20.00 0.00 5.00
It is very easy to track the balance inside a cumulative function if you were showing all the transactions in one page, the balance in the last transaction is your current balance at the end of the day.
For example, for a specific range of dates $fromDate->$toDate, we do like:
$balanceYesterday = DB::table('journal')->where('date', '<', $fromDate)
->join('transactions','transactions.journal_id','journal.id')->where('transactions.type', "=", 0) /* 0 means the account of the company */
->select(DB::raw('SUM(amount) as total_balance'))
->first()->total_balance;
Now we have balance from yesterday, we depend on it to calculate the balance after that in a cumulative loop until the end of the process, reaching $toDate;
$currentBalance = $currentBalance + $currentTransaction->amount;
$currentTransactionBalance = $currentBalance;
Now the real problem starts when you have a big amount of transactions, and you need to paginate them $journal = $journal->paginate(100);, let's say 100 transactions per page, the system will work as expected for the first page, as we already can calculate the $balanceYesterday and depend on it to calculate the new balance after every transaction to the end of the 100 transactions in the first page.
Next page will have the problem that it doesn't know what was the last balance at the previous transaction in the first page, so it will start again from $balanceYesterday, making the whole table have wrong calculations.
What I did first to fix, was transferring the last transaction amount (in front-end) to the next page as a parameter, and use it as a starting amount to calculate again, and that was the best solution I had as I was using only << PREV and NEXT >> buttons, so it was very easy to fix it like that.
But I lately found out that this workaround will not work if I have a pagination with page numbers, as the user would like to go through pages to explore the journal, now it is impossible to know the last balance at a specific page, and the system will show wrong calculations.
What I am trying to do is finding a way to calculate the balance at a specific transaction, weather it was a credit or debit, I'm looking for a way to know how much the balance was after a specific transaction is done in a specific date, I DON'T WANT TO ADD A NEW BALANCE COLUMN AND SAVE THE BALANCE INSIDE IT, THE USER IS DOING A LOT OF MODIFICATIONS AND EDITS TO THE TRANSACTIONS FROM TIME TO TIME AND THAT WILL BREAK EVERYTHING AS A SMALL AMOUNT MODIFICATION WILL AFFECT ALL THE BALANCES AFTER IT, I CAN NOT depend on IDs of transactions in any method because transactions might have different random dates, so there will be no ordering by ID but there might be ordering by other fields like date or account owner or type or whatever..
I've been scratching my head on this for about 4 months, I searched online and found no solutions, I hope after this long explanation that my problem is clear, and I hope somebody can help me with a solution, please..
Thank you.
I believe the only thing you really need at this point is to calculate the sum of all transactions from the beginning of the paginated data set (all records, not just the current page's) until one before the first record displayed on the current page.
You can get this by finding the number of transactions that occurred between the start of your entire data set and the current page's transactions, retrieving them via LIMIT, and adding them up.
The first thing you'll want to have is the exact constraints of your pagination query. Since we want to grab another subset of paginated records besides the current page, you want to be sure the results of both queries are in the same order. Reusing the query builder object can help (adjust to match your actual pagination query):
$baseQuery = DB::table('journal')
->join('transactions', 'transactions.journal_id', 'journal.id')
->where('date', '>', $fromDate)
->where('date', '<', $toDate)
->where('transactions.type', "=", 0)
->orderBy('date', 'asc');
// Note that we aren't fetching anything here yet.
Then, fetch the paginated result set. This will perform two queries: one for the total count of records, and a second for the specific page's transactions.
$paginatedTransactions = $baseQuery->paginate(100);
From here, we can determine what records we need to retrieve the previous balance for. The pagination object returned is an instance of LengthAwarePaginator, which knows how many records in total, the number of pages, what current page its on, etc.
Using that information, we just do some math to grab the number of records we need:
total records needed = (current page - 1) * records per page
Assuming the user is on page 5, they will see records 401 - 500, so we need to retrieve the previous 400 records.
// If we're on Page 1, or there are not enough records to
// paginate, we don't need to calculate anything.
if ($paginatedTransactions->onFirstPage() || ! $paginatedTransactions->hasPages()) {
// Don't need to calculate a previous balance. Exit early here!
}
// Use helper methods from the Paginator to calculate
// the number of previous transactions.
$limit = ($paginatedTransactions->currentPage() - 1) * $paginatedTransactions->perPage();
Now that we have the number of transactions that occurred within our data set but before the current page, we can retrieve and calculate the sum by again utilizing the base query:
$previousBalance = $baseQuery->limit($limit)->sum('amount');
Adding a highlight here to explain that using your database to perform the SUM calculations will be a big performance benefit, rather than doing it in a loop in PHP. Take advantage of the DB as often as you can!
Add this balance to your original "yesterday" balance, and you should have an accurate beginning balance for the paginated transactions.
Note: everything pseudo-coded from theory, may need adjustments. Happy to revise if there are questions or issues.
You should be able to formulate a truth statement for the balance for each record as long as you can tell what the order is to calculate the sum for the balance at each point within that ordered list.
For sure this come with a massive overhead as you need to query the whole table for each record you display, but first of all one must be able to do that. As you've shown in the example, you are as long as you do not paginate.
What you could do for pagination is to pre-calculate the balance for each record and store it in relation to the original record. This would de-normalize your data but with the benefit that creating the pagination is rather straight forward.

How to get the time difference in seconds between two db column status (or events)

I am trying to get the time difference in seconds in my database between two events. On the database, the users table has different columns but I'm working with the status column. If a user places an order on our website the status column shows "pending" and, if it is confirmed by our agents it then switches to "success". I'm trying to get the time difference (in secs) between when it shows pending and when it shows success.
NB: I'll be glad if anyone can explain the time() function in PHP with an example.
You can use MySQL's unix_timestamp() function. Assuming that your table has two records for the two events, and there is a column called eventTime, then two queries like the following can give you the two values which are the respective number of seconds since Epoch. You can subtract the latter by the former and get the time difference
select unix_timestamp(eventTime) ... where status='pending'
select unix_timestamp(eventTime) ... where status='success'
Update
After re-reading your question, I guess your DB design is that there is only one row for the whole life cycle of the transaction (from pending to success). In this case, if all three parties (the agent who updates the status to pending, the agent who updates the status to success, and the agent who needs to find the time difference between the two events) involved are the same thread, then you can keep the two event time in memory and simply compute the difference.
However, I think it is more likely that the three parties are two or three different threads. In this case, I think you must have some mechanism to pass the knowledge (of the first event time) from one thread to another. This can be done by way of adding a new column called lastUpdateTime, or by adding a new table for the purpose of time tracking.
By the way, if you use the second approach, I think MySQL Trigger may be useful for you (so that whenever the table gets updated, it trigger another command to update the second table which is used solely to keep track of event time of elapsed time). This approach allows you to not changing the original table but just add a new table.

Too many SQL calls on page load?

I'm constructing a website for a small collection of parents at a private daycare centre. One of the desired functions of the site is to have a calendar where you can pick what days you can be responsible for the cleaning of the locales. Now, I have made a working calendar. I found a simple script online that I modified abit to fit our purpose. Technically, it works well, but I'm starting to wonder if I really should alter the way it extracts information from the databse.
The calendar is presented monthly, and drawn as a table using a for-loop. That means that said for-loop is run 28-31 times each time the page is loaded depending on the month. To present who is responsible for cleaning each day, I have added a call to a MySQL database where each member's cleaning day is stored. The pseudo code looks like this, simplified:
Draw table month
for day=start_of_month to day=end_ofmonth
type day
select member from cleaning_schedule where picked_day=day
type member
This means that each reload of the page does at least 28 SELECT calls to the database and to me it seems both inefficient and that one might be susceptible to a DDOS-attack. Is there a more efficient way of getting the same result? There are much more complex booking calendars out there, how do they handle it?
SELECT picked_day, member FROM cleaning_schedule WHERE picked_day BETWEEN '2012-05-01' AND '2012-05-31' ORDER BY picked_day ASC
You can loop through the results of that query, each row will have a date and a person from the range you picked, in order of ascending dates.
The MySQL query cache will save your bacon.
Short version: If you repeat the same SQL query often, it will end up being served without table access as long as the underlying tables have not changed. So: The first call for a month will be ca. 35 SQL Queries, which is a lot but not too much. The second load of the same page will give back the results blazing fast from the cache.
My experience says, that this tends to be much faster than creating fancy join queries, even if that would be possible.
Not that 28 calls is a big deal but I would use a join and call in the entire month's data in one hit. You can then iterate through the MySQL Query result as if it was an array.
You can use greater and smaller in SQL. So instead of doing one select per day, you can write one select for the entire month:
SELECT day, member FROM cleaning_schedule
WHERE day >= :first_day_of_month AND day >= :last_day_of_month
ORDER BY day;
Then you need to pay attention in your program to handle multiple members per day. Although the program logic will be a bit more complex, the program will be faster: The interprocess or even network based communication is a lot slower than the additional logic.
Depending on the data structure, the following statement might be possible and more convenient:
SELECT day, group_concat(member) FROM cleaning_schedule
WHERE day >= :first_day_of_month AND day >= :last_day_of_month
GROUP BY day
ORDER BY day;
28 queries isnt a massive issue and pretty common for most commercial websites but is recommend just grabbing your monthly data by each month on one hit. Then just loop through the records day by day.

Setting up database dates with ORMs

I have been thinking/researching the best way to handle my entity dates in conjunction with an ORM tool. Currently I am using Doctrine2(php 5.3) with a MySQL driver(if anyone needs to know).
So my situation is as follows: I have a system that tracks WorkOrders and their Invoices from collaborating subcontractors. So a WorkOrder may have numerous invoices submitted by same/different subcontractors which will aggregate for a given pay period. This amount, is paid to the subcontractor. My question is what is the best way to handle fetching invoices that fall into a specific pay period/ or any date range for that matter? As an example I have a table which displays the totals for each subcontractor for each week in a year, but I also display totals for a month etc.. In addition I have a calendar view which displays the same invoices aggregated by day and week.
Currently I pass a date range(fromDate/thruDate) along with a class which is configured to iterate the result set and compose collections based on different criteria such as unit of time to aggregate results and a calculator to handle the totaling of the invoices based on user role and/or invoice type. This way seems to be very flexible so far, however I am concerned with the performance impact of fetching say 10,000 invoices, having doctrine hydrate the objects, me iterating the result set, and then iterating again in my view to display. I am thinking I could eliminate one step of me iterating the result step by looking into a custom hydrator.
I have also been thinking about setting up a entity with each date from the 'date of origin' of the system to a relevant current/future date with relationships to weeks/months/quarters/years which would save me the hassle of forming my own collections from the result set. This method seems like it would be nice especially since when I pass a date range to fetch invoices to display on a calendar, I have to find and pass fromDates and thruDate which more often than not extend into previous and future months because of how the weeks are totaled. I am beginning to lean more toward this approach but I have a feeling when I begin to implement it that I will begin to run into problems.
So enough rambling on for now, I'll just ask. Can anyone give me any pointers/tips/lessons learned/ reading material/ etc... on this subject.
Thanks for your time.
One idea may be to hydrate as an array when displaying the data and only hydrate into objects when you need to work with an individual invoice.
Another approach may be to limit the number of entities returned into a paginated list to ensure you have a known maximum number of objects being returned.
Hope that helps

datewise records sorting /fetching from large mySQL database

I have a separate table for every day's data which is basically webstats type : keywords, visits, duration, IP, sale, etc (maybe 100 bytes total per record)
Each table will have around a couple of million records.
What I need to do is have a web admin so that the user/admin can view reports for different date periods AND sorted by certain calculated values. For example, the user may want the results for the 15th of last month to the 12th of this month , sorted by SALE/VISIT , descending order.
The admin/user only needs to view (say) the top 200 records at a time and will probably not view more than a few hundred total in any one session
Because of the arbitrary date period involved, I need to sum up the relevant columns for each record and only then can the selection be done.
My question is whether it will be possible to have the reports in real time or would they be too slow (the tables are not rarely - if ever - updated after the day's data has been inserted)
Is such a scenario better fitted to indexes or tablescans?
And also, whether a massive table for all dates would be better than having separate tables for each date (there are almost no joins)
thanks in advance!
With a separate table for each day's data, summarizing across a month is going to involve doing the same analysis on each of 30-odd tables. Over a year, you will have to do the analysis on 365 or so tables. That's going to be a nightmare.
It would almost certainly be better to have a soundly indexed single table than the huge number of tables. Some DBMS support fragmented tables - if MySQL does, fragment the single big table by the date. I would be inclined to fragment by month, especially if the normal queries are for one month or less and do not cross month boundaries. (Even if it involves two months, with decent fragment elimination, the query engine won't have to read most of the data; just the two fragments for the two months. It might be able to do those scans in parallel, even - again, depending on the DBMS.)
Sometimes, it is quicker to do sequential scans of a table than to do indexed lookups - don't simply assume that because the query plan involves a table scan that it will automatically be bad performing.
You may want to try a different approach. I think Splunk will work for you. It was designed for this, they even do ads on this site. They have a free version you can try.

Categories