I have been thinking/researching the best way to handle my entity dates in conjunction with an ORM tool. Currently I am using Doctrine2(php 5.3) with a MySQL driver(if anyone needs to know).
So my situation is as follows: I have a system that tracks WorkOrders and their Invoices from collaborating subcontractors. So a WorkOrder may have numerous invoices submitted by same/different subcontractors which will aggregate for a given pay period. This amount, is paid to the subcontractor. My question is what is the best way to handle fetching invoices that fall into a specific pay period/ or any date range for that matter? As an example I have a table which displays the totals for each subcontractor for each week in a year, but I also display totals for a month etc.. In addition I have a calendar view which displays the same invoices aggregated by day and week.
Currently I pass a date range(fromDate/thruDate) along with a class which is configured to iterate the result set and compose collections based on different criteria such as unit of time to aggregate results and a calculator to handle the totaling of the invoices based on user role and/or invoice type. This way seems to be very flexible so far, however I am concerned with the performance impact of fetching say 10,000 invoices, having doctrine hydrate the objects, me iterating the result set, and then iterating again in my view to display. I am thinking I could eliminate one step of me iterating the result step by looking into a custom hydrator.
I have also been thinking about setting up a entity with each date from the 'date of origin' of the system to a relevant current/future date with relationships to weeks/months/quarters/years which would save me the hassle of forming my own collections from the result set. This method seems like it would be nice especially since when I pass a date range to fetch invoices to display on a calendar, I have to find and pass fromDates and thruDate which more often than not extend into previous and future months because of how the weeks are totaled. I am beginning to lean more toward this approach but I have a feeling when I begin to implement it that I will begin to run into problems.
So enough rambling on for now, I'll just ask. Can anyone give me any pointers/tips/lessons learned/ reading material/ etc... on this subject.
Thanks for your time.
One idea may be to hydrate as an array when displaying the data and only hydrate into objects when you need to work with an individual invoice.
Another approach may be to limit the number of entities returned into a paginated list to ensure you have a known maximum number of objects being returned.
Hope that helps
Related
I am currently working on a simple booking system and I need to select some ranges and save them to a mysql database.
The problem I am facing is deciding if it's better to save a range, or to save each day separately.
There will be around 500 properties, and each will have from 2 to 5 months booked.
So the client will insert his property and will chose some dates that will be unavailable. The same will happen when someone books a property.
I was thinking of having a separate table for unavailable dates only, so if a property is booked from 10 may to 20 may, instead of having one record (2016-06-10 => 2016-06-20) I will have 10 records, one for each booked day.
I think this is easier to work with when searching between dates, but I am not sure.
Will the performance be noticeable worse ?
Should I save the ranges or single days ?
Thank you
I would advise that all "events" go into one table and they all have a start and end datetime. Use of indexes on these fields is of course recommended.
The reasons are that when you are looking for bookings and available events - you are not selecting from two different tables (or joining them). And storing a full range is much better for the code as you can easily perform the checks within a SQL query and all php code to handle events works as standard for both. If you only store one event type differently to another you'll find loads of "if's" in your code and find it harder to write the SQL.
I run many booking systems at present and have made mistakes in this area before so I know this is good advice - and also a good question.
This is too much for a comment,So I will leave this as an answer
So the table's primary key would be the property_id and the Date of a particular month.
I don't recommend it.Because think of a scenario when u going to apply this logic to 5 or 10 years system,the performance will be worse.You will get approximately 30*12*1= 360 raws for 1 year.Implement a logic to calculate the duration of a booking and add it to table against the user.
I'm in the process of revising a PHP page that displays all of our items with various statistics regarding each one. We're looking at a period running from the first of the month one year ago up to yesterday. I've managed to get a basic version of the script working, but it performs poorly. The initial implementation of this script (not my revision) retrieved all sales records and item information at once, then used the resulting records to create objects (not mysql_fetch_objects). These were then stored in an array that used hard-coded values to access the object's attributes. The way this is all set up is fairly confusing and doesn't easily lend itself to reusability. It is, however, significantly faster than my implementation since it only calls to the database once or twice.
My implementation utilizes three calls. The first obtains basic report information needed to create DateTime objects for the report's range (spanning the first of the month twelve months ago up to yesterday's date). This is, however, all it's used for and I really don't think I even need to make this call. The second retrieves all basic information for items included in the report. This comes out to 854 records. The last select statement retrieves all the sales information for these items, and last I checked, this returned over 6000 records.
What I tried to do was select only the records pertinent to the current item in the loop, represented by the following.
foreach($allItems as $item){
//Display properties of item here
//db_controller is a special class used to handle DB manipulation
$query = "SELECT * FROM sales_records WHERE ...";
$result = $db_controller->select($query);
//Use sales records to calculate report values
}
This is the problem. Calling to the database for each and every item is extremely time-consuming and drastically impacts performance. What's returned is simply the sums of quantities sold in each month within the timeframe specified earlier in the script, along with the resulting sales amounts. At maximum, each item will only have 13 sales records, ranging from 2015/1 to 2016/1 for example. However, I'm not sure if performing a single fetch for all these sales records before the loop will help performance, the reason being that I would then have to search through the result array for the first instance of a sales record pertaining to the current item. What can I do to alleviate this issue? Since this is a script important to the company's operations, I want to be sure that performance on my script is just as quick as the old script or, at the very least, only slightly slower than the old script. My results are accurate but just painfully slow.
Let me give an example of my issue. Let's say I have a table called users and a table called payments. To calculate a user's total balance, I'd use a query to get all the payments after a certain date and then cache the result for a while.
However, I was wondering, due to the nature of this, would it be a good idea to have a column in the users table called balance and then when the cache expires, I use a different query to gather the payments but from a shorter time and then add this amount on to whatever is in the balance column?
To calculate a user's total balance,
You can create an additional table that always contains the users current balance. If a new payment is added for the user, that column needs to be updated, too. Do a transaction so adding the payment and updating the total balance is aligned.
If you need this more differentiated, you can next to the user relation, keep a date column representing the interval you need to be able to do calculations for. E.g. the week number or month number to be able to give a view back in the past.
If you need more flexibility, you can after some time compress existing payments into a total value and store that into such a balance table that is user-related and keeps a date column.
You can then UNION it with the table of the payments that are "realtime" for the dates not yet compressed / condensed. Then use an aggregation function to SUM the total balance. This might give you the best of both worlds if you need to keep recent data with more detail you can move out of the data-store after some time, just keeping statistic values.
Generally with these kinds of "pre-calculated" values I find that the most pain free way is to store/update them on save of any model that concerns the data
So in short, update the total balance whenever a new payment is saved. That way you can guarantee that your database and your data will always be in sync
The pre-calculation can be either a mysql trigger or a background task with something like Gearman
But as your own question suggested if you want to do some kind of incremental roll-up of the balance, I would advice going by months or some fixed date range. This would work providing that you have no payment backdating or something like that, where payment could appear for an old month.
Start of the new month, run a payment aggregator, bam, you now only have to sum the monthly tables.
It all really depends on how much data you have to deal with. But again I stress, data consistency is a lot more valuable than speed, you can always buy more servers.
I'm constructing a website for a small collection of parents at a private daycare centre. One of the desired functions of the site is to have a calendar where you can pick what days you can be responsible for the cleaning of the locales. Now, I have made a working calendar. I found a simple script online that I modified abit to fit our purpose. Technically, it works well, but I'm starting to wonder if I really should alter the way it extracts information from the databse.
The calendar is presented monthly, and drawn as a table using a for-loop. That means that said for-loop is run 28-31 times each time the page is loaded depending on the month. To present who is responsible for cleaning each day, I have added a call to a MySQL database where each member's cleaning day is stored. The pseudo code looks like this, simplified:
Draw table month
for day=start_of_month to day=end_ofmonth
type day
select member from cleaning_schedule where picked_day=day
type member
This means that each reload of the page does at least 28 SELECT calls to the database and to me it seems both inefficient and that one might be susceptible to a DDOS-attack. Is there a more efficient way of getting the same result? There are much more complex booking calendars out there, how do they handle it?
SELECT picked_day, member FROM cleaning_schedule WHERE picked_day BETWEEN '2012-05-01' AND '2012-05-31' ORDER BY picked_day ASC
You can loop through the results of that query, each row will have a date and a person from the range you picked, in order of ascending dates.
The MySQL query cache will save your bacon.
Short version: If you repeat the same SQL query often, it will end up being served without table access as long as the underlying tables have not changed. So: The first call for a month will be ca. 35 SQL Queries, which is a lot but not too much. The second load of the same page will give back the results blazing fast from the cache.
My experience says, that this tends to be much faster than creating fancy join queries, even if that would be possible.
Not that 28 calls is a big deal but I would use a join and call in the entire month's data in one hit. You can then iterate through the MySQL Query result as if it was an array.
You can use greater and smaller in SQL. So instead of doing one select per day, you can write one select for the entire month:
SELECT day, member FROM cleaning_schedule
WHERE day >= :first_day_of_month AND day >= :last_day_of_month
ORDER BY day;
Then you need to pay attention in your program to handle multiple members per day. Although the program logic will be a bit more complex, the program will be faster: The interprocess or even network based communication is a lot slower than the additional logic.
Depending on the data structure, the following statement might be possible and more convenient:
SELECT day, group_concat(member) FROM cleaning_schedule
WHERE day >= :first_day_of_month AND day >= :last_day_of_month
GROUP BY day
ORDER BY day;
28 queries isnt a massive issue and pretty common for most commercial websites but is recommend just grabbing your monthly data by each month on one hit. Then just loop through the records day by day.
I have a separate table for every day's data which is basically webstats type : keywords, visits, duration, IP, sale, etc (maybe 100 bytes total per record)
Each table will have around a couple of million records.
What I need to do is have a web admin so that the user/admin can view reports for different date periods AND sorted by certain calculated values. For example, the user may want the results for the 15th of last month to the 12th of this month , sorted by SALE/VISIT , descending order.
The admin/user only needs to view (say) the top 200 records at a time and will probably not view more than a few hundred total in any one session
Because of the arbitrary date period involved, I need to sum up the relevant columns for each record and only then can the selection be done.
My question is whether it will be possible to have the reports in real time or would they be too slow (the tables are not rarely - if ever - updated after the day's data has been inserted)
Is such a scenario better fitted to indexes or tablescans?
And also, whether a massive table for all dates would be better than having separate tables for each date (there are almost no joins)
thanks in advance!
With a separate table for each day's data, summarizing across a month is going to involve doing the same analysis on each of 30-odd tables. Over a year, you will have to do the analysis on 365 or so tables. That's going to be a nightmare.
It would almost certainly be better to have a soundly indexed single table than the huge number of tables. Some DBMS support fragmented tables - if MySQL does, fragment the single big table by the date. I would be inclined to fragment by month, especially if the normal queries are for one month or less and do not cross month boundaries. (Even if it involves two months, with decent fragment elimination, the query engine won't have to read most of the data; just the two fragments for the two months. It might be able to do those scans in parallel, even - again, depending on the DBMS.)
Sometimes, it is quicker to do sequential scans of a table than to do indexed lookups - don't simply assume that because the query plan involves a table scan that it will automatically be bad performing.
You may want to try a different approach. I think Splunk will work for you. It was designed for this, they even do ads on this site. They have a free version you can try.