Optimizing my data structure. Dealing with lots of units and conversions - php

My website deals with a lot of drug doses and units. Currently it is setup where it's not really flexible to mold those doses and units into different values based on a user.
For instance, if a user submits a record where he took 1ml of alcohol on monday, 10ml on tuesday, and 2liters on tuesday then the same substance users two sepester units. So what happens if I want to show the user the average of these three days ONLY in mL? What about if the users want's to see it in only Liters?
Here's what I have so far.
drugs
id | drug
1 | alcohol
2 | cannabis
units
id | unit
1 | ml
2 | mg
3 | g
4 | l
unit_conversion
id | from_unitid | to_unitid | multiplier
1 | 1 | 4 | 1000
2 | 4 | 1 | .001
3 | 2 | 3 | 1000
4 | 3 | 2 | .001
user_dose_line
id | drug_id | unit_id | dose
1 | 1 | 1 | 100
2 | 1 | 1 | 200
3 | 1 | 4 | 1
Here's how i'd ideals want it to work.
A user submits a record. He fills out that the drug is alcohol, the dose is 100, and the unit is ml. This stores in the data base table is id#1 on user_dose_line.
Now let's say there is another table. It is the options table where the user sets what default unit of measurement he wants to use.
user_dose_options
id | user_id | drug_id | unit_id
1 | 22 | 1 | 4
This shows that the user has selected that any entry with alcohol in it, should always be converted to liters.
Here is my problem, where in my logic do I do these conversions.
I use CAKEPHP which is a usual MVC framework. Should I be doing the conversion in the model? in the controller? What is the best practice for this. And am I choose the most optimized route for this?
I plan to later add a lot more functionality for units/doses and thus I need my database setup in the most efficient way to allow me to do lots of cool calculations with the data (display graphics, statistics, etc)

You should have an input and an output method. Those should convert from the input unit to one standard unit which you use to store the data.
You could do this in the setter of your model.
i.e.
class userInTake {
private $userId;
private $drugId;
private $dose;
public function setDrugUse($drugId,$dose,$unitId) {
// $convert dose to you default unit for this drug
$this->dose = yourConversionMethod($dose,$unit,$desiredUnit);
$this->userId = $userId;
$this->drugId = $drugId;
}
...
}
At the point where you store the user input into your models, you do the conversion.
So, if for instance, a user inputs he took 4l of alcohol on monday, you convert the litres to ml and then store the ml value.
so if a user inouts three occasions, monday 4l, tuesday 40ml, friday 2,4l, you convert each input to ml.
When displaying the average, you'll get 6440ml.
Your output function could choose the unit that serves best to display the value (i.e. if a ml value > 10000, display as litres).
This way you don't have to store different units and only need your units for conversion upon input/output. That should make calculations a hell of a lot easier.

The ideal setting is that any operation related to data should be in the model. You can override the afterFind() method to do the conversion right after the query, and have the data ready to use.
http://book.cakephp.org/2.0/en/models/callback-methods.html#afterfind
Remember the philosophy: Fat models, skinny controllers.

Related

PHP: Selecting, grouping, and showing only arrays with the same key value?

I have a MySQL database that holds log data from a vehicle's OBD-II reader. The application that interfaces with the reader collects a large amount (usually well over 2,000) of data point sets. Each of these data point sets gets its own row in the table, resulting in something like this:
+---------------+----------------------------------+---------------+----+----+-----+
| session | id | time | kd | kf | ... |
+---------------+----------------------------------+---------------+----+----+-----+
| 1420236074526 | 5cff4a3cc80b22cecb7de85266b25355 | 1420236074534 | 14 | 8 | ... |
| 1420236074526 | 5cff4a3cc80b22cecb7de85266b25355 | 1420236075476 | 17 | 8 | ... |
| 1420236074526 | 5cff4a3cc80b22cecb7de85266b25355 | 1420236075476 | 19 | 8 | ... |
| 1420236074526 | 5cff4a3cc80b22cecb7de85266b25355 | 1420236075476 | 23 | 8 | ... |
| 1420236074526 | 5cff4a3cc80b22cecb7de85266b25355 | 1420236077477 | 25 | 8 | ... |
+---------------+----------------------------------+---------------+---------+-----+
k5 and kf are the vehicle data types (vehicle speed and ambient air temperature, respectively).
There are two indexes in this table (which is called raw_logs):
session maps to columns session and id
id maps to column id
What I'd like to do is grab all of the rows that have the same timestamp (1420236074526, for example), and lump them together into a "session". The goal here is to create a select list where I can view data by session. Ideally, I'd have something like this as the output:
<select>
<option>January 1, 2015 - 7:43AM</option>
<option>January 1, 2015 - 5:15PM</option>
<option>January 2, 2015, - 7:38AM</option>
...
</select>
This is what I have so far (I'm using Medoo to try and simplify the queries):
$session = $database->query("SELECT * FROM raw_logs USE INDEX (session)");
$sessionArray = array();
foreach($session as $session){
$s_time = $session["session"];
$sessionArray[$s_time] = array(
"vehicle_speed" => $session["kd"],
"ambient_air_temp" => $session["kf"]
);
} print_r($sessionArray);
This works... sort of. I get the session time as the array and kd and kf under that with the correct key/value pairs, but it doesn't seem to want to iterate through the whole thing. There are around 25,000 rows in the table at the moment, but it's only returning a few, and there doesn't seem to be any logical order to the listing... it'll return two results from January 8th, one from the 9th, 4 from the 10th, etc.
What would be the best way to select all sessions with the same time stamp, group them, and create a selectable object that will only display the data for the given session?
If you are not ordering the query, there likely won't be any logical order to the listing. Also, you are overwriting the session data where duplicate values exist in the session column. You likely want to do something like $sessionArray[$s_time][] = array(... to append each row under the session id. Also, if you are having trouble, it is best to limit the results from your query down to like 20-100 and keep massaging until you get the result you want.

Calculate price of rental period

I am working on a project for which I need to calculate prices of holiday homes available in a selected rental period. I need some help with building a SQL query that combines the following tables and convert the data into an output containing the price for the requested period for each house. It should contain the stay costs, and the additional cost types together with the amount the renter should pay for every cost_type.
I have a table costprofiles which enables the house owner to have multiple prices throughout the year:
+----------------+----------+--------------+
| costprofile_id | house_id | profile_name |
+----------------+----------+--------------+
| 1 | 312 | summer |
+----------------+----------+--------------+
| 2 | 312 | winter |
+----------------+----------+--------------+
I have a table called costprofile_items which is linked to a costprofile via the foreign key costprofile_id. This table contains all different amounts a renter should pay to the owner if the price of the selected period uses this cost_type. Each additional amount can be calculated in four different ways:
per night
per stay
per person
per person per night
The way each amount contributes to the total rent price is stored in the calculation_type column. This is what the costprofile_items table looks like:
+---------------------+----------------+--------+-------------+----------------------+
| costprofile_item_id | costprofile_id | amount | cost_type | calculation_type |
+---------------------+----------------+--------+-------------+----------------------+
| 1 | 1 | 20 | usage_cost | per_night |
+---------------------+----------------+--------+-------------+----------------------+
| 2 | 1 | 8.5 | cleaning | per_stay |
+---------------------+----------------+--------+-------------+----------------------+
| 3 | 1 | 0.82 | tourist_tax | per_person_per_night |
+---------------------+----------------+--------+-------------+----------------------+
I also have the table prices in which each row represents a price per night that can be used between the start_date and the end_date (the weekday of the start_date equals the weekday of arrival at the house and the weekday of end_date equals the weekday of departure). The row also contains a column nights that determines how long a sub period needs to be in order to use this price. This is what the table looks like:
+----------+----------+----------------+------------+------------+-----------+--------+
| price_id | house_id | costprofile_id | start_date | end_date | per_night | nights |
+----------+----------+----------------+------------+------------+-----------+--------+
| 1 | 1 | 1 | 2014-08-04 | 2014-12-01 | 60 | 7 |
+----------+----------+----------------+------------+------------+-----------+--------+
| 2 | 1 | 1 | 2014-08-08 | 2014-12-05 | 70 | 3 |
+----------+----------+----------------+------------+------------+-----------+--------+
| 3 | 1 | 2 | 2014-12-01 | 2015-03-02 | 0 | 1 |
+----------+----------+----------------+------------+------------+-----------+--------+
In the table you can see that for the given house you can book the period from 8 till 11 August and this will cost 3*70 = €210 for the stay. If you are with 4 persons the additional costs are 3*20 = €60 for electricity/gas usage, €8.5 for cleaning and 0.82*4*3 = €9.84 for tourist tax. So the total cost of your weekend will be €288.34. It also should be possible to combine this weekend with for example 2 times the weekly price as described in the first row of the table. In this case the price from 8 till 25 August would be 288.34 + 2*582.96 = €1454.26. Note that the calculation types per_stay and per_person only need to be selected from the first sub period, so the cleaning in the last example is only paid once.
The last table I use for calculating prices is the table prices_per_group. This table is connected to prices via the foreign key price_id. In the prices table above you can see in the last row that the price per night equals 0. In that case the owner had given a price per night for every number of persons that he accepts in his house during this period this price is active. This is the way those different prices are stored:
+--------------------+----------+------------+-----------+
| price_per_group_id | price_id | group_size | per_night |
+--------------------+----------+------------+-----------+
| 1 | 3 | 5 | 50 |
+--------------------+----------+------------+-----------+
| 2 | 3 | 4 | 45 |
+--------------------+----------+------------+-----------+
As you can see a week starting at 1 December (or any Monday after that, but before 2 March) will cost €50 per night if you are with 5 persons or €45 if you are with 4.
I hope it is clear now how I am trying to store and compute all different prices.
I have managed to get these calculations working, by first querying all cost types of every available house with the following query:
SELECT * FROM (
SELECT prices.house_id,
prices.price_id,
prices.costprofile_id,
prices.nights,
prices.start_date,
prices.end_date,
MIN(
prices.per_night + COALESCE(prices_per_group.per_night, 0)
) AS per_night /* Add the price per night from prices and prices_per_group (if one has a non-zero value the other is always zero) */
FROM prices
LEFT JOIN prices_per_group ON prices.price_id = prices_per_group.price_id
WHERE prices.house_id IN (
/* Query that returns a set with the ids of all available houses here */
)
AND (
prices_per_group.price_id IS NULL OR /* If true, no row in prices_per_group is pointing to the price_id currently being evaluated */
prices_per_group.group_size >= 4 /* If true, the group_size satisfies the requested number of persons */
)
GROUP BY prices.price_id
) AS possible_prices
INNER JOIN costprofile_items ON costprofile_items.costprofile_id = possible_prices.costprofile_id
ORDER BY price_id ASC
After that I used PHP to loop through all rows containing price information for a certain house. I started at the start_date and made steps using the first usable price row it could find and repeated that until I am at the end_date. The problem with my current method is that it is too slow. For 1000 houses the webserver needs 0.3sec execution time. Maybe some optimization can be done in my PHP code, but I was hoping someone could help me with putting this all together in SQL. This way for example sorting by price is easier to implement and just asking for the large result after quickly executing the above query makes my execution time jump up to 0.12sec.
All help and advice is welcome
In the end I decided to cache all prices instead of live computing them. This results in much better performance and allows for much more complex pricing than can be computed on the fly inside queries. Every night a cronjob runs that fills up 21 tables (a table for each possible rental duration). The duration pricing tables contain key,value pairs of arrival date and corresponding computed price for that duration. Optionally you can add a column for group size, resulting in a price per duration, per group size, per arrival date. It takes quite some database records, but if you create indices this is blazing fast.

How does the concept of repeat mode works

In the project (in codeigniter) I am working, a user can create a task and set its repeat mode as (Once/Daily/Weekly) where
Daily - Task will appear for the same time everyday in future
Weekly - Task will appear every Monday (say if task is being added on Monday)
Once - Task will get added only for today
Now every task created by user creates a record in database,
For example, suppose a task is created today(13-01-2014) from 2:00-3:00 with repeat mode as Daily, this will create a record against this (13-01-2014) date but I can't add the same task at that time for all future dates.
And also user can change/edit the mode of task anytime then that should not repeat thereafter.
Can anyone plz explain me the concept of how this repeating mode works? I mean when actually to create a task for future dates, or how to maintain the same in database.
"Explain the concept of repeat mode" is a pretty vague request. However, I think I understand what piece is missing.
I assume you have some kind of taskId, which is a unique key for each task. What you need is a batchId as well. Your end result would look something like this:
+----------+----------+----------------------+
|taskId |batchId |description |
|----------|----------|----------------------|
| 1 | | Some meeting |
| 2 | | Another meeting |
| 3 | 1 | Daily meeting |
| 4 | 1 | Daily meeting |
| 5 | 1 | Daily meeting |
| 6 | 2 | Go to the gym! |
| 7 | 2 | Go to the gym! |
| 8 | 2 | Go to the gym! |
| 9 | 2 | Go to the gym! |
| 10 | | Yet another meeting |
+----------+----------+----------------------+
Having a batchId lets you group these events in the case you need to modify all the tasks at once, but still lets you modify each task individually if need be, thanks to the taskId.
The actual implementation of this batchId is up to you. For example, it can be:
a random string generated on-the-fly
a hash of the first taskId to ensure that their always unique
a foreign key in a separate table that auto-generates a batchId as its key
Use the one that best suits your needs, or make one up yourself.
I just made up taskId and batchId. Replace those with whatever makes sense to you.

Statistical method for grading a set of exponential data

I have a PHP application that allows the user to specify a list of countries and a list of products. It tells them which retailer is the closest match. It does this using a formula similar to this:
(
(number of countries matched / number of countries selected) * (importance of country match)
+
(number of products matched / number of products selected) * (importance of product match)
)
*
(significance of both country and solution matching * (coinciding matches / number of possible coinciding matches))
Where [importance of country match] is 30%, [importance of product match] is 10% and [significance of both country and solution matching] is 2.5
So to simplify it: (country match + product match) * multiplier.
Think of it as [do they operate in that country? + do they sell that product?] * [do they sell that product in that country?]
This gives us a match percentage for each retailer which I use to rank the search results.
My data table looks something like this:
id | country | retailer_id | product_id
========================================
1 | FR | 1 | 1
2 | FR | 2 | 1
3 | FR | 3 | 1
4 | FR | 4 | 1
5 | FR | 5 | 1
Until now it's been fairly simple as it has been a binary decision. The retailer either operates in that country or sells that product or they don't.
However, I've now been asked to add some complexity to the system. I've been given the revenue data, showing how much of that product each retailer sells in each country. The data table now looks something like this:
id | country | retailer_id | product_id | revenue
===================================================
1 | FR | 1 | 1 | 1000
2 | FR | 2 | 1 | 5000
3 | FR | 3 | 1 | 10000
4 | FR | 4 | 1 | 400000
5 | FR | 5 | 1 | 9000000
My problem is that I don't want retailer 3 selling ten times as much as retailer 1 to make them ten times better as a search result. Similarly, retailer 5 shouldn't be nine thousand times better as a match than retailer 1. I've looked into using the mean, the mode and median. I've tried using the deviation from the mean. I'm stumped as to how to make the big jumps less significant. My lack of ignorance of the field of statistics is showing.
Help!
Consider using the log10() function. This reduces the direct scaling of results, like you were describing. If you log10() of the revenue, then someone with a revenue 1000 times larger receives a score only 3x larger.
A classic in "dampening" huge increases in value are the logarithms. If you look at that Wikipedia article, you see that the function value initially grows fairly quickly but then much less so. As mentioned in another answer, a logarithm with base 10 means that each time you multiply the input value by ten, the output value increases by one. Similarly, a logarithm with base two will grow by one each time you multiply the input value by two.
If you want to weaken the effect of the logarithm, you could look into combining it with, say, a linear function, e.g. f(x) = log2 x + 0.0001 x... but that multiplier there would need to be tuned very carefully so that the linear part doesn't quickly overshadow the logarithmic part.
Coming up with this kind of weighting is inherently tricky, especially if you don't know exactly what the function is supposed to look like. However, there are programs that do curve fitting, i.e. you can give it pairs of function input/output and a template function, and the program will find good parameters for the template function to approximate the desired curve. So, in theory you could draw your curve and then make a program figure out a good formula. That can be a bit tricky, too, but I thought you might be interested. One such program is the open source tool QtiPlot.

Algorithm or MySQL Query Advice Needed

I am in desperate need of either algorithm or query construction assistance.
We have a user-generated, flexible database that is created using a form builder we have created. The data for these forms is stored in two tables as follows:
The instances table tell us what form the user is viewing, and then the instance_records table has all the data for the instance. The field_id column tells us what field on the form the data maps to. The reason we use a single table like this instead of creating a table for each form is that MySQL limits how many columns we can have in table given that the data is varchar of a significant length. One possibility would be to use Text fields for the data, but then we would lose the built in MySQL searching capabilities.
Things work quite well and very fast on basic forms. The problem is that one instance of a form can refer to another instance of the form. For example, we have user created form called Appointments. On this form, it refers to the Patient form, the Technician form, the Doctor form, etc.
So, on the Appointment form with instance id, the value for the patient field is actually an instance id of a patient, the doctor field value is the instance id for the doctor, etc. At the first level of references, things aren’t too bad. But, you can have chains of references. I can have a prescription that refers to an appointment that refers to a patient, etc. So, if I want to get the value of the patient name on the prescription, I have to follow the chain down to get the right instance id and field id for the data.
So, if I want to do a report on Appointments and show the Patient name, the Doctor name, and the Technician name, I have to go through some hoops. What I have tried is creating views and then joining the views to a final view that shows all the data for the query. But, it eats a ton of memory and starts writing the view temporary tables to disk and gets slow as all heck. Using query caching, the second time the report runs, it’s fast as heck. But, that first run can take over a minute once we get above 5000-7000 instances.
Something tickling at the back of my mind is that there might be some sort of a way to store the data in a way that I can take advantage of some faster tree search algorithms.
You should read up on EAV... This article might give you some ideas... It talks about two different approaches for storing values. Either approach you end up having a single query for any given form that would essentially grab all the values for the master entity (in this case the form). Then either on the application side or the db side you aggregate those values together appropriately for the application to consume.
The form itself should be a single atomic unit that has a list of fields, you dont need to store which form a field actually comes from you just need to store it as a field on the complete form. You should develop logic for merging the fields to a single form on the application side during the creation process.
It sounds like you're trying to create a database in a database. There's a dailywtf link I'm looking for somewhere...
Anyway, it sounds like you need an Appointment table, and a Patient table, and a Doctor table, and a Technician table, and then you need to join them properly.
For example, to see the patients, doctors, and techs from the appointments yesterday, you might do
SELECT
Appointment.start-time
Appointment.end-time
Patient.name
Patient.insurance-carrier
Doctor.name
Tech.name
Tech.home-lab
FROM Appointment
JOIN Patient on Appointment.patient-id = Patient.patient-id
JOIN Doctor on Appointment.doctor-id = Doctor.doctor-id
JOIN Tech on Appointment.tech-id = Tech.tech-id
WHERE Appointment.appointment-date = $YESTERDAY
Edit: Let's give the example of Patient with a variable number of fields
Table Patient - contains data ALL patients will have
| ID | Name | Insurance Carrier | .. other fields
+------+-------------+-------------------+-------
+ 0001 | John Doe | ABC Healthcare |
+ 0002 | Jane Doe | ABC Healthcare |
+ 0003 | Jon Skeet | C# Insurance Inc. |
+ 0004 | Mark Byers | Gold Badge Health |
+------+-------------+-------------------+-------
Table Patient-Form
| Form-Name | Form-Field | Required | Default-Value |
+-----------+------------------+----------+---------------|
| Vitals | Blood Pressure | TRUE | null |
| Vitals | Pulse | TRUE | null |
| Vitals | Ear Temperature | FALSE | null |
| Lab Work | Lab Room | TRUE | Lab-001 |
| Lab Work | Technician | TRUE | null |
| Lab Work | Insurance Covers | TRUE | NO |
| Payment | Balance | TRUE | $0.00 |
| Payment | Co-Pay | FALSE | 0.00% |
| Payment | Deductable | FALSE | $0.00 |
| Payment | Payment Terms | FALSE | 30 Days Full |
+-----------+------------------+----------+---------------|
Table Patient-Form-Field - contains data that may or may not be available for a patient
| Patient-ID | Form-Name | Form-Field | Form Value |
+------------+-----------+------------------+------------+
+ 0001 | Vitals | Blood Pressure | 130 / 54 |
+ 0001 | Vitals | Pulse | 84bpm |
+ 0001 | Vitals | Ear Temperature | 98.4F |
+ 0002 | Vitals | Blood Pressure | 126 / 74 |
+ 0002 | Vitals | Pulse | 87bpm |
+ 0002 | Vitals | Ear Temperature | 99.0F |
+ 0003 | Lab Work | Lab Room | SO-Meta |
+ 0003 | Lab Work | Technician | Rose Smith |
+ 0003 | Lab Work | Insurance Covers | TRUE |
+ 0003 | Vitals | Blood Pressure | 190 / 100 |
+ 0003 | Vitals | Pulse | 213bpm |
+------------+-----------+------------------+------------+
You can now query this like this:
SELECT
Patient.name
Patient-form-field.form-name
Patient-form-field.form-field
Patient-form-field.form-value
FROM Patient
JOIN Patient-Form-Field on ( Patient.patient-id = patient.id
AND Patient-form-field in ("Vitals","Lab Work")
)
WHERE Patient.patient-id IN ("0001","0002","0003")

Categories