Algorithm or MySQL Query Advice Needed

Algorithm or MySQL Query Advice Needed - php

I am in desperate need of either algorithm or query construction assistance.
We have a user-generated, flexible database that is created using a form builder we have created. The data for these forms is stored in two tables as follows:
The instances table tell us what form the user is viewing, and then the instance_records table has all the data for the instance. The field_id column tells us what field on the form the data maps to. The reason we use a single table like this instead of creating a table for each form is that MySQL limits how many columns we can have in table given that the data is varchar of a significant length. One possibility would be to use Text fields for the data, but then we would lose the built in MySQL searching capabilities.
Things work quite well and very fast on basic forms. The problem is that one instance of a form can refer to another instance of the form. For example, we have user created form called Appointments. On this form, it refers to the Patient form, the Technician form, the Doctor form, etc.
So, on the Appointment form with instance id, the value for the patient field is actually an instance id of a patient, the doctor field value is the instance id for the doctor, etc. At the first level of references, things aren’t too bad. But, you can have chains of references. I can have a prescription that refers to an appointment that refers to a patient, etc. So, if I want to get the value of the patient name on the prescription, I have to follow the chain down to get the right instance id and field id for the data.
So, if I want to do a report on Appointments and show the Patient name, the Doctor name, and the Technician name, I have to go through some hoops. What I have tried is creating views and then joining the views to a final view that shows all the data for the query. But, it eats a ton of memory and starts writing the view temporary tables to disk and gets slow as all heck. Using query caching, the second time the report runs, it’s fast as heck. But, that first run can take over a minute once we get above 5000-7000 instances.
Something tickling at the back of my mind is that there might be some sort of a way to store the data in a way that I can take advantage of some faster tree search algorithms.

You should read up on EAV... This article might give you some ideas... It talks about two different approaches for storing values. Either approach you end up having a single query for any given form that would essentially grab all the values for the master entity (in this case the form). Then either on the application side or the db side you aggregate those values together appropriately for the application to consume.
The form itself should be a single atomic unit that has a list of fields, you dont need to store which form a field actually comes from you just need to store it as a field on the complete form. You should develop logic for merging the fields to a single form on the application side during the creation process.

It sounds like you're trying to create a database in a database. There's a dailywtf link I'm looking for somewhere...
Anyway, it sounds like you need an Appointment table, and a Patient table, and a Doctor table, and a Technician table, and then you need to join them properly.
For example, to see the patients, doctors, and techs from the appointments yesterday, you might do
SELECT
Appointment.start-time
Appointment.end-time
Patient.name
Patient.insurance-carrier
Doctor.name
Tech.name
Tech.home-lab
FROM Appointment
JOIN Patient on Appointment.patient-id = Patient.patient-id
JOIN Doctor on Appointment.doctor-id = Doctor.doctor-id
JOIN Tech on Appointment.tech-id = Tech.tech-id
WHERE Appointment.appointment-date = $YESTERDAY
Edit: Let's give the example of Patient with a variable number of fields
Table Patient - contains data ALL patients will have
| ID | Name | Insurance Carrier | .. other fields
+------+-------------+-------------------+-------
+ 0001 | John Doe | ABC Healthcare |
+ 0002 | Jane Doe | ABC Healthcare |
+ 0003 | Jon Skeet | C# Insurance Inc. |
+ 0004 | Mark Byers | Gold Badge Health |
+------+-------------+-------------------+-------
Table Patient-Form
| Form-Name | Form-Field | Required | Default-Value |
+-----------+------------------+----------+---------------|
| Vitals | Blood Pressure | TRUE | null |
| Vitals | Pulse | TRUE | null |
| Vitals | Ear Temperature | FALSE | null |
| Lab Work | Lab Room | TRUE | Lab-001 |
| Lab Work | Technician | TRUE | null |
| Lab Work | Insurance Covers | TRUE | NO |
| Payment | Balance | TRUE | $0.00 |
| Payment | Co-Pay | FALSE | 0.00% |
| Payment | Deductable | FALSE | $0.00 |
| Payment | Payment Terms | FALSE | 30 Days Full |
+-----------+------------------+----------+---------------|
Table Patient-Form-Field - contains data that may or may not be available for a patient
| Patient-ID | Form-Name | Form-Field | Form Value |
+------------+-----------+------------------+------------+
+ 0001 | Vitals | Blood Pressure | 130 / 54 |
+ 0001 | Vitals | Pulse | 84bpm |
+ 0001 | Vitals | Ear Temperature | 98.4F |
+ 0002 | Vitals | Blood Pressure | 126 / 74 |
+ 0002 | Vitals | Pulse | 87bpm |
+ 0002 | Vitals | Ear Temperature | 99.0F |
+ 0003 | Lab Work | Lab Room | SO-Meta |
+ 0003 | Lab Work | Technician | Rose Smith |
+ 0003 | Lab Work | Insurance Covers | TRUE |
+ 0003 | Vitals | Blood Pressure | 190 / 100 |
+ 0003 | Vitals | Pulse | 213bpm |
+------------+-----------+------------------+------------+
You can now query this like this:
SELECT
Patient.name
Patient-form-field.form-name
Patient-form-field.form-field
Patient-form-field.form-value
FROM Patient
JOIN Patient-Form-Field on ( Patient.patient-id = patient.id
AND Patient-form-field in ("Vitals","Lab Work")
)
WHERE Patient.patient-id IN ("0001","0002","0003")

Related

Best practices linking data in MySQL tables

For an online game, I have a table that contains all the plays, and some information on those plays, like the difficulty setting etc.:
+---------+---------+------------+------------+
| play-id | user-id | difficulty | timestamp |
+---------+---------+------------+------------+
| 1 | abc | easy | 1335939007 |
| 2 | def | medium | 1354833214 |
| 3 | abc | easy | 1354833875 |
| 4 | abc | medium | 1354833937 |
+---------+---------+------------+------------+
In another table, after the game has finished, I store some stats related to that specific game, like the score etc:
+---------+----------------+--------+
| play-id | type | value |
+---------+----------------+--------+
| 1 | score | 201487 |
| 1 | enemies_killed | 17 |
| 1 | gems_found | 4 |
| 2 | score | 110248 |
| 2 | enemies_killed | 12 |
| 2 | gems_found | 7 |
+---------+----------------+--------+
Now, I want to make a distribution graph so users can see in what score percentile they are. So I basically want the boundaries of the percentiles.
If it would be on a score level, I could rank the scores and start from there, but it needs to be on a highscore level. So mathematically, I would need to sort all the highscores of users, and then find the percentiles.
I'm in doubt what's the best approach here.
On one hand, constructing an array that holds all the highscores seems like a performance heavy thing to do, because it needs to cycle through both tables and match the scores and the users (the first table holds around 10M rows).
On the other hand, making a separate table with the highscore of users would make things easier, but it feels like it's against the rules of avoiding data redundancy.
Another approach that came to mind was doing the performance heavy thing once a week and keep the result in a separate table, or doing the performance heavy stuff on only a (statistically relevant) subset of the data.
Or maybe I'm completely missing the point here and should use a completely different database setup?
What's the best practice here?

Database structure to track item's lifetime events

I would like to be able to track lifetime events of certain item and to be able to reconstruct its state at any time in the past for vizualization purposes. "State" here means a snapshot of several parameters, e.g. location, temperature and being alive/dead. Raw parameter values are recorded/entered only "on change" and independent from each other.
How should I store the parameter change events to be able to reconstruct the state later?
I can think of two possible solutions:
Solution 1: "Snapshot" table
+----------+-------------+------+------+
| Location | Temperature | Dead | Time |
+----------+-------------+------+------+
| A | + | 0 | 001 |
+----------+-------------+------+------+
| A | - | 0 | 002 |
+----------+-------------+------+------+
| B | + | 0 | 005 |
+----------+-------------+------+------+
On parameter change the state itself is updated and stored. To get a state of an item at a certain point is as simple as fetching one row.
This is exactly what I need, except:
Redundant data, all parameters are recorded even if only one has changed at the time
Table has to be altered if attribute set changes in the future
Knowing when a certain parameter changed is impossible without row comparison
Solution 2: Recording events
table stores individual parameters/changes rather than a complete shapshot.
+----+-----------+------------+------+
| ID | EventType | EventValue | Time |
+----+-----------+------------+------+
| 1 | loc | A | 001 |
+----+-----------+------------+------+
| 2 | temp | + | 001 |
+----+-----------+------------+------+
| 3 | temp | - | 002 |
+----+-----------+------------+------+
| 4 | loc | B | 005 |
+----+-----------+------------+------+
| 5 | temp | + | 005 |
+----+-----------+------------+------+
While this solution is more flexible than the first, it is problematic to reconstruct the snapshot. For example, how to efficiently check what is the temperature, location and viability at a time 004 in as few DB queries as possible?
Are there other solutions for this problem?
(P.S. This is for a biology experiment web app using php+Doctrine2+MySQL)

Using your Solution 2 you can very easy get everything you need:
SELECT DISTINCT (t1.eventType),t1.eventValue, t2.*
FROM `events` AS t1
LEFT JOIN
(SELECT eventtype, max(time) AS time
FROM events
WHERE events.`time`<='004'
GROUP BY eventtype ) AS t2
ON t1.eventType=t2.eventType
WHERE t1.time=t2.time
so this query will return all different attribute that was valid for time 004 , and you will see when each of attribute was set

Your second solution is looking pretty solid. There are other ways to organize the data, such as an field level revision table, which is a touch more structure than you currently have.
Using the second solution you could get a snapshot in one query with a sub-query. I assume this is something that "just needs to be done" and doesn't rely on the most efficient query.
SELECT * FROM (
SELECT * FROM event
WHERE time >= '003'
ORDER BY Time DESC) AS temp
GROUP BY EventType;

Database design with undetermined data

Recently I have been planning a system that allows a user to customize and add to a web interface. The app could be compared to a quiz creating system. The problem I'm having is how to design a schema that will allow for "variable" numbers of additions to be made to the application.
The first option that I looked into was just creating an object for the additions and then serializing it and putting it in its own column. The content wouldn't be edited often so writing would be minimal, reads however would be very often. (caching could be used to cut down)
The other option was using something other than mysql or postgresql such as cassandra. I've never used other databases before but would be interested in learning how to use them if they would improve the design of the system.
Any input on the subject would be appreciated.
Thank you.
*edit 29/3/14
Some information on the data being changed. For my idea above of using a serialized object, you could say that in the table I would store the name of the quiz, the number of points the quiz is worth and then a column called quiz data that would store the serialized object containing the information on the questions. So overall the object could look like this:
Questions(Array):{
[1](Object):Question{
Field-type(int):1
Field-title(string):"Whats your gender?"
Options(Array):{"Female", "Male"}
}
[2](Object):Question{
Field-type(int):2
Field-title(string):"Whats your name?"
}
}
The structure could vary of course but generally i would be storing integers to determin the type of field in the quiz and then a field to hold the label for the field and the options (if there are any) for that field.

In this scenario I would advise looking at MongoDB.
However if you want to work with MySQL you can think about the entity-attribute-value model in your design. The EAV model allows you to design for entries that contain a variable number of attributes.
edit
Following your update on the datatypes you would like to store, you could map your design as follows:
+-------------------------------------+
| QuizQuestions |
+----+---------+----------------------+
| id | type_id | question_txt |
+----+---------+----------------------+
| 1 | 1 | What's your gender? |
| 2 | 2 | What's your name? |
+----+---------+----------------------+
+-----------------------------------+
| QuestionTypes |
+----+--------------+---------------+
| id | attribute_id | description |
+----+--------------+---------------+
| 1 | 1 | Single select |
| 2 | 2 | Free text |
+----+--------------+---------------+
+----------------------------+
| QuestionValues |
+----+--------------+--------+
| id | question_id | value |
+----+--------------+--------+
| 1 | 1 | Male |
| 2 | 1 | Female |
+----+--------------+--------+
+-------------------------------+
| QuestionResponses |
+----+--------------+-----------+
| id | question_id | response |
+----+--------------+-----------+
| 1 | 1 | 1 |
| 2 | 2 | Fred |
+----+--------------+-----------+
This would then allow you to dynamically add various different questions (QuizQuestions), of different types (QuestionTypes), and then restrict them with different options (QuestionValues) and store those responses (QuestionResponses).

Database Schema suggestions please

I have a scenario and i'm confused about how i can go about designing the database schema for it.
In my software (php)
there are companies and applications.
companies need to have licenses to access applications.
now the fields (for form while purchasing licenses) for each application is different.
for ex:
for application1:
fields are:
no of users
no of groups
for application2:
no of users
for application3:
number of hours of usage
Prices are based on these fields.
Now i need to design schema for this so that on one page company can manage licenses for all applications.
How can i make this schema generic?
Please help.
Thanks.

You can go with this type of structure
select * from applicationMaster
| APPID | APPNAME |
------------------------
| 1 | Application1 |
| 2 | Application2 |
ApplicationMaster will go with main Application related details which won't be repeated such Name, date etc.
Query 2:
select * from applicationField
| FIELDID | APPID | FIELDNAME |
---------------------------------
| 1 | 1 | NoOfUsers |
| 2 | 1 | NoOfGroups |
| 3 | 2 | NoHourusage |
ApplicationField can adjust any number of field for a particular appId.
So AppId 1 has 2 fields NoofUsers and NoOfGroups. It is also capable to adjust newer fields for a particular app if you want.
Query 3:
ApplicationValue will have the values for every license aplication so it will have compId which represents which company has applied using fieldId which refers to applicationField table we can get for which app values are stored.
select * from applicationValue
| ID | COMPID | FIELDID | FIELDVALUE |
--------------------------------------
| 1 | 1 | 1 | 50 |
| 2 | 1 | 2 | 150 |
| 3 | 2 | 3 | 350 |
| 4 | 3 | 1 | 450 |
| 5 | 3 | 2 | 50 |
applicationPriceMaster stores the price package for each application. There could be multiple package for a application.
select * from applicationPriceMaster
| APPPACKAGE | APPID | TOTALPRICE |
-----------------------------------
| 1 | 1 | 50 |
| 2 | 1 | 100 |
For each application package its details will posted in this table.
select * from applicationPriceDetail
| APPPACKAGE | FIELDID | QUANT |
--------------------------------
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 2 | 1 | 10 |
| 2 | 2 | 1 |
NOTE Please check the structure as it is now too complex and check what type of queries you would be running on these table and its performance.
select apm.APPPACKAGE, TOTALPRICE from
applicationPriceMaster apm
inner join
(select APPPACKAGE from applicationPriceDetail
where FIELDID=1 and QUANT=1)a
on apm.APPPACKAGE = a.APPPACKAGE
inner join
(select APPPACKAGE from applicationPriceDetail
where FIELDID=2 and QUANT=1)b
on
a.APPPACKAGE=b.APPPACKAGE
SQL FIDDLE:
| APPPACKAGE | TOTALPRICE |
---------------------------
| 1 | 50 |
For single filter you have to use this query, so you have to increase number of inner query with the number of inner filter.
select apm.APPPACKAGE, TOTALPRICE from
applicationPriceMaster apm
inner join
(select APPPACKAGE from applicationPriceDetail
where FIELDID=1 and QUANT=1)a
on apm.APPPACKAGE = a.APPPACKAGE
NOTE-This query is quite complex and will only work if the values are same as mentioned in the packagedetail table and will work only if the values are 2 filter you have to remove 1 inner join if there is only 1 filter. So I suggest you to reconsider before using this approach.

What you have there, could be easily mapped to Classes in an OO language (like PHP). You have an Abstract License, and then 3 Subclasses (ApplicationByUsersAndGroups, etc). Then, mapping to a Relational database is a very common problem, here is a nice article about it: http://www.ibm.com/developerworks/library/ws-mapping-to-rdb/
It has 3 options, it depends on the way you want to structure your application which one you should use. I recommend reading it, it is not that long.

One way is
Table LICENCES:
LICENSE_ID ==> UNIQUE IDENTIFIER
COMPANY_ID ==> references table COMPANIES
APPLICATION_ID ==> references table APPLICATIONS
LICENCE_TYPE ==> either of "BY_GROUPS_AND_USERS", "BY_USERS", "BY_HOURS"
LICENCE_BODY_ID ==> ID of specific body table
[...]
Table LIC_TYPE_BY_GROUPS_AND_USERS:
LICENCE_BODY_ID ==> body identifier
NO_GROUP
NO_USERS
[...]
Table LIC_TYPE_BY_USERS:
LICENCE_BODY_ID ==> body identifier
NO_USERS
[...]
This way, your intention is clear. Even after long time comming back, you will know in no time how things are organized, which fields are used in which case...

how about a table structured this way:
LicenseId int PK
CompanyId Int PK
AppId Int PK
LicenseType int
NumberOfUsers int
NumberOfGroups int
NumberOfHours int
Price Money
Depending on LicenseType, you will use different column in your business logic,
you might need to add CompanyID and/or AppID, that depends how you going to structure those tables as well as relation ships between company/app/license.
Some questions to think about:
Can one company have different License Types for same App?
Can one company have different Apps?

Dont complicate things, if the number of users is unlimited then set it to 999999 or some other max value.
This keeps the license check logic (which will run every time a user logs in ) simple and the same for all applications.
You will need extra logic in the licenses maintenance application, but this should also be pretty simple:
if the cost_per_user is = 0 then set no_of_users = 99999
Again you end up with the same licensing screen and logic for all your applications.

How to pass or display mySQL data based on subscription or billing

I want to build a PHP based site where, the user can view data based on the types of data they've paid for.
Allow me to use something simple for an example.
Let's say historical data for basketball was not readily available but could be purchased.
Simple information such as the Winner, Loser, Final score and date are all stored in a mySQL table.
What would be involved so that, when the user logs in, they can only see the historical data they have paid for.
My theories so far about the architecture:
I imagined a mySQL table storing True or False values for all historical game data they have paid for. Based on this, a 'data chart' object enables the user to view all data within their mySQL row which has a value of 'true.'
Follow ups:
Assuming I am correct, what methods are popular or practical for this type of service.

If we're going with the sporting paradigm...
Every game needs to have a unique ID. Then you have a table of customers, each also with a unique ID. Then you have a table that describes who paid for what. A table with two columns of ID's, customerID and gameID. This is what is called normalization.
So, in your joining table you might have CUSTOMER ID 001, who have paid for games 001, 003, and 005. Here is the customer table:
.------------------------------.
| customer_id | customer_name |
|------------------------------|
| 001 | SPM, Inc. |
| 002 | Stack Overflow |
'------------------------------'
Here is the game table:
.---------------------------------.
| game_id | description |
|---------------------------------|
| 001 | Giants v. Red Sox | |
| 002 | Blah v. Yada |
| 003 | Vader v. Kenobi |
| 004 | Romney v. Obama |
| 005 | Roth v. Hagar |
'---------------------------------'
Here is the table that corresponds to who paid for what:
.-----------------------------.
| customer_id | game_id |
|-----------------------------|
| 001 | 001 |
| 001 | 003 |
| 001 | 005 |
| 002 | 002 |
| 002 | 005 |
'-----------------------------'
Notice how the ID's in the last table are not unique.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.