A better logging design or some SQL magic?

A better logging design or some SQL magic? - php

I'm knee deep in modifying some old logging code that i didn't write and wondering what you think of it. This is an event logger written in PHP with MySQL, that logs message like:
Sarah added a user, slick101
Mike deleted a user, slick101
Bob edited a service, Payment
Broken up like so:
Sarah [user_id] added a user [message], slick101 [reference_id, reference_table_name]
Into a table like this:
log
---
id
user_id
reference_id
reference_table_name
message
Please note that the "Bob" and "Payment" in the above example messages are Id's to other tables, not the actual names. A join is needed to get the names.
It looks like the "reference _ table _ name" is for finding the proper names in the correct table, since only the reference _ id is stored. This would probably be good if somehow i could join on a table name that stored in reference_table_name, like so:
select * from log l
join {{reference_table_name}} r on r.id = l.reference_id
I think I see where he was going with this table layout - how much better to have ids for statistics instead of a storing the entire message in a single column (which would require text parsing). Now I'm wondering..
Is there a better way or is it possible to do the make-believe join somehow?
Cheers

To get the join based on the modelling, you'd be looking at a two stage process:
Get the table name from LOG for a particular message
Use dynamic SQL by constructing the actual query as a string. IE:
"SELECT l.* FROM LOG l JOIN "+ tableName +" r ON r.id = l.reference_id"
There's not a lot of value to logged deletions because there's no record to join to in order to see what was deleted.
How much history does the application need?
Do you need to know who did what to a value months/years in the past? If records are required, they should be archived & removed from the table. If you don't need all the history, consider using the following audit columns on each table:
ENTRY_USERID, NOT NULL
ENTRY_TIMESTAMP, DATE, NOT NULL
UPDATE_USERID, NOT NULL
UPDATE_TIMESTAMP, DATE, NOT NULL
These columns allow you to know who created the record & when, and who last successfully updated it and when. I'd create audit tables on a case by case basis, it just depends on what functionality the user needs.

Related

Checking if a user changed any of their data in multiple tables

In my database, I have several tables. One is a checkpoint table that makes note of a user choosing to finalize one of their projects. This table contains a timestamp that is automatically created. Whenever a user finalizes their project a new row is added to the checkpoint table (that way we can also keep a history of previous times the project was finalized).
I have several other tables with timestamps (or tables that I could add timestamp columns too) that automatically update when their tables change.
Is there a simple way to be able to tell if any of the other tables have updated their data since the project was last finalized? I do not need to know which tables have changed data just that there are tables that have changed data.
For example, if a user changes data in one of their tables I want to be able to display a message indicating that their project has unfinalized data.
There are a couple of ways that I have thought about doing this:
Checking every single table to see if any timestamps are newer than the latest timestamp in the checkpoint table.
Add an additional timestamp column (I already have a created and updated timestamp column) to the main project table. Most of the other tables are linked directly or indirectly to this main project table. Add triggers to every other table to update this timestamp when their data changes. I am not quite sure yet how to correctly set up a proper trigger for this.
Creating a new table with just the project_id and a timestamp column. Add a trigger to the other tables as shown in option 2.
As new modules are added, I will be adding more tables to the project so will need something that is easy to scale as well.
Each of these approaches seems like there would be a lot of steps involved.
Would one of these approaches be more efficient or viable than another? Is there another approach that I am not thinking about? If triggers are the best way to do this how would I go about setting up the trigger?
A simplified overview of my tables looks like this:
main_project_table
id
user_id (FK to user_table)
created_timestamp
updated_timestamp
checkpoint_group_table (users can choose which group to finalize their project too)
id
user_id (FK to user_table)
group_name
checkpoint_table (the table that records the finalized data and time of finalization)
id
checkpoint_group_id (FK to checkpoint_group_table)
project_id (FK to main_project_table)
project_finalized_timestamp
parent_table (several of these)
id
project_id (FK to main_project_table)
child_table (0 or more of these for each parent_table)
id
parent_id (FK to parent_table)

You really only have three solutions: Middleware, Triggers, and General Log File.
Middleware solution:
Add a timestamp field to each relevant table, and set the default value is set to "CURRENT_TIMESTAMP". This will update the timestamp field to the current time on every update.
Assuming that users are going through some API, you can write a JOIN query where it returns the latest time stamp. It would look like this.
SELECT
CASE
WHEN b.timestamp IS NOT NULL THEN 0
WHEN c.timestamp IS NOT NULL THEN 0
WHEN d.timestamp IS NOT NULL THEN 0
WHEN e.timestamp IS NOT NULL THEN 0
ELSE 1
AS `test`
FROM checkpoint_table a
LEFT JOIN main_project_table b
ON a.project_id = b.id
AND b.timestamp > a.project_finalized_timestamp
LEFT JOIN checkpoint_group_table c
ON b.user_id = c.user_id
AND c.timestamp > a.project_finalized_timestamp
LEFT JOIN parent_table d
ON b.id = d.project_id
AND d.timestamp > a.project_finalized_timestamp
LEFT JOIN child_table e ON d.id = e.parent_id
ON b.id = d.project_id
AND e.timestamp > a.project_finalized_timestamp
Now when a request routed to the tables you can run this query and if test == 0, then you return the message.
<?php
class middleware{
public function getMessage(){
// run query
if($data[0]['test'] == 1){
return "project has unfinalized data";
}else{
return null;
}
}
}
Trigger Solution:
CREATE TRIGGER checkpoint_group_table
AFTER UPDATE on _table_
FOR EACH ROW UPDATE _table_
SET main_project_table.updated_timestamp = CURTIME()
WHERE main_project_table.user_id=checkpoint_group_table.id
The advantages to this are that it is perhaps more elegant than the middleware solution. The disadvantages are that triggers are not in plain view, and it is my experience, that when processes are in the background they eventually are forgotten. In the long term, you could be left with this Jenga puzzle, which would make like difficult.
General Log File Solution:
Mysql can log every query on the server. It is possible to access this log file during the time, parse it out, and figure out if any tables were updated. This way you can figure if anything was updated after the project was finalized.
Turn on a general log file.
SET GLOBAL general_log = 'ON';
Set the path of the log file.
SET GLOBAL general_log_file = 'var/log/mysql/mysql_general.log'
Confirm by going to the command terminal.
mysql -se "SHOW VARIABLES" | grep -e general_log
You might need to reset MySQL.
sudo service MySQL restart
This script can you started...
$v = shell_exec("sudo less /var/log/mysql/mysql_general.log");
$lines = explode("\n",$v);
$new = array();
foreach($lines as $i => $line){
if(substr($line,0,1) != " "){
if(isset($l)){
array_push($new,$l);
}
$l = $line;
}else{
$l.= preg_replace('/\s+/', ' ', $line);
}
}
$lines = $new;
$index = array();
foreach($lines as $i => $line){
$e = explode("\t",$line);
$new = array();
foreach($e as $key => $value){
$new[$key] = trim($value);
}
$index[$i] = $new;
}
This will result in this...
array(3) {
[0]=> string(27) "2017-10-01T08:17:04.659274Z"
[1]=> string(8) "70 Query"
[2]=> string(129) "UPDATE checkpoint_group_table SET group_name = 'Dev Group' Where id=6"
}
From here you can use a library called PHP-SQL-Parser to parse out the query.
The advantages to this approach might scale well, being that you will not have to add any columns to your database. The disadvantages are that this will involve more code and that means more complexity. You probably cannot really do this solution without writing unit tests for it.

If I would have been at your situation, I would have made a table with fields project id (FK) and boolean for is_finalized. So every time a project is finalized, I would add an entry in it.
+-----------------+--------------+
| project_id | is_finalized |
|-----------------|--------------|
| 12 | 1 |
+-----------------+--------------+
before any update/insert, Just check if this key exists for my project. if exists, change it to 0 and while loading the file, Just check if the value is 0. If 0, then show the Message: project has unfinalized data.
It should show the message only if the key exists and the value is 0. If the project is not finalized. The table won't have the value, hence no message.
Quite easy, faster in processing (rather than checking each timestamp) and extensible approach as it would be just dependent on the update or insert queries, which you can use in you upcoming modules in future.

timestamp comparison could be messy to do multiple check.
...I do not need to know which tables have changed data just that there are tables that have changed data...
Join-query to generate a (1) data-set, JSON/SERIALIZE it, then MD5, keep this hasted string into db. Next time compare it back, if there ANY different, the data-set has been changed. This is the general idea in large data/file comparison / code-repo.
but in light of...
..more tables to the project..
Then just use MD5 on each data-row in the table. Once changed the hashed string will be different.

Plan A: An off-the-wall solution:
Set up Master-Slave. The Slave will contain an 'old' copy of the data.
Establish "delayed" replication. Let's say 1 hour.
Get pt-table-checksum; run it twice an hour.
That will discover changes within an hour. (The timings may need tweaking if data size is quite large or small.)
Plan B
Deny all direct access from actual humans. Instead, build an application that handles all normal accesses through some API. Then I would instrument the API to collect whatever I choose.
Ad Hoc queries (for which there is no API):
Perhaps disallow them
Perhaps have a review board (me) to admit their running.
Perhaps have an API that runs the query, but immediately logs/emails/rings bells/whatever.

Really not sure why these answers are suggesting reliance on IDs or complex data-logging, this is a fairly common problem with some very simple solutions.
Use those parent/child relationships
Note: when documenting a schema, it is important to note more than just FK relationships, but also the type of replationship (one-to-one, many-to-one, one-to-many, many-to-many).
You already have a fairly well defined parent/child relationships, I assume to be:
main_project one<--many parent one<--many child
Use them one of two ways:
Update a date for parent and main_project which stores the most recent date any child was modified.
Use a combination of join/max/modified in a query utilizing main_project, parent, and child.
child_updated date
main_project.child_updated
parent.child_updated
Whenever updating any child, also update the child_modified dates for main_project and parent. Similar for parent, update main_project. This can be done with triggers, php, or some clever uses of joins or views to as the main_project objects. I would highly advise sticking to doing this with PHP models of those tables.
join/max/modified
Just build a query to get you four values, then check them:
checkpoint_table.main_project_finalized
main_project.modified
MAX(parent.modified)
MAX(child.modified)
These joins can get a bit tricky, so you'll have to play with this a bit.
SELECT m.modified as modified, MAX(c.project_finalized_timestamp) as finalized, MAX(p.modified) AS parent_modified, MAX(c.modified) as child_modified
FROM main_project_table m
LEFT JOIN checkpoint_table c
ON m.id = c.project_id
LEFT JOIN parent_table p
ON m.id = p.project_id
LEFT JOIN child_table c
ON p.id = c.parent_id
GROUP BY m.id
This will give you ONE row of all the dates you care about, allowing you to create some simple logic for it in PHP.
$result = // retrieve joined data as above
if ($result['finalized'] < max($result['modified'], $result['parent_modified'], $result['child_modified']) {
// changed
}

There are some good solutions mentioned so far. Another one is to make use of MySQL's information schema. Doing this, you can for example select all tables that have a timestamp field with the name you know, and check their modification times. This is probably the most dynamic and seamless approach but not really the best one. I would typically only do something like this if I was building an interface on top of legacy or third party code and didn't have control of that part of the application.
Architecturally I think the best approach is to have your application aware of pertinent tables / fields and do an audit of them. I am assuming that the data is relational to the object at question and therefore although they are foreign tables, they can still be easily checked for modifications.
Another good idea would be to add versioning to all of the tables in question so that during this step in your application you can show what changed.

Data from MYSQL Database straight to Input fields, depending on what is written in first input field

looking for guidance/assistance on PHP, MYSQL, HTML, Previously in all the code I've been given I'm very hack and slashy, and as long as it works that's great, and if a problem arises, I deal with it as they appear.
I have a form - http://jsfiddle.net/Yrdit/yQzsh/ - on an internal project. I have 1 database that contains numerous tables, but only 2 Should matter for what I need.
In my Form I have already figured out how to populate from 1 table the username and password.
In the two tables, there is 1 column that I want to be linked. - In Table 1 it is called company_name in table 2 it is called - user_company.
Table 1 handles the companies phone number, address etc.
Table 2 handles the username / password / name.
In my Fiddle I want Phone/ Address/ City/ Zip/ Country field's to be filled in depending on what is in the Company one.
Say if in my table my company is called CompanyABC, and has its details completed in the database Table 1. I want those values put into the fields I listed above.
$Table_1 = $db->get_row("SELECT user_login,user_password,user_name,user_email FROM Table_1 WHERE (user_id = $url_user_id) limit 1;");
Above is the part of code used as a request to get the login/ name/ password. Can anyone in a similar format guide me through how I can do what I've spoken about above please?
Apologies in advance if the formatting is wrong, I read the rules but might've missed something about formatting.

Not sure if I understand your question correctly but I'm assuming you wish to pull company info from table 1 and user info from table 2. If so you can use join - assuming you have some sort of identifier field in between the two... I.e. company_id. company names could work but this would not be good approach. 1. company names can be misspelled / uper-lower case issues / indexing performance. For the example sake lets join two tables on company name.
"SELECT t1.user_login,t1.user_password,t1.user_name,t1.user_email FROM Table_1 t1 LEFT JOIN Table_2 t2 ON t2.user_company = t1.company_name WHERE t1.user_id = $url_user_id LIMIT 1"
Hopefully that is the answer you were looking for.

Insert Array into MYSQL field

For a forum, i want to enable the users to send messages to each other to.
In order to do this, I made a table called Contacts, within this table I have 5 collumns: The user_id, a collumn for storing Friends, one for storing Family, one for storing Business and one for other contacts. These last four should all contain an array, which holds the user_id's of that type of contact. The reason I chose for this design is because I don't want to type an awful lot or limit the users on the amount of friends, like friend1, friend2 etc.
My question is: Is this correct how I do it? If not, what should be improved?And what type of MYSQL field should Friends, Family, Business and Other be?

What you should do instead of that is have a map table between your Contacts table and any related tables (User, Friends, Family, Business). The purpose would purely be to create a link between your Contact and your User(s) etc, without having to do what you're talking about and use arrays compacted into a varchar etc field.
Structured data approach gives you a much more flexible application.
E.g. UserContacts table purely contains its own primary key (id), a foreign key for Users and a foreign key for Contacts. You do this for each type, allowing you to easily insert, or modify maps between any number of users and contacts whenever you like without potentially damaging other data - and without complicated logic to break up something like this: 1,2,3,4,5 or 1|2|3|4|5:
id, user_id, contact_id
So then when you come to use this structure, you'll do something like this:
SELECT
Contacts.*
-- , Users.* -- if you want the user information
FROM UserContacts
LEFT JOIN Contacts ON (UserContacts.contact_id = Contacts.id)
LEFT JOIN Users ON (Users.id = UserContacts.user_id)

Use the serialize() and unserialize() functions.
See this question on how to store an array in MySQL:
Save PHP array to MySQL?
However, it's not recommended that you do this. I would make a separate table that stores all the 'connections' between two users. For example, if say John adds Ali, there would be a record dedicated to Ali and John. To find the friends of a user, simply query the records that have Ali or John in them. But that's my personal way of doing things.
I recommend that you query the users friends using PHP/MySQL all the time you need them. This could save considerable amount of space and would not take up so much speed.

serialize the array before storing and unserialize after retrieving.
$friends_for_db = serialize($friends_array);
// store $friends_for_db into db
And for retrieving:
// read $friends_for_db from db
$friends_array = unserialize($friends_for_db);
However, it should be wiser to follow other answers about setting up an appropriate many-to-many design.
Nevertheless, I needed this kind of design for a minor situation which a complete solution would not be necessary (e.g. easy storing/retrieving some multi-select list value which I'll never query nor use, other than displaying to user)

Too relation or not to relation ? A MySQL, PHP database workflow

im kinda new with mysql and i'm trying to create a kind complex database and need some help.
My db structure
Tables(columns)
1.patients (Id,name,dob,etc....)
2.visits (Id,doctor,clinic,Patient_id,etc....)
3.prescription (Id,visit_id,drug_name,dose,tdi,etc....)
4.payments (id,doctor_id,clinic_id,patient_id,amount,etc...) etc..
I have about 9 tables, all of them the primary key is 'id' and its set to autoinc.
i dont use relations in my db (cuz i dont know if it would be better or not ! and i never got really deep into mysql , so i just use php to run query's to Fitch info from one table and use that to run another query to get more info/store etc..)
for example:
if i want to view all drugs i gave to one of my patients, for example his id is :100
1-click patient name (name link generated from (tbl:patients,column:id))
2-search tbl visits WHERE patient_id=='100' ; ---> that return all his visits ($x array)
3-loop prescription tbl searching for drugs with matching visit_id with $x (loop array).
4- return all rows found.
as my database expanding more and more (1k+ record in visit table) so 1 patient can have more than 40 visit that's 40 loop into prescription table to get all his previous prescription.
so i came up with small teak where i edited my db so that patient_id and visit_id is a column in nearly all tables so i can skip step 2 and 3 into one step (
search prescription tbl WHERE patient_id=100), but that left me with so many duplicates in my db,and i feel its kinda stupid way to do it !!
should i start considering using relational database ?
if so can some one explain a bit how this will ease my life ?
can i do this redesign but altering current tables or i must recreate all tables ?
thank you very much

Yes, you should exploit MySQL's relational database capabilities. They will make your life much easier as this project scales up.
Actually you're already using them well. You've discovered that patients can have zero or more visits, for example. What you need to do now is learn to use JOIN queries to MySQL.
Once you know how to use JOIN, you may want to declare some foreign keys and other database constraints. But your system will work OK without them.
You have already decided to denormalize your database by including both patient_id and visit_id in nearly all tables. Denormalization is the adding of data that's formally redundant to various tables. It's usually done for performance reasons. This may or may not be a wise decision as your system scales up. But I think you can trust your instinct about the need for the denormalization you have chosen. Read up on "database normalization" to get some background.
One little bit of advice: Don't use columns named simply "id". Name columns the same in every table. For example, use patients.patient_id, visits.patient_id, and so forth. This is because there are a bunch of automated software engineering tools that help you understand the relationships in your database. If your ID columns are named consistently these tools work better.
So, here's an example about how to do the steps numbered 2 and 3 in your question with a single JOIN query.
SELECT p.patient_id p.name, v.visit_id, rx.drug_name, rx.drug_dose
FROM patients AS p
LEFT JOIN visits AS v ON p.patient_id = v.patient_id
LEFT JOIN prescription AS rx ON v.visit_id = rx.visit_id
WHERE p.patient_id = '100'
ORDER BY p.patient_id, v.visit_id, rx.prescription_id
Like all SQL queries, this returns a virtual table of rows and columns. In this case each row of your virtual table has patient, visit, and drug data. I used LEFT JOIN in this example. That means that a patient with no visits will have a row with NULL data in it. If you specify JOIN MySQL will omit those patients from the virtual table.

Multiple joins in database

This situation is pretty difficult to explain, but I'll do my best.
For school, we have to create a web application (written in PHP) which allows teachers to manage their students' projects and allow these to make peer-evaluation. As there are many students, every projects has multiple projectgroups (and ofcourse you should only peer-evaluate your own group members).
My databasestructure looks like this at the moment:
Table users: contains all user info (user_id is primary)
Table: projects: Contains a project_id, a name, a description and a start date.
So far this is pretty easy. But now it gets more difficult.
Table groups: Contains a group_id, a groupname and as a group is specific for a project, it also holds a project_id.
Table groupmembers: A group contains multiple users, but users can be in multiple groups (as they can be active in multiple projects). So this table contains a user_id and a group_id to link these.
At last, admins can decide when users need to do their peer-evaluation and how much time they have for it. So there is a last table evaluations containing an evaluation_id, a start and end date and a project_id (the actual evaluations are stored in a sixth table, which is not relevant for now).
I think this is a good design, but it gets harder when I actually have to use this data. I would like to show a list of evaluations you still have to fill in. The only thing you know is your user_id as this is stored in the session.
So this would have to be done:
1) Run a query on groupmembers to see in which groups the user is.
2) With this result, run a query on groups to see to which projects these groups are related.
3) Now that we know what projects the user is in, the evaluations table should be queried to see if there are ongoing evaluations for this projects.
4) We now know which evaluations are available, but now we also need to check the sixth table to see if the user has already completed this evaluation.
All these steps are dependent on the result of each other, so they should all contain their own error handling. Once the user has chosen the evaluation they wish to fill in (a evaluationID will be send via GET), a lot of new queries will have to be run to check which users this member has in his group and will have to evaluate and another check to see which other groupmembers are already evaluated).
As you see, this is quite complex. With all the errorhandling included, my script will be a real mess. Someone told me a "view" might help in this situation, but I don't really understand why this would help me here.
Is there a good way to do this?
Thank you very much!

you are thinking too procedurally.
all your conditions should be easily entered into one single where clause of a sql statement.
you will end up with a single list of the items to be evaluated. only one list, only one set of error handling.

Not sure if this is exactly right, but try this basic approach. I didn't run this against an actual database so the syntax may need to be tweaked.
select p.project_name
from projects p inner join evaluations e on p.project_id = e.project_id
where p.project_id in (
select project_id
from projects p inner join groups g on p.project_id = g.project_id
inner join groupmembers gm on gm.group_id = g.group_id
where gm.user_id = $_SESSION['user_id'])
Also, you'll need to make sure that you properly escape your user_id when making it a part of the query, but that is a whole other topic.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.