Tracking data changes

Tracking data changes - php

I work on a market research database centric website, developed in PHP and MySQL.
It consists of two big parts – one in which users insert and update own data (let say one table T with an user_id field) and another in which an website administrator can insert new or update existing records (same table).
Obviously, in some cases end users will have their data overridden by the administrator while in other cases, administrator entered data is updated by end users (it is fine both ways).
The requirement is to highlight the view/edit forms with (let’s say) blue if end user was the last to update a certain field or red if the administrator is to “blame”.
I am looking into an efficient and consistent method to implement this.
So far, I have the following options:
For each record in table T, add another one ( char(1) ) in which write ‘U’ if end user inserted/updated the field or ‘A’ if the administrator did so. When the view/edit form is rendered, use this information to highlight each field accordingly.
Create a new table H storing an edit history containing something like user_id, field_name, last_update_user_id. Keep table H up-to-date when fields are updated in main table T. When the view/edit form is rendered, use this information to highlight each form field accordingly.
What are the pros/cons of these options; can you suggest others?

I suppose it just depends how forward-looking you want to be.
Your first approach has the advantage of being very simple to implement, is very straightforward to update and utilize, and also will only increase your storage requirements very slightly, but it's also the extreme minimum in terms of the amount of information you're storing.
If you go with the second approach and store a more complete history, if you need to add an "edit history" in the future, you'll already have things set up for that, and a lot of data waiting around. But if you end up never needing this data, it's a bit of a waste.
Or if you want the best of both worlds, you could combine them. Keep a full edit history but also update the single-character flag in the main record. That way you don't have to do any processing of the history to find the most recent edit, just look at the flag. But if you ever do need the full history, it's available.
Personally, I prefer keeping more information than I think I'll need at the time. Storage space is very cheap, and you never know when it's going to come in handy. I'd probably go even further than what you proposed, and also make it so the edit history keeps track of what they changed, and the before/after values. That can be very handy for debugging, and could be useful in the future depending on the project's exact needs.

Yes, implement an audit table that holds copies of the historical data, by/from whom &c. I work on a system currently that keeps it simple and writes the value changes as simple name-value string pairs along with date and by whom. It requires mandatory master record adjustment, but works well for tracking. You could implement this easily with a trigger.

The best way to audit data changes is through a trigger on the database table. In your case you may want to just update the last person to make the change. Or you may want a full auditing solution where you store the previous values making it easy to restore them if they were made in error. But the key to this is to do this on the database and not through the application. Database changes are often made through sources other than the application and you will want to know if this happened as well. Suppose someone hacked into the database and updated the data, wouldn't you like to be able to find the old data easily or know who did it even if he or she did it through a query window and not through the application? You might also need to know if the data was changed through a data import if you ever have to get large amounts of data at one time.

Related

What do you think of this approach for logging changes in mysql and have some kind of audit trail

I've been reading through several topics now and did some research about logging changes to a mysql table. First let me explain my situation:
I've a ticket system with a table: 'ticket'
As of now I've created triggers which will enter a duplicate entry in my table: 'ticket_history' which has "action" "user" and "timestamp" as additional columns. After some weeks and testing I'm somewhat not happy with that build since every change is creating a full copy of my row in the history table. I do understand that disk space is cheap and I should not worry about it but in order to retrieve some kind of log or nice looking history for the user is painful, at least for me. Also with the trigger I've written I get a new row in the history even if there is no change. But this is just a design flaw of my trigger!
Here my trigger:
BEFORE UPDATE ON ticket FOR EACH ROW
BEGIN
INSERT INTO ticket_history
SET
idticket = NEW.idticket,
time_arrival = NEW.time_arrival,
idticket_status = NEW.idticket_status,
tmp_user = NEW.tmp_user,
action = 'update',
timestamp = NOW();
END
My new approach in order to avoid having triggers
After spening some time on this topic I came up with an approach I would like to discuss and implement. But first I would have some questions about that:
My idea is to create a new table:
id sql_fwd sql_bwd keys values user timestamp
-------------------------------------------------------------------------
1 UPDATE... UPDATE... status 5 14 12345678
2 UPDATE... UPDATE... status 4 7 12345678
The flow would look like this in my mind:
At first I would select something or more from the DB:
SELECT keys FROM ticket;
Then I display the data in 2 input fields:
<input name="key" value="value" />
<input type="hidden" name="key" value="value" />
Hit submit and give it to my function:
I would start with a SELECT again: SELECT * FROM ticket;
and make sure that the hidden input field == the value from the latest select. If so I can proceed and know that no other user has changed something in the meanwhile. If the hidden field does not match I bring the user back to the form and display a message.
Next I would build the SQL Queries for the action and also the query to undo those changes.
$sql_fwd = "UPDATE ticket
SET idticket_status = 1
WHERE idticket = '".$c_get['id']."';";
$sql_bwd = "UPDATE ticket
SET idticket_status = 0
WHERE idticket = '".$c_get['id']."';";
Having that I run the UPDATE on ticket and insert a new entry in my new table for logging.
With that I can try to catch possible overwrites while two users are editing the same ticket in the same time and for my history I could simply look up the keys and values and generate some kind of list. Also having the SQL_BWD I simply can undo changes.
My questions to that would be:
Would it be noticeable doing an additional select everytime I want to update something?
Do I lose some benefits I would have with triggers?
Are there any big disadvantages
Are there any functions on my mysql server or with php which already do something like that?
Or is there might be a much easier way to do something like that
Is maybe a slight change to my trigger I've now already enough?
If I understad this right MySQL is only performing an update if the value has changed but the trigger is executed anyways right?
If I'm able to change the trigger, can I still prevent somehow the overwriting of data while 2 users try to edit the ticket the same time on the mysql server or would I do this anyways with PHP?
Thank you for the help already

Another approach...
When a worker starts to make a change...
Store the time and worker_id in the row.
Proceed to do the tasks.
When the worker finishes, fetch the last worker_id that touched the record; if it is himself, all is well. Clear the time and worker_id.
If, on the other hand, another worker slips in, then some resolution is needed. This gets into your concept that some things can proceed in parallel.
Comments could be added to a different table, hence no conflict.
Changing the priority may not be an issue by itself.
Other things may be messier.
It may be better to have another table for the time & worker_ids (& ticket_id). This would allow for flagging that multiple workers are currently touching a single record.
As for History versus Current, I (usually) like to have 2 tables:
History -- blow-by-blow list of what changes were made, when, and by whom. This is table is only INSERTed into.
Current -- the current status of the ticket. This table is mostly UPDATEd.
Also, I prefer to write the History directly from the "database layer" of the app, not via Triggers. This gives me much better control over the details of what goes into each table and when. Plus the 'transactions' are clear. This gives me confidence that I am keeping the two tables in sync:
BEGIN; INSERT INTO History...; UPDATE Current...; COMMIT;

I've answered a similar question before. You'll see some good alternatives in that question.
In your case, I think you're merging several concerns - one is "storing an audit trail", and the other is "managing the case where many clients may want to update a single row".
Firstly, I don't like triggers. They are a side effect of some other action, and for non-trivial cases, they make debugging much harder. A poorly designed trigger or audit table can really slow down your application, and you have to make sure that your trigger logic is coordinated between lots of developers. I realize this is personal preference and bias.
Secondly, in my experience, the requirement is rarely "show the status of this one table over time" - it's nearly always "allow me to see what happened to the system over time", and if that requirement exists at all, it's usually fairly high priority. With a ticketing system, for instance, you probably want the name and email address of the users who created, and changed the ticket status; the name of the category/classification, perhaps the name of the project etc. All of those attributes are likely to be foreign keys on to other tables. And when something does happen that requires audit, the requirement is likely "let me see immediately", not "get a database developer to spend hours trying to piece together the picture from 8 different history tables. In a ticketing system, it's likely a requirement for the ticket detail screen to show this.
If all that is true, then I don't think history tables populated by triggers are a good idea - you have to build all the business logic into two sets of code, one to show the "regular" application, and one to show the "audit trail".
Instead, you might want to build "time" into your data model (that was the point of my answer to the other question).
Since then, a new style of data architecture has come along, known as CQRS. This requires a very different way of looking at application design, but it is explicitly designed for reactive applications; these offer much nicer ways of dealing with the "what happens if someone edits the record while the current user is completing the form" question. Stack Overflow is an example - we can see, whilst typing our comments or answers, whether the question was updated, or other answers or comments are posted. There's a reactive library for PHP.

I do understand that disk space is cheap and I should not worry about it but in order to retrieve some kind of log or nice looking history for the user is painful, at least for me.
A large history table is not necessarily a problem. Huge tables only use disk space, which is cheap. They slow things down only when making queries on them. Fortunately, the history is not something you'd use all the time, most likely it is only used to solve problems or for auditing.
It is useful to partition the history table, for example by month or week. This allows you to simply drop very old records, and more important, since the history of the previous months has already been backed up, your daily backup schedule only needs to backup the current month. This means a huge history table will not slow down your backups.
With that I can try to catch possible overwrites while two users are editing the same ticket in the same time
There is a simple solution:
Add a column "version_number".
When you select with intent to modify, you grab this version_number.
Then, when the user submits new data, you do:
UPDATE ...
SET all modified columns,
version_number=version_number+1
WHERE ticket_id=...
AND version_number = (the value you got)
If someone came in-between and modified it, then they will have incremented the version number, so the WHERE will not find the row. The query will return a row count of 0. Thus you know it was modified. You can then SELECT it, compare the values, and offer conflict resolution options to the user.
You can also add columns like who modified it last, and when, and present this information to the user.
If you want the user who opens the modification page to lock out other users, it can be done too, but this needs a timeout (in case they leave the window open and go home, for example). So this is more complex.
Now, about history:
You don't want to have, say, one large TEXT column called "comments" where everyone enters stuff, because it will need to be copied into the history every time someone adds even a single letter.
It is much better to view it like a forum: each ticket is like a topic, which can have a string of comments (like posts), stored in another table, with the info about who wrote it, when, etc. You can also historize that.
The drawback of using a trigger is that the trigger does not know about the user who is logged in, only the MySQL user. So if you want to record who did what, you will have to add a column with the user_id as I proposed above. You can also use Rick James' solution. Both would work.
Remember though that MySQL triggers don't fire on foreign key cascade deletes... so if the row is deleted in this way, it won't work. In this case doing it in the application is better.

What is the best way to wait that an administrator validate something before comitting it?

I'm building a web application where several groups have their own page but if they want to modify it, an administrator has to validate it before.
For example, can change to change its logo, post new photo, change their phone number, their name, their location etc... Basically they can edit a value in the database but only if the administrator accepts it. The administrator has to validate every modification because... our customer asked us to.
That's why we have to create a system that could be called "pending queries" management.
At the beginning I thought that keeping the query in the database and executing when an administrator validate it was a good idea, but if we choose this option we can't use PDO to build prepared statements since we have to concatenate string to build our own statement, wich obvious security issues.
Then we thought that we should keep PHP code that calls the right methods (that use PDO) in our database and that we will execute with eval() when the administrator validates it. But again, it seems that using eval() is a very bad idea. As says this Rasmus Lerford's quote : "If eval() is the answer, you're almost certainly asking the
wrong question".
I thought about using eval because I want to call methods that uses PDO to deal with the database.
So, what is the best way to solve this problem ? It seems that there is no safe way to implements it.

Both your ideas are, to be frank, simply weird.
Add a field in a table to tell an approved content from unapproved one.

Here's one possible approach, with an attempt to keep the things organised to an extent, as the system begins to scale:
Create a table called PendingRequests. This will have to have most of the following fields and maybe quite a few more:
(id, request_type, request_contents, reqeust_made_by, request_made_timestamp,
request_approved_by, request_approved_timestamp, ....)
Request_contents is a broad term and it may not just be confined to one column alone. How you gather the data for this column will depend on the front-end environment you provide to the users (WYSIWYG, etc).
Request_approved_by will be NULL when the data is first inserted in the table (i.e. user has made an initial request). This way, you'll know which requests to present in the administration panel. Once an admin approves it, this column will be updated to reflect the id of the admin that approved it and the approved changes could eventually go live.
So far, we've only talked about managing the requests. Once that process is established, then the next question would be to determine how to finally map the approved requests to users. As such, it'd actually require a bit of study of the currently proposed system and its workflow. Though, in short, there may be two school of thoughts:
Method 1:
Create a new table each for everything (logo, phone number, name, etc) that is customisable.
Or
Method 2:
Simply add them as columns in one of your tables (which would essentially be in a 1:1 relationship with the user table, as far as attributes such as logo, name, etc. are concerned).
This brings us to Request_type. This is the field that will hold values / flags for the system to determine which field or table (depending on Method 1 or Method 2) the changes will be incident upon - after an admin has approved the changes.
No matter what requirement or approach it is to go about database management, PHP and PDO are both flexible enough to help write customisable and secure queries.
As an aside, it might be a good idea to maintain a table for history of all the changes / updates made. By now, it should probably be apparent that the number of history tables will once again depend on Method 1 or Method 2.
Hope that helps.

PHP / MySQL - approval system for edits

I have a business listings website. I am currently building an interface for users to log in and update their listings.
The functionality I would like to implement is as follows:
user edits listing and submits
the edits are sent to admin for approval
in the meantime the existing listing remains on the site
admin reviews the edits and corrects any mistakes before approving
Now usually something like this can be accomplished by storing a 'duplicate' row in the database, with a flag set to 'pending'.
However if possible I would like to try a different approach, as the listing data is stored across several tables, including one which contains multiple category selections.
So ideally I would prefer not to create additional database records. Is there a better alternative I could use?

There are actually 2 different approaches.
Use the database (either in the same tables, or some "pending changes" tables with almost the same structure as the original ones), although one way or another it creates records.
Don't use the database at all, but some intermediate store (key / value store; message queue; memcache; email;just some files; whatever comes to mind ;))
With option 2 you can easily do as the 2 commenters on your question already said: create a data structure and serialize this structure and store it anywhere until the admin "approves" the data and updates it in the database.
At first glance the second option would of course be quicker to implement, although implementing it in the database could give you the ability to "rollback" changes by using a current or state flag. Secondly, imho, you want to keep all your (related) data close together to keep it maintainable

How would I use an audit trail to display which fields have ever been edited?

For a project I am working on, I have been asked to create an audit trail of all changes that have been made to records. This is the first time I have had to create an audit trail, so I have been doing a lot of research on the subject.
The application will be developed in PHP/MSSQL, and will be low-traffic.
From my reading, I have pretty much decided to have an audit table and use triggers to record the changes in the table.
The two requirements for display in the application are as follows:
Be able to see a log of all changes made to a field (I pretty much know how to do this)
Be able to see, when viewing a record in the application, an indicator next to any field in the record that has ever been changed (and possibly other info like the date of the last change).
Item #2 is the one that is currently giving me grief. Without doing a separate query for each field (or a very long nested query that will take ages to execute), does anyone have suggestions for an optimal way to do this? (I have thought of adding an extra "ModifiedFlag" field for each field in the table, that will act as boolean indicator if the field has ever been edited, but that seems like a lot of overhead.

I would treat the audit information separately from the actual domain information as much as possible.
Requirement #1:
I think you will create additional audit tables to record the changes.
Eric suggestion is a good one, creating the audit information using triggers in the SQL database. This way your application needs not be aware of the audit logic.
If your database does not support triggers, then perhaps you are using some kind of persistence or database layer. This would also be a good place to put this kind of logic, as again you minimize any dependencies between normal application code and the audit code.
Requirement #2:
As for showing the indicators: I would not create boolean fields in the table that stores the actual. (This would cause all sorts of dependencies to exist between your normal application code and your audit trail code.)
I would try to let the code responsible for displaying the form also be responsible for showing audit data on field level. This will cause query overhead, but that is the cost for displaying this extra layer of information. Perhaps you can minimize the database overhead by adding metadata to the audit information that allows for easy retrieval.
Some big Enterprisy application that I maintain uses roughly the following structure:
A change header table corresponding to a change of a record in a table.
Fields:
changeId, changeTable, changedPrimaryKey, userName, dateTime
- A change field table corresponding to a field that is changed.
Fields:
changeId, changeField, oldValue, NewValue
Sample content:
Change Header:
'1', 'BooksTable', '1852860138', 'AdamsD', '2009-07-01 15:30'
Change Item:
'1', 'Title', 'The Hitchhiker's Guide to the Gaxaly', 'The Hitchhiker's Guide to the Galaxy'
'1', 'Author', 'Duglas Adasm', 'Douglas Adams'
This structure allows both easy viewing of audit trails as well as easy retrieval for showing the desired indicators. One query (inner join in the Header and Items table) would be enough to retrieve all information to show in a single form. (Or even a table when you have a list of shown Id's)

As a general requirement flagging changed field "smells" slightly odd. If records are long lived and subject to change over time then eventually all fields will tend to get so flagged. Hence I wonder how any user could make sense of a simple set of indicators per field.
That line of thinking makes me suspect that the data you store needs to be, as you've described, a true audit trail with all the changes recorded, and the first real challenge is to decide how the info should be presented to the user.
I think your idea of preparing some kind of aggregateOfTheAuditTrail data is likely to be very useful. The question would be is a single flag per record enough? If the User's primary access is through list then maybe it's enough just to highlight the changed records for later drill down. Or a date of last change of the record value, so that only recently changed records are highlighted - all back to what the user's real needs are. I find it hard to imagine that records changed 3 years ago are as intersting as those changed last week.
Then when we come to the drill down to a single record. Again a simple flag per field doesn't sound useful (though your domain, your requirements). If it is, then your summary idea is fine. My guess is that a sequence of changes to a field, and the sequence of overall changes to the record, are much more interesting. Employee had pay rise, employee moved department, employee was promoted = three separate business events or one?
If anything more than a simple flag is needed then I suspect that you just need to return the whole (or recent) audit trail for the record and let the UI figure out how to present that.
So, my initial thought: Some kind of rolling-maintenance of a summary record sounds like a good idea. If necessary maintained in background threads or batch jobs. We deisgn that to be business-useful without going to the full audit trail each time. Then for detailed analyses we allow some or all of the trail to be retrieved.

Personally, I'd make the tracking simple, and the reporting funky.
Each time a user inserts a record, you make a insert into the audit table for that table
'I', 'Date', 'User', 'Data column1','Data Column2', etc.
That is assuming the structure of the tables won't change over time (re. the amount of datacolumns)
For updates, just insert
'U', 'Date', 'User', 'Data column1', etc
Insert what the user just entered as an update.
Then, after the insert and update, you will have the following
'I','May 3 2009','BLT','person005','John','Smith','Marketing'
'U','May 4 2009','BLT','person005','John','Smith','Accounting'
Then, it's just an easy report to show that the unique person record 'person005' has had an insert and an update, where their department was updated.
Due to the low usage of the system, having a simple insert on changing then a more complex reporting process isn't going to effect performance. This style will still work with higher traffic systems, as the extra load on a edit is minimal, whereas the higher intensity workload of reporting back the changes isn't done as often as an update, so the system won't fall over.

What's the best way to implement a "soft" save or modification workflow?

I'm working on a MVC based web app on LAMP that needs some records to be modified only under the approval of a "superior" user. (The normal user can submit changes but they get applied only after this approval)
There are is only a table in which this would have to take place, say "events":
EVENTS
- id
- name VARCHAR
- start_date DATETIME
- guest INTEGER
Every time one of the attributes of an events gets modified by a "normal" user, these changes are not made official until there's a revision (and possible approval) from this "super" user.
At first I though of the following options:
Duplicating each columns, except the id, say name_temp for "name", to hold the pending-approval modification.
Creating a separate table with a duplicate structure and hold there all the pending approval modifications.
Have you implemented this before? What do you think is the best/your way to do this? And also: Is there any pattern for this kind of problem?
Thanks!!!
PD: I need to keep the "old" record where it was, until the new one gets approved..

Let me add my vote for the "second table" approach -- as both other respondents said, it's head and shoulders above attempts to shoehorn "proposed changes" into the master table (that would complicate its schema, every query on it, etc, etc).
Writing all intended changes to an auxiliary table and processing them (applying them to the master table if they meed certain conditions) "in batches" later is indeed a common pattern, often useful when you need the master table to change only at certain times while proposed changes can come in any time, and also in other conditions, e.g. when writes to the master table are very costly (lock contention or whatever) but writing N changes is not much costlier than writing one, or validating a proposed change is very time-consuming (you could class your use case as an extreme case of the latter category, since in your case the validation requires a human to look at the proposed change -- VERY slow indeed by computer standards;-).

If it's only about a permission for adding new events, I would go with one additional boolean column, namely is_approved or similar.
There are edits possible, though, so, in my opinion, it's a big no-no to duplicate every column in the same table to store the temporary value. Just imagine what would happen when two users post their changes to the same event.
Option number two, namely a separate table, is a better choice. You can store each attempt there and just update your main table accordingly.
With this approach even a some form of rollbacking changes is possible. Just be sure you store a timestamp with each and never delete your approved edits.

Why not add 1 more column named IsApproved? bool or tinyint(1) type? and just now show approved events?
Edit: Oh it's every attribute approval. Then 2nd table would be my choice
named PendingEventChanges with same structure or just id+changeable properties, and on aproval "super" user would update original data and would remove etries from pending.
First choice is very much against database normalization, so adding more properties in future would cause trouble.

If history or versioning is important, combined with the approval, I would go for a single table. You would add a revision number, which could work with your existing primary key and could be incremented on each revision. You could then add a state field to show which is the current version, expired versions and needs approval versions. A nice index on the state and key field will get you snappy results. If this is just one, two or three fields out of many, then a specific table to handle these unapproved edits might be better as suggested above.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.