Logging user activities in applications

Logging user activities in applications - php

The problem I'm here to talk about and (ask about of course) is not new. I searched web and stack overflow and I got ideas to many part of this problem (pros and cons) but there is still some part missing in my mind. So I thought it would be a good idea to share in one place (of course it will be more complete with others' ideas) and ask for it.
The problem is clear: "We Want to log every single action of user" - probably when we solve the big problem, smaller ones (like logging only one action would be piece of cake).
First from what I read over the web and stack overflow:
Use DB instead of File: That's a good advice although it always depends on situation. But because of many benefits of DB, in long term and in general, it's the better solution.
DB Layer or Application Layer: Actually it depends. For example If you want really monitor everything(I mean really every single rows that changes in Database, it seems we will have one choice "Using Database Triggers". Although there are many discussions around MySQL that says, triggers slowdown DB and they advised not to use it. So it depends on the level of details you need, you can put your logging system in DB Layer or Application Layer(for exam some common function call $logClass->logThis()).
Use Observers: Clean codes are always better. If you are familiar with observers, you can use them to do things for you when an action is happened so you don't have to add $logClass->logThis() every time a CRUD happens in your application.
What To Log: Simple and short answer is: Based on your needs, but there are some common fields you will need:
user_id (if a unique user ID is available)
timestamp (unix maybe)
ip (not everyone know how to fake it in first place so use it, even faking it give you some insight about user behavior)
action_id (should be predefined actions for better unifying in queries and reports)
object_id (the unique row ID of a record that changes had made on)
action (which my question is about this part)
and etc...
I would appreciate if anyone correct me if I made mistake in any part or add other useful information to this post, so it would become one of good references for other users.
And now my question: How to Store actions?. For better understanding, consider following scenario.
I have a table named "product" and a table named "companies". From the business logic we want to assign products to companies, which we ended up in a table "company_product". Now when a user insert new product and simultaneously assign it's companies, 2 table will be affected (the same goes for delete and update): "product" and "company_product" and we want to know:
what's inserted?
what's deleted?
what's updated to what?
For performance issue and because I don't have enough knowledge about triggers, I want to use logging in Application Layer, so I ended up with this idea that I can, save action fields of database in array or json structure. But as I developed my solution I encountered a problem: How to make this log understandable for non technical users? Because for example I want to save something like this in action field of database when delete(insert) product with id 20:
action : [{id: 20, product_id:2, company_id: 1},{id: 21, product_id:2, company_id: 2}]
And this is not something easy for every one to read and understand. Actually I can use this json more readable and make it something like this:
action : {'Product A Deleted From Company X', 'Product A Deleted From Company Y'}
and save the previous action in technical_action field for further diagnose, But it needs additional works and more query to run for something that is not always needed to be considered(log)
I would appreciate any additional information on this article (I'm definitely sure that there exist other criteria that can be discussed), and answer to my question.

You are actually going to gather details for analytics kind of stuffs.
It will be good if you go for flat tables rather than going to relational tables.
Because if you want to do more analysis your relational table will not be a good choice as it lacks in performance.

Related

What is the best way to wait that an administrator validate something before comitting it?

I'm building a web application where several groups have their own page but if they want to modify it, an administrator has to validate it before.
For example, can change to change its logo, post new photo, change their phone number, their name, their location etc... Basically they can edit a value in the database but only if the administrator accepts it. The administrator has to validate every modification because... our customer asked us to.
That's why we have to create a system that could be called "pending queries" management.
At the beginning I thought that keeping the query in the database and executing when an administrator validate it was a good idea, but if we choose this option we can't use PDO to build prepared statements since we have to concatenate string to build our own statement, wich obvious security issues.
Then we thought that we should keep PHP code that calls the right methods (that use PDO) in our database and that we will execute with eval() when the administrator validates it. But again, it seems that using eval() is a very bad idea. As says this Rasmus Lerford's quote : "If eval() is the answer, you're almost certainly asking the
wrong question".
I thought about using eval because I want to call methods that uses PDO to deal with the database.
So, what is the best way to solve this problem ? It seems that there is no safe way to implements it.

Both your ideas are, to be frank, simply weird.
Add a field in a table to tell an approved content from unapproved one.

Here's one possible approach, with an attempt to keep the things organised to an extent, as the system begins to scale:
Create a table called PendingRequests. This will have to have most of the following fields and maybe quite a few more:
(id, request_type, request_contents, reqeust_made_by, request_made_timestamp,
request_approved_by, request_approved_timestamp, ....)
Request_contents is a broad term and it may not just be confined to one column alone. How you gather the data for this column will depend on the front-end environment you provide to the users (WYSIWYG, etc).
Request_approved_by will be NULL when the data is first inserted in the table (i.e. user has made an initial request). This way, you'll know which requests to present in the administration panel. Once an admin approves it, this column will be updated to reflect the id of the admin that approved it and the approved changes could eventually go live.
So far, we've only talked about managing the requests. Once that process is established, then the next question would be to determine how to finally map the approved requests to users. As such, it'd actually require a bit of study of the currently proposed system and its workflow. Though, in short, there may be two school of thoughts:
Method 1:
Create a new table each for everything (logo, phone number, name, etc) that is customisable.
Or
Method 2:
Simply add them as columns in one of your tables (which would essentially be in a 1:1 relationship with the user table, as far as attributes such as logo, name, etc. are concerned).
This brings us to Request_type. This is the field that will hold values / flags for the system to determine which field or table (depending on Method 1 or Method 2) the changes will be incident upon - after an admin has approved the changes.
No matter what requirement or approach it is to go about database management, PHP and PDO are both flexible enough to help write customisable and secure queries.
As an aside, it might be a good idea to maintain a table for history of all the changes / updates made. By now, it should probably be apparent that the number of history tables will once again depend on Method 1 or Method 2.
Hope that helps.

When to use stored procedures and triggers vs the applicative layer

I have a dilemma, which I hope you will have some expert opinions on.
I have a table called CARDS with a column STATUS. If a record's status changes from 'download' to 'publish', I have to insert the record reference into another table called CARD_ASSIGNMENTS. Additionally, the record needs to be added into CARD_ASSIGNMENTS as many times as there are active records in SCANNERS.
In other words, if there are two active scanners, I will end up with two records in CARD_ASSIGNMENTS as below:
ID CARD_ID SCANNER_ID STATUS_ID
1 1 1 4
2 1 2 4
My dilemma is that I'm not quite sure what would be the most efficient way to execute the above. I've considered the following options:
From PHP - Do one UPDATE query and then the INSERT queries.
Create a stored procedure, which will take care of updating the CARDS record and adding records into the CARD_ASSIGNMENTS. Then, just call that stored procedure from PHP.
Create an ON UPDATE trigger for the CARDS table which will take care of processing INSERTS into the CARD_ASSIGNMENTS table.
PS. A simplified version of my database is available on MySQL Fiddle
Thanks,
Kate

Interesting question.
I'm going to give you clues about how to approach the problem.
So, you have to start by defining precisely three things:
the expected functionality
the access policy to the functionality
the technical upgrade policy
Here I'll detail these points.
So, the first point is that you have to define your functionality. By doing so, you will be able to tell whether adding a card implies always, in all the possible paradigms (sorry for the pedantic word I can't find a more proper one) of your information system, that this card MUST exist in the other table according to the specifications you provided. This 1-1 functional link must be said TRUE or FALSE. This is really important.
Said with other words, if there's at least one possibility that one day you don't want to copy that record to the other table, it means the trigger is a wrong solution, or at least it should be thought with an emergency mode (for example a variable inside that allows it to not get executed in some conditions) setup on.
Then comes the second point, about the access policy. You have to know whether the allowed accessing systems will do so by using your application layer or if they could develop their own (SAAS style). If so, your php layer will be useless and the stored procedure is an excellent option, since every single technical and business layer will go trough it yes or yes.
The last thing to know is whether you're possibly going to upgrade your php layer one day. In most of the cases the answer is yes. If so, you might have to modify the part containing this sql logic you're talking about. Then, having everything into a stored procedure vs storing it hardcoded into the php will definitely save you time, and improve stability.
Left brain right brain, I'm going to tell you my personal opinion afterall. I really love going with stored procedures but not using any triggers. If the environment allows it, I would go for an underlying batch, calling a set of defined stored procedures, concentrating the activity outside of the online scope.
The advantages are the following:
none or less risks of interruption of the online workflow since you reduce the number of operations
different schedule to alliviate the database load
more secure policy since executing the stored procedure requires only one grant, while using the same sql with php would require insert/update grants
better logging quality: you can have a log per job
better emergency response: when a job fails (if well thought) you can restart it, and that's it.
Long post, but that was interesting and I really wanted to share these ideas.
Cheers!

I would use triggers. Some developers say, that if you have too many triggers and stored procedures, the database lives its own life, that means you never know what is going to happen on insert, update etc. But in my opinion, triggers may help you a lot to keep database consistent, so even if someone inserts data directly from some administration tool, the integrity is still kept, because all necessary commands are executed. If you choose stored procedures, you would still have to know, that you need to call this procedure to insert any new data.

MVC Multi User Authentication/Security

I've been working on a web application for a company that assists them with quoting, managing inventory, and running jobs. We believe the app will be useful to other companies in the industry, but there's no way I want to roll out separate instances of the app, so we're making it multi-user (or multi-company might be a better term, as each company has multiple users).
It's built in Codeigniter (wish I had've done it in Rails, too late now though), and I've tried to follow the skinny-controller fat-model approach. I just want to make sure I do the authorisation side of things properly. When a user logs in I'd store the companyID along with the userID in the session. I'm thinking that every table that the user interfaces with should have an additional companyID field (tables accessed indirectly via relationships probably wouldn't need to store the companyID too, tell me if I'm wrong though). Retrieving data seems pretty straight forward, just have an additional where clause in AR to add the company ID to the select, eg $this->db->where('companyID', $companyID). I'm ok with this.
However, what I'd like to know is how to ensure users can only modify data within their own company (in case they send say, a delete request to a random quoteID, using firebug or a similar tool). One way I thought of is to add the same where clause above to every update and delete method in the models as well. This would technically work, but I just wanted to know whether it's the correct way to go about doing it, or if anyone had any other ideas.
Another option would be to check to see if the user's company owned the record prior to modification, but that seems like a double-up on database requests, and I don't really know if there's any benefit to doing it this way.
I'm surprised I couldn't find an answer to this question, I must be searching for the wrong terms :p. But I would appreciate any answers on this topic.
Thanks in advance,
Christian

I'd say you're going about this the correct way. Keeping all of the items in the same tables will allow you to run global statistics as well as localized statistics - so I think this is the better way to go.
I would also say that it would be best to add the where clause you mention to each query (whether it's a get, update, delete. However, I'm not sure you'd want to manually go in and do that for all of your queries. I would suggest you overwrite those methods in your models to add the relevant where clauses. That way, when you call $this->model->get(), you will automatically get the where->($companyID, $userID) clause added to the query.

From the looks of things it looks like this might be a more API type system (as otherwise this is simply a normal user authentication system).
Simple Authentication
Anyway, the best bet I can see for an API is to have two tables, companies and users
in the companies table have an companyID, and password. in the users table link each user to a company.
Then when a user makes a request have them send through the companyID and password with every request.
oauth
The next option, slightly harder to implement, and means that the other end must also setup Oauth authentication is oauth.
But, in my opinion is much nicer overall to use and is a bit more secure.

One way to do it would be with table prefixes. However, if you have a lot of tables already, duplicating them will obviously grow the size of the db rapidly. If you don't have many tables, this should scale. You can set the prefix based on user credentials. See the prefixes section of this page: http://codeigniter.com/user_guide/database/queries.html for more on working with them.
Another option is to not roll out separate instances of the application, but use separate databases. Here is a post on CI forum discussing multiple db's: http://codeigniter.com/forums/viewthread/145901/ Here again you can select the proper db based on user credentials.
The only other option I see is the one you proposed where you add an identifier to the data designating ownership. This should work, but seems kinda scary.

Tracking data changes

I work on a market research database centric website, developed in PHP and MySQL.
It consists of two big parts – one in which users insert and update own data (let say one table T with an user_id field) and another in which an website administrator can insert new or update existing records (same table).
Obviously, in some cases end users will have their data overridden by the administrator while in other cases, administrator entered data is updated by end users (it is fine both ways).
The requirement is to highlight the view/edit forms with (let’s say) blue if end user was the last to update a certain field or red if the administrator is to “blame”.
I am looking into an efficient and consistent method to implement this.
So far, I have the following options:
For each record in table T, add another one ( char(1) ) in which write ‘U’ if end user inserted/updated the field or ‘A’ if the administrator did so. When the view/edit form is rendered, use this information to highlight each field accordingly.
Create a new table H storing an edit history containing something like user_id, field_name, last_update_user_id. Keep table H up-to-date when fields are updated in main table T. When the view/edit form is rendered, use this information to highlight each form field accordingly.
What are the pros/cons of these options; can you suggest others?

I suppose it just depends how forward-looking you want to be.
Your first approach has the advantage of being very simple to implement, is very straightforward to update and utilize, and also will only increase your storage requirements very slightly, but it's also the extreme minimum in terms of the amount of information you're storing.
If you go with the second approach and store a more complete history, if you need to add an "edit history" in the future, you'll already have things set up for that, and a lot of data waiting around. But if you end up never needing this data, it's a bit of a waste.
Or if you want the best of both worlds, you could combine them. Keep a full edit history but also update the single-character flag in the main record. That way you don't have to do any processing of the history to find the most recent edit, just look at the flag. But if you ever do need the full history, it's available.
Personally, I prefer keeping more information than I think I'll need at the time. Storage space is very cheap, and you never know when it's going to come in handy. I'd probably go even further than what you proposed, and also make it so the edit history keeps track of what they changed, and the before/after values. That can be very handy for debugging, and could be useful in the future depending on the project's exact needs.

Yes, implement an audit table that holds copies of the historical data, by/from whom &c. I work on a system currently that keeps it simple and writes the value changes as simple name-value string pairs along with date and by whom. It requires mandatory master record adjustment, but works well for tracking. You could implement this easily with a trigger.

The best way to audit data changes is through a trigger on the database table. In your case you may want to just update the last person to make the change. Or you may want a full auditing solution where you store the previous values making it easy to restore them if they were made in error. But the key to this is to do this on the database and not through the application. Database changes are often made through sources other than the application and you will want to know if this happened as well. Suppose someone hacked into the database and updated the data, wouldn't you like to be able to find the old data easily or know who did it even if he or she did it through a query window and not through the application? You might also need to know if the data was changed through a data import if you ever have to get large amounts of data at one time.

How would I use an audit trail to display which fields have ever been edited?

For a project I am working on, I have been asked to create an audit trail of all changes that have been made to records. This is the first time I have had to create an audit trail, so I have been doing a lot of research on the subject.
The application will be developed in PHP/MSSQL, and will be low-traffic.
From my reading, I have pretty much decided to have an audit table and use triggers to record the changes in the table.
The two requirements for display in the application are as follows:
Be able to see a log of all changes made to a field (I pretty much know how to do this)
Be able to see, when viewing a record in the application, an indicator next to any field in the record that has ever been changed (and possibly other info like the date of the last change).
Item #2 is the one that is currently giving me grief. Without doing a separate query for each field (or a very long nested query that will take ages to execute), does anyone have suggestions for an optimal way to do this? (I have thought of adding an extra "ModifiedFlag" field for each field in the table, that will act as boolean indicator if the field has ever been edited, but that seems like a lot of overhead.

I would treat the audit information separately from the actual domain information as much as possible.
Requirement #1:
I think you will create additional audit tables to record the changes.
Eric suggestion is a good one, creating the audit information using triggers in the SQL database. This way your application needs not be aware of the audit logic.
If your database does not support triggers, then perhaps you are using some kind of persistence or database layer. This would also be a good place to put this kind of logic, as again you minimize any dependencies between normal application code and the audit code.
Requirement #2:
As for showing the indicators: I would not create boolean fields in the table that stores the actual. (This would cause all sorts of dependencies to exist between your normal application code and your audit trail code.)
I would try to let the code responsible for displaying the form also be responsible for showing audit data on field level. This will cause query overhead, but that is the cost for displaying this extra layer of information. Perhaps you can minimize the database overhead by adding metadata to the audit information that allows for easy retrieval.
Some big Enterprisy application that I maintain uses roughly the following structure:
A change header table corresponding to a change of a record in a table.
Fields:
changeId, changeTable, changedPrimaryKey, userName, dateTime
- A change field table corresponding to a field that is changed.
Fields:
changeId, changeField, oldValue, NewValue
Sample content:
Change Header:
'1', 'BooksTable', '1852860138', 'AdamsD', '2009-07-01 15:30'
Change Item:
'1', 'Title', 'The Hitchhiker's Guide to the Gaxaly', 'The Hitchhiker's Guide to the Galaxy'
'1', 'Author', 'Duglas Adasm', 'Douglas Adams'
This structure allows both easy viewing of audit trails as well as easy retrieval for showing the desired indicators. One query (inner join in the Header and Items table) would be enough to retrieve all information to show in a single form. (Or even a table when you have a list of shown Id's)

As a general requirement flagging changed field "smells" slightly odd. If records are long lived and subject to change over time then eventually all fields will tend to get so flagged. Hence I wonder how any user could make sense of a simple set of indicators per field.
That line of thinking makes me suspect that the data you store needs to be, as you've described, a true audit trail with all the changes recorded, and the first real challenge is to decide how the info should be presented to the user.
I think your idea of preparing some kind of aggregateOfTheAuditTrail data is likely to be very useful. The question would be is a single flag per record enough? If the User's primary access is through list then maybe it's enough just to highlight the changed records for later drill down. Or a date of last change of the record value, so that only recently changed records are highlighted - all back to what the user's real needs are. I find it hard to imagine that records changed 3 years ago are as intersting as those changed last week.
Then when we come to the drill down to a single record. Again a simple flag per field doesn't sound useful (though your domain, your requirements). If it is, then your summary idea is fine. My guess is that a sequence of changes to a field, and the sequence of overall changes to the record, are much more interesting. Employee had pay rise, employee moved department, employee was promoted = three separate business events or one?
If anything more than a simple flag is needed then I suspect that you just need to return the whole (or recent) audit trail for the record and let the UI figure out how to present that.
So, my initial thought: Some kind of rolling-maintenance of a summary record sounds like a good idea. If necessary maintained in background threads or batch jobs. We deisgn that to be business-useful without going to the full audit trail each time. Then for detailed analyses we allow some or all of the trail to be retrieved.

Personally, I'd make the tracking simple, and the reporting funky.
Each time a user inserts a record, you make a insert into the audit table for that table
'I', 'Date', 'User', 'Data column1','Data Column2', etc.
That is assuming the structure of the tables won't change over time (re. the amount of datacolumns)
For updates, just insert
'U', 'Date', 'User', 'Data column1', etc
Insert what the user just entered as an update.
Then, after the insert and update, you will have the following
'I','May 3 2009','BLT','person005','John','Smith','Marketing'
'U','May 4 2009','BLT','person005','John','Smith','Accounting'
Then, it's just an easy report to show that the unique person record 'person005' has had an insert and an update, where their department was updated.
Due to the low usage of the system, having a simple insert on changing then a more complex reporting process isn't going to effect performance. This style will still work with higher traffic systems, as the extra load on a edit is minimal, whereas the higher intensity workload of reporting back the changes isn't done as often as an update, so the system won't fall over.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.