Application logs in database or file - php

I want to make a detailed logger for my application and because it can get very complex and have to save a lot of different things I wonder where is the best to save it in a database(and if database wich kind of database is better for this kind of opperations) or in file(and if file what kind of format:text,csv,json,xml).My first thought was of course file because in database I see a lot of problems but I also want to be able to show those logs and for this is easier with database.

I am building a log for HIPPA compliance and here is my rough implementation (not finished yet).
File VS. DB
I use a database table to store the last 3 months of data. Every night a cron will run and push the older data (data past 3 months) off into compressed files. I haven't written this script yet but it should not be difficult. That way the last 3 months can be searched, filtered, etc. But the database won't be overwhelmed with log entries.
Database Preference
I am using MSSQL because I don't have a choice. I usually prefer MySQL though as it has better pager optimization. If you are doing more than a very minimal amount of searching and filtering or if you are concerned about performance you may want to consider an apache solr middle man. I'm not a db expert so I can't give you much more than that.
Table Structure
My table is 5 columns. Date, Operation (create, update, delete), Object (patient, appointment, doctor), ObjectID, and Diff (a serialized array of before and after values, changed values only no empty or unchanged values for the sake of saving space).
Summary
The most important piece to consider is: Do you need people to be able to access and filter/search the data regularly? IF yes consider a database for the recent history or the most important data.
If no a file is probably a better option.
My hybrid solution is also worth considering. I'll be pushing the files off to a amz file server so it doesn't take up my web servers space.

You can create the detail & Complex logger with using the some existing libraries like log4php because that is fully tested as part of the performance compare to you design custom for your self and it will also save time of development, I personally used few libraries from php and dotnet for our complex logger need in some financial and medical domain projects
here i would suggest if you need to do from the php then use this
https://logging.apache.org/log4php/

I think the right answer is actually: Neither.
Neither the file or a DB give you proper search, filtering, and you need that when looking at logs. I deal with logs all day long (see http://sematext.com/logsene to see why), and I'd tackle this as follows:
log to file
use a lightweight log shipper (e.g. Logagent or Filebeat)
index logs into either your own Elasticsearch cluster (if you don't mind managing and learning) or one of the Cloud log management services (if you don't want to deal with Elasticsearch management, scaling, etc. -- Logsene, Loggly, Logentries...)

Related

PHP app log storage

Our PHP & MySQL based application creates custom logs which are written to a MySQL database for users actions. We mainly did this for ease of searching and because the app was already using MySQL for persistant storage, so it just made sense.
Our log now contains 17.6 million rows and is 2GB in size. Not that friendly when moving around the place.
I was wondering what the community might suggest as a better more efficient way to store logs.
You could obviously split this table to 1 weeks worth of all logs and then delete non critical logs and split the table in two for historic critical logs, for such things as payments etc.
In general we're writing to the log through the means of a function such as
playerlog($id,$message,$cash,$page,$ip,$time);
But that's a fairly simplified version, we're also using MySQL's INSERT DELAYED as the logs are not critical for page loads.
If you're interested in doing this with MongoDB (which I assume from the tag), you might want to take a look here: http://docs.mongodb.org/manual/use-cases/storing-log-data/
You should clarify for what the logs are needed. As a second step after inserting you could set up a job that works on the log data, e.g. reads the logs and processes them (which degrades your DBMS to some sort of messaging middleware). That may be storing parts (like payments) to an archive that doesn't get deleted or writing authentication logs to a place where they get deleted after a specified retention time. But this all depends on your use case.
Depending on what you plan to analyze or the way you have to query the data you could even store them outside of MySQL.
Some possibilities:
implement a SIEM system (http://en.wikipedia.org/wiki/Security_information_and_event_management) that is targeted to analyze events, trigger alerts etc.
use a SIEM-like software like Splunk (see splunk.com) that works on raw logs and is directed towards log searching and analyzing
stick with your DBMS solution if it is "fast enough"
simply use syslog and store text log files -- you could skip the whole MySQL thing then
...

Store some records in the application and some in the database?

I have an application where it seems as if it would make sense to store some records hard-coded in the application code rather than an entry in the database, and be able to merge the two for a common result set when viewing the records. Are there any pitfalls to this approach?
Firstly, it would seem to make it easier to enforce that a record is never edited/deleted, other than when the application developer wants to. Second, in some scenarios such as installing a 3rd party module, the records could be read from their configuration rather than performing an insert in the db (with the related maintenance issues).
Some common examples:
In the application In the database
----------------------------------- ------------------ ----------------------
customers (none) all customers
HTML templates default templates user-defined templates
'control panel' interface languages default language additional languages
Online shop payment processors all payment processors (none)
So, I think I have three options depending on the scenario:
All records in the database
Some records in the application, some records in the database
All records in the application
And it seems that there are two ways to implement it:
All records in the database:
A column could be flagged as 'editable' or 'locked'
Negative IDs could represent locked values and positive IDs could represent editable
Odd IDs represent locked and even IDs represent editable...
Some records live in the application (as variables, arrays or objects...)
Are there any standard ways to deal with this scenario? Am I missing some really obvious solutions?
I'm using MySQL and php, if that changes your answer!
By "in the application", do you mean these records live in the filesystem, accessible to the application?
It all depends on the app you're building. There are a few things to consider, especially when it comes to code complexity and performance. While I don't have enough info about your project to suggest specifics, here are a few pointers to keep in mind:
Having two possible repositories for everything ramps up the complexity of your code. That means readability will go down and weird errors will start cropping up that are hard to trace. In most cases, it's in your best interest to go with the simplest solution that can possibly work. If you look at big PHP/MySQL software packages you will see that even though there are a lot of default values in the code itself, the data comes almost exclusively from the database. This is probably a reasonable policy when you can't get away with the simplest solution ever (namely storing everything in files).
The big downside of heavy database involvement is performance. You should definitely keep track of all the database calls of any typical codepath in your app. If you rely heavily on lots of queries, you have to employ a lot of caching. Track everything that happens and keep in mind what the computer has to in order to fulfill the request. It's you job to make the computer's task as easy as possible.
If you store templates in the DB, another big performance penalty will be the lack of opcode re-use and caching. Normal web hosting environments compile a PHP file once and then keep the bytecode version of it around for a while. This saves subsequent recompiles and speeds up execution substantially. But if you fill PHP template code into an eval() statement, this code will have to be recompiled by PHP every single time it's called.
Also, if you're using eval() in this fashion and you allow users to edit templates, you have to make sure those users are trusted - because they'll have access to the entire PHP environment. If you're going the other route and are using a template engine, you'll potentially have a much bigger performance problem (but not a security problem). In any case, consider caching template outputs wherever possible.
Regarding the locking mechanism: it seems you are introducing a big architectural issue here since you now have to make each repository (file and DB) understand what records are off-limits to the other one. I'd suggest you reconsider this approach entirely, but if you must, I'd strongly urge you to flag records using a separate column for it (the ID-based stuff sounds like a nightmare).
The standard way would be to keep classical DB-shaped stuff in the DB (these would be user accounts and other stuff that fits nicely into tables) and keep the configuration, all your code and template things in the filesystem.
I think that keeping some fixed values hard-coded in the application may be a good way to deal with the problem. In most cases, it will even reduce load on database server, because some not all the values must be retrieved via SQL.
But there are cases when it could lead to performance issues, mainly if you have to join values coming from the database with your hard-coded values. In this case, storing all the values in database may have better performance, because all values could be optimized and processed by the database server, rather than getting all the values from SQL query and joining them manually in the code.
To deal with this case, you can store the values in database, but inserts and updates must be handled just by your maintenance or upgrade routines. If you have a bigger concern about not letting the data be modified, you can setup a maintenance routine to check if the values from the database are the same as the code from time to time. In this case, this database tables act much like a "cache" of the hard-coded values. And when you don't need to join the fixed values with the database values, you can still get them from the code, avoiding an unnecessary SQL query (because you're sure the values are the same).
In general, anytime you're performing a database query if you want to include something that's hard-coded into the work-flow, there isn't any joining that needs to happen. You would simply the action on your hard-coded data as well as the data you pulled from the database. This is especially true if we're talking about information that is formed into an object once it is in the application. For instance, I can see this being useful if you want there to always be a dev user in the application. You could have this user hard-coded in the application and whenever you would query the database, such as when you're logging in a user, you would check your hard-coded user's values before querying the database.
For instance:
// You would place this on the login page
$DevUser = new User(info);
$_SESSION['DevUser'] = $DevUser;
// This would go in the user authentication logic
if($_SESSION['DevUser']->GetValue(Username) == $GivenUName && $_SESSION['DevUser']->GetValue(PassHash) == $GivenPassHash)
{
// log in user
}
else
{
// query for user that matches given username and password hash
}
This shows how there doesn't need to be any special or tricky database stuff going on. Hard-coding variables to include in your database driven workflow is extremely simple when you don't over think it.
There could be a case where you might have a lot of hard-coded variables/objects and/or you might want to execute a large block of logic on both sets of information. In this case it could be beneficial to have an array that holds the hard-coded information and then you could just add the queried information to that array before you perform any logic on it.
In the case of payment processors, I would assume that you're referring to online payments using different services such as PayPal, or a credit card, or something else. This would make the most sense as a Payment class that has a separate function for each payment method. That way you can call whichever method the client chooses. I can't think of any other way you would want to handle this. If you're maybe talking about the payment options available to your customers, that would be something hard-coded on your payment page.
Hopefully this helps. Remember, don't make it more complicated than it needs to be.

Configuration storage setup [file vs. database]

I see programmers putting a lot of information into databases that could otherwise be put in a file that holds arrays. Instead of arrays, they'll use many tables of SQL which, I believe, is slower.
CitrusDB has a table in the database called "holiday". This table consists of just one date column called "holiday_date" that holds dates that are holidays. The idea is to let the user add holidays to the table. Citrus and the programmers I work with at my workplace will prefer to put all this information in tables because it is "standard".
I don't see why this would be true unless you are allowing the user, through a user interface, to add holidays. I have a feeling there's something I'm missing.
Sometimes you want to design in a bit of flexibility to a product. What if your product is released in a different country with different holidays? Just tweak the table and everything will work fine. If it's hard coded into the application, or worse, hard coded in many different places through the application, you could be in a world of pain trying to get it to work in the new locale.
By using tables, there is also a single way of accessing this information, which probably makes the program more consistent, and easier to maintain.
Sometimes efficiency/speed is not the only motivation for a design. Maintainability, flexibility, etc are very important factors.
The main advantage I have found of storing 'configuration' in a database, rather than in a property file, or a file full of arrays, is that the database is usually centrally stored, whereas a server may often be split across a farm of several, or even hundreds of servers.
I have implemented, in a corporate environment, such a solution, and the power of being able to change configuration at a single point of access, knowing that it will immediately be propagated to all servers, without the concern of a deployment process is actually very powerful, and one that we have come to rely on quite heavily.
The actual dates of some holidays change every year. The flexibility to update the holidays with a query or with a script makes putting it in the database the easiest way. One could easily implement a script that updates the holidays each year for their country or region when it is stored in the database.
Theoretically, databases are designed and tuned to provide faster access to data than doing a disk read from a file. In practice, for small to mid-sized applications this difference is minuscule. Best practices, however, are typically oriented at larger scale. By implementing best practices on your small application, you create one that is capable of scaling up.
There is also the consideration of the accessibility of the data in terms of other aspects of the project. Where is most of the data in a web-based application? In the database. Thus, we try to keep ALL the data in the database, or as much as is feasible. That way, in the future, if you decide that now you need to join the holiday dates again a list of events (for example), all the data is in a single place. This segmenting of disparate layers creates tiers within your application. When each tier can be devoted to exclusive handling of the roles within its domain (database handles data, HTML handles presentation, etc), it is again easier to change or scale your application.
Last, when designing an application, one must consider the "hit by a bus principle". So you, Developer 'A', put the holidays in a PHP file. You know they are there, and when you work on the code it doesn't create a problem. Then.... you get hit by a bus. You're out of commission. Developer 'B' comes along, and now your boss wants the holiday dates changed - we don't get President's Day off any more. Um. Johnny Next Guy has no idea about your PHP file, so he has to dig. In this example, it sounds a little trivial, maybe a little silly, but again, we always design with scalability in mind. Even if you KNOW it isn't going to scale up. These standards make it easier for other developers to pick up where you left off, should you ever leave off.
The answer lays in many realms. I used to code my own software to read and write to my own flat-file database format. For small systems, with few fields, it may seem worth it. Once you learn SQL, you'll probably use it for even the smallest things.
File parsing is slow. String readers, comparing characters, looking for character sequences, all take time. SQL Databases do have files, but they are read and then cached, both more efficiently.
Updating & saving arrays require you to read all, rebuild all, write all, save all, then close the file.
Options: SQL has many built-in features to do many powerful things, from putting things in order to only returning x through y results.
Security
Synchronization - say you have the same page accessed twice at the same time. PHP will read from your flatfile, process, and write at the same time. They will overwrite each other, resulting in dataloss.
The amount of features SQL provides, the ease of access, the lack of things you need to code, and plenty other things contribute to why hard-coded arrays aren't as good.
The answer is it depends on what kind of lists you are dealing with. It seems that here, your list consists of a small, fixed set of values.
For many valid reasons, database administrators like having value tables for enumerated values. It helps with data integrity and for dealing wtih ETL, as two examples for why you want it.
At least in Java, for these kinds of short, fixed lists, I usually use Enums. In PHP, you can use what seems to be a good way of doing enums in PHP.
The benefit of doing this is the value is an in-memory lookup, but you can still get data integrity that DBAs care about.
If you need to find a single piece of information out of 10, reading a file vs. querying a database may not give a serious advantage either way. Reading a single piece of data from hundreds or thousands, etc, has a serious advantage when you read from a database. Rather than load a file of some size and read all the contents, taking time and memory, querying from the database is quick and returns exactly what you query for. It's similar to writing data to a database vs text files - the insert into the database includes only what you are adding. Writing a file means reading the entire contents and writing them all back out again.
If you know you're dealing with very small numbers of values, and you know that requirement will never change, put data into files and read them. If you're not 100% sure about it, don't shoot yourself in the foot. Work with a database and you're probably going to be future proof.
This is a big question. The short answer would be, never store 'data' in a file.
First you have to deal with read/write file permission issues, which introduces security risk.
Second, you should always plan on an application growing. When the 'holiday' array becomes very large, or needs to be expanded to include holiday types, your going to wish it was in the DB.
I can see other answers rolling in, so I'll leave it at that.
Generally, application data should be stored in some kind of storage (not flat files).
Configuration/settings can be stored in a KVP storage (such as Redis) then access it via REST API.

Is PHP serialization a good choice for storing data of a small website modified by a single person

I'm planning a PHP website architecture. It will be a small website with few visitors and small set of data. The data is modified exclusively by a single user (administrator).
To make things easier, I don't want to bother with a real database or XML data. I think about storing all data through PHP serialization into several files. So for example if there are several categories, I will store an array containing Category class instances for each category.
Are there any pitfalls using PHP serialization in those circumstances?
Use databases -- it is not that difficult and any extra time spent will be well learnt with database use.
The pitfalls I see are as Yehonatan mentioned:
1. Maintenance and adding functionality.
2. No easy way to query or look at data.
3. Very insecure -- take a look at "hackthissite.org". A lot of the beginning examples have to do with hacking where someone put the data hard coded in files.
4. Serialization will work for one array, meaning one table. If you have to do anything like have parent categories that have to match up to other data, not going to work so well.
The pitfalls come when with maintenance and adding functionality.
it is a very good way to learn but you will appreciate databases more after the lessons.
I tried to implement PHP serialization to store website data. For those who want to do the same thing, here's a feedback from the project started a few months ago and heavily modified since:
Pros:
It was very easy to load and save data. I don't have to write SQL queries, optimize them, etc. The code is shorter (with parametrized SQL queries, it may grow a lot).
The deployment does not require additional effort. We don't care about what is supported on the web server: if there is just PHP with no additional extensions, database servers, etc., the website will still work. Sqlite is a good thing, but it is not possible to install it on some servers, and it also requires a PHP extension.
We don't have to care about updating a database server, nor about the database server to use (thus avoiding the scenario where the customer wants to migrate from Microsoft SQL Server to Oracle, etc.).
We can add more properties to the objects without having to break everything (just like we can add other columns to the database).
Cons:
Like Kerry said in his answer, there is "no easy way to query or look at data". It means that any business intelligence/statistics cases are impossible or require a huge amount of work. By the way, some basic scenarios become extremely complicated. Let's say we store products and we want to know how much products there are. Instead of just writing select count(1) from Products, in my case it requires to create a PHP file just for that, load all data then count the number of items, sometimes by adding stuff manually.
Some changes required to implement data migration, which was painful and required more work than just executing an SQL query.
To conclude, I would recommend using PHP serialization for storing data of a small website modified by a single person only if all the following conditions are true:
The deployment context is unknown and there are chances to have a server which supports only basic PHP with no extensions,
Nobody cares about business intelligence or similar usages of the information,
There will be no changes to the requirements with large impact on the data structure.
I would say use a small database like sqlite if you don't want to go through setting up a full db server. However I will also say that serializing an array and storing that in a text file is pretty dang fast. I've had to serialize an array with a few thousand records (a dump from a database) and used that as a temp database when our DB server was being rebuilt for a few days.

flat-file database php application

I'm creating and app that will rely on a database, and I have all intention on using a flat file db, is there any serious reasons to stay away from this?
I'm using mimesis (http://mimesis.110mb.com)
it's simpler than using mySQL, which I have to admit I have little experience with.
I'm wondering about the security of the db. but the files are stored as php and it seems to be a solid database solution.
I really like the ease of backing up and transporting the databases, which I have found harder with mySQL. I see that everyone seems to prefer the mySQL way - and it likely is faster when it comes to queries but other than that is there any reason to stay away from flat-file dbs and (finally) properly learn mysql ?
edit
Just to let people know,
I ended up going with mySQL, and am using the CodeIgniter framework. Still like the flat file db, but have now realized that it's way more complex for this project than necessary.
Use SQLite, you get a database with many SQL features and yet it's only a single file.
Greetings, I'm the creator of Mimesis. Relational databases and SQL are important in situations where you have massive amounts of data that needs to be handled. Are flat files superior to relation databases? Well, you could ask Google, as their entire archiving system works with flat files, and its the most popular search engine on Earth. Does Mimesis compare to their system? Likely not.
Mimesis was created to solve a particular niche problem. I only use free websites for my online endeavors. Plenty of free sites offer the ability to use PHP. However, they don't provide free SQL database access. Therefore, I needed to create a database that would store data, implement locking, and work around file permissions. These were the primary design parameters of Mimesis, and it succeeds on all of those.
If you need an idea of Mimesis's speed, if you navigate to the first page it will tell you what country you're viewing the site from. This free database is taken from the site ip2nation.com and ported into a Mimesis ffdb. It has hundreds if not thousands of entries.
Furthermore, the hit counter on the main page has already tracked over 7000 visitors. These are UNIQUE visits, which means that the script has to search the database to see if the IP address that's visiting already exists, and also performs a count of the total IPs.
If you've noticed the main page loads up pretty quickly and it has two fairly intensive Mimesis database scripts running on the backend. The way Mimesis stores data is done to speed up read and write procedures and also translation procedures. Most ffdb example scripts or other ffdb scripts out there use a simple CVS file or other some such structure for storing data. Mimesis actually interprets binary data at some levels to augment its functionality. Mimesis is somewhat of a hybrid between a flat file database and a relational database.
Most other ffdb scripts involve rewriting the COMPLETE file every time an update is made. Mimesis does not do this, it rewrites only the structural file and updates the actual row contents. So that even if an error does occur you only lose new data that's added, not any of the older data. Mimesis also maintains its history. Unless the table is refreshed the data that rows had previously is still contained within.
I could keep going on about all the features, but this isn't intended as a "Mimesis is the greatest database ever" rant. Moreso, its intended to open people's eyes to the fact that SQL isn't the ONLY technology available, and that flat files, when given proper development paradigms are superior to a relational database, taking into account they are more specialized.
Long live flat files and the coders who brave the headaches that follow.
The answer is "Fine" if you only NEED a flat-file structure. One test: Would a single simple spreadsheet handle all needs? If not, you need a relational structure, not a flat file.
If you're not sure, perhaps you can start flat-file. SQLite is a great app for getting started.
It's not good to learn you made the wrong choice, if you figure it out too far along in the process. But if you understand the importance of a relational structure, and upsize early on if needed, then you are fine.
I really like the ease of backing up
and transporting the databases, which
I have found harder with mySQL.
Use SQLite as mentioned in another answer. There is only one file to backup, or set up periodic dumps of the MySQL databases to SQL files. This is a relatively simple thing to do.
I see that everyone seems to prefer
the mySQL way - and it likely is
faster when it comes to queries
Speed is definitely a consideration. Databases tend to be a lot faster, because the data is organized better.
other than that is there any reason to
stay away from flat-file dbs and
(finally) properly learn mysql ?
There sure are plenty of reasons to use a database solution, but there are arguments to be made for flat files. It is always good to learn things other than what you "usually" use.
Most decisions depend on the application. How many concurrent users are you going to have? Do you need transaction support?
Wanted to inform that Mimesis has moved from the original URL to http://mimesis.site11.com/
Furthermore, I am shifting the focus of Mimesis from an ffdb to a key-value store. It's more sensible Given the types of information I'm storing and the methods I use to retrieve it. There was also a grave error present in the coding of Mimesis (which I've since fixed). However, I'm still in the testing phase of the new key-value store type. I've also been side-tracked by other things. Locking has also been changed from the use of file creation to directory creation as the mutex mechanism.
Interoperability. MySQL can be interfaced by basically any language that counts. Mimesis is unlikely to be usable outside PHP.
This becomes significant the moment you try to use profilers, or modify data from the outside.
You might also look at http://lukeplant.me.uk/resources/flatfile/ for the PHP Flatfile Package.
The issue with going flatfile is that in order to adjust the situation for further development you have to alter a significant amount of code in order to improve the foundation of the system. Whereas if it was a pure SQL system it would require little to no modification to proceed in the future.

Categories