Related to my previous question:
PHP and Databases: Views, Functions and Stored Procedures performance
Just to make a more specific question regarding large SELECT queries.
When would it be more convenient to use a View instead of writing the SELECT query in the code and calling it:
$connector->query($sql)->fetchAll();
What are the factors to take into account when deciding wether its best to use a view, or just leave it as it is. Say, if you join several tables, select certain amount of data, etc.
I'm asking in the context of a big web app (with PHP & Postgres), and looking for performance and optimization.
One thing to take into account when you are using PHP source code + views (instead of only PHP source code) is that you now have two kind of sources to modify when you update your application :
you must put the new PHP sources on the server
and you must update the views
And you sometimes must do that exactly at the same time if you don't want your application to crash... Or you have to program thinking that the application must run OK with an outdated / more recent version of the views (for a couple of seconds).
Something else you might have to consider is versionning : versionning PHP scripts is easy : just use SVN and its allright, as it's text files.
With views, to get the same kind of versionning, you have to work in text files (commited on the SVN before you update them on the DB production server), and keep those in sync with the DB server -- seems easy, but it's not when you have to push an emergency patch to production ^^
Personnaly, I generally use views / stored procedures when it really makes a diffenrence : for instance, if a calculation would require thousands of SQL queries (and, so, thousands of call from PHP, waiting for the response, and so on) or too many data exchanges between the two servers, using a stored proc can really be great !
(Never used postgre, but the idea is the same with other products)
Related
I want to make a detailed logger for my application and because it can get very complex and have to save a lot of different things I wonder where is the best to save it in a database(and if database wich kind of database is better for this kind of opperations) or in file(and if file what kind of format:text,csv,json,xml).My first thought was of course file because in database I see a lot of problems but I also want to be able to show those logs and for this is easier with database.
I am building a log for HIPPA compliance and here is my rough implementation (not finished yet).
File VS. DB
I use a database table to store the last 3 months of data. Every night a cron will run and push the older data (data past 3 months) off into compressed files. I haven't written this script yet but it should not be difficult. That way the last 3 months can be searched, filtered, etc. But the database won't be overwhelmed with log entries.
Database Preference
I am using MSSQL because I don't have a choice. I usually prefer MySQL though as it has better pager optimization. If you are doing more than a very minimal amount of searching and filtering or if you are concerned about performance you may want to consider an apache solr middle man. I'm not a db expert so I can't give you much more than that.
Table Structure
My table is 5 columns. Date, Operation (create, update, delete), Object (patient, appointment, doctor), ObjectID, and Diff (a serialized array of before and after values, changed values only no empty or unchanged values for the sake of saving space).
Summary
The most important piece to consider is: Do you need people to be able to access and filter/search the data regularly? IF yes consider a database for the recent history or the most important data.
If no a file is probably a better option.
My hybrid solution is also worth considering. I'll be pushing the files off to a amz file server so it doesn't take up my web servers space.
You can create the detail & Complex logger with using the some existing libraries like log4php because that is fully tested as part of the performance compare to you design custom for your self and it will also save time of development, I personally used few libraries from php and dotnet for our complex logger need in some financial and medical domain projects
here i would suggest if you need to do from the php then use this
https://logging.apache.org/log4php/
I think the right answer is actually: Neither.
Neither the file or a DB give you proper search, filtering, and you need that when looking at logs. I deal with logs all day long (see http://sematext.com/logsene to see why), and I'd tackle this as follows:
log to file
use a lightweight log shipper (e.g. Logagent or Filebeat)
index logs into either your own Elasticsearch cluster (if you don't mind managing and learning) or one of the Cloud log management services (if you don't want to deal with Elasticsearch management, scaling, etc. -- Logsene, Loggly, Logentries...)
I've just finished a basic PHP file, that lets indie game developers / application developers store user data, handle user logins, self-deleting variables etc. It all revolves around storage.
I've made systems like this before, but always hit the max_user_connections issue - which I personally can't currently change, as I use a friends hosting - and often free hosting providers limit the max_user_connections anyway. This time, I've made the system fully text file based (each of them holding JSON structures).
The system works fine currently, as it's being tested by only me and another 4/5 users per second. The PHP script basically opens a text file (based upon query arguments), uses json_decode to convert the contents into the relevant PHP structures, then alters and writes back to the file. Again, this works fine at the moment, as there are few users using the system - but I believe if two users attempted to alter a single file at the same time, the person who writes to it last will overwrite the data that the previous user wrote to it.
Using SQL databases always seemed to handle queries quite slowly - even basic queries. Should I try to implement some form of server-side caching system, or possibly file write stacking system? Or should I just attempt to bump up the max_user_connections, and make it fully SQL based?
Are there limits to the number of users that can READ text files per second?
I know game / application / web developers must create optimized PHP storage solutions all the time, but what are the best practices in dealing with traffic?
It seems most hosting companies set the max_user_connections to a fairly low number to begin with - is there any way to alter this within the PHP file?
Here's the current PHP file, if you wish to view it:
https://www.dropbox.com/s/rr5ua4175w3rhw0/storage.php
And here's a forum topic showing the queries:
http://gmc.yoyogames.com/index.php?showtopic=623357
I did plan to release the PHP file, so developers could host it on their own site, but I would like to make it work as well as possible, before doing this.
Many thanks for any help provided.
Dan.
I strongly suggest you not re-invent the wheel. There are many options available for persistent storage. If you don't want to use SQL consider trying out any of the popular "NoSQL" options like MongoDB, Redis, CouchDB, etc. Many smart people have spent many hours solving the problems you are mentioning already, and they are hard at work improving and supporting their software.
Scaling a MySQL database service is outside the scope of this answer, but if you want to throttle up what your database service can handle you need to move out of a shared hosting environment in any case.
"but I believe if two users attempted to alter a single file at the same time, the person who writes to it last will overwrite the data that the previous user wrote to it."
- that is for sure. It even throws an error if the 2nd tries to save while the first has it open.
"Are there limits to the number of users that can READ text files per second?"
- no, but it is pointless to open a file, just for read multiple times. That file needs to be cached in a content management network.
"I know game / application / web developers must create optimized PHP storage solutions all the time, but what are the best practices in dealing with traffic?"
- usually a new database will do a better job than files, starting from the fact that the most often selects are stored in the RAM, the most often .txt files are not. As #oliakaoil read about the DB difference and see what you need.
I'm the webmaster for a major US university. We have a great deal of requests on our website, which I've built and been in charge of for the last 7 years or so. I've been building ever-more-complex features into our website and it's always been my practice to put as much of the programming burden on our multi-processor Microsoft SQL server as possible - using stored procedures, views, etc, and fill-in what can't be done with PHP, ASP, or Perl from the IIS web server. Both servers are very powerful and capable machines. Since I've been doing this alone for so long without anyone else to brainstorm with, I'm curious if my approach is ideal for even higher load situations we'll have in the future.
My question is: Is it better practice to place more of the load burden on the SQL server using nested SELECT statements, views, stored procedures and aggregate functions, or should I be pulling multiple simpler queries and processing through them using server-side compile-time scripts like PHP? Keep on keepin' on or come up with a better way?
I've recently become more interested in performance after I did some load traces and learned just how much I've been putting on the shoulders of the SQL server. Both the web server and SQL servers are fast and responsive throughout the day, and almost without regard for how much I put on them, but I'd like to be ready and have trained myself and upgraded my existing code optimized best practices in mind by the time it becomes important.
Thanks for your advice and input.
You put each layer in your stack to use in the domain it fits best.
There is no use in having your database server send 1000 rows and using PHP to filter them if a WHERE-clause or GROUP-clause would suffice. It's not optimal to call the database to add two integers (SELECT 5+9 works fine, but php can do it itself, and you save the roundtrip).
You will probably want to look into scalability: what parts of your application can be divided unto multiple processes? If you're still just using 2 layers (script & db), there is a lot of room for scaling there. But always start with the bottleneck first.
Some examples: host static contents on CDN, use caching for your pages, read about nginx and memcached, use nosql (mongoDB), consider sharding, consider replication.
My opinion is that it's generally (mostly) best to favor letting the web servers do the processing. Two points:
First is scalability. Once your application gets enough usage, you'll need to start worrying about load balancing. And it's a lot easier to drop in a couple of extra web servers pointing to a common database than it is to set up a distributed database cluster. So best to take as much strain away from the Database as you can and keep it on a single machine for as long as possible.
The second point i'd like to make is about optimizing the queries. This will depend a lot on the queries you are using, and the database backend. When i first started working with databases, i fell into the trap of making elaborate SQL queries with multiple JOINs that fetched exactly the data i wanted, even if it was from four or five different tables. I reasoned that "That's what the database is there for - lets get it to do the hard work"
I quickly found that these queries took way too long to execute, and often ended up blocking the database from other requests. While it may seam inefficient to split your query into multiple requests (for example in a for loop), you'll often find that executing multiple small queries with fast indexes will make your application run far more smoothly than trying to pass all the hard work to the database
Firstly, you might want to check if there is any load which can be removed entirely by client side caching (.js, .css, static HTML and images), and use of technologies such as AJAX to do partial updates of screens - this will remove load on both web and sql servers.
Secondly, see if there is sql load which can be reduced by web server caching - e.g. static or low refresh data - if you have a lot of 'content' pages on your systems, have a look at common CMS caching techniques which will scale to allow many more users to view the same data without rebuilding the page or hitting the database.
I tend to do as much as possible outside the db, viewing db calls as expensive/time-intensive.
For example, when performing a select on a user table with fields name_given and name_family, I could fatten the query to return a column called full_name built by concatenation. But that kind of thing can be easily done in a model on your server-side scripting language (PHP, Ruby, etc).
Of course, there are cases when the db is the more "natural" place to perform an operation. But, in general, I incline more towards putting the load on the web server and optimize there with many of the techniques noted in other answers.
I'm planning a PHP website architecture. It will be a small website with few visitors and small set of data. The data is modified exclusively by a single user (administrator).
To make things easier, I don't want to bother with a real database or XML data. I think about storing all data through PHP serialization into several files. So for example if there are several categories, I will store an array containing Category class instances for each category.
Are there any pitfalls using PHP serialization in those circumstances?
Use databases -- it is not that difficult and any extra time spent will be well learnt with database use.
The pitfalls I see are as Yehonatan mentioned:
1. Maintenance and adding functionality.
2. No easy way to query or look at data.
3. Very insecure -- take a look at "hackthissite.org". A lot of the beginning examples have to do with hacking where someone put the data hard coded in files.
4. Serialization will work for one array, meaning one table. If you have to do anything like have parent categories that have to match up to other data, not going to work so well.
The pitfalls come when with maintenance and adding functionality.
it is a very good way to learn but you will appreciate databases more after the lessons.
I tried to implement PHP serialization to store website data. For those who want to do the same thing, here's a feedback from the project started a few months ago and heavily modified since:
Pros:
It was very easy to load and save data. I don't have to write SQL queries, optimize them, etc. The code is shorter (with parametrized SQL queries, it may grow a lot).
The deployment does not require additional effort. We don't care about what is supported on the web server: if there is just PHP with no additional extensions, database servers, etc., the website will still work. Sqlite is a good thing, but it is not possible to install it on some servers, and it also requires a PHP extension.
We don't have to care about updating a database server, nor about the database server to use (thus avoiding the scenario where the customer wants to migrate from Microsoft SQL Server to Oracle, etc.).
We can add more properties to the objects without having to break everything (just like we can add other columns to the database).
Cons:
Like Kerry said in his answer, there is "no easy way to query or look at data". It means that any business intelligence/statistics cases are impossible or require a huge amount of work. By the way, some basic scenarios become extremely complicated. Let's say we store products and we want to know how much products there are. Instead of just writing select count(1) from Products, in my case it requires to create a PHP file just for that, load all data then count the number of items, sometimes by adding stuff manually.
Some changes required to implement data migration, which was painful and required more work than just executing an SQL query.
To conclude, I would recommend using PHP serialization for storing data of a small website modified by a single person only if all the following conditions are true:
The deployment context is unknown and there are chances to have a server which supports only basic PHP with no extensions,
Nobody cares about business intelligence or similar usages of the information,
There will be no changes to the requirements with large impact on the data structure.
I would say use a small database like sqlite if you don't want to go through setting up a full db server. However I will also say that serializing an array and storing that in a text file is pretty dang fast. I've had to serialize an array with a few thousand records (a dump from a database) and used that as a temp database when our DB server was being rebuilt for a few days.
My company have develop a web application using php + mysql. The system can display a product's original price and discount price to the user. If you haven't logined, you get the original price, if you loginned , you get the discount price. It is pretty easy to understand.
But my company want more features in the system, it want to display different prices base on different user. For example, user A is a golden parnter, he can get 50% off. User B is a silver parnter, only have 30 % off. But this logic is not prepare in the original system, so I need to add some attribute in the database, at least a user type in this example. Is there any recommendation on how to merge current database to my new version of database. Also, all the data should preserver, and the server should works 24/7. (within stop the database)
Is it possible to do so? Also , any recommend for future maintaince advice? Thz u.
I would recommend writing a tool to run SQL queries to your databases incrementally. Much like Rails migrations.
In the system I am currently working on, we have such tool written in python, we name our scripts something like 000000_somename.sql, where the 0s is the revision number in our SCM (subversion), and the tool is run as part of development/testing and finally deploying to production.
This has the benefit of being able to go back in time in terms of database changes, much like in code (if you use a source code version control tool) too.
http://dev.mysql.com/doc/refman/5.1/en/alter-table.html
Here are more concrete examples of ALTER TABLE.
http://php.about.com/od/learnmysql/p/alter_table.htm
You can add the necessary columns to your table with ALTER TABLE, then set the user type for each user with UPDATE. Then deploy the new version of your app. that uses the new column.
Did you use an ORM for data access layer ? I know Doctrine comes with a migration API which allow version switch up and down (in case something went wrong with new version).
Outside any framework or ORM consideration, a fast script will minimize slowdown (or downtime if process is too long).
To my opinion, I'd rather prefer a 30sec website access interruption with an information page, than getting shorter interuption time but getting visible bugs or no display at all. If interruption times matters, it's best doing this at night or when lesser traffic.
This can all be done in one script (or at least launched by one commande line), when we'd to do such scripts we include in a shell script :
putting application in standby (temporary static page) : you can use .htaccess redirect or whatever applicable to your app/server environment.
svn udpate (or switch) for source code and assets upgrade
empty caches, cleaning up temp files, etc.
rebuild generated classes (symfony specific)
upgrade DB structure with ALTER / CREATE TABLE querys
if needed, migrate data from old structure to new : depending on what you changed on structure, it may require fetching data before altering DB structure, or use tmp tables.
if all went well, remove temporary page. Upgrade done
if something went wrong display a red message to the operator so it can see what happened, try to fix it and then remove waiting page by hand.
The script should do checks at each steps and stop a first error, and it should be verbose (but concise) about what it does at all steps, thus you can fix the app faster if something has to went wrong.
The best would be a recoverable script (error at step 2 - stop process - manual fix - recover at step 3), I never took the time to implement it this way.
If works pretty well but these kind of script have to be intensively tested, on an environnement as closest as possible to the production one.
In general we develop such scripts locally, and test them on the same platform tha the production env (just different paths and DB)
If the waiting page is not an option, you can go whithout but you need to ensure data and users session integrity. As an example, use LOCK on tables during upgrade/data transfer and use exclusive locks on modified files (SVN does I think)
There could other better solutions, but it's basically what I use and it do the job for us. The major drawback is that kind of script had to be rewritten at each major release, this incitate me to search for other options to do this, but which one ??? I would be glad if someone here had better and simpler alternative.