MySQL or SQL Server better for an Enterprise application? - php

I am developing a big application using PHP. Is MySQL or SQL Server the best one to use?

Neither. Use PostgreSQL. :)
Honestly though, PostgreSQL scales much better than MySQL. I don't know what you mean by "enterprise", but I figure scaling is important for a "big" web application, as you put it, and PostgreSQL does that very well. MySQL can't handle too many concurrent connections. (Though if that isn't an issue for you, go with MySQL for ease of use.)

MySQL and PHP work well together. I'd recommend that combination.
I'd much rather choose an open-source solution rather than rely on MS. That said, you can go with PostgreSQL as well if you need to, or your requirements gear you toward it. We would need more details to know what you truly require.

While this is a bit subjective, I would suggest going with MySQL.
The reason I say this is because traditionally you see people go with a LAMP setup. LAMP of course being Linux + Apache + MySQL + PHP
PHP has some great build in functionality for dealing with MySQL Databases, therefore it may be easier for you. Then you'll also have the ability to do some web based work with PhpMyAdmin tying a web interface to your Database

Use the one you and your team has most experience in terms of both development and administration.
If you start from scratch, I would go with PostgreSQL.
Between your choices I would go for SQL Server, especially if you are working in Windows environment.

It will depend on your application's needs. I'm not especially well researched on the differences between the various SQL engines, but as far as I know, MySQL is faster for SELECT queries (if you have a predominantly read-only type app). On the other hand, MSSQL and PostgreSQL both have better support for transactions, and perhaps also better performance if you have lots of inserts/updates happening. Also, MSSQL and PostgreSQL are said to scale better, but there are various successful applications that seem to do fine with MySQL (Facebook and Flickr as examples).

MySQL and SQL Server Express are free for production use. In my view the best advice is to try them both and decide for yourself. A lot of folks can live quite happily with a lightweight RDBMS where solutions like MySQL/Express may be appropriate.
From a purely technical point of view all of the major RDBMS vendors (Oracle, Sybase, DB2, SQL Server et al.) are significantly more capable than MySQL is currently or can reasonably be expected to be in the foreseeable future.
This does not mean you should not use MySQL for a particular job. A good analogy is continuing to use a version of Microsoft office released years ago. For most people the old version does everything they would ever want even though the newer version is "better" and has more features.

MySQL is certainly better to work with PHP. But MS is putting a huge effort in better supporting PHP on Windows platforms.
SQL Server is DEFINITELY the better choice for large enterprise solutions since there's better cluster and management support. We use MySQL for cost reasons, but i would really like some easier management and cluster support.
On the other hand it's like with computers: many features you need to compare if they suit your needs - and your purse.
If you are doing a one-man-show: Step away from SQL Server. It is only suitable for enterprises. Take MySQL or PostgreSQL.

For most IT directors a big decision is going to be which can you get the best support for in your area / online / already have in-house and which can you get the most uptime for. Ongoing costs are usually higher than deployment costs so its probably not worth worrying about license costs; unless you are into ia64 or better type systems anyway when the CPU count starts to make SQL look eye-wateringly expensive.

It's like deciding what computer to get, they are by now pretty much the same no matter what brand you pick. It's pretty much the same for databases, they all support most of the things that you need for lightweight webapplications.
I have used MySQL to all my php applications so far and had no problems whatsoever. I have wanted to test out PostgreSQL several times but never got to it, but I have heard very good things about it. I never touch MS products however, so no opinion (Not that I am allergic, I'm just stingy.).

Related

MS Access vs MySQL

I run a website that uses a database, but not intensively, on a WAMP configuration. I currently use MS Access: We have a small database, < 4MB max, that can be downloaded for easy backup and emailed to organization members for completing tasks in the MS Access software (like generating reports, etc.). However, it requires MS Office software and isn't exactly standard use with PHP.
On the other hand, our host provides MySQL, which is typical with PHP, generally more powerful, has a greater availability of software and support, but backup can be a little messier.
But, MySQL is not hosted on the local host. So, I copied the information to MySQL, and made a copy of the site using the MySQL database. I proceeded to run some benchmarks, and surprisingly, MS Access was faster, marginally.
I am not sure which is the best direction to take at this point. Hoping the community can give some pros and cons that I haven't though about.
Since Access is way simpler, it's not surprising that rough benchmarking reveals it's faster. The difference comes when you have to deal with concurrent sessions and large data sets. Desktop apps are normally used by a single process at a time but in web applications concurrent queries are the norm.
Said that, if you've been using Access for a while and you didn't find issues, I don't think that switching to MySQL is going to make any difference regarding performance. I'd think about other considerations:
Would you like to have Linux hosting as an option?
Are you proficient enough with MySQL as to migrate code in a reasonable timespan and with reasonable quality?
Can you replace those reports with plain HTML listings?
BTW, MySQL backups can be automated with a simple command line script, it should not be messy at all.
One pro that MS Access is already offering you is a client interface. You've mentioned users that are "generating reports, etc.". Unless you already have an alternative in place that will do everything they need, switching to MySQL will likely be a no-win situation.
I'd stick with Access database for such a small scale project! There's no need to move onto a bigger technology for the hell of it - put it this way, if you had 4 kids, and a bus came up for sale, would you buy the bus because you can fit your 4 kids in it?
One big advantage of MySQL IMO is that PHP has built in support for MySQL. You can use ODBC with PHP to connect to MS Access but it's one more thing to set up and one more thing to 'break' at some point.
Could you set up MySQL on the host? Is it likely that your database would grow and become more complex in the near future?
Access is ideal for us: several accountants using it in our accounting work in the same room but not through the internet, and none of us is programmer. The only thing to think about is the fee for Access copy-wright.
Mysql is free, yes, that is great, but Mysql lacks stored queries, forms and reports, and the quick "on_click, on_doubleclick..." functions that are extremely useful and easy to handle in Access. Are there ways to solve this problem. Thank you.

Php/MySQL to ASP.NET/SQL Server, Suggest if its worth the trouble

We have been using PHP/MySQL for our web application which has been growing a lot, the database is around 4-5GB and one of the table is 2GB sometimes, hence slowing down whenever any queries to that table is called.
Should we just try to optimize, or are we using MySQL above its limit? Will switching our web app to .NET/SQL Server resolve the issues?
You're going to get a lot of very passionate responses to this.
PHP is, from a code and performance standpoint, very similar to classic ASP. ASP.NET v1 was , according to many, many benchmarks available via your favorite search engine, 3x-5x faster than classic ASP. Draw your own conclusions.
I feel that MSSQL is a superior database solution. If you're stuck with open source, at least look at Postgres. It's less popular but very powerful.
To answer your real question: performance is a function of your toolset and platform choice, but also of developer skill and project structure. I've seen far more projects that could benefit from some healthy refactoring and optimization than I have that are limited by the platform in which they are written. It is rarely worthwhile to rewrite a large application in a completely different language. Instead, I would focus on improving your existing codebase, and looking for ways to incrementally upgrade to a platform like ASP.NET.
Also keep in mind that switching will require you to jump to IIS Windows server and there will be more cost involved most likely. There are a lot of considerations here when thinking about a switch like this.
I say if the application calls for it, work it out.
You certainly aren't using MySQL above it's limit. But you should consider benchmarking your database queries on MSSQL to see if you notice a huge improvement.
There are many factors involved here, your code base, database optimisations & changes to table structure, server spec.... they all contribute independently.
Are there any particularly slow queries or is it running slow accross the application? Can any caching be implemented here? Do you have propper indexes?

Migrating technologies: PHP+MySQL -> ASP.NET MVC+MS SQL 2008

I have a completed web app in PHP 5 + MySQL. I have not yet started its conversion, but it will migrate to ASP.NET MVC + MS SQL 2008. I'm not sure how to progress for the easiest transition:
Edit the PHP DAL for SQL Server. Migrate to the new db immediately
Leave the live code alone. Create ASP.NET MVC with a MySQL DAL to use for now. Migrate to new db later
Leave the live code alone. Write the new version entirely. Transition db and code at the same time
Is there some common wisdom for which path is best to take?
Edit: addressing Dave's question:
How are you accessing the database? If you have really good separation
between your code and database and are using stored procedures it would
probably affect the answers given.
None of the ASP.NET MVC stuff has been written at all. There will have to be some changes to make the current PHP data layer work with MS SQL. I'm currently taking advantage of some PHP+MySQL stuff that doesn't exist with PHP+MSSQL. Nothing major but it will take some amount of retooling. My data layer is sufficiently separate that I hope it won't be too invasive.
Also what's your release plan? Will you be forced to release incrementally
or do you plan on just "flipping the switch" one day?
Flipping the switch -- it's just a hobby site for my family. But I'm not opposed to leaving 1 db and both code sets live for a while until I feel confident that the new one is fine.
Edit 2:
Looks like my options are limited more than I thought. You can only use PHP's native MSSQL functions for SQL 2000 and before. For 2005+ you need to install MS provided drivers. I'm on el cheapo shared hosting so I can't really ask them to install drivers for me. Looks like I unfortunately have my answer :(
The purists will suggest starting with TDD so you can have a gauge of when the migration is fairly complete by having all unit tests pass.
However, I would suggest that you start with the app from scratch in ASP.NET MVC as it's very different from a non-MVC PHP application. I'd map the data layer first and build some models then work my way up controllers and the view. The data models should be fairly easy to migrate if you use the visual studio surface designer.
An easy way would be to use an Application generator.
There are many available like:
- Iron Speed Designer (only supports ASP.NET)
- Code Charge Studio (supports many different web scripting languages like PHP, ASP, ASP.NET, Pearl, etc.)
I have tried out both. But have not been satisfied with any as they have not documented the MVC/MVP part to extent that it becomes easy for developers to modify generated code.
Iron Speeds Designer MVC is better compared to CCS but ISD will prove to be costly as it supports only one set of technology while CCS supports many and one can add support for new language with a little support from its developers.

Why are there so many PHP sites that only provide for MySQL as a database?

I've dabbled with MySQL and personally, I find it vastly inferior to better RDBMSs like Postgres; while I admit it's come a long way and improved, even the latest version to my knowledge does not even support CHECK constraints to verify data integrity (it allows the keyword but doesn't do anything with it).
As someone who is looking at switching away from Microsoft technologies and into open source, I am appalled by the sheer number of PHP-backed applications that will only work with MySQL as the underlying database. A number of these apps are really good and would save a lot of work in development, but the fact they haven't been abstracted to be database agnostic is usually a deal-killer for me and my technical associates.
So I am curious - I understand why MySQL is so popular and why it's almost always used with PHP, but why do so many PHP-backed sites refuse to be properly developed to allow for other databases, but instead force MySQL when there are much better and more "database-like" options out there? I'm getting increasingly frustrated by these apps that I want to use, but they only work with MySQL and I won't bring myself to use it because personally I find Postgres a much better database, and because I personally feel that your database should enforce it's own constraints instead of doing this only at the code level.
I realize MySQL is popular, and it's not a bad system, but I hate when I find a great application and it'll only work when the database is MySQL because the developers used MySQL-specific modules and/or syntax.
I'm sure its the same reason there's so many ASP.NET stuff that only supports MSSQL. Its the traditional database paired with the language just by convention. Plus using/building database independent solutions is hard and one of those things that "you aint gonna need" when most so many other people follow that convention. When its needed its one of those things that can be "page faulted" in.
If you need to get a php app to use another DB, the php is probably open source, perhaps you can do the work yourself.
Cross-platform support, as long as SQL is concerned, is like a duck.
You know, a duck can walk, can fly and can swim — and does all this equally bad.
It's much better to stick to one platform and develop a well-optimized application, then to try to satisfy everybody, satisfying noboby in fact.
Most PHP developers develop with PHP because it's free, easy to get going, and powerful. All of the same qualities are shared with MySQL, so it's a natural fit.
That being said, many professional developers create data-abstraction layers that would allow them easy integration with other backends. But most projects don't need those types of things.
It's mostly the logical end result of the fact that almost all PHP-capable shared hosting services offer MySQL and only MySQL. The extra work to abstract the database is often deemed unnecessary when almost nobody using the application needs it.
LAMP is an extremely common development stack. Common enough that even people who don't use PHP know what LAMP stands for.
For those who don't know (all 1 of you), LAMP most commonly stands for Linux, Apache, MySQL, and PHP.
I think the key point is exactly what you said, "it's almost always used with PHP". By developing for MySQL, they're maximizing their target audience. Yes, it'd be ideal if they developed it to be able to work with multiple databases, but that can be a fair amount of extra work. Lots of these projects just grow from someone's personal project, which was probably not initially designed to be compatible with multiple engines. Once they're pretty far in, it starts to turn into a major job to rewrite the code to support multiple database systems, and there's usually other features/fixes that their users would rather have.
I also greatly prefer pgsql, but I think if you're planning to use other peoples' PHP applications (forums, blogs, etc), it's just a reality that you're probably going to have to run MySQL to support them.
Back in old times there was a huge difference in the ease of use. MySQL was easy to use and very fast for simple task. Back then it didn't provide full ACID, nor triggers, nor subselects, nor procedures. On the other hand you had PostgreSQL (called Postgres back then), which was much slower, complicated to install and mantain, but provided full power of real RDBMS. The thing is, that the web apps didn't really need full power of RDBMS, so MySQL gained huge popularity, while PostgreSQL was used by few.
Ah, one more thing: as of PHP5 SQLite comes embedded. So I expect that pretty soon a lot of new PHP apps that don't really need full blown RDBMS will use SQLite, rather than MySQL.
You're right, PostgreSQL has much better support for SQL and other advanced features, so there's a very good case for why PostgreSQL is superior to MySQL.
However, MySQL is so much easier to install and manage for someone who is just getting started, that it gains a lot of adoption relative to PostgreSQL. Simple tasks like configuring a login and giving it specific privileges are very confusing on a PostgreSQL server, compared to MySQL.
Also, there were a few years early on where MySQL offered native binaries for Windows but PostgreSQL did not. You could get it to work under Cygwin, but that's hardly satisfying for a real Windows developer. By the time PostgreSQL did support Windows natively, MySQL had a substantial lead in market share and name recognition.
BTW: http://www.postgresql.org/support/professional_hosting_northamerica
IMO the big problem with so many MySQL-only sites is that MySQL doesn't support half the features of a "real" database, so if you need data integrity you're pretty much screwed and will have to write your own software instead of taking advantage of existing solutions, or compromise your application and don't have any real integrity checks at the database level. You end up between a rock and a hard place.
We're wanting our cake and eating it too with this question. First, we want database abstraction. Then, we want CHECK constraints in the RDBMS we choose to use behind that abstraction.
Huh? That means we'll neglect to do data checking in the PHP itself, and things will break using databases without the CHECKs. Either that or we WILL implement the checks in PHP to support an abstracted database without CHECKs, doing twice the work.
I think full database abstraction isn't worth the effort, and is mostly a solution in search of a problem.

Tactics for using PHP in a high-load site

Before you answer this I have never developed anything popular enough to attain high server loads. Treat me as (sigh) an alien that has just landed on the planet, albeit one that knows PHP and a few optimisation techniques.
I'm developing a tool in PHP that could attain quite a lot of users, if it works out right. However while I'm fully capable of developing the program I'm pretty much clueless when it comes to making something that can deal with huge traffic. So here's a few questions on it (feel free to turn this question into a resource thread as well).
Databases
At the moment I plan to use the MySQLi features in PHP5. However how should I setup the databases in relation to users and content? Do I actually need multiple databases? At the moment everything's jumbled into one database - although I've been considering spreading user data to one, actual content to another and finally core site content (template masters etc.) to another. My reasoning behind this is that sending queries to different databases will ease up the load on them as one database = 3 load sources. Also would this still be effective if they were all on the same server?
Caching
I have a template system that is used to build the pages and swap out variables. Master templates are stored in the database and each time a template is called it's cached copy (a html document) is called. At the moment I have two types of variable in these templates - a static var and a dynamic var. Static vars are usually things like page names, the name of the site - things that don't change often; dynamic vars are things that change on each page load.
My question on this:
Say I have comments on different articles. Which is a better solution: store the simple comment template and render comments (from a DB call) each time the page is loaded or store a cached copy of the comments page as a html page - each time a comment is added/edited/deleted the page is recached.
Finally
Does anyone have any tips/pointers for running a high load site on PHP. I'm pretty sure it's a workable language to use - Facebook and Yahoo! give it great precedence - but are there any experiences I should watch out for?
No two sites are alike. You really need to get a tool like jmeter and benchmark to see where your problem points will be. You can spend a lot of time guessing and improving, but you won't see real results until you measure and compare your changes.
For example, for many years, the MySQL query cache was the solution to all of our performance problems. If your site was slow, MySQL experts suggested turning the query cache on. It turns out that if you have a high write load, the cache is actually crippling. If you turned it on without testing, you'd never know.
And don't forget that you are never done scaling. A site that handles 10req/s will need changes to support 1000req/s. And if you're lucking enough to need to support 10,000req/s, your architecture will probably look completely different as well.
Databases
Don't use MySQLi -- PDO is the 'modern' OO database access layer. The most important feature to use is placeholders in your queries. It's smart enough to use server side prepares and other optimizations for you as well.
You probably don't want to break your database up at this point. If you do find that one database isn't cutting, there are several techniques to scale up, depending on your app. Replicating to additional servers typically works well if you have more reads than writes. Sharding is a technique to split your data over many machines.
Caching
You probably don't want to cache in your database. The database is typically your bottleneck, so adding more IO's to it is typically a bad thing. There are several PHP caches out there that accomplish similar things like APC and Zend.
Measure your system with caching on and off. I bet your cache is heavier than serving the pages straight.
If it takes a long time to build your comments and article data from the db, integrate memcache into your system. You can cache the query results and store them in a memcached instance. It's important to remember that retrieving the data from memcache must be faster than assembling it from the database to see any benefit.
If your articles aren't dynamic, or you have simple dynamic changes after it's generated, consider writing out html or php to the disk. You could have an index.php page that looks on disk for the article, if it's there, it streams it to the client. If it isn't, it generates the article, writes it to the disk and sends it to the client. Deleting files from the disk would cause pages to be re-written. If a comment is added to an article, delete the cached copy -- it would be regenerated.
I'm a lead developer on a site with over 15M users. We have had very little scaling problems because we planned for it EARLY and scaled thoughtfully. Here are some of the strategies I can suggest from my experience.
SCHEMA
First off, denormalize your schemas. This means that rather than to have multiple relational tables, you should instead opt to have one big table. In general, joins are a waste of precious DB resources because doing multiple prepares and collation burns disk I/O's. Avoid them when you can.
The trade-off here is that you will be storing/pulling redundant data, but this is acceptable because data and intra-cage bandwidth is very cheap (bigger disks) whereas multiple prepare I/O's are orders of magnitude more expensive (more servers).
INDEXING
Make sure that your queries utilize at least one index. Beware though, that indexes will cost you if you write or update frequently. There are some experimental tricks to avoid this.
You can try adding additional columns that aren't indexed which run parallel to your columns that are indexed. Then you can have an offline process that writes the non-indexed columns over the indexed columns in batches. This way, you can control better when mySQL will need to recompute the index.
Avoid computed queries like a plague. If you must compute a query, try to do this once at write time.
CACHING
I highly recommend Memcached. It has been proven by the biggest players on the PHP stack (Facebook) and is very flexible. There are two methods to doing this, one is caching in your DB layer, the other is caching in your business logic layer.
The DB layer option would require caching the result of queries retrieved from the DB. You can hash your SQL query using md5() and use that as a lookup key before going to database. The upside to this is that it is pretty easy to implement. The downside (depending on implementation) is that you lose flexibility because you're treating all caching the same with regard to cache expiration.
In the shop I work in, we use business layer caching, which means each concrete class in our system controls its own caching schema and cache timeouts. This has worked pretty well for us, but be aware that items retrieved from DB may not be the same as items from cache, so you will have to update cache and DB together.
DATA SHARDING
Replication only gets you so far. Sooner than you expect, your writes will become a bottleneck. To compensate, make sure to support data sharding early as possible. You will likely want to shoot yourself later if you don't.
It is pretty simple to implement. Basically, you want to separate the key authority from the data storage. Use a global DB to store a mapping between primary keys and cluster ids. You query this mapping to get a cluster, and then query the cluster to get the data. You can cache the hell out of this lookup operation which will make it a negligible operation.
The downside to this is that it may be difficult to piece together data from multiple shards. But, you can engineer your way around that as well.
OFFLINE PROCESSING
Don't make the user wait for your backend if they don't have to. Build a job queue and move any processing that you can offline, doing it separate from the user's request.
I've worked on a few sites that get millions/hits/month backed by PHP & MySQL. Here are some basics:
Cache, cache, cache. Caching is one of the simplest and most effective ways to reduce load on your webserver and database. Cache page content, queries, expensive computation, anything that is I/O bound. Memcache is dead simple and effective.
Use multiple servers once you are maxed out. You can have multiple web servers and multiple database servers (with replication).
Reduce overall # of request to your webservers. This entails caching JS, CSS and images using expires headers. You can also move your static content to a CDN, which will speed up your user's experience.
Measure & benchmark. Run Nagios on your production machines and load test on your dev/qa server. You need to know when your server will catch on fire so you can prevent it.
I'd recommend reading Building Scalable Websites, it was written by one of the Flickr engineers and is a great reference.
Check out my blog post about scalability too, it has a lot of links to presentations about scaling with multiple languages and platforms:
http://www.ryandoherty.net/2008/07/13/unicorns-and-scalability/
Re: PDO / MySQLi / MySQLND
#gary
You cannot just say "don't use MySQLi" as they have different goals. PDO is almost like an abstraction layer (although it is not actually) and is designed to make it easy to use multiple database products whereas MySQLi is specific to MySQL conections. It is wrong to say that PDO is the modern access layer in the context of comparing it to MySQLi because your statement implies that the progression has been mysql -> mysqli -> PDO which is not the case.
The choice between MySQLi and PDO is simple - if you need to support multiple database products then you use PDO. If you're just using MySQL then you can choose between PDO and MySQLi.
So why would you choose MySQLi over PDO? See below...
#ross
You are correct about MySQLnd which is the newest MySQL core language level library, however it is not a replacement for MySQLi. MySQLi (as with PDO) remains the way you would interact with MySQL through your PHP code. Both of these use libmysql as the C client behind the PHP code. The problem is that libmysql is outside of the core PHP engine and that is where mysqlnd comes in i.e. it is a Native Driver which makes use of the core PHP internals to maximise efficiency, specifically where memory usage is concerned.
MySQLnd is being developed by MySQL themselves and has recently landed onto the PHP 5.3 branch which is in RC testing, ready for a release later this year. You will then be able to use MySQLnd with MySQLi...but not with PDO. This will give MySQLi a performance boost in many areas (not all) and will make it the best choice for MySQL interaction if you do not need the abstraction like capabilities of PDO.
That said, MySQLnd is now available in PHP 5.3 for PDO and so you can get the advantages of the performance enhancements from ND into PDO, however, PDO is still a generic database layer and so will be unlikely to be able to benefit as much from the enhancements in ND as MySQLi can.
Some useful benchmarks can be found here although they are from 2006. You also need to be aware of things like this option.
There are a lot of considerations that need to be taken into account when deciding between MySQLi and PDO. It reality it is not going to matter until you get to rediculously high request numbers and in that case, it makes more sense to be using an extension that has been specifically designed for MySQL rather than one which abstracts things and happens to provide a MySQL driver.
It is not a simple matter of which is best because each has advantages and disadvantages. You need to read the links I've provided and come up with your own decision, then test it and find out. I have used PDO in past projects and it is a good extension but my choice for pure performance would be MySQLi with the new MySQLND option compiled (when PHP 5.3 is released).
General
Do not try to optimize before you start to see real world load. You might guess right, but if you don't, you've wasted your time.
Use jmeter, xdebug or another tool to benchmark the site.
If load starts to be an issue, either object or data caching will likely be involved, so generally read up on caching options (memcached, MySQL caching options)
Code
Profile your code so that you know where the bottleneck is, and whether it's in code or the database
Databases
Use MYSQLi if portability to other databases is not vital, PDO otherwise
If benchmarks reveal the database is the issue, check the queries before you start caching. Use EXPLAIN to see where your queries are slowing down.
After the queries are optimized and the database is cached in some way, you may want to use multiple databases. Either replicating to multiple servers or sharding (splitting the data over multiple databases/servers) may be appropriate, depending on the data, the queries, and the kind of read/write behavior.
Caching
Plenty of writing has been done on caching code, objects, and data. Look up articles on APC, Zend Optimizer, memcached, QuickCache, JPCache. Do some of this before you really need to, and you'll be less concerned about starting off unoptimized.
APC and Zend Optimizer are opcode caches, they speed up PHP code by avoiding reparsing and recompilation of code. Generally simple to install, worth doing early.
Memcached is a generic cache, that you can use to cache queries, PHP functions or objects, or entire pages. Code must be specifically written to use it, which can be an involved process if there are no central points to handle creation, update and deletion of cached objects.
QuickCache and JPCache are file caches, otherwise similar to Memcached. The basic concept is simple, but also requires code and is easier with central points of creation, update and deletion.
Miscellaneous
Consider alternative web servers for high load. Servers like lighthttp and nginx can handle large amounts of traffic in much less memory than Apache, if you can sacrifice Apache's power and flexibility (or if you just don't need those things, which often, you don't).
Remember that hardware is surprisingly cheap these days, so be sure to cost out the effort to optimize a large block of code versus "let's buy a monster server."
Consider adding the "MySQL" and "scaling" tags to this question
APC is an absolute must. Not only does it make for a great caching system, but the gain from the auto-cached PHP files is a godsend. As for the multiple database idea, I don't think you would get much out of having different databases on the same server. It may give you a bit of a gain in speed during query time, but I doubt the effort it would take to deploy and maintain the code for all three while making sure they are in sync would be worth it.
I also highly recommend running Xdebug to find bottlenecks in your program. It made optimization a breeze for me.
Firstly, as I think Knuth said, "Premature optimization is the root of all evil". If you don't have to deal with these issues right now then don't, focus on delivering something that works correctly first. That being said, if the optimizations can't wait.
Try profiling your database queries, figure out what's slow and what happens alot and come up with an optimization strategy from that.
I would investigate Memcached as it's what a lot of the higher load sites use for efficiently caching content of all types, and the PHP object interface to it is quite nice.
Splitting up databases among servers and using some sort of load balancing technique (e.g. generate a random number between 1 and # redundant databases with necessary data - and use that number to determine which database server to connect to) can also be an excellent way to increase efficiency.
These have all worked out pretty well in the past for some fairly high load sites. Hope this helps to get you started :-)
Profiling your app with something like Xdebug (like tj9991 recommended) is definitely going to be a must. It doesn't make a whole lot of sense to just go around optimizing things blindly. Xdebug will help you find the real bottlenecks in your code so you can spend your optimization time wisely and fix chunks of code that are actually causing slow downs.
If you're using Apache, another utility that can help in testing is Siege. It will help you anticipate how your server and application will react to high loads by really putting it through its paces.
Any kind of opcode cache for PHP (like APC or one of the many others) will help a lot as well.
I run a website with 7-8 million page views a month. Not terribly much, but enough that our server felt the load. The solution we chose was simple: Memcache at the database level. This solution works well if the database load is your main problem.
We started out using Memcache to cache entire objects and the database results that were most frequently used. It did work, but it also introduced bugs (we might have avoided some of those if we had been more careful).
So we changed our approach. We built a database wrapper (with the exact same methods as our old database, so it was easy to switch), and then we subclassed it to provide memcached database access methods.
Now all you have to do is decide whether a query can use cached (and possibly out of date) results or not. Most of the queries run by the users are now fetched directly from Memcache. The exceptions are updates and inserts, which for the main website only happens because of logging. This rather simple measure reduced our server load by about 80%.
For what it's worth, caching is DIRT SIMPLE in PHP even without an extension/helper package like memcached.
All you need to do is create an output buffer using ob_start().
Create a global cache function. Call ob_start, pass the function as a callback. In the function, look for a cached version of the page. If exists, serve it and end.
If it doesn't exist, the script will continue processing. When it reaches the matching ob_end() it will call the function you specified. At that time, you just get the contents of the output buffer, drop them in a file, save the file, and end.
Add in some expiration/garbage collection.
And many people don't realize you can nest ob_start()/ob_end() calls. So if you're already using an output buffer to, say, parse in advertisements or do syntax highlighting or whatever, you can just nest another ob_start/ob_end call.
Thanks for the advice on PHP's caching extensions - could you explain reasons for using one over another? I've heard great things about memcached through IRC but have never heard of APC - what are your opinions on them? I assume using multiple caching systems is pretty counter-effective.
Actually, many do use APC and memcached together...
It looks like I was wrong. MySQLi is still being developed. But according to the article, PDO_MySQL is now being contributed to by the MySQL team. From the article:
The MySQL Improved Extension - mysqli
- is the flagship. It supports all features of the MySQL Server including
Charsets, Prepared Statements and
Stored Procedures. The driver offers a
hybrid API: you can use a procedural
or object-oriented programming style
based on your preference. mysqli comes
with PHP 5 and up. Note that the End
of life for PHP 4 is 2008-08-08.
The PHP Data Objects (PDO) are a
database access abstraction layer. PDO
allows you to use the same API calls
for various databases. PDO does not
offer any degree of SQL abstraction.
PDO_MYSQL is a MySQL driver for PDO.
PDO_MYSQL comes with PHP 5. As of PHP
5.3 MySQL developers actively contribute to it. The PDO benefit of a
unified API comes at the price that
MySQL specific features, for example
multiple statements, are not fully
supported through the unified API.
Please stop using the first MySQL
driver for PHP ever published:
ext/mysql. Since the introduction of
the MySQL Improved Extension - mysqli
- in 2004 with PHP 5 there is no reason to still use the oldest driver
around. ext/mysql does not support
Charsets, Prepared Statements and
Stored Procedures. It is limited to
the feature set of MySQL 4.0. Note
that the Extended Support for MySQL
4.0 ends at 2008-12-31. Don't limit yourself to the feature set of such
old software! Upgrade to mysqli, see
also Converting_to_MySQLi. mysql is in
maintenance only mode from our point
of view.
To me, it seems the article is biased towards MySQLi. I suppose I'm biased towards PDO.
I really like PDO over MySQLi. It's straight forward to me. The API is a lot closer to other languages I've programmed in. OO Database interfaces seem to work better.
I haven't come across any specific MySQL features that weren't available through PDO. I would be surprised if I ever did.
PDO is also very slow and its API is pretty complicated. No one in their sane mind should use it if portability is not a concern. And let's face it, in 99% of all webapps it is not. You just stick with MySQL or PostrgreSQL, or whatever it is you are working with.
As for the PHP question and what to take into account. I think premature optimization is the root of all evil. ;) Get your application done first, try to keep it clean when it comes to programming, do a little documentation and write unit tests. With all of the above you will have no issues refactoring code when the time comes. But first you want to be done and push it out to see how people react to it.
Sure pdo is nice, but there has been some controversy about it's performance versus mysql and mysqli, although it seems fixed now.
You should use pdo if you envision portability, but if not, mysqli should be the way. It has an OO interface, prepared statements, and most of what pdo offers (except, well, portability).
Plus, if performance is really needed, prepare for the (native mysql) MysqLnd driver in PHP 5.3, who will be much more tightly integrated with php, with better performance and improved memory usage (and statistics for performance tuning).
Memcache is nice if you have clustered servers (and YouTube-like load), but i'd try out APC first too.
A lot of good answers were given already, but I would like to point you to an alternate opcode cache called XCache. It is created by a lighty contributor.
Also, if you may need load balancing your database server in future, MySQL Proxy could very well help you to achieve this.
Both of those tools should plug into an existing application quite easily, so this optimization can be done when you need it, without too much hassle.
First question is how big do you really expect it to be? And how much do you plan on investing in your infrastructure. Since you feel the need to ask the question here, I'm guessing that you expect to start small on a limited budget.
Performance is irrelevant if the site is not available. And for availability you need horizontal scaling. The minimum you can sensibly get away with is 2 servers, both running apache, php and mysql. Set up one DBMS as a slave to the other. Do all the writes on the master, and all the reads on the local database (whatever that is) - unless for some reason you need to read back the data you've just read (use master). Make sure you've got the machinery in place to automatically promote the slave and fence the master. Use round-robin DNS for the webserver addresses to give more affinity for the slave node.
Partitioning your data across different database nodes at this stage is a very bad idea - however you might want to consider splitting it across different databases on the same server (which will facilitate partitioning across nodes when you overtake facebook).
Do make sure you've got the monitoring and data analysis tools in place to measure your sites performance and identify bottlenecks. Most performance problems can be fixed by writing better SQL / fixing the database schema.
Keeping your template cache on the database is a dumb idea - the database should be a central common repository for structured data. Keep your template cache on the local filesystem of your webservers - it will be available faster and won't slow down your database access.
Do use a op-code cache.
Spend plenty of time studying your site and its logs to understand why its going so slow.
Push as much caching as possible onto the client.
Use mod_gzip to compress everything you can.
C.
My first piece of advice is to think about this issue and keep it in mind when designing the site but don't go overboard. It's often difficult to predict the success of a new site and I your time will be better spent getting up finished early and optimising it later.
In general, Simple is fast.
Templates slow you down. Databases slow you down. Complex libraries slow you down. Layering templates over each other retrieving them from databases and parsing it in a complex library --> the time delays multiply with each other.
Once you have the basic site up and running do tests to show you where to spend your efforts. It's difficult to see where to target. Often to speed things up you will have to unravel the complexity of the code, this makes it larger and harder to maintain, so you only want to do it where necessary.
In my experience establishing the database connection was relatively expensive. If you can get away with it, don't connect to the database for general visitors on the most trafficed pages like the front page to the site. Creating multiple database connections is madness with very little benefit.
#Gary
Don't use MySQLi -- PDO is the 'modern' OO database access layer. The most important feature to use is placeholders in your queries. It's smart enough to use server side prepares and other optimizations for you as well.
I'm loking over PDO at the moment and it looks like you're right - however I know that MySQL are developing the MySQLd extension for PHP - I think to succeed either MySQL or MySQLi - what do you think about that?
#Ryan, Eric, tj9991
Thanks for the advice on PHP's caching extensions - could you explain reasons for using one over another? I've heard great things about memcached through IRC but have never heard of APC - what are your opinions on them? I assume using multiple caching systems is pretty counter-effective.
I will definitely be sorting out some profiling testers - thank you very much for your recommendations on those.
I don't see myself switching from MySQL anytime soon - so I guess I don't need the abstraction capabilities of PDO. Thanks for those articles DavidM, they've helped me a lot.
Look into mod_cache, an output cache for the Apache web server, simillar to the output caching in ASP.NET.
Yes, I can see that it's still experimental but it will be final someday.
I can't believe no-one has already mentioned this: Modularisation and Abstraction. If you think your site is going to have to grow to lots of machines, you must design it so it can! That means stupid things like don't assume the database is on localhost. It also means things that are going to be a bother at first, like writing a database abstraction layer (like PDO, but much much lighter because it only does what you need it to do).
And it means things like working with a framework. You will need layers to your code so that you can later gain performance by refactoring the data-abstraction layer, for example, by teaching it that some objects are in a different database -- and the code doesn't have to know or care.
Finally, be careful of memory-intensive operations, for example, unnecessary string copying. If you can keep PHP's memory usage down, then you will get more performance out of your webserver and this is something that will scale when you go to a load-balanced solution.
If you are working with large amounts of data, and caching isn't cutting it, look into Sphinx. We've had great results with using SphinxSearch not only for better text searching, but also as a data retrieval replacement for MySQL when dealing larger tables. If you use SphinxSE (MySQL plugin), it surpassed our performance gains we had from caching several times over, and application-implementation is a sinch.
The points made about cache are spot-on; it is the least complicated and most important part of building an efficient application. I'd like to add that while memcached is great, APC is about five times faster if your application lives on a single server.
The "Cache Performance Comparison" post at the MySQL performance blog has some interesting benchmarks on the subject - http://www.mysqlperformanceblog.com/2006/08/09/cache-performance-comparison/.

Categories