Should I use Cassandra for a 100,000 user project? In MySQL 5, I have full-text search and table partitioning. I'm starting a Q&A system like SO with CodeIgniter. It's a move from vBulletin to a new system. In the old vBulletin system I had 100,000 users, with a total post count around 80,000. In the next 3 or 4 years, I expect there will be more and more users and posts both. So, should I use Cassandra instead of MySQL 5?
If I use Cassandra, I need to change from Grid-Service to Dedicated-Virtual hosting at Media Temple. Because Cassandra is not provided as part of a hosting system, I need to use a VPS or DV server solution. If I use MySQL, hosting is not a problem, but then what about performances, search speed.
By the way, what database is Stack Overflow using?
From the information you provided, I would suggest to stick to MySQL.
Just as a side-note, Facebook was using MySQL at first, and eventually moved to Cassandra only after it was storing over 7 Terabytes of inbox data, for over 100 million users.
Source: Lakshman, Malik: Cassandra - A Decentralized Structured Storage System.
Wikipedia also handles hundreds of Gigabytes of text data in MySQL.
You say 100,000 users - but how many concurrent users?
Cassandra is not built in hosting system
Using a hosted service on a single server suggests a very small scale operation - and your obviously limited by your budget. There's certainly no advantage running Cassandra on a single server node.
In mysql 5 have full text search
Which is not a very scalable solution - you should definitely think about using a normalized search (which I believe you'd have to do if you were migrating to Cassandra anyway).
Given that you can comfortably scale the MySQL solution to multiple databases using replication before you even think about fully clustered solution, and you obviously don't have the budget to do your own hosting, migrating to Cassandra seems like a massive overkill.
I would NOT recommend you using cassandra in your case for the following reasons:
Cassandra needs good understanding of the application you're building. It will be much harder to make changes and to run complex queries against data stored in cassandra. SQL is more flexible and easier to maintain. Cassandra is good when you need to store huge amounts of data and when you know exactly how the data stored in cassandra will be accessed and sorted.
Mysql works fine for millions of rows if properly indexes are built.
If you hit some bottlenecks in the future with mysql, you may look at what exactly your problems are and scale them using cassandra. I mean you must be able to combine both approaches: SQL and noSQL in the same project.
With regards to mysql full-text index I can say that it's useless. I mean that it works too bad to be used in high-loaded projects. Look at sphinxsearch.com, which is a great implementation of full-text search made for sql databases.
But if you expect that your system grows fast and is going to serve millions of users, you should consider cassandra since the beginning.
Related
im currently in the face of considering what to use for building a piece of software - The system needs to handle complexity like:
- User Management (ex: Trainer Login - Client login)
Different dashboards (Depending on user profile)
Workout Builder (Trainer must be able to create workout programs and send(email) and attach (Client can see workout program in system) the program to a client)
Diet Plans (much like the above)
Workout Library
Booking/Calendar (Client should be able to book a trainer)
Training Logs etc...
As you can see, there would be alot of relations/bindings etc, and personlization (Dashboards) etc... I think you get the idea :) - However, im a Frontend Developer, I do have php experience and mySQL (However a long time ago) - So the question is... Is this system possible to build completely with ex: Angular, Express, Mongo and Node - Or would I have to depend on a database system like mySQL and use ex: PHP for the system ?
Thx in advance for any answers :)
In my opinion, if your hands on experience with PHP and MySQL is good enough you should go ahead and deploy your application with PHP and MySQL with MongoDB as an additional database.
I understand that MEAN stack can power up your complete app, but the development time would be longer, and for what I have felt while using MongoDB over petabytes of data is that MongoDB is amazingly great for storing complex data in a flat architecture in massive size. But just like all databases, even MongoDB has certain constraints.
You should go ahead with MySQL for your usual Login credentials and minor activities, for storing Diet Plans, Workout Libraries use MongoDB. Because that gives you a flexibility of the varying document structure and high availability. Over the time you will find MongoDB easier to work upon than MySQL.
Using MEAN Stack is great. But, now I prefer to use a mixed architecture of MySQL, MongoDB, and PostGres. If you are going to use any framework it would probably have ACL in it or available as an add-on, and that could help you with building permissions and roles of users.
Also, if you are using MongoDB, make sure you code according to the engine MMAP or WiredTiger, I had to do a major recoding because of the storage engine changes. Just a heads up!
Yes, it is possible to build on pure JavaScript stack like MEAN: MongoDB, Angular, Express, Node.js
Everything that MySQL does, MongoDB can do also. The question is only in proper database design and performance for specific use cases.
Ok I do have a small messaging site for my client. Well its more likely a post-comment system(created in PHP). Now my client want a system that can comment to another existing comment and add some features like liking and tagging. Another thing is the existing system is heavily used by my client in his company as they use it like a skype chat(that makes it write-read intensive). well my client want's to use open source software as possible. so I used mysql community edition.
Too much about my story... So I had a 1 week research about NoSql databases and I found it right for my requirements as my client wants to add features (that means adding and adding columns and tables from time to time.) Now these are nosql database systems that caught my eye.(well if you can suggest other nosql database system its ok)
MongoDB
CouchDB
Redis
Now my question is which of the three is good for my situation? I also read some bad things about those 3 nosql databases
MongoDB is crappy on its 2.x version
CouchDB is slow (my client doesn't want slow)
Redis is memory-based so it just writes on the disk on certain intervals. so when the system crash in the middle of the interval then the data is lost
I want to have some opinions about this and any advice that can help me to cope up with my upcoming situation
MongoDB is a popular solution to this, and my personal preference. The great thing about Mongo (besides being schemaless) is that you can have nested/embedded documents. So for example, you can have a comment which has an array of sub-comments which each have their own arrays of sub-comments. I don't know of any other datastore that has that feature. It's also fast.
CouchDB has some nice features, but Mongo is so similar and much better.
Redis is very different from the other two. It's used mostly as an alternative to memcached. So it's primarily used for temporary data. Although it has some nice pubsub features built in. A lot of people use both MongoDB and Redis, but for different things.
I run a website that uses a database, but not intensively, on a WAMP configuration. I currently use MS Access: We have a small database, < 4MB max, that can be downloaded for easy backup and emailed to organization members for completing tasks in the MS Access software (like generating reports, etc.). However, it requires MS Office software and isn't exactly standard use with PHP.
On the other hand, our host provides MySQL, which is typical with PHP, generally more powerful, has a greater availability of software and support, but backup can be a little messier.
But, MySQL is not hosted on the local host. So, I copied the information to MySQL, and made a copy of the site using the MySQL database. I proceeded to run some benchmarks, and surprisingly, MS Access was faster, marginally.
I am not sure which is the best direction to take at this point. Hoping the community can give some pros and cons that I haven't though about.
Since Access is way simpler, it's not surprising that rough benchmarking reveals it's faster. The difference comes when you have to deal with concurrent sessions and large data sets. Desktop apps are normally used by a single process at a time but in web applications concurrent queries are the norm.
Said that, if you've been using Access for a while and you didn't find issues, I don't think that switching to MySQL is going to make any difference regarding performance. I'd think about other considerations:
Would you like to have Linux hosting as an option?
Are you proficient enough with MySQL as to migrate code in a reasonable timespan and with reasonable quality?
Can you replace those reports with plain HTML listings?
BTW, MySQL backups can be automated with a simple command line script, it should not be messy at all.
One pro that MS Access is already offering you is a client interface. You've mentioned users that are "generating reports, etc.". Unless you already have an alternative in place that will do everything they need, switching to MySQL will likely be a no-win situation.
I'd stick with Access database for such a small scale project! There's no need to move onto a bigger technology for the hell of it - put it this way, if you had 4 kids, and a bus came up for sale, would you buy the bus because you can fit your 4 kids in it?
One big advantage of MySQL IMO is that PHP has built in support for MySQL. You can use ODBC with PHP to connect to MS Access but it's one more thing to set up and one more thing to 'break' at some point.
Could you set up MySQL on the host? Is it likely that your database would grow and become more complex in the near future?
Access is ideal for us: several accountants using it in our accounting work in the same room but not through the internet, and none of us is programmer. The only thing to think about is the fee for Access copy-wright.
Mysql is free, yes, that is great, but Mysql lacks stored queries, forms and reports, and the quick "on_click, on_doubleclick..." functions that are extremely useful and easy to handle in Access. Are there ways to solve this problem. Thank you.
I am developing a big application using PHP. Is MySQL or SQL Server the best one to use?
Neither. Use PostgreSQL. :)
Honestly though, PostgreSQL scales much better than MySQL. I don't know what you mean by "enterprise", but I figure scaling is important for a "big" web application, as you put it, and PostgreSQL does that very well. MySQL can't handle too many concurrent connections. (Though if that isn't an issue for you, go with MySQL for ease of use.)
MySQL and PHP work well together. I'd recommend that combination.
I'd much rather choose an open-source solution rather than rely on MS. That said, you can go with PostgreSQL as well if you need to, or your requirements gear you toward it. We would need more details to know what you truly require.
While this is a bit subjective, I would suggest going with MySQL.
The reason I say this is because traditionally you see people go with a LAMP setup. LAMP of course being Linux + Apache + MySQL + PHP
PHP has some great build in functionality for dealing with MySQL Databases, therefore it may be easier for you. Then you'll also have the ability to do some web based work with PhpMyAdmin tying a web interface to your Database
Use the one you and your team has most experience in terms of both development and administration.
If you start from scratch, I would go with PostgreSQL.
Between your choices I would go for SQL Server, especially if you are working in Windows environment.
It will depend on your application's needs. I'm not especially well researched on the differences between the various SQL engines, but as far as I know, MySQL is faster for SELECT queries (if you have a predominantly read-only type app). On the other hand, MSSQL and PostgreSQL both have better support for transactions, and perhaps also better performance if you have lots of inserts/updates happening. Also, MSSQL and PostgreSQL are said to scale better, but there are various successful applications that seem to do fine with MySQL (Facebook and Flickr as examples).
MySQL and SQL Server Express are free for production use. In my view the best advice is to try them both and decide for yourself. A lot of folks can live quite happily with a lightweight RDBMS where solutions like MySQL/Express may be appropriate.
From a purely technical point of view all of the major RDBMS vendors (Oracle, Sybase, DB2, SQL Server et al.) are significantly more capable than MySQL is currently or can reasonably be expected to be in the foreseeable future.
This does not mean you should not use MySQL for a particular job. A good analogy is continuing to use a version of Microsoft office released years ago. For most people the old version does everything they would ever want even though the newer version is "better" and has more features.
MySQL is certainly better to work with PHP. But MS is putting a huge effort in better supporting PHP on Windows platforms.
SQL Server is DEFINITELY the better choice for large enterprise solutions since there's better cluster and management support. We use MySQL for cost reasons, but i would really like some easier management and cluster support.
On the other hand it's like with computers: many features you need to compare if they suit your needs - and your purse.
If you are doing a one-man-show: Step away from SQL Server. It is only suitable for enterprises. Take MySQL or PostgreSQL.
For most IT directors a big decision is going to be which can you get the best support for in your area / online / already have in-house and which can you get the most uptime for. Ongoing costs are usually higher than deployment costs so its probably not worth worrying about license costs; unless you are into ia64 or better type systems anyway when the CPU count starts to make SQL look eye-wateringly expensive.
It's like deciding what computer to get, they are by now pretty much the same no matter what brand you pick. It's pretty much the same for databases, they all support most of the things that you need for lightweight webapplications.
I have used MySQL to all my php applications so far and had no problems whatsoever. I have wanted to test out PostgreSQL several times but never got to it, but I have heard very good things about it. I never touch MS products however, so no opinion (Not that I am allergic, I'm just stingy.).
I've dabbled with MySQL and personally, I find it vastly inferior to better RDBMSs like Postgres; while I admit it's come a long way and improved, even the latest version to my knowledge does not even support CHECK constraints to verify data integrity (it allows the keyword but doesn't do anything with it).
As someone who is looking at switching away from Microsoft technologies and into open source, I am appalled by the sheer number of PHP-backed applications that will only work with MySQL as the underlying database. A number of these apps are really good and would save a lot of work in development, but the fact they haven't been abstracted to be database agnostic is usually a deal-killer for me and my technical associates.
So I am curious - I understand why MySQL is so popular and why it's almost always used with PHP, but why do so many PHP-backed sites refuse to be properly developed to allow for other databases, but instead force MySQL when there are much better and more "database-like" options out there? I'm getting increasingly frustrated by these apps that I want to use, but they only work with MySQL and I won't bring myself to use it because personally I find Postgres a much better database, and because I personally feel that your database should enforce it's own constraints instead of doing this only at the code level.
I realize MySQL is popular, and it's not a bad system, but I hate when I find a great application and it'll only work when the database is MySQL because the developers used MySQL-specific modules and/or syntax.
I'm sure its the same reason there's so many ASP.NET stuff that only supports MSSQL. Its the traditional database paired with the language just by convention. Plus using/building database independent solutions is hard and one of those things that "you aint gonna need" when most so many other people follow that convention. When its needed its one of those things that can be "page faulted" in.
If you need to get a php app to use another DB, the php is probably open source, perhaps you can do the work yourself.
Cross-platform support, as long as SQL is concerned, is like a duck.
You know, a duck can walk, can fly and can swim — and does all this equally bad.
It's much better to stick to one platform and develop a well-optimized application, then to try to satisfy everybody, satisfying noboby in fact.
Most PHP developers develop with PHP because it's free, easy to get going, and powerful. All of the same qualities are shared with MySQL, so it's a natural fit.
That being said, many professional developers create data-abstraction layers that would allow them easy integration with other backends. But most projects don't need those types of things.
It's mostly the logical end result of the fact that almost all PHP-capable shared hosting services offer MySQL and only MySQL. The extra work to abstract the database is often deemed unnecessary when almost nobody using the application needs it.
LAMP is an extremely common development stack. Common enough that even people who don't use PHP know what LAMP stands for.
For those who don't know (all 1 of you), LAMP most commonly stands for Linux, Apache, MySQL, and PHP.
I think the key point is exactly what you said, "it's almost always used with PHP". By developing for MySQL, they're maximizing their target audience. Yes, it'd be ideal if they developed it to be able to work with multiple databases, but that can be a fair amount of extra work. Lots of these projects just grow from someone's personal project, which was probably not initially designed to be compatible with multiple engines. Once they're pretty far in, it starts to turn into a major job to rewrite the code to support multiple database systems, and there's usually other features/fixes that their users would rather have.
I also greatly prefer pgsql, but I think if you're planning to use other peoples' PHP applications (forums, blogs, etc), it's just a reality that you're probably going to have to run MySQL to support them.
Back in old times there was a huge difference in the ease of use. MySQL was easy to use and very fast for simple task. Back then it didn't provide full ACID, nor triggers, nor subselects, nor procedures. On the other hand you had PostgreSQL (called Postgres back then), which was much slower, complicated to install and mantain, but provided full power of real RDBMS. The thing is, that the web apps didn't really need full power of RDBMS, so MySQL gained huge popularity, while PostgreSQL was used by few.
Ah, one more thing: as of PHP5 SQLite comes embedded. So I expect that pretty soon a lot of new PHP apps that don't really need full blown RDBMS will use SQLite, rather than MySQL.
You're right, PostgreSQL has much better support for SQL and other advanced features, so there's a very good case for why PostgreSQL is superior to MySQL.
However, MySQL is so much easier to install and manage for someone who is just getting started, that it gains a lot of adoption relative to PostgreSQL. Simple tasks like configuring a login and giving it specific privileges are very confusing on a PostgreSQL server, compared to MySQL.
Also, there were a few years early on where MySQL offered native binaries for Windows but PostgreSQL did not. You could get it to work under Cygwin, but that's hardly satisfying for a real Windows developer. By the time PostgreSQL did support Windows natively, MySQL had a substantial lead in market share and name recognition.
BTW: http://www.postgresql.org/support/professional_hosting_northamerica
IMO the big problem with so many MySQL-only sites is that MySQL doesn't support half the features of a "real" database, so if you need data integrity you're pretty much screwed and will have to write your own software instead of taking advantage of existing solutions, or compromise your application and don't have any real integrity checks at the database level. You end up between a rock and a hard place.
We're wanting our cake and eating it too with this question. First, we want database abstraction. Then, we want CHECK constraints in the RDBMS we choose to use behind that abstraction.
Huh? That means we'll neglect to do data checking in the PHP itself, and things will break using databases without the CHECKs. Either that or we WILL implement the checks in PHP to support an abstracted database without CHECKs, doing twice the work.
I think full database abstraction isn't worth the effort, and is mostly a solution in search of a problem.