I would like to know what do you think about storing chat messages in a database?
I need to be able to bind other stuff to them (like files, or contacts) and using a database is the best way I see for now.
The same question comes for files, because they can be bound to chat messages, I have to store them in the database too..
With thousands of messages and files I wonder about performance drops and database size.
What do you think considering I'm using PHP with MySQL/Doctrine?
I think that it would be OK to store any textual information on the database (names, messages history, etc) provided that you structure your database properly. I have worked for big Web-sites (multi-kilo visits a day) and telecom companies that store information about their users (including their traffic statistics) on the databases that have grown up to hundreds of gigabytes and the applications were working fine.
But regarding binary information like images and files it would be better to store them on the file systems and store only their paths on the database, because it will be cheaper to read them off the disks that to tie a database process to reading a multi-megabyte file.
As I said, it is important that you do several things:
Structure you information properly - it is very important to properly design your database, properly divide it into tables and tables into fields with your performance goals in mind because this will form the basis for your application and queries. Get that wrong and your queries will be slow.
Make proper decisions on table engines pertinent to every table. This is an important step because it will greatly affect the performance of your queries. For example, MyISAM blocks reading access to the table while it is being updated. That will be a problem for a web application like a social networking or a news site because im many situations your users will basically have to wait for a information update to be completed before the will see a generated page.
Create proper indexes - very important for performance, especially for applications with rapidly growing big databases.
Measure performance of your queries as data grows and look for the ways to improve it - you will always find bottlenecks that have to be removed, this is an ongoing non-stop process. Every popular web application has to do it.
I think a NoSQL database like CouchDB or MongtoDB is an option. You can also store the files separate and link them via a known filename but it depends on your system architecture.
Related
I've recently taken over a project linking to a large MySQL DB that was originally designed many years ago and need some help.
Currently the DB has 5 tables per client that store their users information, transaction history, logs etc. However we currently have ~900 clients that have applied to use our services, with an average of 5 new clients applying weekly. So the DB has grown to nearly 5000 tables and ever increasing. Many of our clients do not end up using our services so their tables are all empty but still in the DB.
The original DB designer says it was created this way so if a table was ever compromised it would not reveal information on any other client.
As I'm redesigning the project in PHP I'm thinking of redesigning the DB to have an overall user, transaction history, log etc tables using the clients unique id to reference them.
Would this approach be correct or should the DB stay as is?
Could you see any possible security / performance concerns
Thanks for all your help
You should redesign the system to have just five tables, with a separate column identifying which client the row pertains to. SQL handles large tables well, so you shouldn't have to worry about performance. In fact, having many, many tables can be a hinderance to performance in many cases.
This has many advantages. You will be able to optimize the table structures for all clients at once. No more trying to add an index to 300 tables to meet some performance objective. Managing the database, managing the tables, backing things up -- all of these should be easier with a single table.
You may find that the database even gets smaller in size. This is because, on average, each of those thousands of tables has a half-paged filled at the end. This will go from thousands of half-pages to just one.
The one downside is security. It is easier to put security on tables than one rows in tables. If this is a concern, you may need to think about these requirements.
This may just be a matter of taste, but I would find it far more natural - and thus maintainable - to store this information in as few tables as possible. Also most if not all database ORMs will be expecting a structure like this, and there is no reason to reinvent that wheel.
From the perspective of security, it sounds like this project could be described as a web app. Obviously I don't know the realities of the business logic you're dealing with, but it seems like regardless of the table permissions all access to the database would be via the code base, in which case the app itself needs full permissions for all tables - nullifying any advantage of keeping the tables separated.
If there is a compelling reason for the security measures - say, different services that feed data into the DB independently of the web app, I would still explore ways to handle that authentication at the application layer instead of at the database layer. It will be much easier to handle your security rules in that way. Instead of having rules set in 5000+ different places, a single security rule of 'only let a user view a row of data if their user id equals the user_id column" is far simpler, easier to understand, and therefore far more maintainable (and possibly more secure).
Different people approach databases in different ways. I am a web developer, so I view databases as the place to store my data and nothing more, as it's always a dedicated and generally single-purpose DB installation, and I handle all other logic at the application level. There are people who view databases as the application itself, who make far more extensive use of built-in security features for their massive, distributed, multi-user systems - but I honestly don't know enough about those scenarios to comment on exactly where that line should be drawn.
I am creating a record system for my site which will track users and how they interact with my site's pages. This system will record button clicks, page view times, and the method used to navigate away from a page (among other things.) I an considering one of two options:
create a log file and append a string to it for each action.
create a database table and save entries based on user interaction.
Although I am sure that both methods could easily fill my needs, which would be better in the long run. Other considerations:
General page viewing will never cause this data to be read (only added to it.)
Old Data should be archived, but still accessible.
Data will be viewed and searched via web app
As with most performance questions, the answer is 'It depends.'
I would expect it depends on the file system, media type, and operating system of your server.
I don't believe I've ever experienced performance differences INSERTing data into a large, or a small MySQL database. The performance differences manifest when you retrieve that data. The database will almost always outperform queries to files, especially when you want complex or statistical data.
If you are only concerned with the speed of inserting/appending data, and expect a large amount of traffic, build a mock environment and benchmark each approach. If you want to have any amount of speed retrieving that data in a structured way, go with the database.
If you want performance you should inspect the server log, instead of trying to build your log system...
I´m new on php/mysql, and i´m codding a simple CMS. But in this case i will host multiple companies (each company with their multiple users), that pays a fee to use the system.
So... My question is about how to organize the Data Base... Talking about security, management and performance, i just want to know the opinion of ou guys of wich of these cases is the best:
Host all companies on a single DB and they get a company id to match with the users.
Each company have a separated DB that holds the users in there (and dont need the companies id anymore).
I would start the development following the first situation... But than i thought if i have some hacker attack / sql injection, every client would be harmed. Having separated DBs, the damage will get only one client. So maybe the 2nd situation could be better in terms of security. But could not say the same about management and performance.
So, based on your experience, any help or tip would be great!
Thanks in advance, and sorry about my poor english.
I would go for seperate DBs. But not only for hacking.
Scalability:
Lets say you have a server that handles 10 websites, but 1 of those websites in growing fast in requests, content, etc. Your server is having a hard time to host all of them.
With seperate DB's it is a piece of cake to spread over multiple servers. With a single one you would have to upgrade you current DB or cluster it, but that is sometimes not possible with the hosting company or very expensive.
Performance:
You they are all on 1 DB and data of multiple users is in 1 table, locks might slow down other users.
Large tables, mean large indices, large lookups, etc. So splitting to diffrent DB's would actualy speed that up.
You would have to deal with extra memory and CPU overhead per DB but they normaly do not have an amazingly large impact.
And yes, management for multiple DBs is more work, but having proper update scripts and keeping a good eye on the versions of the DB schema will reduce your management concerns a lot.
Update: also see this article.
http://msdn.microsoft.com/en-us/library/aa479086.aspx
Separate DBs has many advantages including performance, security, scalability, mobility, etc. There is more risk less reward trying to pack everything into 1 database especially when you are talking about separate companies data.
You haven't provided any details, but generally speaking, I would opt for separate databases.
Using an autonomous database for every client allows a finer degree of control, as it would be possible to manage/backup/trash/etc. them individually, without affecting the others. It would also require less grooming, as data is easier to be distinguished, and one database cannot break the others.
Not to mention it would make the development process easier -- note that separate databases mean that you don't have to always verify the "owner" of the rows.
If you plan to have this database hosted in a cloud environment such as Azure databases where resources are (relatively) cheap, clients are running the same code base, the database schema is the same (obviously), and there is the possibility of sharing some data between the companies then a multi-tenant database may be the way to go. For anything else you, you will probably be creating a lot of extra work going with a multi-tenant database.
Keep in mind that if you go the separate databases route, trying to migrate to a multi-tenant cloud solution later on is a HUGE task. I only mention this because all I've been hearing for the past few years around the IT water coolers is "Cloud! Cloud! Cloud!".
I've got a gaming-oriented website with 200+ users. The site has a large database tracking user plays, and one of the motivations for continued participation is the extensive statistics and rankings (S&R) with which the site provides the user.
As the list of S&Rs tracked has grown, some of the more intricate calculations have been moved to tables within the database, rather than be generated on-the-fly in order to improve page loading speed.
However, I plan to move from extensive to exhaustive S&Rs by the end of the year, increasing the overall number of datapoints available to the user by a factor of 10. I've already decided to stop doing on-the-fly queries and to move all the calculations to a cron job, but I'm unsure where to store the data.
Given a user base <1000, would it make more sense to place this data within the database or read/write a text file for each user's stats?
These are the main pros and cons in my mind:
Storing S&Rs in the Database
+ cross-user comparisons are easy and fast
+ faster cron jobs because there's no need to write to many, many files
- database table count will jump from ~50 to 200+ (at least)
- one point of failure (database corruption) for all site data
- modifying S&R structure requires modifying database as well
Storing S&Rs in Text Files
+ neatly organized and distributes data corruption risk
+ database is easier to navigate
+ redesigning S&R structure is done by simply modifying script and
overwriting all text files, rather than adjusting database tables
- cron job will have to read/update XXX files each time
- cross-user comparisons are difficult and time-consuming
But I've never done something of this magnitude before, so I'm not really sure (for example) if a 200+ table MySQL database is even really a problem?
I'd appreciate any suggestions you can provide! :-)
Any popular database software should be able to handle millions of entries, having 200+ tables is not an issue on that end.
Corruption is unlikely, but on a site of that nature you should be doing backups fairly frequently, and preferable storing a copy outside the server - using individual files distributes and decreases the likelihood of a general failure, but there's also a small chance of problems ocurring.
Database software excels at performing tasks on it's data, using flat files would probably force you to write your own method to process them, and this could easily prove to be a major task, at the extra cost of a loss of speed compared to using a database (I'm just assuming this, I might be very wrong).
I am currently in a debate with a coworker about the best practices concerning the database design of a PHP web application we're creating. The application is designed for businesses, and each company that signs up will have multiple users using the application.
My design methodology is to create a new database for every company that signs up. This way everything is sand-boxed, modular, and small. My coworkers philosophy is to put everyone into one database. His argument is that if we have 1000+ companies sign up, we wind up with 1000+ databases to deal with. Not to mention the mess that doing Business Intelligence becomes.
For the sake of example, assume that the application is an order entry system. With separate databases, table size can remain manageable even if each company is doing 100+ orders a day. In a single-bucket application, tables can get very big very quickly.
Is there a best practice for this? I tried hunting around the web, but haven't had much success. Links, whitepapers, and presentations welcome.
Thanks in advance,
The1Rob
I talked to the database architect from wordpress.com, the hosting service for WordPress. He said that they started out with one database, hosting all customers together. The content of a single blog site really isn't that much, after all. It stands to reason that a single database is more manageable.
This did work well for them until they got hundreds and thousands of customers, they realized that they needed to scale out, running multiple physical servers and hosting a subset of their customers on each server. When they add a server, it would be easy to migrate individual customers to the new server, but harder to separate data within a single database that belongs to an individual customer's blog.
As customers come and go, and some customers' blogs have high-volume activity while others go stale, the rebalancing over multiple servers becomes an even more complex maintenance job. Monitoring size and activity per individual database is easier too.
Likewise doing a database backup or restore of a single database containing terrabytes of data, versus individual database backups and restores of a few megabytes each, is an important factor. Consider: a customer calls and says their data got SNAFU'd due to some bad data entry, and could you please restore the data from yesterday's backup? How would you restore one customer's data if all your customers share a single database?
Eventually they decided that splitting into a separate database per customer, though complex to manage, offered them greater flexibility and they re-architected their hosting service to this model.
So, while from a data modeling perspective it seems like the right thing to do to keep everything in a single database, some database administration tasks become easier as you pass a certain breakpoint of data volume.
I would never create a new database for each company. If you want a modular design, you can create this using tables and properly connected primary and secondary keys. This is where i learned about database normalization and I'm sure it will help you out here.
This is the method I would use. SQL Article
I'd have to agree with your co-worker. Relational databases are designed to handle large amounts of data, and the numbers you're talking about (1000+ companies, multiple users per company, 100+ orders/day) are well within the expected bounds. Separate databases means:
multiple database connections in each script (memory and speed penalty)
maintenance is harder (DB systems generally do not provide tools for acting on databases as a group) so schema changes, backups, and similar tasks will be more difficult
harder to run queries on data from multiple companies
If your site becomes huge, you may eventually need to distribute your data across multiple servers. Deal with that when it happens. To start out that way for performance reasons sounds like premature optimization.
I haven't personally dealt with this situation, but I would think that if you want to do business intelligence, you should aggregate the data into an offline database that you can then run any analysis you want on.
Also, keeping them in separate databases makes it easier to partition across servers (which you will likely have to do if you have 1000+ customers) without resorting to messy replication technologies.
I had a similar question a while back and came to the conclusion that a single database is drastically more manageable. Right now, we have multiple databases (around 10) and it is already becoming a pain to manage especially when we upgrade the code. We have to migrate every single database.
The upside is that the data is segregated cleanly. Due to the sensitivity of our data, this is a good thing, but it does make it quite a bit more difficult to keep up with.
The separate database methodology has a very big advance over the other:
+ You could broke it up into smaller groups, this architecture scales much better.
+ You could make stand alone servers in an easy way.
That depends on how likely your schemas are to change. If they ever have to change, will you be able to safely make those changes to 1000 separate databases? If a scalability problem is found with your design, how are you going to fix it for 1000 databases?
We run a SaaS (Software-as-a-Service) business with a large number of customers and have elected to keep all customers in the same database. Managing 1000's of separate databases is an operational nightmare.
You do have to be very diligent creating your data model and the business objects / reporting queries that access them. One approach you may want to consider is to carry the company ID in every table and ensure that every WHERE clause includes the company ID for the currently logged-in user. If you use a data access layer, you can enforce that condition there.
As you grow large, you can still vertically partition by placing groups of companies on each physical server, e.g. the first 100 companies on Server A, the next 100 companies on Server B.