I am thinking about writing a quick chat application for a client to help them solve some of their communication needs. Clearly, writing a simple chat is no brainer, but the company have serious scaling needs, so it is probably a good idea to build the service on a noSQL storage from the beginning.
Besides the obvious lack of transactions, which isn't one of our concerns, is it a good idea to use a noSQL storage for a chat?
MongoDB should be good enough if you're after scalability and performance. Most SQL engines would be overkill for this stuff. I doubt if you need complex data aggregation and other queries for chat data. Even with that, MongoDB has map-reduce capability to help you along.
NoSQL ist used if you have no fixed data model, this applies to document oriented applications where you have to store objects and documents where each one may have a different structure.
I don't think this is the case in your situation, since a chat log has a well defined fixed data model for example (user, time, text). I think a traditional SQL database may be the right fit for you. If used on client side only, SQLite will be the best fit, since there is no need to install or configure, simply redistribute the SQLite dll. Also the footprint is very small.
I would say no. SQLite is included in PHP... why not just use that? Or better still, why not use one of the hundreds of chat applications that already exist, and save yourself a whole load of development time.
Related
Is there currently any technology that would separate storage from business logic and allow me, to easily switch from MySQL do MongoDB? (I assume I migrate the data myself, or start with an empty database after the switch).
I would like the change to be as easy as changing the configuration, the driver, and the db connection data.
I understand it is possible for PHP with Doctrine, switching between different RDBMS, but I'm interested in switching from any RDBMS to a NoSQL system.
I am focusing on PHP now - but if you know any solutions for other programming languages - I will be happy to learn about them.
I am assuming not a complex database, no transactions, no complex relations.
More background/details
I am writing a simple crawler that will visit websites, and read some data and save it to the DB. It is super simple and I might go with pdo_mysql for PHP only. I am considering an extra layer only to cover the situation in case I want to switch from MySQL to MongoDB one day - and I asking if this is even possible.
Update
I think that Laravel with its Eloquent supports MySQL out of the box, and with an extra plugin: https://github.com/jenssegers/laravel-mongodb supports MongoDB - I will check if this is truly transparent from the programmer's perspective. Unless someone has experience and knows right away?
A typical approach here is the DAO or Repository design pattern. In this pattern you provide a library which is responsible for persistence and returns business objects. The objects do not have persistence logic and get stashed in the Repository or retrieved from the repository.
This being said, that only works for business objects. The problems will come up when you want to redo reporting or the like.... Here the differences between an RDBMS and a NoSQL solution will bite you hard.
To be sure I don't know what a use case is given that "what do you want to do with your data" is a massive concern in selecting between these.
I'm staring to build a system for working with native languages, tags and such data in Yii Framework.
I already choose MongoDB for storing my data as I think it feets nicelly and will get better performance with less costs (the database will have huge amounts of data).
My question regards user authentication, payments, etc... This are sensitive bits of information and areas where I think the data is relational.
So:
1. Would you use two different db systems? Should I need them or I'm I complicating this?
2. If you recommend the two db approach how would I achieve that in Yii?
Thanks for your time!
PS: I do not intend this question to be another endless discussion between the relational vs non-relational folks. Having said that I think that my data feets mongo but if you have something to say about that go ahead ;)
You might be interested in this presentation on OpenSky's infrastructure, where MongoDB is used alongside MySQL. Mongo was utilized mainly for CMS-type data where a flexible schema was useful, and they relied upon MySQL for transactions (e.g. customer orders, payments). If you end up using the Doctrine library, you'll find that the ORM (for SQL databases) and MongoDB ODM share a similar API, which should make the experimentation process easier.
I wouldn't shy away from using MongoDB to store user data, though, as that's often a record that can benefit from embedded document storage (e.g. storing multiple billing/shipping addresses within a single user document). If anything, Mongo should be flexible enough to enable you to develop your application without worrying about schema changes due to evolving product requirements. As those requirements become more clear, you'll be able to make a decision based on the app's performance needs and types of database queries you end up needing.
There is no harm in using multiple databases (if you really need), many big websites are using multiple databases so go a head and start your project.
Consider the creation of high traffic PHP web-site with many parallel users. Which is the best possible MySQL abstraction (ORM or OODBMS) in terms of effectiveness (15-20 database tables with sum of about 100000 items and JOIN queries between no more than 4 tables)?
Somewhere I heard that Doctrine libraries are appropriate or I should use framework like Zend? Which of these database solutions are build over PDO and don't require much learning (at this time I'm using pure PHP)?
Regardless of the DB solution you should look at using a system like MemCached. With the proper caching strategy you will significantly reduce the load your databases are putting on your server.
There is a PHP API for memcached here
ORM or any data modeling layer will never get you better performance. Their sole purposes is to make your development time faster and easier to maintain. They are notoriously bad at decision making when it comes to actually using relationships appropriately and end up querying all tables in order to find the correct data. At that level of complex queries you are not going to be able to abstract away these relationships without sacrificing performance.
MySQL is fine for up to a couple million records at least (I've used it for over 100 million in a single table). For performance sake you generally want to have at least a master/slave setup and some method of distributing reads between them. The database will almost always be the limiting factor in performance. You can always add in more web servers and get a load balance in front of them to solve the other side of things but the database setup is always a little harder to maintain.
You have to think about why you want to use an ORM. If its for development reasons, that's fine, but be coginiscent that your performance will suffer. Otherwise stick to queries. An ORM adds a third layer of code to deal with and learn. If you know PHP and MySQL, do you need to learn a 3rd language to use them effectively? Most often the answer is no.
You have many options to choose from but be aware that at some point the framework/ORM you choose will not behave the way you want it to and to get it to behave to your desires you will have to do a lot of searching and digging through code. It's the classic problem - save time up front and pay for it later or spend time up front with no possible payoff later.
ORM solutions will be able to optimize some aspects, if you cache query data and use the object API in a planned and deliberate way.
Column / document[nosql : hbase,mongo] databases will improve performance if you have lots (millions+) of records, and are still growing.
Memcached will help if you have a lot of spare memory and especially if there are a lot of repetitious queries being run.
I am building a site that requires a lot of MySQL inserts and lookups from different tables in a (hopefully) secure part of the site. I want to use an abstraction layer for the whole process. Should I use a PHP framework (like Zend or CakePHP) for this, or just use a simple library (like Crystal or Doctrine)?
I would also like to make sure that the DB inserts are done in a relatively secure part of the site (though not SSL). Currently, I am using the method outlined here (MD5 encryption and random challenge string), but maybe some of the frameworks come with similar functionality that would simplify the process?
What I'm trying to implement: a table of forms filled out with DB values. If you change a value or add a new row, pressing "save" will update or insert DB rows. I'm sure this has been done before, so I wouldn't want to reinvent the wheel.
Most PHP backends have secure access to a private database. Normally, there's little difficulty to keeping the database secure, mostly by not making it reachable directly. That way the security of access depends on the inability for anyone to tamper with the PHP code, and not any software security scheme.
I would recomend Symfony Framework for this. There is a great online tutorial on this at Practical Symfony.The Framework's Form class handles most of the security for you. It also has a nice login plugin to make the application secure.
Unless by Data Abstraction you mean an implementation of a Data Access Patterns like ActiveRecord or Table Data Gateway or something ORMish (in both cases you should update your question accordingly then), you don't need a framework, because PHP has a DB abstraction layer with PDO.
It sounds like you are really asking two different questions. One being should I use a framework (Zend, Symfony, Cake, etc) for the development of a website? The other being whether or not to use something along the lines of an ORM (Doctrine, Propel, etc)?
The answer to the first one is a resounding "yes". Frameworks are designed to keep you from having to reinvent the wheel for common/basic functionality. The time you spend learning how to (correctly) use a framework will payoff greatly in the long run. You'll eventually be much more productive that "rolling your own". Not to mention you'll gain a community of people who have likely been through similar situations and overcome issues similar to what you will face (that in and of itself could be the best reason to use a framework). I'm not going to suggest a particular framework since they all have strengths and weaknesses and is another topic in and of itself (however, I do use and prefer Zend Framework but don't let that influence your decision).
Concerning whether or not to use an ORM is a slightly more difficult question. I've recently began to work with them more and in general I would recommend them but it all boils down to using the right tool for the right job. They solve some specific problems very well, others not so much. However, since you specifically mention security I'll quickly address that. I don't think that a ORM inherently "increases security", however it can force you into making better decisions. That said, bad coding and bad coding practices will result in security issues no matter what technology/framework you are using.
Hope that helps!
I know there already are a lot of posts floating on the web regarding this topic.
However, many people tend to focus on different things when talking about it. My main goal is to create a scalable web application that is easy to maintain. Speed to develop and maintain is far more appreciated BY ME than raw performance (or i could have used Java instead).
This is because i have noticed that when a project grows in code size, you must have maintainable code. When I first wrote my application in the procedural way, and without any framework it became a nightmare only after 1 month. I was totally lost in the jungle of spaghetti code lines. I didn't have any structure at all, even though i fought so badly to implement one.
Then I realized that I have to have structure and code the right way. I started to use CodeIgniter. That really gave me structure and maintainable code. A lot of users say that frameworks are slowing things down, but I think they missed the picture. The code must be maintainable and easy to understand.
Framework + OOP + MVC made my web application so structured so that adding features was not a problem anymore.
When i create a model, I tend to think that it is representing a data object. Maybe a form or even a table/database. So I thought about ORM (doctrine). Maybe this would be yet another great implementation into my web application giving it more structure so I could focus on the features and not repeating myself.
However, I have never used any ORM before and I have only learned the basics of it, why it's good to use and so on.
So now Im asking all of you guys that just like me are striving for maintainable code and know how important that is, is ORM (doctrine) a must have for maintainable code just like framework+mvc+oop?
I want more life experience advices than "raw sql is faster" advices, cause if i would only care about raw performance, i should have dropped framework+mvc+oop in the first place and kept living in a coding nightmare.
It feels like it fits so good into a MVC framework where the models are the tables.
Right now i've got like 150 sql queries in one file doing easy things like getting a entry by id, getting entry by name, getting entry by email, getting entry by X and so on. i thought that ORM could reduce these lines, or else im pretty sure that this will grow to 1000 sql lines in the future. And if i change in one column, i have to change all of them! what a nightmare again just thinking about it. And maybe this could also give me nice models that fits to the MVC pattern.
Is ORM the right way to go for structure and maintainable code?
Ajsie,
My vote is for an ORM. I use NHibernate. It's not perfect and there is a sizable learning curve. But the code is much more maintainable, much more OOP. Its almost impossible to create an application using OOP without an ORM unless you like a lot of duplicate code. It will definitely eliminate probably the vast majority of your SQL code.
And here's the other thing. If you're are going to build an OOP system, you'll end up writing your own O/R Mapper anyway. You'll need to call dynamic SQL or stored procs, get the data as a reader or dataset, convert that to an object, wire up relationships to other objects, turn object modifications into sql inserts/updates, etc. What you write will be slower and more buggy than NHibernate or something that's been in the market for a long while.
Your only other choice really is to build a very data centric, procedural application. Yes it may perform faster in some areas. I agree that performance IS important. But what matters is that its FAST ENOUGH. If you save a few milliseconds here and there doing procedural code, your users will not notice the performance increase. But you 'll notice the crappy code.
The biggest performance bottle-necks in an ORM are in the right way to pre-fetch and lazy-load objects. This gets into the n-query problems with ORMs. However, these are easily solved. You just have to performance tune your object queries and limit the number of calls to the database, tell it when to use joins, etc. NHibernate also supports a rich caching mechanism so you don't hit the database at all at times.
I also disagree with those that say performance is about users and maintenance is about coders. If your code is not easily maintained, it will be buggy and slow to add features. Your users will care about that.
I wont say every application should have an ORM, but I think most will benefit. Also don't be afraid to use native SQL or stored procedures with an ORM every now and then where necessary. If you have to do batch updates to millions of records or write a very complex report (hopefully against a separate, denormalized reporting database) then straight SQL is the way to go. Use ORMs for the OOP, transactional, business logic and C.R.U.D. stuff, and use SQL for the exceptions and edge cases.
I'd recommend reading Jeffrey Palermo's stuff on NHibernate and Onion Architecture. Also, take his agile boot camp or other classes to learn O/R Mapping, NHibernate and OOP. Thats what we use: NHibernate, MVC, TDD, Dependency Injection.
A lot of users say that frameworks are
slowing things down, but I think they
missed the big picture. The code MUST
BE MAINTAINABLE and EASY TO
UNDERSTAND.
A well-structured, highly-maintainable system is worthless if its performance is Teh Suck!
Maintability is something which benefits the coders who construct an application. Raw performance benefits the real people who use the app for their work (or whatever). So, whose concerns ought to be paramount: those who build the system or those who pay for it?
I know it's not as simple as that, because the customer will eventually pay for a poorly structured system - perhaps more bugs, certainly more time to fix them, more time to implement enhancements to the application. As is usually the case, everything is a trade-off.
I've started developing like you, without orm tools.
Then i worked for companies where software development was more industrialized, and they all use some kind of orm mapping tool (with more or less features). The development is far easier, faster, produce more maintainable code, etc.
But i've also seen the drawbacks of these tools : very slow performance. But it was mostly misuses of the tool (hibernate in that case).
Orm tool are very complex tool, so it is easy to misuse them, but if you have experience with them, you should be able to get nearly the same performances as with raw sql. I would have three advices for you :
If performance is not critical, use an orm tool (choose a good one, i am not developing with php, so i can't give you a name)
Be sure for each feature you add, to check the sql that the orm tool produce and send to the database (thanks to a logging facility for example). Think if it is the way you would have written your queries. Most of the inefficiencies of orm tools come from unwanted data that are gathered from the db, unique request split in multiple ones, etc. Slowness rarely comes from the tool in itself
Do not use the tool for everything. Choose wisely when not to use it (you reduce maintainability each time you do raw db access), but sometimes, it isn't just worst trying to make the orm tool do something it was not developed for.
Edit:
Orm tool are most useful with very complex model : many relationships between entities. Which is most of the time encountered in configuration part of the application, or in complex business part of the application.
So it is less useful if you have only few entities, and if there is less chance they get changed (refactored).
The limit between few entities and many is not clear. I would say more that 50 differents Types (sql tables, without join tables) is many, and less than 10 is few.
I don't know what was used to build stackoverflow but it must have been very carefully performance tested before.
If you want to build a web site that will get such a heavy load, and if you don't have experience with that, try to get someone in your team that have already worked on such sites (performance testing with a real set of data and a representative number of concurrent users is not an easy and fast task to implement). Having someone that have experience with it will greatly speed up the process.
Its very important to have a maintainabilty that is high. Ive developed large scaled web application with lowlevel super high preformance. The big disadvantage was maintaining the system, that is, developing new features. If you'r to slow developing the customers will look for other systems/applications.. Its a trade of. Most of the orms has features if you need to do optmized queries direct to sql. The orm itself isnt the bottleneck. Ill say its more about a good db design.
I think you missed the picture. Performance is everyday for your users, they care not at all about maintainability. You are being ethnocentric, you are concerned only for your personal concerns and not those of the the people who pay for the system. It isn't all about your convenience.
Perhaps you should sit down with the users and watch them use your system for day or two. Then you should sit down at a PC that is the same power as the ones they use (not a dev machine) and spend an entire week doing nothing but using your system all day long. Then you might understand their point.