I have two servers:
Server A: MySql
Table A
key-a
foreign-key-b
Server B: MsSql
Table B
key-b
foreign-key-a
Presumably I have two objects with methods that handle the relationships:
TableA->getRelatedTableB();
TableB->getRelatedTableA();
This is easy to implement in most ORMs. But what if I want to get a large set of objects with only one query per database server? Ideally the framework would abstract this and do the logical join so that the developer can pretend he doesn't know anything about the database(s). Something like:
FinderObject->getAlotOfTableAObjectsWithTableBAlreadyLoaded()
and it would perform a query on each database and logically join the results in some efficient manner.
Does anyone know of a way to implement this in Doctrine or some other php ORM framework?
Doctrine doesn't explicitly support cross-database joins, but there is a way to do it:
http://www.doctrine-project.org/blog/cross-database-joins
One solution for this is to use federated tables, but I've read that this hasn't a good performance. All depends in how you need to use it.
I don't know of any that do... BUT maybe you could use Propel, Memcached, and MySQL together.
Setup a distributed memory cache using Memcached, and see if there's a way to store some MySQL data in there. Because Memcached is distributed, both of your MySQL servers could store data there. Then you'd have to find a way to access that memory (via Memory Tables?).
Seems a very tricky situation.
Perhaps the problem is being approached from the wrong direction. Could you tell us what problem you're trying to solve? There might be a completely different (and simpler!) solution just around the corner.
Related
Is there currently any technology that would separate storage from business logic and allow me, to easily switch from MySQL do MongoDB? (I assume I migrate the data myself, or start with an empty database after the switch).
I would like the change to be as easy as changing the configuration, the driver, and the db connection data.
I understand it is possible for PHP with Doctrine, switching between different RDBMS, but I'm interested in switching from any RDBMS to a NoSQL system.
I am focusing on PHP now - but if you know any solutions for other programming languages - I will be happy to learn about them.
I am assuming not a complex database, no transactions, no complex relations.
More background/details
I am writing a simple crawler that will visit websites, and read some data and save it to the DB. It is super simple and I might go with pdo_mysql for PHP only. I am considering an extra layer only to cover the situation in case I want to switch from MySQL to MongoDB one day - and I asking if this is even possible.
Update
I think that Laravel with its Eloquent supports MySQL out of the box, and with an extra plugin: https://github.com/jenssegers/laravel-mongodb supports MongoDB - I will check if this is truly transparent from the programmer's perspective. Unless someone has experience and knows right away?
A typical approach here is the DAO or Repository design pattern. In this pattern you provide a library which is responsible for persistence and returns business objects. The objects do not have persistence logic and get stashed in the Repository or retrieved from the repository.
This being said, that only works for business objects. The problems will come up when you want to redo reporting or the like.... Here the differences between an RDBMS and a NoSQL solution will bite you hard.
To be sure I don't know what a use case is given that "what do you want to do with your data" is a massive concern in selecting between these.
I'm staring to build a system for working with native languages, tags and such data in Yii Framework.
I already choose MongoDB for storing my data as I think it feets nicelly and will get better performance with less costs (the database will have huge amounts of data).
My question regards user authentication, payments, etc... This are sensitive bits of information and areas where I think the data is relational.
So:
1. Would you use two different db systems? Should I need them or I'm I complicating this?
2. If you recommend the two db approach how would I achieve that in Yii?
Thanks for your time!
PS: I do not intend this question to be another endless discussion between the relational vs non-relational folks. Having said that I think that my data feets mongo but if you have something to say about that go ahead ;)
You might be interested in this presentation on OpenSky's infrastructure, where MongoDB is used alongside MySQL. Mongo was utilized mainly for CMS-type data where a flexible schema was useful, and they relied upon MySQL for transactions (e.g. customer orders, payments). If you end up using the Doctrine library, you'll find that the ORM (for SQL databases) and MongoDB ODM share a similar API, which should make the experimentation process easier.
I wouldn't shy away from using MongoDB to store user data, though, as that's often a record that can benefit from embedded document storage (e.g. storing multiple billing/shipping addresses within a single user document). If anything, Mongo should be flexible enough to enable you to develop your application without worrying about schema changes due to evolving product requirements. As those requirements become more clear, you'll be able to make a decision based on the app's performance needs and types of database queries you end up needing.
There is no harm in using multiple databases (if you really need), many big websites are using multiple databases so go a head and start your project.
Consider the creation of high traffic PHP web-site with many parallel users. Which is the best possible MySQL abstraction (ORM or OODBMS) in terms of effectiveness (15-20 database tables with sum of about 100000 items and JOIN queries between no more than 4 tables)?
Somewhere I heard that Doctrine libraries are appropriate or I should use framework like Zend? Which of these database solutions are build over PDO and don't require much learning (at this time I'm using pure PHP)?
Regardless of the DB solution you should look at using a system like MemCached. With the proper caching strategy you will significantly reduce the load your databases are putting on your server.
There is a PHP API for memcached here
ORM or any data modeling layer will never get you better performance. Their sole purposes is to make your development time faster and easier to maintain. They are notoriously bad at decision making when it comes to actually using relationships appropriately and end up querying all tables in order to find the correct data. At that level of complex queries you are not going to be able to abstract away these relationships without sacrificing performance.
MySQL is fine for up to a couple million records at least (I've used it for over 100 million in a single table). For performance sake you generally want to have at least a master/slave setup and some method of distributing reads between them. The database will almost always be the limiting factor in performance. You can always add in more web servers and get a load balance in front of them to solve the other side of things but the database setup is always a little harder to maintain.
You have to think about why you want to use an ORM. If its for development reasons, that's fine, but be coginiscent that your performance will suffer. Otherwise stick to queries. An ORM adds a third layer of code to deal with and learn. If you know PHP and MySQL, do you need to learn a 3rd language to use them effectively? Most often the answer is no.
You have many options to choose from but be aware that at some point the framework/ORM you choose will not behave the way you want it to and to get it to behave to your desires you will have to do a lot of searching and digging through code. It's the classic problem - save time up front and pay for it later or spend time up front with no possible payoff later.
ORM solutions will be able to optimize some aspects, if you cache query data and use the object API in a planned and deliberate way.
Column / document[nosql : hbase,mongo] databases will improve performance if you have lots (millions+) of records, and are still growing.
Memcached will help if you have a lot of spare memory and especially if there are a lot of repetitious queries being run.
I am thinking about writing a quick chat application for a client to help them solve some of their communication needs. Clearly, writing a simple chat is no brainer, but the company have serious scaling needs, so it is probably a good idea to build the service on a noSQL storage from the beginning.
Besides the obvious lack of transactions, which isn't one of our concerns, is it a good idea to use a noSQL storage for a chat?
MongoDB should be good enough if you're after scalability and performance. Most SQL engines would be overkill for this stuff. I doubt if you need complex data aggregation and other queries for chat data. Even with that, MongoDB has map-reduce capability to help you along.
NoSQL ist used if you have no fixed data model, this applies to document oriented applications where you have to store objects and documents where each one may have a different structure.
I don't think this is the case in your situation, since a chat log has a well defined fixed data model for example (user, time, text). I think a traditional SQL database may be the right fit for you. If used on client side only, SQLite will be the best fit, since there is no need to install or configure, simply redistribute the SQLite dll. Also the footprint is very small.
I would say no. SQLite is included in PHP... why not just use that? Or better still, why not use one of the hundreds of chat applications that already exist, and save yourself a whole load of development time.
The Problem: Object models built using an ORM often need to perform multiple queries to perform a single action. For example a "get" action may pull information from multiple tables, particularly when you have a nested object structure. On complicated requests these queries can add up and your database will start blocking long before it would if you were manually writing SQL.
The Question: Where do you load balance the ORM to cut down on the number of queries that need to be made, and more importantly why did you choose this approach? Do you have separate models to load data dependent on context, or do you specify which data should load in the controller? Or something else?
ORM is really there for a good reason -- to speed up your development.
If performance becomes an issue for me, I'd rather implement some caching mechanisms instead of taking a step back and hard-coding SQL.
I recommend using the Domain Model pattern, to provide an interface to data data in an OO-friendly way. As part of the implementation of persistence within your Domain Model classes, it's appropriate to use a mix of ORM and SQL.
For instance, you'll have some simple queries against a single table. Use a convenient ActiveRecord pattern for this. But as you describe, you'll also typically need some complex queries against multiple tables for more complex related data. ActiveRecord is a clumsy solution in this case, so use plain SQL. It's the best tool when you need a complex query with relational operators like JOIN or GROUP BY.
#pestaa mentions caching which is another good tool. Here's another one you can consider: Identity Map. The point is that you should learn multiple tools, and think about which one is the best in any given situation.
Trying to use only one pattern for every situation is like driving your car everywhere in first gear.
A lot of it depends on the ORM, its philosophy and features. But assuming you've got a good set of model classes between your ORM and the rest of your application, you can do the following:
Provide methods in your models that provide the right amount of depth for most cases. If your ORM doesn't allow you to specify things efficiently, consider a different ORM (if you have that luxury)
Plan and implement caching. Since we're talking in a data-centric context, this means writing/leveraging data caching in your model.
Have a plan to split reads from writes. Either in your model or ORM configuration. Especially if reads are your bottleneck, using some replication to create a gang of read-only slaves can be immensely useful. However, if you don't plan for it, you can easily design yourself into a position where it's a pain.