I'm trying to develop a website in which many recipes are stored, and retrieved for the clients. I had some courses about XML and native XML-based databases, and those courses introduced the concept of native XML databases. Besides, if I remember correctly, we learned that XQuery is the most suitable programming language for working with XML. Because of the semi-structure and not so tabular nature of a recipe, I guess(please correct me if I'm wrong) that it can be best expressed in an XML file, like below :
<recipe>
<ingredients>
<name='floor' amount='500g'/>
<name='y' amount='200g'/>
</ingredients>
<steps>
<step id='1'> first prepare .....
<steps>
</recipe>
I know that relational databases have their advantages and glories over other options, however it would result in so many join operations on tables in this particular case. On the other hand, native XML databases don't seem very promising to me, regarding their performance and abilities to handle a large amount of data. Besides, programming in PHP is much more simpler than XQuery, considering the huge volume of tutorials and helps on internet.
I really don't know what to do, and that's why I came to you guys.
some simple determination theory without looking any strong requirements or something.
first where is your data-source gonna be.
if your data is being generated through a user input screens.
if your data is well validated and processed by a single application( e.g) web ).
if your data properties and features are pretty much freezed and no new dimension to it.
if your data is of transaction nature.
then you can think of relation - db.
if your data is coming from different datasources like flat file, xml, internet screen scraping , etc etc.
comparatively less amount transaction
data properties are fluid and can have various slice / dimension to it.
ready to work with functional languages like XQuery or Xmllized language like XSLT
then Xml Database is the key.
Use relations DB - because it is much more faster if you get bigger amout of records , and it is simplier to create.
( for your example it is 3 tables - one with recipes, another with ingredients and the last one with steps. Alternative is to create table with all known ingredients and use association - eg. table with ID of recipe, ID of ingredient and amount )
It seems that you're thinking that you have to choose one or the other here. That isn't the case, XQuery isn't really setup to be a complete web scripting environment, it's a replacement to SQL not PHP. Therefore you can certainly use PHP to do the web focused parts of the site such as user logins (which could also be in a relational DB) then use XQUery just for your recipe querying layer.
Some XML databases such as MarkLogic can also do all the web logic side of the equation but they don't offer the same richness of libraries yet, so I would certainly recommend PHP or something like that for the web tier.
Related
I am constructing a PHP framework from scratch (unfortunately I don't have any choice in this matter). The framework is required to rely heavily on object-oriented data, and therefore needs to have the ability to store large amounts of object-oriented data efficiently.
I am struggling with the second part.
I've been working on this for a few months. Initially I was introduced to the idea of an ORM, after trying a few pre-built libraries (Doctrine 2, Redbean etc) I liked the idea, but none of what I could find functioned the way that was required, so I set out to create my own ORM, of which turned out quite well. The only issue really is that it suffers in performance, and after spending some time trying to optimize it, I am now convinced that an ORM is not quite the solution to the problem. Although close, it just doesn't quite cut it.
I have briefly looked into other solutions, but due to my lack of experience in this area I am struggling to pin-point the solution.
Here are the requirements of the data storage engine:
Ultimately, it needs to be able to store key-value pairs
The "value" part can be a simple data type, but can also be an object, or an array of the same type of object.
The application defines the structure of each object (or the SCHEMA), sort of in the same way that a .wsdl file works, so the engine would need to like strict formats.
Objects can either have their instances re-used, or not. Meaning that if an object exists as a child object in multiple locations (across many objects) its values are the same everywhere that it is located (if it re-used). Otherwise, a new instance of the object exists for every existing object (not re-used).
There needs to be the ability to query the data efficiently, to make comparisons on any part of an object to find it. For example: find a customer where customer.address.postcode LIKE ('%XXX%')
Any suggestions would be greatly appreciated
EDIT
Thanks to those that have attempted to aid me so far in my somewhat crazy endeavour. To answer some questions that have so far been asked:
What solutions have you tried, and why did they not work?
ORM systems
I had tried a small number of pre-built ORM libraries for PHP. Including Doctrine 2 and Redbean. With Doctrine it was more to do with how you specified the SCHEMA of a model, in that you are required to do so in docblocks. I found this particularly awkward to use due to the requirements that I had, particularly because I knew of a number of ways this could be avoided. I did eventually manage to get Doctrine to work the way that I wanted, but this was after hacking away at the code. Again, this was fun, but it wasn't right.
Redbean actively required me to change the property names of objects. One of my requirements was to basically be able to plug in any sort of document-oriented object, and store it. So having to specifically name properties in order to do this was counter-intuitive. Again, I did play with Redbean for a bit to get it to work, which wasn't right.
It was after playing with a few more ORM systems that I felt I had the knowledge to make my own. Again, the ORM system that I made was good, in that it met the requirements precisely. It was massively let-down due to poor performance, specifically when dealing with large sets of data, but more so when dealing with largely complex models.
Storing objects in XML files
There was a very small time that I considered this, thinking that maybe my requirements meant that I was always going to end up with performance being a problem. So I set out designing a way to generate text-based storage and ultimately ended up creating a whole SCHEMA engine and a bunch of other interesting things. This turned out to be just a fun project in the end, I just couldn't get it to perform at all.
NoSQL
My most recent endeavours have pushed me down the route of systems such as MongoDB and a few other NoSQL systems that I didn't much get into like Cassandra.
MongoDB comes very close to being a tool I could use, however it would require that I add an additional layer because I do in-fact require a SCHEMA, since my objects always conform to a specific structure. I am slowly coming to terms with MongoDB possibly being the solution, however I want to make sure before I spend more time on this.
What exactly do you mean by efficient?
I'm not 100% talking about performance when I mention efficiency, although performance is most certainly an important factor that I am using to consider my options, I understand that going down this route rather than something like a relational database, performance is naturally going to be a problem.
I am more talking about using the right tools. I never like to have to hack away at someone's code to get things to work. To me, it feels as if I am pushing things down a road that the system wasn't designed to go down, and at some point in the future it will bite me in the a**.
So really, when I mention I am looking for something "efficient", I'm meaning tools that match the requirements as closely as possible, so that I am only using/extending the functionality, rather than re-writing it.
Here are some routes to look into. Your requirement for storing "objects" (quite a broad term when it comes to databases) makes me think of:
Storing data in databases in a serialised format, e.g. JSON. PostgreSQL these days has ways to reach into such a column to do search operations on it, so it is not as non-searchable as has been previously regarded (though I would expect it to be slower than querying correctly normalised data).
The requirement to store customer.address.postcode makes me think that you could store your data as a hierarchy, in which case there are several algorithms available to you. Look into nested sets. This is designed to work well with relational databases, without resorting to recursive SQL.
It's not an area of my expertise, but graph databases may be worth looking into.
On a side note, Doctrine is a great library from what I hear, but I suspect you need to work out what technology to use first. It is designed broadly to map onto a relational database, so if you can't express your problem cleanly in a raw RDBMS, Doctrine may not help.
(This could be an XY question, it's hard to tell. You've said you need Y, but if you can tell us that you want to achieve X, maybe the feedback you're getting would be more concrete - and take you in a better direction).
I came across an interesting comment in php.net about serialize data in order to save it into the DB.
It says the following:
Please! please! please! DO NOT serialize data and place it into your
database. Serialize can be used that way, but that's missing the point
of a relational database and the datatypes inherent in your database
engine. Doing this makes data in your database non-portable, difficult
to read, and can complicate queries. If you want your application to
be portable to other languages, like let's say you find that you want
to use Java for some portion of your app that it makes sense to use
Java in, serialization will become a pain in the buttocks. You should
always be able to query and modify data in the database without using
a third party intermediary tool to manipulate data to be inserted.
I've encountered this too many times in my career, it makes for
difficult to maintain code, code with portability issues, and data
that is it more difficult to migrate to other RDMS systems, new
schema, etc. It also has the added disadvantage of making it messy to
search your database based on one of the fields that you've
serialized.
That's not to say serialize() is useless. It's not... A good place to
use it may be a cache file that contains the result of a data
intensive operation, for instance. There are tons of others... Just
don't abuse serialize because the next guy who comes along will have a
maintenance or migration nightmare.
I would like to know if this is a standard view about using serializing data for DB purposes. Meaning if it's a good practice to use it sometimes, or if it should be avoided.
For example, I was instructed to use serialize myself recently.
In this case the data we had to save into a MySQL table was the following:
Car brand.
Car model.
Car version.
Car info.
Car info was an array representing all the properties of a version, so it was a large variable amount of properties (under 100 properties). This array was the one to be serialized.
The main reason I was given in order to use serialize was the following:
Being a large number of fields, it is better to serialize the data in
order to improve performance instead of creating a field for each property
or multiple tables.
Personally I agree more with the commentary in php.net than with this last asseveration, but I would like to here more qualified opinions than mine about this.
Being a large number of fields, it is better to serialize the data in
order to improve performance instead of creating a field for each
property or multiple tables.
I would consider this highly dependent on the use case. What if there is a class Customer that wants to have infos about all cars that are running Diesel or any other specific data for the car (using fuel seems easiest). You would need to get all the cars from the database, unserialize it, check for the propery and keep the list with all cars relevant for the customer.
Example: We had to move some person-related data from an old customer CMS to a new one. Instead of having each attribute nicely mapped on the database, the whole information was a single string in the old database. So instead of using a proper database structure, we had to do lots of regex-foo to turn the data into a proper structure again. Of course, this was an expensive (both monetary and work-load) task. In this case, the problem was not that huge since the amount of data was managable. But imagine the same scenario with millions of rows and more than just a single string....
The comment you posted is only talking about data structures IMO. And I agree, storing these is not very good nor efficient. It will be much easier to have a typo somewhere or add a new property that other parts of the language are not aware of. This WILL leed to problems sooner or later.
On the other hand, storing some configs that are more easily ported might be an OK case for serializing data. You could argue that there external setting files are more ideal for such a case, but this will be highly dependent on the case/philosophy/customer/...
TL;DR
In most cases, using a proper schema will sooner or later benefit the whole development, speed wise and complexity wise (since I preferr reading many table descriptions instead of a huge, cryptic string). There might be some use-cases where serializing data is acceptable so giving a finite answer if this is good or bad practice is not that easy and highly dependent.
all experienced programmers.
I need advice on following.
What would be the best practice for the following problem
We have 2-3 apis of objects(apartments) (XML, JSON , SOAP , protocol doesn't matter now)
Each of them has several keypoints
a) Geographics - Each api has its own GeoDatabase with own names and IDs for the same cities and places
b) Each api has different ways of object attribute description ... like what a house has(swimming pool, wheelchair friendly, etc )
So what we need is to import those data , merge them locally and search ....
What would be architecturally the right way of this type of problem solving....
A very near example is hotel search engines , where they are searching the data from 10-20 different systems. ...
So we need a similar stuff but totally on another type of objects.
Your notes , comments and answers are really appreciated . Thanks a lot for participating.
This is a very generic question, so sadly the answer will be pretty generic too. I would approach this problem so:
Create wrappers for each of the various APIs, this will standardize the way in which they are internally invoked, making it easier to interact with them.
Convert all the results into a uniform format (if possible), at least those fields which will be searched, sorted, acted upon.
If you persist this information into database, then it would be important to make the structure in such a way that it is easily query-able.
E.g. Storing a whole JSON string that defines a house into a field called description is not desirable. So is having fields created for each attribute of the house like swimming pool (BOOLEAN yes/no). Instead have key-value fields like attribute-name and attribute-value which might have records like:
swimming pool: YES
No: of bedrooms: THREE
etc. You get the point. From my experience, as much you can have a unified data model into which you can mould the API value to be contained, the easier it will be to collate and compare them.
I'm staring to build a system for working with native languages, tags and such data in Yii Framework.
I already choose MongoDB for storing my data as I think it feets nicelly and will get better performance with less costs (the database will have huge amounts of data).
My question regards user authentication, payments, etc... This are sensitive bits of information and areas where I think the data is relational.
So:
1. Would you use two different db systems? Should I need them or I'm I complicating this?
2. If you recommend the two db approach how would I achieve that in Yii?
Thanks for your time!
PS: I do not intend this question to be another endless discussion between the relational vs non-relational folks. Having said that I think that my data feets mongo but if you have something to say about that go ahead ;)
You might be interested in this presentation on OpenSky's infrastructure, where MongoDB is used alongside MySQL. Mongo was utilized mainly for CMS-type data where a flexible schema was useful, and they relied upon MySQL for transactions (e.g. customer orders, payments). If you end up using the Doctrine library, you'll find that the ORM (for SQL databases) and MongoDB ODM share a similar API, which should make the experimentation process easier.
I wouldn't shy away from using MongoDB to store user data, though, as that's often a record that can benefit from embedded document storage (e.g. storing multiple billing/shipping addresses within a single user document). If anything, Mongo should be flexible enough to enable you to develop your application without worrying about schema changes due to evolving product requirements. As those requirements become more clear, you'll be able to make a decision based on the app's performance needs and types of database queries you end up needing.
There is no harm in using multiple databases (if you really need), many big websites are using multiple databases so go a head and start your project.
I should start by saying I'm not now, nor do I have any delusions I'll ever be a professional programmer so most of my skills have been learned from experience very much as a hobby.
I learned PHP as it seemed a good simple introduction in certain areas and it allowed me to design simple web applications.
When I learned about objects, classes etc the tutor's basic examnples covered the idea that as a rule of thumb each database table should have its own class. While that worked well for the photo gallery project we wrote, as it had very simple mysql queries, it's not working so well now my projects are getting more complex. If I require data from two separate tables which require a table join I've instead been ignoring the class altogether and handling it on a case by case basis, OR, even worse been combining some of the data into the class and the rest as a separate entity and doing two queries, which to me seems inefficient.
As an example, when viewing content on a forum I wrote, if you view a thread, I retrieve data from the threads table, the posts table and the user table. The queries from the user and posts table are retrieved via a join and not instantiated as an object, whereas the thread data is called using my Threads class.
So how do I get from my current state of affairs to something a little less 'stupid', for want of a better word. Right now I have a DB class that deals with connection and escaping values etc, a parent db query class that deals with the common queries and methods, and all of the other classes (Thread, Upload, Session, Photo and ones thats aren't used Post, User etc ) are children of that.
Do I make a big posts class that has the relevant extra attributes that I retrieve from the users (and potentially threads) table?
Do I have separate classes that populate each of their relevant attributes with a single query? If so how do I do that?
Because of the way my classes are written, based on what I was taught, my db update row method, or insert method both just take the attributes as an array and update all of that, if I have extra attributes from other db tables in each class then how do I rewrite those methods as obbiously updating automatically like that would result in errors?
In short I think my understanding is limited right now and I'd like some pointers when it comes to the fundamentals of how to write more complex classes.
Edit:
Thanks for the answers so far they've given me lots of pointers and thoughts and a lot of reading material. What I would like though is maybe an idea of how different people have decided to handle a simple table join with any amount of classes? Did you add attributes to the classes? Query from outside the class then pass the results into each class? Something else?
Entire books have been written about how to design a set of classes to fit a database schema.
Long story short: there is no one-size-fits-all way to do it, you have to make a lot of design decisions about the trade offs you want to make on an application-by-application basis.
You can find a library or framework to help, keywords: ActiveRecord, ORM (Object Relational Mapper)
P.S. You have no idea the potential for soul-killing analysis paralysis and over designing you can get into. Do the simplest thing that can possibly work for your app.
Code sample for my (below) comment:
$post = new PublishedPost($data);
$edit = $post->setTitle($newTitle);
$edit->save();
This is too broad to be answered without going into epic length.
Basically, there is four prominent Data Source Architectural Patterns from Patterns of Enterprise Architecture: Table Data Gateway, Row Data Gateway, Active Record and Data Mapper. These can be found implemented in the common php frameworks in some variation. These are easy to grasp and implement.
Where it gets difficult is when you start to tackle the impedance mismatch between the database and the business objects in your application. To do so, there are a number of Object-Relational Behavioral, Structural and Metadata Mapping Patterns, like Identity Maps, Lazy Loading, Query Objects, Repositories, etc. Explaining these is beyond scope. They cover almost 200 pages in PoEAA.
What you can look at is Doctrine or Propel - the two most well known PHP ORM - that implement most of these patterns and which you could use in your application to replace your current database access handling.
Many of your worries can be answered by inspecting the existing solutions found in well-tested frameworks such as CakePHP, symfony and Zend Framework. Examining their approaches and peeking under the hood should shed light on your questions. Who knows? You may even decide to write future projects using them!
They've spent years putting their heads together to tackle these problems. Take advantage!
Checkout Doctrine:
Here is an example of a forum application using Doctrine.
http://www.doctrine-project.org/documentation/manual/1_2/en/real-world-examples#forum-application