How to cross reference data in MongoDB - php

I am trying to make my first application using PHP and MongoDB.
What I am trying to make is a listing of Fruit trees in a given area, and Also I want to have a list of fruit trees that could be grown in an area.
I come from MySQL background, so if in sql, I would have a trees table that had information about trees. Than I would have a table of actual trees, which would be left joined on a IDs with the trees table,a nd also have location and other information.
However in Mongo, I am not sure how this is done, or HOW it should be done.
Baiscally want I want is a list of trees, and than users can make a reference about their own tree.
Any help or direction on how this would be great.
$db->trees->fruittrees
$db->tress->userstrees
Chris

There are quite a few things missing from your question however:
What I am trying to make is a listing of Fruit trees in a given area, and Also I want to have a list of fruit trees that could be grown in an area.
Smells like a geospatial query: http://docs.mongodb.org/manual/core/geospatial-indexes/ .
So the user would have a location:
{
name: 'sammaye',
location: [107,206]
}
And each growing tree could take advantage of an array of areas:
{
name: 'apple tree'
locations_of_growth: [[74,59],[67,-45]]
}
And you would simply do a $near query on the user comparable to the distance of the Earth ( http://docs.mongodb.org/manual/core/geospatial-indexes/#distance-calculation ) to calculate what trees exist near that user.
The thing about geospatial queries is that they can be constrained with further information as well, so say you want to allow the user to compare the leaf colour of their apple trees in the area to find out how many apple trees in the area have green leafs you can simply add that as a conditiojn within the tree document and add it to your $near.
There is one downside to a geospatial query and it is that you can only have one index per collection ( http://docs.mongodb.org/manual/core/geospatial-indexes/#create-a-geospatial-index ) so this means that if you wish to also provide a list of trees that can be planted in that area then you might need two collections, one called other_trees and one called growable_species. When you want to allow the user to compare their tree to others in the area you query other_trees and when you wish to allow the user to view what trees can be grown in the area you can query growable_trees.
This is of course just one way of doing this.
Edit
I would not "recommend" using Doctrine, it is upto you and it depends on your scenario. Doctrine is a very heavy and surprisingly slow ORM and there are a lot of faster and more Lightweight ORMs for PHP out there: http://docs.mongodb.org/ecosystem/drivers/php-libraries/ if you want to a use an ORM that is, maybe you have a full stack framework or you want to go it alone with the driver.
Also MongoDB does not require JSON PHP (what-ever that is) handling knowledge. All results from MongoDB come into the driver as BSON and are then de-serialzed to a standard dict within PHP, namely an associative array. You do not require JSON PHP handling knowledge here.
Also unlike other answers, be very wary about "store all that information in a single JSON document", namely because MongoDB is BSON not JSON but also because embedding should be thought about very carefully and should be judged upon whether it fits your scenario or not. In fact you will find that embedding anything more than _ids in most cases to other rows can cause problems, but as I said it is scenario dependant.

MongoDB is a document-oriented database. This means you should design your model completely different as you are used in MySQL or another relational database. For example, if you want to store users information (including the trees that they have) you should store all that information in a single JSON document. For more reference please read this
I also recommend using Doctrine which lets you store objects into your MongoDB, so it will be far easier to model your database, you will only need to create an Object Oriented Model that represents your objects. Please refer to this page
For this you will need knowledge in JSON php handling and object oriented programming. One thing that always helps in the beginning of the transition from using MySql to a No-Sql database is to model your database and then denormalize the tables, that way you will know all the information you need to put in your objects/documents.
Hope this helps you

I've found quite often when modeling data in Mongo it depends on how you will use the data in the UI. Mongo isn't good at doing joins (as you know), but you can still do them. You can do them in a middle tier or in the UI.
So for your example, I would probably do the following:
A tree in the tree collection:
{
"_id" : 123,
"name" : "Orange",
"price" : 10,
"climate" : "Any"
}
User:
{
"user" : "Bob",
"selected_trees" : [ 123, 124, 156 ]
}
Then in your UI, you load Bob and make one call to the available tree collection to get the details of the trees for him.
Also a common practice is to actually include the full tree object with the user. You will recoil in disgust at this idea coming from a relational background, but it's very common with Mongo. You have copies of the same data all over. That is normal and expected.

Related

How to implement a server-side querying solution for mongo db references

I am relatively new to Mongo DB, but I am finding that is merges nicely with a project that I am working on. I am currently stuck at a problem however that I am really struggling to resolve...
This is specifically related to mongo db's "manual" references, documented here: http://docs.mongodb.org/manual/reference/database-references/#document-references
The project that I am working on sees every single document as a re-usable object instance, meaning that it can be embedded within another document, and because I am using manual references along with the client-side to resolve the references, it works really well. The issue arises when I want to be able to find objects, based on the value of one of the child objects.
A likely scenario:
we have an Orders collection which stores generated shop orders. An order object has a property named "products" which when viewed in mongo db is an array of references to product objects.
we also have a Products collection which stores products that can be used in orders.
say we want to be able to find all orders that contain the product "foo-bar", and bearing in mind that the path order.products is an array of references (not the embedded objects), what would be the most efficient way to do this? The most ideal solution would be the ability to simply use order.products.name : 'foo-bar'
A few additional notes:
fetching all order objects from the database and having the client-side resolve the objects to filter out the ones that we're looking for is far too inefficient.
embedding the product documents inside of the order documents is not an option, as it is essential to be able to modify the order and product documents independently of each other.
I am accessing mongo db using a PHP framework (and the official mongo db php extension)
a server-side solution would be ideal
I have briefly looked into the ability to write custom functions on mongo's server-side, but I can't quite tell if that would be a potential way to go?
Seems that you want to use joins (a term from SQL). Mongodb has no support for joins or alternative techniques.
The simplest thing that can work here is two-step query (pseudo-code)
product_ids = db.products.find(name: 'foo-bar').only('id')
orders = db.orders.find(product_id: {$in: product_ids})
This way you don't download a bunch of product objects onto the client, only their ids. It works quite well for me in my apps.
But this task is, of course, much better handled by a real relational DB.

Right mysql table design/relations in this scenario

I have this situation where i need suggestions on database tables design.
BACKGROUND
I am developing an application in PHP ( cakephp to be precise ). where we upload an xml file, it parses the file and save data in databases. These XML could be files or url feeds and these are purchased from various suppliers for data. It is intended to collect various venues data from source urls , venues can be anything like hotels , cinemas , schools , restaurants etc.
Problem
Initial table structure for these venues is as below . table is deigned to store generic information initially.
id
Address
Postcode
Lat
Long
SourceURL
Source
Type
Phone
Email
Website
With the more data coming from different sources , I realized that there are many attributes for different types of venues.
For example
a hotel can have some attributes like
price_for_one_day, types_of_accommodation, Number_of_rooms etc
where as schools will not have them but have different set of attributes.Restaurant will have some other attributes.
My first idea is to create two tables called vanue_attribute_names , Venue_attributes
##table venue_attribute_names
_____________________________
id
name
##table venue_attributes
________________________
id
venue_id
venue_attribute_name_id
value
So if I detect any new attribute I want to create one and the its value in attributes table with a relation. But I doubt this is not the correct approach. I believe there could be any other approach for this?. Besides if table grows huge there could be performance issues because of increase in joins and also sql queries
Is creating widest possible table with all possible attributes as columns is right approach? Please let me know. If there any links where I could refer I can follow it . Thanks
This is a surprisingly common problem.
The design you describe is commonly known as "Entity/Attribute/Value" or EAV. It has the benefit of allowing you to store all kinds of data without knowing in advance what the schema for that data is. It has the drawback of being hard to query - imagine finding all hotels in a given location, where the daily roomrate is between $100 and $150, whose name starts with "Waldorf". Writing queries against all the attributes and applying boolean logic quickly becomes harder than you'd want it to be. You also can't easily apply database-level consistency checks like "hotel_name must not be null", or "daily_room_rate must be a number".
If neither of those concerns worry you, maybe your design works.
The second option is to store the "common" fields in a traditional relational structure, but to store the variant data in some kind of document - MySQL supports XML, for instance. That allows you to define an XML schema, and query using XPath etc.
This approach gives you better data integrity than EAV, because you can apply schema constraints. It does mean that you have to create a schema for each type of data you're dealing with. That might be okay for you - I'm guessing that the business doesn't add dozens of new venue types every week.
Performance with XML querying can be tricky, and general tooling and the development approach will make it harder to build than "just SQL".
The final option if you want to stick with a relational database is to simply bite the bullet and use "pure" SQL. You can create a "master" table with the common attributes, and a "restaurant" table with the restaurant-specific attributes, a "hotel" table with the hotel attributes. This works as long as you have a manageable number of venue types, and they don't crop up unpredictably.
Finally, you could look at NoSQL options.
If you are sticking with a relational data base, that's it. The options you listed are pretty much what they can give you.
For your situation MongoDB (or an other document oriented NoSql system) could be a good option. This db systems are very good if your have a lot of records with different atributes.

How to store & deploy chunks of relational data?

I have a Postgres DB containing some configuration data spread over several tables.
This configurations need to be tested before they get deployed to the production system.
Now I'm looking for a way to
store single configuration objects with their child entities in SVN, and
to deploy this objects with child entities to different target DB's
The point is that the relations between the objects needs to be somehow maintained without the actual id's which would cause conflicts when copying the data to another DB.
For example, if the database would contain data about music artists, albums and tracks with a simple tree table schema like artist -> has albums -> has tracks, then the solution I'm looking for would allow to export e.g. one selected album with all tracks (or one artist with all albums with all tracks) into one file which could be stored to SVN and later be 'deployed' to whatever DB which has the same schema.
I was thinking of implementing something myself, e.g. to have config file describing dependencies, and an export script which replaces id's with PHP variables and generates some kind of PHP-SQL INSERT or UPDATE script.
But then I thought it would be really silly not to ask before to double check if something like this already exists :o)
This is one of the arguments for Natural Keys. An album has an artist and is made up of tracks. No "id" necessary to link these pieces of information together, just use the names. Perl-esque example of a data file:
"Bob Artist" => {
"First Album" => ["My Best Song", "A Slow Song",],
"Comeback Album" => ["One-Hit Wonder", "ORM Blues",],
}, "Noname Singer" => {
"Parse This Record!" => ["Song Named 'D'",],
}
To add the data, just walk the tree creating INSERT statements based on each level of parent data and if you must have one, use "RETURNING id" (PostgreSQL extension) at the end of each INSERT statement to get the auto-generated ids to pass to the next level down in the tree.
I second Matthew's suggestion. As a refinement of that concept, you may want to create "derived natural keys", for example "bob_artist" for "Bob Artist". The derived natural key would be well suited as a filename when storing the record into svn, for example.
The derived natural key should be generated such that any two different natural keys result in different derived natural keys. That way conflicts can't happen between independent datasets.
The concept of Rails migrations seems relevant although it aims mainly on performing schema updates: http://guides.rubyonrails.org/migrations.html
The idea has been transferred into PHP with the name Ruckusing, but seem to support only mySQL at this point: http://code.google.com/p/ruckusing/wiki/BigPictureOverview
Doctrine also provides migrations functionality but seems again to focus on schema transformations rather than on migrating or deploying data: http://www.doctrine-project.org/projects/migrations/2.0/docs/en
Possibly Ruckusing or Doctrine could be used (abused?) or if needed modified / extended to do the job?

Sitewide multi object search - database design / code strategy?

I am lost on how to best approach the site search component. I have a user content site similar to yelp. People can search for local places, local events, local photos, members, etc. So if i enter "Tom" in the search box I expect the search to return results from all user objects that match with Tom. Now the word Tom can be anywhere, like a restaurant name or in the description of the restaurant or in the review, or in someone's comment, etc.
So if i design this purely using normalized sql I will need to join about 15 object tables to scan all the different user objects + scan multiple colunms in each table to search all the fields/colunms. Now I dont know if this is how it is done normally or is there a better way? I have seen stuff like Solr/Apache/Elasticsearch but I am not sure how these fit in to myusecase and even if i use these I assume i still need to scan all the 15 tables + 30-40 colunms correct? My platform is php/mysql. Also any coding / component architecture / DB design practice to follow for this? A friend said i should combine all objects into 1 table but that wont work as you cant combine photos, videos, comments, pages, profiles, etc into 1 table so I am lost on how to implement this.
Probably your friend meant combining all the searchable fields into one table.
The basic idea would be to create a table that acts as the index. One column is indexable and stores words, whereas the other column contains a list of references to objects that contain that word in one of those fields (for example, an object may be a picture, and its searchable fields might be title and comments).
The list of references can be stored in many ways, so you could for example have string of variable length, say a BLOB, and in it store a JSON-encoded array of the ids & types of objects, so that you could easily find them afterwards by doing a search for that id in the table corresponding to the type of object).
Of course, on any addition / removal / modification of indexable data, you should update your index accordingly (but you can use lazy update techniques that eventually update the index in the background - that is because most people expect indexes to be accurate within maybe a few minutes to the current state of the data. One implementation of such an index is Apache Cassandra, but I wouldn't use it for small-scale projects, where you don't need distributed databases and such).

PHP/MySQL database design for various/variable content - modular system

I'm trying to build (right now just thinking/planning/drawing relations :] ) little modular system to build basic websites (mostly to simplify common tasks we as webdesigners do routinely).
I got little stuck with database design / whole idea of storing content.
1., What is mostly painful on most of websites (from my experience), are pages with quasi same layout/skelet, with different information - e.g. Title, picture, and set of information - but, making special templates / special modules in cms happens to cost more energy than edit it as a text - however, here we lose some operational potential - we can't get "only titles", because, CMS/system understands whole content as one textfield
So, I would like to this two tables - one to hold information what structure the content has (e.g. just variable amount of photos <1;500) :], title & text & photo (large) & gallery) - HOW - and another table with all contents, modules and parts of "collections" (my working name for various structured information) - WHAT
table module_descriptors (HOW)
id int
structure - *???*
table modules (WHAT)
id int
module_type - #link to module_descriptors id
content - *???*
2., What I like about this is - I don't need many tables - I don't like databases with 6810 tables, one for each module, for it's description, for misc. number to text relations, ... and I also don't like tables with 60 columns, like content_us, content_it, category_id, parent_id.
I'm thinking I could hold the structure description and content itself (noted the ??? ?) as either XML or CSV, but maybe I'm trying to reinvent the wheel and answer to this is hidden in some design pattern I haven't looked into.
Hope I make any sense at all and would get some replies - give me your opinion, pros, cons... or send me to hell. Thank you
EDIT: My question is also this: Does this approach make sense? Is it edit-friendly? Isn't there something better? Is it moral? Don't do kittens die when I do this? Isn't it too much for server, If I want to read&compare 30 XMLs pulled from DB (e.g. I want to compare something)? The technical part - how to do it - is just one part of question:)
The design pattern you're hinting at is called Serialized LOB. You can store some data in the conventional way (as columns) for attributes that are the same for every entry. For attributes that are variable, format them as XML or MarkDown or whatever you want, and store it in a TEXT BLOB.
Of course you lose the ability to use SQL expressions to query individual elements within the BLOB. Anything you need to use in searching or sorting should be in conventional columns.
Re comment: If your text blob is in XML format, you could search it with XML functions supported by MySQL 5.1 and later. But this cannot benefit from an index, so it's going to result in very slow searches.
The same is true if you try to use LIKE or RLIKE with wildcards. Without using an index, searches will result in full table-scans.
You could also try to use a MySQL FULLTEXT index, but this isn't a good solution for searching XML data, because it won't be able to tell the difference between text content and XML tag names and XML attributes.
So just use conventional columns for any fields you want to search or sort by. You'll be happier that way.
Re question: If your documents really require variable structure, you have few choices. When used properly, SQL assumes that every row has the same structure (that is, columns). Your alternatives are:
Single Table Inheritance or Concrete Table Inheritance or Class Table Inheritance
Serialized LOB
Non-relational databases
Some people resort to an antipattern called Entity-Attribute-Value (EAV) to store variable attributes, but honestly, don't go there. For a story about how bad this can go wrong, read this article: Bad CaRMa.

Categories