I am building an application and while building my database layer, i have the question should we create a column where i can store multiple items as JSON, or create columns for each item i want to store?
When should we store JSON in the database and when should we not?
What are the advantages and disadvantages of storing JSON in the database?
so an example, for all the columns that don’t need to be searched or indexed, would it be more optimal to store their values in a JSON array, or should I stick to creating normalized data tables where I have multiple columns?
On a relational database, you should always try to normalize. From this perspective seen, it is simple wrong to store a json string.
As long as you don't want to manipulate or query the content of the JSON, I see no performance impact. This means you only set and get by primary key.
If you general want to store your objects, and it is not a special case, I would also suggest you to take a look at a NoSQL Database like MongoDB. It is for storing and receiving objects. Maybe it is something for you, maybe not.
That said, my rule of thumb is:
If you end up with a table with only one row, use XML/JSON (ConfigID|XML/JSON)
If you end up with multiple rows, do normalization
With your example : for all the columns that don’t need to be searched or indexed, you could absolutely use JSON data in MySQL.
Logically, it allows saving the storage memory rather than normalized
data tables. So you could use well the JSON storage in MySQL in this
case.
In addtion, I want to say you more detail for the possible storage types here: JSON or ARRAYS . For me, I use usually serialized arrays rather than JSON storage. ARRAYS, it's more performant for unserialized a object class / arrays but limited using just only for PHP (with function serialized && unserialized in PHP). So the choice for you :
Are you using your datas only with PHP ? If yes : arrays, if no :
JSON.
You can see more the discussion following : JSON vs. Serialized Array in database . Good luck !
Related
I am currently developing an application which requires the storage of a multidimensional array in a database. Currently I am taking the array which is 3 tiers deep and could be any size as it is populated with user-generated information, and using json_encode() to convert it into a string and enter it into the database.
The issue with this is that the single field is extremely large and difficult to read, to update the data I have to retrieve the data, decode it, update it, encode it and re-upload it. If the site did become used by many I am unaware of how this would scale.
I think the main alternative would be to create another table, with each row accessed by a unique id stored in the main table, but again I am unsure how well this would scale
Which is better? Any help much appreciated:)
If you are not sure how many tiers deep your array could be it's really hard to design the db schema.So best way to store is in a text field which you are doing already.
As for as scaling is concerned with proper indexing and partitioning you and scale you application however one field with large text value has nothing to do with scaling.
Creating multiple database tables and relationships between these tables to group your data together will probably be easier to maintain and will give you the ability to query/filter results based on columns rather than having to retrieve your data, decode it and iterate over it in PHP.
I think if you are not sure about how deep your array can be ,,,,than either you build a well constructed database that will be bit difficult so just do as you have planned. As I would suggest make use of functions as much as you can.
Function for encoding and function for decoding for json data (of any tiers) and use these functions and carry out your updates.
Using separate functions will obviously beautify your project and codes wont mess up.
I use to store data in mysql just like that "data1,data2,data3,data4"
Then i decided to store them as serialized using php.
But the main problem that i faced is now i cannot search an item in that fields.
I was using FIND_IN_SET but i know that it doesn't work for serialized data.
My question is that should i turn back storing data with commas ?
If you need it to be searchable, you need it expressed as a series of rows that can be queried. If you need it to be in a comma separated format for performance reasons, that's another consideration. Sometimes you have to do both of these things and make sure your application keeps the two different forms in sync.
It is very hard to match the performance of a properly indexed table for matching queries when using serialized data, and likewise, the speed of retrieval for serialized data versus having to join in the other results. It's a trade-off.
Iam currently unserializing the data retrieved from database then updating the value as per by requirement & then serializing the values & running update query.
Similarly for implementing search on the serialized columns, I have unserialize & compare the values and mine search is like google search. So you can imagine it will be serializing for each requests.
I want to optimize it. Some methods that can improve my current method.
Don't used a serialized column of a DB for storing those values, use a proper search engine instead. In my case, I always use Lucene, using Zend_Search_Lucene.
You can build a Lucene Index very easily, just by defining documents and the fields you wish to store of those documents, then searching will be very efficient. If you want some docs, here and here you have some tutorials on how to use it, altough the official docs pointer first I find them quite explicative.
Hope I can help!
Your approach sounds really unefficient. Have a look at solr/lucene for a real document search system.
If the data is not serialised in your database then you should be able to query specific values without the overhead of having to serialise and unserialise.
The best method for updating serialized columns in database is not to have serialized columns at all. Period.
Map your serialized data into related tables and update whatever value using SQL query
I see that people store arrays as like:
a:6:{i:0;s:5:"11148";i:1;s:5:"11149";i:2;s:5:"11150";i:3;s:5:"11153";i:4;s:5:"11152";i:5;s:5:"11160";}
why can't they just be:
11148,11149,11150,11153...
and have sql "Type" be "Array" ?
this way it's shorter and you can change the values directly in the databse without having to alter "s:" or "i:".
One thing you can't do with CSV notation (1,2,3,4) is represent multi-dimensional arrays.
Neither way is really appropriate though. The data should be normalized into separate related tables. If there's a real need to store serialized data in a database that can't be or doesn't need to be normalized, it should be stored as JSON, which is language independent and smaller.
there is no mysql type array, and the reason is so you can recreate the array easily, if your data was as you showed it not hard but what about a multidimensional array with non numeric keys. Of course this is NOT good db practice in the first place, breaking normilisation
Not all DB servers have an array type (mysql being one)
if you have a csv list, you have to explode it, or loop over it to recreate the array
multi dimensions aren't possible
With serialize and unserialize you are using native php functions that run faster than php based loop constructs etc. serialize and unserialize are the way to go if you absolutely positively have to store an array in the db (like for sessions). Otherwise you may want to reconsider your application design to not store php array in the database. It may cause problems down the road if you try to access the data with a language other than php.
It's because PHP has the serialize() function that takes a PHP array and turns it into a string like the one you quoted above, and then has another function, unserialize(), that takes the array string and converts it back into a PHP array.
This makes it very easy to turn an array into a string when it needs saving in a database and then turn it back into a proper PHP array after you select it from the database later.
See the PHP manual here:
http://php.net/serialize
and:
http://php.net/unserialize
I've not seen this a whole lot. But it's clearly done for implementation ease. Serializing data allows to store quasi binary data.
Your second example is a CSV scheme. This is workable for storing confined string lists. While it's easier to query or even modify within the database, it makes more effort for unmarshalling from/to the database API. Also there is really only limited list support anyway. http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_find-in-set
However, true, the serialized data is unneeded in your example. It's only a requirement if you need to store complex or nested array structures. And in such cases the data blob is seldomly accessed or queried within the database.
On the Facebook FQL pages it shows the FQL table structure, here is a screenshot below to show some of it (screenshot gone).
You will notice that some items are an array, such as meeting_sex, meeting_for current_location. I am just curious, do you think they are storing this as an array in mysql or just returning it as one, from this data it really makes me think it is stored as an array. IF you think it is, or if you have done similar, what is a good way to store these items as an array into 1 table field and then retrieve it as an array on a PHP page?
alt text http://img2.pict.com/3a/70/2a/2439254/0/screenshot2b187.png
The correct way to store an array in a database is by storing it as a table, where each element of the array is a row in the table.
Everything else is a hack, and will eventually make you regret your decision to try to avoid an extra table.
There are two options for storing as an array:
The first, which you mentioned, is to make one, or several, tables, and enumerate each possible key you intend to store. This is the best for searching and having data that makes sense.
However, for what you want to do, use serialize(). Note: DO NOT EVER EVER EVER try to search against this data in its native string form. It is much faster (and saner) to just reload it, call unserialize(), and then search for your criteria than to develop some crazy search pattern to do your bidding.
EDIT: If it were me, and this were something I was seriously developing for others to use (or even for myself to use, to be completely honest), I would probably create a second lookup table to store all the keys as columns; Heck, if you did that, mysql_fetch_assoc() could give you the array you wanted just by running a quick second query (or you could extract them out via a JOINed query). However, if this is just quick-and-dirty to get whatever job done, then a serialized array may be for you. Unless you really, really don't care about ever searching that data, the proper column-to-key relationship is, I think most would agree, superior.
I guarantee you that Facebook is not storing that data in arrays inside their database.
The thing you have to realize about FQL is that you are not querying Facebook's main data servers directly. FQL is a shell, designed to provide you access to basic social data without letting you run crazy queries on real servers that have performance requirements. Arbitrary user-created queries on the main database would be functional suicide.
FQL provides a well-designed data return structure that is convenient for the type of data that you are querying, so as such, any piece of data that can have multiple associations (such as "meeting_for") gets packaged into an array before it gets returned as an API result.
As other posters have mentioned, the only way to store a programming language structure (such as an array or an object) inside a database (which has no concept of these things), is to serialize it. Serializing is expensive, and as soon as you serialize something, you effectively make it unusable for indexing and searching. Being a social network, Facebook needs to index and search almost everything, so this data would never exist in array form inside their main schemas.
Usually the only time you ever want to store serialized data inside a database is if it's temporary, such as session data, or where you have a valid performance requirement to do so. Otherwise, your data quickly becomes useless.
Split it out into other tables. You can serialize it but that will guarantee that you will want to query against that data later. Save yourself the frustration later and just split it out now.
you can serialize the array, insert it, and then unserialize it when you retrieve it.
They might be using multiple tables with many-to-many relationships, but use joins and MySql's GROUP_CONCAT function to return the values as an array for those columns in one query.