I use to store data in mysql just like that "data1,data2,data3,data4"
Then i decided to store them as serialized using php.
But the main problem that i faced is now i cannot search an item in that fields.
I was using FIND_IN_SET but i know that it doesn't work for serialized data.
My question is that should i turn back storing data with commas ?
If you need it to be searchable, you need it expressed as a series of rows that can be queried. If you need it to be in a comma separated format for performance reasons, that's another consideration. Sometimes you have to do both of these things and make sure your application keeps the two different forms in sync.
It is very hard to match the performance of a properly indexed table for matching queries when using serialized data, and likewise, the speed of retrieval for serialized data versus having to join in the other results. It's a trade-off.
Related
I am building an application and while building my database layer, i have the question should we create a column where i can store multiple items as JSON, or create columns for each item i want to store?
When should we store JSON in the database and when should we not?
What are the advantages and disadvantages of storing JSON in the database?
so an example, for all the columns that don’t need to be searched or indexed, would it be more optimal to store their values in a JSON array, or should I stick to creating normalized data tables where I have multiple columns?
On a relational database, you should always try to normalize. From this perspective seen, it is simple wrong to store a json string.
As long as you don't want to manipulate or query the content of the JSON, I see no performance impact. This means you only set and get by primary key.
If you general want to store your objects, and it is not a special case, I would also suggest you to take a look at a NoSQL Database like MongoDB. It is for storing and receiving objects. Maybe it is something for you, maybe not.
That said, my rule of thumb is:
If you end up with a table with only one row, use XML/JSON (ConfigID|XML/JSON)
If you end up with multiple rows, do normalization
With your example : for all the columns that don’t need to be searched or indexed, you could absolutely use JSON data in MySQL.
Logically, it allows saving the storage memory rather than normalized
data tables. So you could use well the JSON storage in MySQL in this
case.
In addtion, I want to say you more detail for the possible storage types here: JSON or ARRAYS . For me, I use usually serialized arrays rather than JSON storage. ARRAYS, it's more performant for unserialized a object class / arrays but limited using just only for PHP (with function serialized && unserialized in PHP). So the choice for you :
Are you using your datas only with PHP ? If yes : arrays, if no :
JSON.
You can see more the discussion following : JSON vs. Serialized Array in database . Good luck !
Iam currently unserializing the data retrieved from database then updating the value as per by requirement & then serializing the values & running update query.
Similarly for implementing search on the serialized columns, I have unserialize & compare the values and mine search is like google search. So you can imagine it will be serializing for each requests.
I want to optimize it. Some methods that can improve my current method.
Don't used a serialized column of a DB for storing those values, use a proper search engine instead. In my case, I always use Lucene, using Zend_Search_Lucene.
You can build a Lucene Index very easily, just by defining documents and the fields you wish to store of those documents, then searching will be very efficient. If you want some docs, here and here you have some tutorials on how to use it, altough the official docs pointer first I find them quite explicative.
Hope I can help!
Your approach sounds really unefficient. Have a look at solr/lucene for a real document search system.
If the data is not serialised in your database then you should be able to query specific values without the overhead of having to serialise and unserialise.
The best method for updating serialized columns in database is not to have serialized columns at all. Period.
Map your serialized data into related tables and update whatever value using SQL query
okay, let's pretend i've got fifty pieces of information that i want to store in each record of a table. when i pull the data out, i'm going to be doing basic maths on some of them. on any given page request, i'm going to pull out a hundred records and do the calculations.
what are the performance impacts of:
A - storing the data as a serialized array in a single field and doing the crunching in php
vs
B - storing the data as fifty numeric fields and having mysql do some sums and avgs instead
please assume that normalization is not an issue in those fifty fields.
please also assume that i don't need to sort by any of these fields.
thanks in advance!
First, I would never store data serialized, it's just not portable enough. Perhaps in a JSON encoded field, but not serialized.
Second, if you're doing anything with the data (searching, aggregating, etc), make them columns in the table. And I do mean anything (sorting, etc).
The only time it's even acceptable to store formatted data (serialized, json, etc) in a column is if it's read only. Meaning that you're not sorting on it, you're not using it in a where clause, you're not aggregating the data, etc.
Database servers are very efficient at doing set-based operations. So if you're doing any kind of aggregation (summing, etc), do it in MySQL. It'll be significantly more efficient than you could make PHP be...
MySQL will almost certainly be doing these calcualtions faster than PHP.
While I would almost always recommend option B, I'm running into a unique situation myself where storing serialized into a text field might make more sense.
I have a client who has an application form on their website. There are around 50 fields on the form, and all the data will only ever be read only.
Moreover, this application may change over time. Fields may be added, fields may be removed. By using serialized data, I can save all the questions and answers in a serialized format. If the form changes, the old data stays in tact, along with the original questions.
I go with Jonathan! If you have a table where the number of fields would vary depending on the options or contents the user makes, and those fields are neither aggregated nor calculated, i would serialize(and base64_encode) or json_encode the values too.
Joomla and Wordpress do this too. Typo3 has some tables with lots and lots of columns, and that is kind of ugly :-)
On the Facebook FQL pages it shows the FQL table structure, here is a screenshot below to show some of it (screenshot gone).
You will notice that some items are an array, such as meeting_sex, meeting_for current_location. I am just curious, do you think they are storing this as an array in mysql or just returning it as one, from this data it really makes me think it is stored as an array. IF you think it is, or if you have done similar, what is a good way to store these items as an array into 1 table field and then retrieve it as an array on a PHP page?
alt text http://img2.pict.com/3a/70/2a/2439254/0/screenshot2b187.png
The correct way to store an array in a database is by storing it as a table, where each element of the array is a row in the table.
Everything else is a hack, and will eventually make you regret your decision to try to avoid an extra table.
There are two options for storing as an array:
The first, which you mentioned, is to make one, or several, tables, and enumerate each possible key you intend to store. This is the best for searching and having data that makes sense.
However, for what you want to do, use serialize(). Note: DO NOT EVER EVER EVER try to search against this data in its native string form. It is much faster (and saner) to just reload it, call unserialize(), and then search for your criteria than to develop some crazy search pattern to do your bidding.
EDIT: If it were me, and this were something I was seriously developing for others to use (or even for myself to use, to be completely honest), I would probably create a second lookup table to store all the keys as columns; Heck, if you did that, mysql_fetch_assoc() could give you the array you wanted just by running a quick second query (or you could extract them out via a JOINed query). However, if this is just quick-and-dirty to get whatever job done, then a serialized array may be for you. Unless you really, really don't care about ever searching that data, the proper column-to-key relationship is, I think most would agree, superior.
I guarantee you that Facebook is not storing that data in arrays inside their database.
The thing you have to realize about FQL is that you are not querying Facebook's main data servers directly. FQL is a shell, designed to provide you access to basic social data without letting you run crazy queries on real servers that have performance requirements. Arbitrary user-created queries on the main database would be functional suicide.
FQL provides a well-designed data return structure that is convenient for the type of data that you are querying, so as such, any piece of data that can have multiple associations (such as "meeting_for") gets packaged into an array before it gets returned as an API result.
As other posters have mentioned, the only way to store a programming language structure (such as an array or an object) inside a database (which has no concept of these things), is to serialize it. Serializing is expensive, and as soon as you serialize something, you effectively make it unusable for indexing and searching. Being a social network, Facebook needs to index and search almost everything, so this data would never exist in array form inside their main schemas.
Usually the only time you ever want to store serialized data inside a database is if it's temporary, such as session data, or where you have a valid performance requirement to do so. Otherwise, your data quickly becomes useless.
Split it out into other tables. You can serialize it but that will guarantee that you will want to query against that data later. Save yourself the frustration later and just split it out now.
you can serialize the array, insert it, and then unserialize it when you retrieve it.
They might be using multiple tables with many-to-many relationships, but use joins and MySql's GROUP_CONCAT function to return the values as an array for those columns in one query.
Lets say I wanted to build an app that will pull url links from a database and show 50 on a page. I am just using links as an example.
What if I had to store multiple values, an array into 1 mysql field for each link/record posted.
If there is 5 items for every Link, I could have an array of 5 items on the page, a list of 5 items seperated by commas, or I could use json encode/decode, What would be best for performance for saving a link to the DB and showing it on the page, something like implode/explode with a list of item, json encode/decode, or serialize/un-serialize the array?
Serializing is probably the best if you don't want to use multiple tables. The reason being that it deals with the special characters. If you use a comma separated list, then you'll need to worry about values that have commas in them already. Serialize/unserialize deals with this already. But the issue is the serializing is not terribly fast, although your arrays sound quite simple.
The best is still to have multiple tables as it allows you to search and/or manipulate the data much easier at a later date. It also isn't hard in PHP to create a loop to generate a SQL that adds multiple records to the second table (relating them back to the parent in the main table).
BTW, there are many other questions similar to this on SO: Optimal Way to Store/Retrieve Array in Table
The best way, since you ask for it, is to create a table that describes the data you're going to save, and then insert a row for each element. In your example, that would mean you need two tables; pages and links, where there is a foreign key in links.page_id referring pages.id.