On the Facebook FQL pages it shows the FQL table structure, here is a screenshot below to show some of it (screenshot gone).
You will notice that some items are an array, such as meeting_sex, meeting_for current_location. I am just curious, do you think they are storing this as an array in mysql or just returning it as one, from this data it really makes me think it is stored as an array. IF you think it is, or if you have done similar, what is a good way to store these items as an array into 1 table field and then retrieve it as an array on a PHP page?
alt text http://img2.pict.com/3a/70/2a/2439254/0/screenshot2b187.png
The correct way to store an array in a database is by storing it as a table, where each element of the array is a row in the table.
Everything else is a hack, and will eventually make you regret your decision to try to avoid an extra table.
There are two options for storing as an array:
The first, which you mentioned, is to make one, or several, tables, and enumerate each possible key you intend to store. This is the best for searching and having data that makes sense.
However, for what you want to do, use serialize(). Note: DO NOT EVER EVER EVER try to search against this data in its native string form. It is much faster (and saner) to just reload it, call unserialize(), and then search for your criteria than to develop some crazy search pattern to do your bidding.
EDIT: If it were me, and this were something I was seriously developing for others to use (or even for myself to use, to be completely honest), I would probably create a second lookup table to store all the keys as columns; Heck, if you did that, mysql_fetch_assoc() could give you the array you wanted just by running a quick second query (or you could extract them out via a JOINed query). However, if this is just quick-and-dirty to get whatever job done, then a serialized array may be for you. Unless you really, really don't care about ever searching that data, the proper column-to-key relationship is, I think most would agree, superior.
I guarantee you that Facebook is not storing that data in arrays inside their database.
The thing you have to realize about FQL is that you are not querying Facebook's main data servers directly. FQL is a shell, designed to provide you access to basic social data without letting you run crazy queries on real servers that have performance requirements. Arbitrary user-created queries on the main database would be functional suicide.
FQL provides a well-designed data return structure that is convenient for the type of data that you are querying, so as such, any piece of data that can have multiple associations (such as "meeting_for") gets packaged into an array before it gets returned as an API result.
As other posters have mentioned, the only way to store a programming language structure (such as an array or an object) inside a database (which has no concept of these things), is to serialize it. Serializing is expensive, and as soon as you serialize something, you effectively make it unusable for indexing and searching. Being a social network, Facebook needs to index and search almost everything, so this data would never exist in array form inside their main schemas.
Usually the only time you ever want to store serialized data inside a database is if it's temporary, such as session data, or where you have a valid performance requirement to do so. Otherwise, your data quickly becomes useless.
Split it out into other tables. You can serialize it but that will guarantee that you will want to query against that data later. Save yourself the frustration later and just split it out now.
you can serialize the array, insert it, and then unserialize it when you retrieve it.
They might be using multiple tables with many-to-many relationships, but use joins and MySql's GROUP_CONCAT function to return the values as an array for those columns in one query.
Related
I am currently developing an application which requires the storage of a multidimensional array in a database. Currently I am taking the array which is 3 tiers deep and could be any size as it is populated with user-generated information, and using json_encode() to convert it into a string and enter it into the database.
The issue with this is that the single field is extremely large and difficult to read, to update the data I have to retrieve the data, decode it, update it, encode it and re-upload it. If the site did become used by many I am unaware of how this would scale.
I think the main alternative would be to create another table, with each row accessed by a unique id stored in the main table, but again I am unsure how well this would scale
Which is better? Any help much appreciated:)
If you are not sure how many tiers deep your array could be it's really hard to design the db schema.So best way to store is in a text field which you are doing already.
As for as scaling is concerned with proper indexing and partitioning you and scale you application however one field with large text value has nothing to do with scaling.
Creating multiple database tables and relationships between these tables to group your data together will probably be easier to maintain and will give you the ability to query/filter results based on columns rather than having to retrieve your data, decode it and iterate over it in PHP.
I think if you are not sure about how deep your array can be ,,,,than either you build a well constructed database that will be bit difficult so just do as you have planned. As I would suggest make use of functions as much as you can.
Function for encoding and function for decoding for json data (of any tiers) and use these functions and carry out your updates.
Using separate functions will obviously beautify your project and codes wont mess up.
I'm working on some reusable code, basically. My idea is that I'd like to create an array based on a row in a database, where each column is the array's keys. The program then modifies the array, adding new keys if they weren't already in the database, and at the end of the program, the new array data is put back into the database, adding any new columns if they didn't exist first. Thus when making a new program with this reusable code, you don't have to mess with creating all the database columns.
I'm just looking for it to be an array, not some complex object. Kinda like the same way you would use $_SESSION or such. The database wouldn't change frequently, I'm only suggesting that the tables are created when the new program first runs, then don't change (so long as the programmer knows what he's doing). The array would be used securely; you wouldn't put user input into a $_SESSION key, would you?
So, a few questions.
Firstly, is this even a good idea?
Second, are there any similar stand-alone solutions already available which I can use or reference?
Finally, is there anything I should know about how to go about doing it if I need to from scratch?
Thank you a lot for any opinions or knowledge on this technique.
Well, if the programmer knows what columns he is going to use ahead of time, then he should just create the table. If the programmer doesn't know what the fields are called (they're determined by external forces like users, web service calls, etc), then you are opening yourself up for a major world of hurt as you have basically just passed all validation of data integrity to an outside source.
Outside sources are completely beyond your control and can do such lovely things as send bad data, especially if they happen to be users, or things operated by users, or things built by humans, or... well... anything else..
The rest of what you're talking about (select from a DB, modify returned value, save result) can be accomplished with things called Object-Relational-Maps. I can think of two good, standalone ORM systems in PHP: Doctrine and Propel.
Database structures shouldn't change frequently, which is what it sounds like your solution is intended to do. Usually creating any given table is just a single query once, with the occasional 'alter' as business needs change over time. Allowing for random mutability at the drop of a hat sounds like it'd be a nightmare to support.
Even if you did make it easy to add/alter/remove tables like this, there's still all the associated overhead of actually USING the new fields, removing deleted fields from existing code, yada yada yada.
I agree with others that traditional database tables shouldn't change like that. I'd suggest that you'd take a look at document databases like MongoDB, you can save array to the database as it is and you don't need to worry about the changing structure.
I am working on a project that is being built with a standard LAMP stack. Currently, I merely output the results of the query onto the page - the results are not being stored in objects at all.
In the future, I would like to edit the results of the query. I imagine that this would much easier if the results were stored in PHP objects.
Would it be more beneficial to store the objects themselves in a DB (via serialization/deserialization), or to create the objects when need be (after executing the query) and then destroying them when they are no longer needed?
You'd be better off storing a copy of the results directly in your object, rather than a serialized result handle. Serializing the result handle will NOT preserve locks, server-side variables, table state, transactions, or the data in the result set. MySQL has no provision for storing a connection handle in this fashion, so it'd be seen as a regular disconnect and resulting in outstanding queries being cleaned up, variables destroyed, transactions rolled back (or committed), etc...
As well, the data retrieved by the query is not actually fetched across the connection until you do a fetch_row()-type call, so you'd not even have that in your serialized handle.
Always create the objects in php, and destroy them later. In order to serialize you will need to use longtext or like field, which are known to be slow and you cannot index on them. If you are always doing a Select All, then go ahead, but if you ever use conditions or advanced queries, you should have all data separated.
It depends on many factors. If you are running the exact same queries again and again, then yes, store the results in your database. But why serialise them? If you tried Object-relational mapping, you could have a much easier to maintain query object, that you could store in a well organised relational database.
If you are not running the same queries very often, I would recommend caching the output in another way.
Would it be more beneficial to store the objects themselves in a DB (via serialization/deserialization), or to create the objects when need be (after executing the query) and then destroying them when they are no longer needed?
No. Somebody somewhere has done this for you. What would be beneficial is for you to use an existing ORM. It doesn't matter which one, just pick one and use it. You'll be lightyears ahead and get your project out the in a fraction of the time.
You should use a PHP framework while you're at it, many of which come coupled to an ORM.
I just saw the first comment to this question Inserting into a serialized array in PHP and it made me wonder why? Especially seeing that when you use database managed sessions (database based session handling) that is exactly what happens, the session handler inserts a serialized array into a database field.
There's nothing wrong with this in certain contexts. Session management is definitely one of those instances where this would be deemed acceptable. The thing to remember is that if you ever find yourself trying to relate data between the serialized data and any fields in your database you've made a huge design flaw and unfortunately this is something that I have seen people try to do.
Take any "never do x" with a grain of salt as almost any technique can be the correct one in certain circumstances. The advice is usually directed towards noobies who are very apt to misunderstand proper usage and code themselves into a very nasty corner.
How certain are you that you'll never want to get at that data from any platform other than PHP?
I don't know about PHP's form of serialization, but the default binary serialization format from every platform I do know about is inoperable with other platforms... typically it's not a good idea to data encoded for just a single frontend into a database.
Even if you don't end up using any other languages, it means the database itself isn't going to know anything about the information - so you won't be able to query on it etc. Maybe that's not a problem in your case - but it's definitely something to bear in mind.
The main argument against serialized data is that serialized data are hard to search through and impossible to do so efficiently i.e., without retrieving the records in the first place.
Depends on the data. By storing a language-specific data structure in a field you're tied to that language and you're also giving up anything the DB can give you. You won't have indexes on specific fields, can't run simple updates, can't extract partial data, can't have data check, referential integrity and so on.
Iam currently unserializing the data retrieved from database then updating the value as per by requirement & then serializing the values & running update query.
Similarly for implementing search on the serialized columns, I have unserialize & compare the values and mine search is like google search. So you can imagine it will be serializing for each requests.
I want to optimize it. Some methods that can improve my current method.
Don't used a serialized column of a DB for storing those values, use a proper search engine instead. In my case, I always use Lucene, using Zend_Search_Lucene.
You can build a Lucene Index very easily, just by defining documents and the fields you wish to store of those documents, then searching will be very efficient. If you want some docs, here and here you have some tutorials on how to use it, altough the official docs pointer first I find them quite explicative.
Hope I can help!
Your approach sounds really unefficient. Have a look at solr/lucene for a real document search system.
If the data is not serialised in your database then you should be able to query specific values without the overhead of having to serialise and unserialise.
The best method for updating serialized columns in database is not to have serialized columns at all. Period.
Map your serialized data into related tables and update whatever value using SQL query