I just saw the first comment to this question Inserting into a serialized array in PHP and it made me wonder why? Especially seeing that when you use database managed sessions (database based session handling) that is exactly what happens, the session handler inserts a serialized array into a database field.
There's nothing wrong with this in certain contexts. Session management is definitely one of those instances where this would be deemed acceptable. The thing to remember is that if you ever find yourself trying to relate data between the serialized data and any fields in your database you've made a huge design flaw and unfortunately this is something that I have seen people try to do.
Take any "never do x" with a grain of salt as almost any technique can be the correct one in certain circumstances. The advice is usually directed towards noobies who are very apt to misunderstand proper usage and code themselves into a very nasty corner.
How certain are you that you'll never want to get at that data from any platform other than PHP?
I don't know about PHP's form of serialization, but the default binary serialization format from every platform I do know about is inoperable with other platforms... typically it's not a good idea to data encoded for just a single frontend into a database.
Even if you don't end up using any other languages, it means the database itself isn't going to know anything about the information - so you won't be able to query on it etc. Maybe that's not a problem in your case - but it's definitely something to bear in mind.
The main argument against serialized data is that serialized data are hard to search through and impossible to do so efficiently i.e., without retrieving the records in the first place.
Depends on the data. By storing a language-specific data structure in a field you're tied to that language and you're also giving up anything the DB can give you. You won't have indexes on specific fields, can't run simple updates, can't extract partial data, can't have data check, referential integrity and so on.
Related
I have a MySQL table with about 9.5K rows, these won't change much but I may slowly add to them.
I have a process where if someone scans a barcode I have to check if that barcode matches a value in this table. What would be the fastest way to accomplish this? I must mention there is no pattern to these values
Here Are Some Thoughts
Ajax call to PHP file to query MySQL table ( my thoughts would this would be slowest )
Load this MySQL table into an array on log in. Then when scanning Ajax call to PHP file to check the array
Load this table into an array on log in. When viewing the scanning page somehow load that array into a JavaScript array and check with JavaScript. (this seems to me to be the fastest because it eliminates Ajax call and MySQL Query. Would it be efficient to split into smaller arrays so I don't lag the server & browser?)
Honestly, I'd never load the entire table for anything. All I'd do is make an AJAX request back to a PHP gateway that then queries the database, and returns the result (or nothing). It can be very fast (as it only depends on the latency) and you can cache that result heavily (via memcached, or something like it).
There's really no reason to ever load the entire array for "validation"...
Much faster to used a well indexed MySQL table, then to look through an array for something.
But in the end it all depends on what you really want to do with the data.
As you mentions your table contain around 9.5K of data. There is no logic to load data on login or scanning page.
Better to index your table and do a ajax call whenever required.
Best of Luck!!
While 9.5 K rows are not that much, the related amount of data would need some time to transfer.
Therefore - and in general - I'd propose to run validation of values on the server side. AJAX is the right technology to do this quite easily.
Loading all 9.5 K rows only to find one specific row, is definitely a waste of resources. Run a SELECT-query for the single value.
Exposing PHP-functionality at the client-side / AJAX
Have a look at the xajax project, which allows to expose whole PHP classes or single methods as AJAX method at the client side. Moreover, xajax helps during the exchange of parameters between client and server.
Indexing to be searched attributes
Please ensure, that the column, which holds the barcode value, is indexed. In case the verification process tends to be slow, look out for MySQL table scans.
Avoiding table scans
To avoid table scans and keep your queries run fast, do use fixed sized fields. E.g. VARCHAR() besides other types makes queries slower, since rows no longer have a fixed size. No fixed-sized tables effectively prevent the database to easily predict the location of the next row of the result set. Therefore, you e.g. CHAR(20) instead of VARCHAR().
Finally: Security!
Don't forget, that any data transferred to the client side may expose sensitive data. While your 9.5 K rows may not get rendered by client's browser, the rows do exist in the generated HTML-page. Using Show source any user would be able to figure out all valid numbers.
Exposing valid barcode values may or may not be a security problem in your project context.
PS: While not related to your question, I'd propose to use PHPexcel for reading or writing spreadsheet data. Beside other solutions, e.g. a PEAR-based framework, PHPExcel depends on nothing.
I'm working on some reusable code, basically. My idea is that I'd like to create an array based on a row in a database, where each column is the array's keys. The program then modifies the array, adding new keys if they weren't already in the database, and at the end of the program, the new array data is put back into the database, adding any new columns if they didn't exist first. Thus when making a new program with this reusable code, you don't have to mess with creating all the database columns.
I'm just looking for it to be an array, not some complex object. Kinda like the same way you would use $_SESSION or such. The database wouldn't change frequently, I'm only suggesting that the tables are created when the new program first runs, then don't change (so long as the programmer knows what he's doing). The array would be used securely; you wouldn't put user input into a $_SESSION key, would you?
So, a few questions.
Firstly, is this even a good idea?
Second, are there any similar stand-alone solutions already available which I can use or reference?
Finally, is there anything I should know about how to go about doing it if I need to from scratch?
Thank you a lot for any opinions or knowledge on this technique.
Well, if the programmer knows what columns he is going to use ahead of time, then he should just create the table. If the programmer doesn't know what the fields are called (they're determined by external forces like users, web service calls, etc), then you are opening yourself up for a major world of hurt as you have basically just passed all validation of data integrity to an outside source.
Outside sources are completely beyond your control and can do such lovely things as send bad data, especially if they happen to be users, or things operated by users, or things built by humans, or... well... anything else..
The rest of what you're talking about (select from a DB, modify returned value, save result) can be accomplished with things called Object-Relational-Maps. I can think of two good, standalone ORM systems in PHP: Doctrine and Propel.
Database structures shouldn't change frequently, which is what it sounds like your solution is intended to do. Usually creating any given table is just a single query once, with the occasional 'alter' as business needs change over time. Allowing for random mutability at the drop of a hat sounds like it'd be a nightmare to support.
Even if you did make it easy to add/alter/remove tables like this, there's still all the associated overhead of actually USING the new fields, removing deleted fields from existing code, yada yada yada.
I agree with others that traditional database tables shouldn't change like that. I'd suggest that you'd take a look at document databases like MongoDB, you can save array to the database as it is and you don't need to worry about the changing structure.
okay, let's pretend i've got fifty pieces of information that i want to store in each record of a table. when i pull the data out, i'm going to be doing basic maths on some of them. on any given page request, i'm going to pull out a hundred records and do the calculations.
what are the performance impacts of:
A - storing the data as a serialized array in a single field and doing the crunching in php
vs
B - storing the data as fifty numeric fields and having mysql do some sums and avgs instead
please assume that normalization is not an issue in those fifty fields.
please also assume that i don't need to sort by any of these fields.
thanks in advance!
First, I would never store data serialized, it's just not portable enough. Perhaps in a JSON encoded field, but not serialized.
Second, if you're doing anything with the data (searching, aggregating, etc), make them columns in the table. And I do mean anything (sorting, etc).
The only time it's even acceptable to store formatted data (serialized, json, etc) in a column is if it's read only. Meaning that you're not sorting on it, you're not using it in a where clause, you're not aggregating the data, etc.
Database servers are very efficient at doing set-based operations. So if you're doing any kind of aggregation (summing, etc), do it in MySQL. It'll be significantly more efficient than you could make PHP be...
MySQL will almost certainly be doing these calcualtions faster than PHP.
While I would almost always recommend option B, I'm running into a unique situation myself where storing serialized into a text field might make more sense.
I have a client who has an application form on their website. There are around 50 fields on the form, and all the data will only ever be read only.
Moreover, this application may change over time. Fields may be added, fields may be removed. By using serialized data, I can save all the questions and answers in a serialized format. If the form changes, the old data stays in tact, along with the original questions.
I go with Jonathan! If you have a table where the number of fields would vary depending on the options or contents the user makes, and those fields are neither aggregated nor calculated, i would serialize(and base64_encode) or json_encode the values too.
Joomla and Wordpress do this too. Typo3 has some tables with lots and lots of columns, and that is kind of ugly :-)
On the Facebook FQL pages it shows the FQL table structure, here is a screenshot below to show some of it (screenshot gone).
You will notice that some items are an array, such as meeting_sex, meeting_for current_location. I am just curious, do you think they are storing this as an array in mysql or just returning it as one, from this data it really makes me think it is stored as an array. IF you think it is, or if you have done similar, what is a good way to store these items as an array into 1 table field and then retrieve it as an array on a PHP page?
alt text http://img2.pict.com/3a/70/2a/2439254/0/screenshot2b187.png
The correct way to store an array in a database is by storing it as a table, where each element of the array is a row in the table.
Everything else is a hack, and will eventually make you regret your decision to try to avoid an extra table.
There are two options for storing as an array:
The first, which you mentioned, is to make one, or several, tables, and enumerate each possible key you intend to store. This is the best for searching and having data that makes sense.
However, for what you want to do, use serialize(). Note: DO NOT EVER EVER EVER try to search against this data in its native string form. It is much faster (and saner) to just reload it, call unserialize(), and then search for your criteria than to develop some crazy search pattern to do your bidding.
EDIT: If it were me, and this were something I was seriously developing for others to use (or even for myself to use, to be completely honest), I would probably create a second lookup table to store all the keys as columns; Heck, if you did that, mysql_fetch_assoc() could give you the array you wanted just by running a quick second query (or you could extract them out via a JOINed query). However, if this is just quick-and-dirty to get whatever job done, then a serialized array may be for you. Unless you really, really don't care about ever searching that data, the proper column-to-key relationship is, I think most would agree, superior.
I guarantee you that Facebook is not storing that data in arrays inside their database.
The thing you have to realize about FQL is that you are not querying Facebook's main data servers directly. FQL is a shell, designed to provide you access to basic social data without letting you run crazy queries on real servers that have performance requirements. Arbitrary user-created queries on the main database would be functional suicide.
FQL provides a well-designed data return structure that is convenient for the type of data that you are querying, so as such, any piece of data that can have multiple associations (such as "meeting_for") gets packaged into an array before it gets returned as an API result.
As other posters have mentioned, the only way to store a programming language structure (such as an array or an object) inside a database (which has no concept of these things), is to serialize it. Serializing is expensive, and as soon as you serialize something, you effectively make it unusable for indexing and searching. Being a social network, Facebook needs to index and search almost everything, so this data would never exist in array form inside their main schemas.
Usually the only time you ever want to store serialized data inside a database is if it's temporary, such as session data, or where you have a valid performance requirement to do so. Otherwise, your data quickly becomes useless.
Split it out into other tables. You can serialize it but that will guarantee that you will want to query against that data later. Save yourself the frustration later and just split it out now.
you can serialize the array, insert it, and then unserialize it when you retrieve it.
They might be using multiple tables with many-to-many relationships, but use joins and MySql's GROUP_CONCAT function to return the values as an array for those columns in one query.
I have a sort of vague question that you guys are perfect for answering. I've many times come across a point where I've had a form for my user to fill out where it consisted of many different pages. So far, I've been saving them in a session, but I'm a little worried about that practice since a session could expire and it seems a rather volatile way of doing it.
I could see, for example, having a table for temporary forms in SQL that you save to at the end of each page. I could see posting all the data taken so far to the next page. Things along those lines. How do you guys do it? What's good practice for these situations?
Yes, you can definitely save the intermediate data in the database, and then flip some bit to indicate that the record is finished when the user submits the final result. Depending on how you are splitting up the data collection, each page may be creating a row in a different table (with some key tying them together).
You may also want to consider saving the data in a more free-form manner, such as XML in a single column. This will allow you to maintain complex data structures in a simple data schema, but it will make querying the data difficult (unless your database supports xml column types, which most modern enterprisey databases do).
Another advantage to storing the interim data in the database is that the user can return to it later if he wishes. Just send the user an email when he starts, with a link to his work item. Of course, you may need to add whatever security layers on top of that to make sure someone else doesn't return to his work item.
Storing the interim data in the DB also allows the user to skip around from one page to another, and revisit past pages.
Hidden fields are also a good approach, but they will not allow the user to return later.
I would avoid storing large data structures in session, since if the user doesn't invalidate the session explicitly, and if you don't have a good mechanism for cleaning up old sessions, these expired sessions may stick around for a long time.
In the end, it really depends on your specific business needs, but hopefully this gives you something to think about.
I would stick with keeping the data in the session as it is more or less temporary at this stage: What would you do if a user does not complete the forms? You would have to check the SQL table for uncompleted data regularly making your whole application more complex.
By the way, there is a reason for session expiring namely security. And you can define yourself when the session expires.
Why not just pass things along in hidden parameters?
Ahh, good question.
I've found that a great way to handle this (if it's linear). The following will work especially well if you are including different content (pages) into one PHP page (MVC, for example). However, if you need to go from URL to URL, it can be difficult, because you cannot POST across a redirect (well, you can, but no browsers support it).
You can fill in the details.
$data = array();
//or//
$data = unserialize(base64_decode($_POST['data']));
// add keys to data
// serialize
$data = base64_encode(serialize($data));
<input type="hidden" name="data" value="<?= htmlspecialchars($data, ENT_QUOTES); ?>" />