crunching serialized data vs adding more fields - php - mysql - php

okay, let's pretend i've got fifty pieces of information that i want to store in each record of a table. when i pull the data out, i'm going to be doing basic maths on some of them. on any given page request, i'm going to pull out a hundred records and do the calculations.
what are the performance impacts of:
A - storing the data as a serialized array in a single field and doing the crunching in php
vs
B - storing the data as fifty numeric fields and having mysql do some sums and avgs instead
please assume that normalization is not an issue in those fifty fields.
please also assume that i don't need to sort by any of these fields.
thanks in advance!

First, I would never store data serialized, it's just not portable enough. Perhaps in a JSON encoded field, but not serialized.
Second, if you're doing anything with the data (searching, aggregating, etc), make them columns in the table. And I do mean anything (sorting, etc).
The only time it's even acceptable to store formatted data (serialized, json, etc) in a column is if it's read only. Meaning that you're not sorting on it, you're not using it in a where clause, you're not aggregating the data, etc.
Database servers are very efficient at doing set-based operations. So if you're doing any kind of aggregation (summing, etc), do it in MySQL. It'll be significantly more efficient than you could make PHP be...

MySQL will almost certainly be doing these calcualtions faster than PHP.

While I would almost always recommend option B, I'm running into a unique situation myself where storing serialized into a text field might make more sense.
I have a client who has an application form on their website. There are around 50 fields on the form, and all the data will only ever be read only.
Moreover, this application may change over time. Fields may be added, fields may be removed. By using serialized data, I can save all the questions and answers in a serialized format. If the form changes, the old data stays in tact, along with the original questions.

I go with Jonathan! If you have a table where the number of fields would vary depending on the options or contents the user makes, and those fields are neither aggregated nor calculated, i would serialize(and base64_encode) or json_encode the values too.
Joomla and Wordpress do this too. Typo3 has some tables with lots and lots of columns, and that is kind of ugly :-)

Related

MySQL + PHP: ~500kb JSON file - Loading data as tables and fields VS as a single serialized variable

I am making a website that interacts with an offline project through json files sent from the offline project to the site.
The site will need to load these files and manipulate the data.
Is it feasible with modern computing power to simply load these files into the database as a single serialized field, which can then be loaded and decoded for every use?
Or would it save significant overhead to properly store the JSON as tables and fields and refer to those for every use?
Without knowing more about the project, a table with multiple fields is probably the better solution.
There will be more options for the data in the long run, for example, indexing fields, searching through fields and many other MySQL commands that would not be possible if it was all stored in a single variable.
Consider future versions of the project too, example adding another field to a table is easy, but adding another field to a block of JSON would be more difficult.
Project growth, what if you experience 100x or 1000x growth will the table handle the extra load.
500kb is a relatively small data block, there shouldn't be any issue with computing power regardless of which method is used, although more information would be handy here, example 500kb per user, per upload, how many stores a day how often is it accessed.
Debugging will also be easier.
The New MySQL shell has a bulk JSON loader that is not only very quick but lets you have a lot of control on how the data is handled. See https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-utilities-json.html
Load it as JSON.
Think about what queries you need to perform.
Copy selected fields into MySQL columns so that those queries can use WHERE, GROUP BY, ORDER BY of MySQL instead of having to deal with the processing in the client.
A database table contains a bunch of similarly structured rows. Each row has a constant set of columns. (NULLs can be used to indicate missing columns for a given row.) JSON complicates things by providing a complex column. My advice above is a compromise between the open-ended flexibility of JSON and the need to use the database server to process lots of data. Further discussion here.

serialize/json_encode to one field or keep raw data in many fields of db?

Since five days i am thinking about how to store data in my new project. I've read a lot of articles about advantages and disadvantages of serializing or json_encoding, also about searching database with thousands of records. Here's the problem.
Consider i am making a game - i have thousands of locations and every location may have a some objects in it. The ammount of objects is limited but i guess may be 10-20 objects on each location. Each object has some properties (like perks), which sometimes needs to be checked, updated and so on, so i have to store it in db.
I see two options to do it:
simple database way, working with first database normalization form - store each object as a row in database and each property - in columns. I can easily retrieve data, connect it to specific location with just one id. The problem is that there may be (and there will be) a thousands of rows in db - all objects * ammount of location and searching it might be very expensive in time. If you multiply it by number of players searching database simultaneously it can kill DB.
second way is to serialize or json encode (or even implode somhow) all the objects in current loaction with all its properties. I guess each object may have 100 properties * 20 objects serialized might be not so small array of values. So 2000 (assoc key + int value) elements serialized and saved in one field for each location. There is a lot less database searching - just set id as primary key and search for it but later i have to deserialize all the data. It also might be expensive.
I know i dont put any code in here (there isnt any yet) so it is quite virtual question but i wonder if u ever checked what is better solution - to store large ammount of data in one filed but have it serialized or to have it scattered into db for multiple rows.
Hope you can share your experience:)
Kalreg.
Relational database searching is insanely fast. It's also incredibly flexible if you set it up right. So the encode process will be the most time costly factor. The benefit of JSON is server-client data transfer. Personally I'd use the tried-and-true Option 1 but look to cache as much data client-side as possible, e.g. HTML storage.
I also note you're using PHP. An AJAX approach with minimal page reloads is what you want, although I may be over-reading your tag.
You will want the first method.

Searching item in a serialized data field MYSQL

I use to store data in mysql just like that "data1,data2,data3,data4"
Then i decided to store them as serialized using php.
But the main problem that i faced is now i cannot search an item in that fields.
I was using FIND_IN_SET but i know that it doesn't work for serialized data.
My question is that should i turn back storing data with commas ?
If you need it to be searchable, you need it expressed as a series of rows that can be queried. If you need it to be in a comma separated format for performance reasons, that's another consideration. Sometimes you have to do both of these things and make sure your application keeps the two different forms in sync.
It is very hard to match the performance of a properly indexed table for matching queries when using serialized data, and likewise, the speed of retrieval for serialized data versus having to join in the other results. It's a trade-off.

Fastest way to store table of data and retrieve

I am storing some history information on my website for future retrieval of the users. So when they visit certain pages it will record the page that they visited, the time, and then store it under their user id for future additions/retrievals.
So my original plan was to store all of the data in an array, and then serialize/unserialize it on each retrieval and then store it back in a TEXT field in the database. The problem is: I don't know how efficient or inefficient this will get with large arrays of data if the user builds up a history of (e.g.) 10k pages.
EDIT: So I want to know what is the most efficient way to do this? I was also considering just inserting a new row in the database for each and every history, but then this would make a large database for selecting things from.
The question is what is faster/efficient, massive amount of rows in database or massive serialized array? Any other better solutions are obviously welcome. I will eventually be switching to Python, but for now this has to be done in PHP.
There is no benefit to storing the data as serialized arrays. Retrieving a big blob of data, de-serializing, modifying it and re-serializing to update is slow - and worse, will get slower the larger the piece of data (exactly what you're worried about).
Databases are specifically designed to handle large numbers of rows, so use them. You have no extra cost per insert as the data grows, unlike your proposed method, and you're still storing the same amount of data, so let the database do what it does best, and keep your code simple.
Storing the data as an array also makes any sort of querying and aggregation near impossible. If the purpose of the system is to (for example) see how many visits a particular page got, you would have to de-serialize every record, find all the matching pages, etc. If you have the data as a series of rows with user and page, it's a trivial SQL count query.
If, one day, you find that you have so many rows (10,000 is not a lot of rows) that you're starting to see performance issues, find ways to optimize it, perhaps through aggregation and de-normalization.
you can check for session variable and store all data of one session and can dump it together into database.
You can do Indexing at db level to save time.
Last and the most effective thing you can do is to do operation/manipulation on data and store it in separate table.And always select data from manuplated table.You can achieve this using cron job or some schedular.

Why should I not insert serialized arrays into my database field?

I just saw the first comment to this question Inserting into a serialized array in PHP and it made me wonder why? Especially seeing that when you use database managed sessions (database based session handling) that is exactly what happens, the session handler inserts a serialized array into a database field.
There's nothing wrong with this in certain contexts. Session management is definitely one of those instances where this would be deemed acceptable. The thing to remember is that if you ever find yourself trying to relate data between the serialized data and any fields in your database you've made a huge design flaw and unfortunately this is something that I have seen people try to do.
Take any "never do x" with a grain of salt as almost any technique can be the correct one in certain circumstances. The advice is usually directed towards noobies who are very apt to misunderstand proper usage and code themselves into a very nasty corner.
How certain are you that you'll never want to get at that data from any platform other than PHP?
I don't know about PHP's form of serialization, but the default binary serialization format from every platform I do know about is inoperable with other platforms... typically it's not a good idea to data encoded for just a single frontend into a database.
Even if you don't end up using any other languages, it means the database itself isn't going to know anything about the information - so you won't be able to query on it etc. Maybe that's not a problem in your case - but it's definitely something to bear in mind.
The main argument against serialized data is that serialized data are hard to search through and impossible to do so efficiently i.e., without retrieving the records in the first place.
Depends on the data. By storing a language-specific data structure in a field you're tied to that language and you're also giving up anything the DB can give you. You won't have indexes on specific fields, can't run simple updates, can't extract partial data, can't have data check, referential integrity and so on.

Categories