Let's say that I have array like the one I posted below and that I need to store it in my MySQL database:
Array(
"Weight" => "10",
"Height" => "17",
"Usage" => "35"
);
Preamble:
I will never update these values
I will never perform a query based on these values
Long story short I only need to store and display this array as it is. Actually I need to use these values to generate graphs. Now I see 2 possible options.
Option 1: even if I will never use a WHERE, ORDER BY, HAVING (...) condition on these values, I store each value separately in a dedicated column (weight, height, usage).
Option 2: I create a single column (stats) where I store a serialized version of the array then, in order generate my graphs, I unserialize each row before using it.
The question is: what's the best approach to store this array in terms of effectiveness and performaces?
In my opinion the second approach is the best but let's say that there are many rows and elements involved in the process. I don't understand if it's faster and ligher to unserialize an array made by 20 elements for 100 rows with PHP or to read plain values stored in 20 columns considering that I need to save lot of them very frequently and simultaneously.
I will never update these values
I will never perform a query based on these values
The second you finalise your code having stored them as serialised values, you'll be asked to perform a query to update anything with a weight above ten.
Just store them in their own columns - not only will this future-proof the code, but it is easier to work with and will take up less drive space in the long run.
Related
I have multiple dataset as below, and I want to handle them using PHP.
Dataset #1 (75 cols * 27,000 rows)
col #1 col #2 ...
record #1
record #2
...
Dataset #2 (32 cols * 7,500 rows)
....
Dataset #3 (44 cols * 17,500 rows)
....
Here, the number of records and columns are different so it is hard to use database structure.
And note that each 'cell' of dataset is only consists of either real number or N/A... and the dataset is perfectly fixed, i.e., there will be no any change.
So what I've done so far is make them as a file-based table, and write a starting offset of each record in the file.
Using this way, quite nice access speedup was achieved, but not satisfactory so far, because an access to each record requires parsing it as PHP data structure.
What I ultimately want to achieve is eliminating the parsing step. But serialization was not a good choice because it loads entire dataset. Of course it is possible to serialize each record and keep their offset as I've done but without serialization, but it seems me to not so fancy.
So here's the question, is there any method to load a part of dataset without any parsing step, but more better than the partial serialization what I suggested?
Many thanks in advance.
More information
Maybe I made the viewers a little bit confused.
Each dataset is separated and they exist as independent files.
Usual data access pattern is row-wise. Each row have unique ID by string, and an ID in one dataset could be exists in other dataset, but not necessarily. But above of that, what I concern is accelerating an access speed when I have some query to fetch specific row(s) in the dataset. For example, let there is a dataset like below.
Dataset #1 (plain-text file)
obs1 obs2 obs3 ...
my1 3.72 5.28 10.22 ...
xu1 3.44 5.82 15.33 ...
...
qq7 8.24 10.22 47.54 ...
And there is a corresponding index file, serialized using PHP. A key of each item represents unique ID in the dataset, and their value represents their offset in the file
Index #1 (PHP-serialized one, not same as actual serialized one)
Array (
"my1" => 0,
"xu1" => 337,
...
"qq7" => 271104
)
So it is possible to know record "xu1" starts at 337 bytes from the beginning of dataset file.
In order to access and fetch some rows using their unique ID,
1) Load serialized index file
2) Find matching IDs with query
3) Access to those position and fetch rows, and parsing them as an array of PHP.
The problems what I have is
1) Since I using exact matching, it is impossible to fetch multiple rows that partially matching with query (for example, fetch "xu1" row from query "xu")
2) Even though I indexed dataset, fetch speed is not satisfactory (took 0.05 sec. from single query)
3) When I tried to solve above problem by serializing an entire dataset, (maybe of course) the loading speed become substantially slower.
The only easiest way to solve above problems is make them as database I would do so,
but hope to find better way as keep them with plain text or some text-like format (for example, serialized or json-coded).
Many thanks and interests about my problem!
I think I understand your question to some extent. You've got 3 sets of data, that can be or cannot be related, with different number of columns and rows.
This may not the most cleanest looking solution, but I think it could solve the purpose. You can use mysql to store the data to avoid parsing the file every now and again. You can store the data in three tables or put them in one table with all the columns (the rows without need for a set column can have "null" for the field value).
You can also use sql unions, in case you want to run queries on all the three datasets collectively, by using tricks like
select null as "col1", col2, col3 from table1 where col2="something"
union all
select col1,null as "col2", null as "col3" from table2 where co1="something else"
order by col1
So, there's a field in the db in which I store serialized arrays.
$array = array('count1' => 10, 'count2' => 20, 'count3' => 4);
serialized:
a:3:{s:6:"count1";i:10;s:6:"count2";i:20;s:6:"count3";i:4;}
Would it be possible to pull count1+count2+count3 using a mysql query? I guess I'm looking for something like php's explode. Pretty sure this can't be done, but I thought I'd ask.
I need to pull the highest count1+count2+count3 rows and return the total count. Looping through each row and unserializing wouldn't work since there are TONS of rows.
If you need to access parts of your serialized data via SQL, you need to store them in separate columns.
While it might be possible to use techniques such as regular expressions to access those three values in this string, it would be extremely slow when used in a WHERE criterion as indexes would be useless - not to mention that it would be a huge mess, way worse than using goto in a programming language.
So the solution is to create a new columns and then iterate over all rows, unserialize them, and store the sum into the new column. That might take a while but you'll only need to it once.
Depending on your application it might be better to create three columns and store each value separately.
So I've got this form with an array of checkboxes to search for an event. When you create an event, you choose one or more of the checkboxes and then the event gets created with these "attributes". What is the best way to store it in a MySQL database if I want to filter results when searching for these events? Would creating several columns with boolean values be the best way? Or possibly a new table with the checkbox values only?
I'm pretty sure selializing is out of the question because I wouldn't be able to query the selialized string for whether the checkbox was ticked or not, right?
Thanks
You can use the set datatype or a separate table that you join. Either will work.
I would not do a bunch of columns though.
You can search the set easily using FIND_IN_SET(), but it's not indexed, so it depends on how many rows you expect (up to a few thousand is probably OK - it's a very fast search).
The normal solution is a separate table with one column being the ID of the event, and the second column being the attribute using the enum datatype (don't use text, it's slower).
create separate columns or you can store them all in one column using bit mask
One way would be to create a new table with a column for each checkbox, as already described by others. I'll not add to that.
However, another way is to use a bitmask. You have just one column myCheckboxes and store the values as an int. Then in the code you have constants or another appropriate way to store the correlation between each checkbox and it's bit. I.e.:
CHECKBOX_ONE 1
CHECKBOX_TWO 2
CHECKBOX_THREE 4
CHECKBOX_FOUR 8
...
CHECKBOX_NINE 256
Remember to always use the next power of two for new values, otherwise you'll get values that overlap.
So, if the first two checkboxes have been checked you should have 3 as the value of myCheckboxes for that row. If you have ONE and FOUR checked you'd have 9 as the values of myCheckboxes, etc. When you want to see which rows have say checkboxes ONE, THREE and NINE checked your query would be like:
SELECT * FROM myTable where myCheckboxes & 1 AND myCheckboxes & 4 AND myCheckboxes & 256;
This query will return only rows having all this checkboxes marked as checked.
You should also use bitwise operations when storing and reading the data.
This is a very efficient way when it comes to speed. You have just a single column, probably just a smallint, and your searches are pretty fast. This can make a big difference if you have several different collections of checkboxes that you want to store and search trough. However, this makes the values harder to understand. If you see the value 261 in the DB it'll not be easy for a human to immeditely see that this means checkboxes ONE, THREE and NINE have been checked whereas it is much easier for a human seeing separate columns for each checkbox. This normally is not an issue, cause humans don't need to manually poke the database, but it's something worth mentioning.
From the coding perspective it's not much of a difference, but you'll have to be careful not to corrupt the values, cause it's not that hard to mess up a single int, it's magnitudes easier than screwing the data than when it's stored in different columns. So test carefully when adding new stuff. All that said, the speed and low memory benefits can be very big if you have a ton of different collections.
just want to ask for an opinion regarding mysql.
which one is the better solution?
case1:
store in 1 row:-
product_id:1
attribute_id:1,2,3
when I retreive out the data, I split the string by ','
I saw some database, the store the data in this way, the record is a product, the column is stored product attribute:
a:3:{s:4:"spec";a:2:{i:1;s:6:"black";i:3;s:2:"37";}s:21:"spec_private_value_id";a:2:{i:1;s:11:"12367591683";i:3;s:11:"12367591764";}s:13:"spec_value_id";a:2:{i:1;s:1:"5";i:3;s:2:"29";}}
or
case2:
store in 3 row:-
product_id:1
attribute_id:1
product_id:1
attribute_id:2
product_id:1
attribute_id:3
this is the normal I do, to store 3 rows for the attribute for a record.
In term of performance and space, anyone can tell me which one is better. From what I see is case1 save space, but need to process the data in PHP (or other server side scripting).
case2 is more straight forward, but use spaces.
Save space? Seriously? You're talking about saving bytes when a one terabyte disk goes for 70 dollars?
And maybe you're not even saving bytes. If you store attributes as "12234,23342,243234", that's like 30 bytes for 3 attributes. If you'd store them as smallint, they'd take up 6 bytes.
Depends on whether the attributes are important for searching later, for example.
It may be good if you keep attributes as serialized array in just one field in case you actually don't care about them and in case that you, for example, won't need to run a query to show all products that have one attribute.
However, finding all products that have one attribute would be at least "lousy" in case you have attributes as comma-separated (you need to use LIKE), and in case you store attributes as serialized arrays they are completely unusable for any kind of sorting or grouping using sql queries.
Using separate table for multiple relations between products and attributes is far better if they are of any importance for selecting/grouping/sorting other data.
In case 1, although you save space, there's time spent on splitting the string.
You also must take care of the size of your field: If you have 50 products with 2 attributes and one with 100 attributes, you must make the field ~ varchar(200)... You will not save space at all.
I think case 2 is the best and recommended solution.
You need to consider the SELECT statements that would be using these values. If you wish to search for records that have certain attributes, it is much more efficient to store them in separate columns and index them. Otherwise, you are doing "LIKE" statements which take much longer to process.
What is the way to get the greatest value into a serialized data. For example i have this in my column 'rating':
a:3:{s:12:"total_rating";i:18;s:6:"rating";i:3;s:13:"total_ratings";i:6;}
How can I select the 3 greatest 'rating' with a query?
thanks a lot
You're probably looking at a pile of SUBSTRING_INDEX(field,':',#offset) calls if you want to do it in SQL. It would be very grisly. Storing a serialized version of an object in the db is a convenience for persistance, but it should not be considered a permanent storage method. If you insist on using the serialized string for queries, you've lost all the power of a relational db and you might as well store the strings in a text file.
The best option is to use the serialized string only for persistance purposes (like remembering what the user was doing last time they visited), and store the data you need for calculations in properly normalized fields and tables. Then you can easily query what you need to know.
The other option is to select all the 'rating' strings from rows whos fields meet certain other criteria (e.g. the date_added field is within the last week), reinstantiate all the objects in your application layer and compare them there.