Get the greatest value into serialized data with php into mysql column - php

What is the way to get the greatest value into a serialized data. For example i have this in my column 'rating':
a:3:{s:12:"total_rating";i:18;s:6:"rating";i:3;s:13:"total_ratings";i:6;}
How can I select the 3 greatest 'rating' with a query?
thanks a lot

You're probably looking at a pile of SUBSTRING_INDEX(field,':',#offset) calls if you want to do it in SQL. It would be very grisly. Storing a serialized version of an object in the db is a convenience for persistance, but it should not be considered a permanent storage method. If you insist on using the serialized string for queries, you've lost all the power of a relational db and you might as well store the strings in a text file.
The best option is to use the serialized string only for persistance purposes (like remembering what the user was doing last time they visited), and store the data you need for calculations in properly normalized fields and tables. Then you can easily query what you need to know.
The other option is to select all the 'rating' strings from rows whos fields meet certain other criteria (e.g. the date_added field is within the last week), reinstantiate all the objects in your application layer and compare them there.

Related

Speed of SELECT Distinct vs array unique

I am using WordPress with some custom post types (just to give a description of my DB structure - its WP's).
Each post has custom meta, which is stored in a separate table (postmeta table). In my case, I am storing city and state.
I've added some actions to WP's save_post/trash_post hooks so that the city and state are also stored in a separate table (cities) like so:
ID postID city state
auto int varchar varchar
I did this because I assumed that this table would be faster than querying the rather large postmeta table for a list of available cities and states.
My logic also forced me to add/update cities and states for every post, even though this will cause duplicates (in the city/state fields). This must be so because I must keep track of which states/cities exist (actually have a post associated with them). When a post is added or deleted, it takes its record to or from the cities table with it.
This brings me to my question(s).
Does this logic make sense or do I suck at DB design?
If it does make sense, my real question is this: **would it be faster to use MySQL's "SELECT DISTINCT" or just "SELECT *" and then use PHP's array_unique on the results?**
Edits for comments/answers thus far:
The structure of the table is exactly how I typed it out above. There is an index on ID, but the point of this table isn't to retrieve an indexed list, but to retrieve ALL results (that are unique) for a list of ALL available city/state combos.
I think I may go with (I don't know why I didn't think of this before) just adding a serialized list of city/state combos in ONE record in the wp_options table. Then I can just get that record, and filter out the unique records I need.
Can I get some feedback on this? I would imagine that retrieving and filtering a serialized array would be faster than storing the data in a separate table for retrieval.
To answer your question about using SELECT distinct vs. array_unique, I would say that I would almost always prefer to limit the result set in the database assuming of course that you have an appropriate index on the field for which you are trying to get distinct values. This saves you time in transmitting extra data from DB to application and for the application reading that data into memory where you can work with it.
As far as your separate table design, it is hard to speculate whether this is a good approach or not, this would largely depend on how you are actually preforming your query (i.e. are you doing two separate queries - one for post info and one for city/state info or querying across a join?).
The is really only one definitive way to determine what is fastest approach. That is to test both ways in your environment.
1) Fully normalized table(when it have only integer values and other tables have only one int+varchar) have advantage when you not dooing full table joins often and dooing alot of search on normalized fields. As downside it require large join/sort buffers and result more complex queries=much less chance query will be auto-optimized by mysql. So you have optimize your queries yourself.
2)Select distinct will be faster in almost any cases. Only case when it will be slower - you have low size sort buffer in /etc/my.conf and much more size memory buffer for php.
Distinct select can use indexes, while your code can't.
Also sending large amount of data to your app require alot of mysql cpu time and real time.

pulling serialized data from mysql

So, there's a field in the db in which I store serialized arrays.
$array = array('count1' => 10, 'count2' => 20, 'count3' => 4);
serialized:
a:3:{s:6:"count1";i:10;s:6:"count2";i:20;s:6:"count3";i:4;}
Would it be possible to pull count1+count2+count3 using a mysql query? I guess I'm looking for something like php's explode. Pretty sure this can't be done, but I thought I'd ask.
I need to pull the highest count1+count2+count3 rows and return the total count. Looping through each row and unserializing wouldn't work since there are TONS of rows.
If you need to access parts of your serialized data via SQL, you need to store them in separate columns.
While it might be possible to use techniques such as regular expressions to access those three values in this string, it would be extremely slow when used in a WHERE criterion as indexes would be useless - not to mention that it would be a huge mess, way worse than using goto in a programming language.
So the solution is to create a new columns and then iterate over all rows, unserialize them, and store the sum into the new column. That might take a while but you'll only need to it once.
Depending on your application it might be better to create three columns and store each value separately.

How to store searchable arrays in MySQL

So I've got this form with an array of checkboxes to search for an event. When you create an event, you choose one or more of the checkboxes and then the event gets created with these "attributes". What is the best way to store it in a MySQL database if I want to filter results when searching for these events? Would creating several columns with boolean values be the best way? Or possibly a new table with the checkbox values only?
I'm pretty sure selializing is out of the question because I wouldn't be able to query the selialized string for whether the checkbox was ticked or not, right?
Thanks
You can use the set datatype or a separate table that you join. Either will work.
I would not do a bunch of columns though.
You can search the set easily using FIND_IN_SET(), but it's not indexed, so it depends on how many rows you expect (up to a few thousand is probably OK - it's a very fast search).
The normal solution is a separate table with one column being the ID of the event, and the second column being the attribute using the enum datatype (don't use text, it's slower).
create separate columns or you can store them all in one column using bit mask
One way would be to create a new table with a column for each checkbox, as already described by others. I'll not add to that.
However, another way is to use a bitmask. You have just one column myCheckboxes and store the values as an int. Then in the code you have constants or another appropriate way to store the correlation between each checkbox and it's bit. I.e.:
CHECKBOX_ONE 1
CHECKBOX_TWO 2
CHECKBOX_THREE 4
CHECKBOX_FOUR 8
...
CHECKBOX_NINE 256
Remember to always use the next power of two for new values, otherwise you'll get values that overlap.
So, if the first two checkboxes have been checked you should have 3 as the value of myCheckboxes for that row. If you have ONE and FOUR checked you'd have 9 as the values of myCheckboxes, etc. When you want to see which rows have say checkboxes ONE, THREE and NINE checked your query would be like:
SELECT * FROM myTable where myCheckboxes & 1 AND myCheckboxes & 4 AND myCheckboxes & 256;
This query will return only rows having all this checkboxes marked as checked.
You should also use bitwise operations when storing and reading the data.
This is a very efficient way when it comes to speed. You have just a single column, probably just a smallint, and your searches are pretty fast. This can make a big difference if you have several different collections of checkboxes that you want to store and search trough. However, this makes the values harder to understand. If you see the value 261 in the DB it'll not be easy for a human to immeditely see that this means checkboxes ONE, THREE and NINE have been checked whereas it is much easier for a human seeing separate columns for each checkbox. This normally is not an issue, cause humans don't need to manually poke the database, but it's something worth mentioning.
From the coding perspective it's not much of a difference, but you'll have to be careful not to corrupt the values, cause it's not that hard to mess up a single int, it's magnitudes easier than screwing the data than when it's stored in different columns. So test carefully when adding new stuff. All that said, the speed and low memory benefits can be very big if you have a ton of different collections.

mysql insert multiple data into a single column or multiple row

just want to ask for an opinion regarding mysql.
which one is the better solution?
case1:
store in 1 row:-
product_id:1
attribute_id:1,2,3
when I retreive out the data, I split the string by ','
I saw some database, the store the data in this way, the record is a product, the column is stored product attribute:
a:3:{s:4:"spec";a:2:{i:1;s:6:"black";i:3;s:2:"37";}s:21:"spec_private_value_id";a:2:{i:1;s:11:"12367591683";i:3;s:11:"12367591764";}s:13:"spec_value_id";a:2:{i:1;s:1:"5";i:3;s:2:"29";}}
or
case2:
store in 3 row:-
product_id:1
attribute_id:1
product_id:1
attribute_id:2
product_id:1
attribute_id:3
this is the normal I do, to store 3 rows for the attribute for a record.
In term of performance and space, anyone can tell me which one is better. From what I see is case1 save space, but need to process the data in PHP (or other server side scripting).
case2 is more straight forward, but use spaces.
Save space? Seriously? You're talking about saving bytes when a one terabyte disk goes for 70 dollars?
And maybe you're not even saving bytes. If you store attributes as "12234,23342,243234", that's like 30 bytes for 3 attributes. If you'd store them as smallint, they'd take up 6 bytes.
Depends on whether the attributes are important for searching later, for example.
It may be good if you keep attributes as serialized array in just one field in case you actually don't care about them and in case that you, for example, won't need to run a query to show all products that have one attribute.
However, finding all products that have one attribute would be at least "lousy" in case you have attributes as comma-separated (you need to use LIKE), and in case you store attributes as serialized arrays they are completely unusable for any kind of sorting or grouping using sql queries.
Using separate table for multiple relations between products and attributes is far better if they are of any importance for selecting/grouping/sorting other data.
In case 1, although you save space, there's time spent on splitting the string.
You also must take care of the size of your field: If you have 50 products with 2 attributes and one with 100 attributes, you must make the field ~ varchar(200)... You will not save space at all.
I think case 2 is the best and recommended solution.
You need to consider the SELECT statements that would be using these values. If you wish to search for records that have certain attributes, it is much more efficient to store them in separate columns and index them. Otherwise, you are doing "LIKE" statements which take much longer to process.

Returning multiple rows per row (in Zend Framework)

I have a MySQL database containing these tables:
sessions
--------
sessionid (INT)
[courseid (INT)]
[locationid (INT)]
[comment (TEXT)]
dates
-----
dateid (INT)
sessionid (INT)
date (DATE)
courses
-------
...
locations
---------
...
Each session has a unique sessionid, and each date has a unique dateid. But dates don't necessarily have a unique sessionid, as a session can span over a variable number of dates (not necessarily consecutive).
Selecting each full row is simply a matter of joining the tables on the sessionid. However, I'm looking for a way to return a rowset for a particular courseid, where each row in that rowset represents a location, and contains another rowset, each containing single session, which in turn contains another rowset, which contains all of the dates for that session:
course
location
sesssion
date
date
session
date
date
date
location
...
This is because I'm using querying this database from PHP using Zend Framework, which has a great interface for manipulating rows and rowsets in an object-oriented manner.
Ultimately, I'm trying to output a 'schedule' to the view, organized first by course, then location, then date. Ideally, I'd be able iterate over each row as a location, and then for each location, iterate over each session, and then for each session, iterate over each date.
I'm thinking of doing this by querying for all the locations, sessions, and dates separately. Then, I'd convert each rowset into an array, and add each sessions array as a member of a locations array, and add each dates array as a member of a sessions array.
This, however, feels very kludgy, and doesn't provide me with the ability to handle the rows in an object-oriented manner.
I was wondering if there was either:
a) a better table schema for representing this data;
b) an sql query which i'm not aware of;
c) a method in Zend_Db that allows me to assign a rowset to a rowset
Please let me know if I haven't been clear anywhere, and thanks in advance.
(Crossing my fingers that this doesn't end up on the daily wtf...)
I've run into lots of issues with using Zend Frameworks database abstraction classes when I have to deal with data from multiple tables. The number of queries that run and the overhead of all of the objects generated has brought my hosting server to it's knees. I've since reverted back to writing queries to gather all of my data and then walking the data to build my display. It's not a pretty or OO as using the abstraction layers but it's also not making my PHP scripts page to disk just to display a table full of data.
As Steve mentions benchmark whatever solution you end up with, I'd also profile your memory usage.
You could handle this scenario using the relationship features of Zend_ Db_ Table. You'd need to create table wrapper classes for sessions, dates, courses, etc. if you're using Zend_ Db_ Aadpter for your queries currently.
http://framework.zend.com/manual/en/zend.db.table.relationships.html
It's not too different from the approach you described of querying for each dataset separately, but it gives you a straight forward OO interface for retrieving the appropriate related data for a given record.
You'll want to do some benchmarking if you go this route, as it could potentially execute a lot of queries.

Categories