Storing cartesian product results in SQL?

Storing cartesian product results in SQL? - php

Thanks for reading, I hope this makes sense! I am trying to modify a bespoke CMS which gives different product options and lets you set various attributes based on combinations of those options. These are getting pulled from a mySQL database. For example, a t-shirt might have:
Colour Size
------ ----
Red Small
Red Medium
Blue Small
Blue Medium
Each colour and size would have a unique ID so red = 1, blue = 2 and small = 3, medium = 4. Colour and Size would each have a parent id, so colour = 1, size = 2.
At the moment I am storing a string in a database table in the database that identifies each combination (e.g. Red + small would be &1=1&2=3). This string then associates this combination of options with various attributes: price, stock code etc.
However the problem comes when we want to add a new option group to the mix, say sleeve length. At the moment this would mean having to go through the strings and changing them to also include the new option. This seems like a very inefficient and long-winded way to be doing this!
So (eventually!) my question is - is there a better way I can be doing this ? I need to ensure that there is no real limit on the number of option groups that can be added, as well as letting new option groups be added at any time.
Thanks in advance for your help.
Stuart

I would counter instead with asking you, why would you want to store that in the first place?
Cartesian products are a controller issue, not a domain model one. Your domain model (your database) should only care to keep the data in a normalized fashion (or as close as it's feasible), let the database system do the joins as needed (inner join in sql), that's what they're optimized to do in the first place.

Related

PHP: MYSQL Row into Array

I have a little problem, hope you can help me... In my mysql table row named size I have:
"XS"=>"11", "S"=>"22", "M"=>"33", "L"=>"44"
I would like to get that out in an array, so I can post it to my select.
$x = mysqli_query($mysql_link, "SELECT * FROM dagenshug_produkt WHERE varenummer = '$produktid'");
while($row = mysqli_fetch_assoc($x)) {
$sizearray = array($row['size']);
print_r($sizearray);
}
This returns:
Array ( [0] => "XS"=>"11", "S"=>"22", "M"=>"33", "L"=>"44" )
But i would like
Array ("XS"=>"11", "S"=>"22", "M"=>"33", "L"=>"44")
Anyone who can help me a little?
The new db design

I post this as an answer, as it will be to loong for a comment.
I will delete it later, as it will not be a real answer to your question.
You need to redesign your database. Please read about relational database design.
One (easy) possibility is to have one table-field per size.
The downside to this design is, that if f.e. one product invents a new size called XMS you're f* and need to rewrite a lot of code.
This is where relational tables come in place. If you have an extra table that saves all the sizes per product, you're fine!
such a sizes-table could look like that:
product_sizes
----------------
id // unique id of this entry (int, auto_incement)
label // 'XS', ...
size // '22',...
product_id // the id of the product it refers to
But all that thoughts depend on your requirements that we don't know.

You asked for the wrong question. As raised in the comments, you need to redesign your database.
You don't necessarily have to develop a universal products variations (colors, sizes, matters, etc.).
Your products are in a table, they all have a unique ID. Good. In another table, store your sizes:
id: integer auto increment (PK)
size_name: 2XS, XS, L, M, L, XL, XXL, 3XL
size_description (for information purpose in the backoffice)
product_type (or product category). Remember that XL is a shirt size, whereas 10 is a shoe size.
The pivot table, which materializes the relation between your products and the sizes:
product_id
size_id
This table may contain relation data, like additional cost (sometimes large sizes are more expansive), etc.
To make it even better, you may store the association size<->product type with another table. It will be necessary in your backoffice to allow some sizes to be applied to appropriate products.
Anyway. Designing a database does not happen in phpMyAdmin. Never. There are specific tools for that, like MySQL Workbench. It will help you have full overview of your database.
You also need to know the difference between a table which represents an entity and and a table which exists for technical reasons (sorry, my vocabulary is a little limited in this case). A pivot table is not en entity, even if it can carry data (for example: the date a user was added to a group).
This is very important to know these basics, they will help you build a strong, secure, efficient and fast database.

I hope to help...
$x = mysqli_query($mysql_link, "SELECT * FROM dagenshug_produkt WHERE varenummer = '$produktid'");
$row = mysqli_fetch_assoc($x);
$arr = explode(",",$row['size']);
$a=array();
foreach($arr as $n) {
$m = explode("=>",$n);
$a[$m[0]]= $m[1];
}
print_r($a);

PHP, MySQL, Efficient tag-driven search algorithm

I'm currenlty building a webshop. This shop allows users to filter products by category, and a couple optional, additional filters such as brand, color, etc.
At the moment, various properties are stored in different places, but I'd like to switch to a tag-based system. Ideally, my database should store tags with the following data:
product_id
tag_url_alias (unique)
tag_type (unique) (category, product_brand, product_color, etc.)
tag_value (not unique)
First objective
I would like to search for product_id's that are associated with anywhere between 1-5 particular tags. The tags are extracted from a SEO-friendly url. So I will be retrieving a unique strings (the tag_url_alias) for each tag, but I won't know the tag_type.
The search will be an intersection, so my search should return the product_id's that match all of the provided tags.
Second objective
Besides displaying the products that match the current filter, I would also like to display the product-count for other categories and filters which the user might supply.
For instance, my current search is for products that match the tags:
Shoe + Black + Adidas
Now, a visitor of the shop might be looking at the resulting products and wonder which black shoes other brands have to offer. So they might go to the "brand" filter, and choose any of the other listed brands. Lets say they have 2 different options (in practice, this will probably have many more), resulting in the following searches:
Shoe + Black + Nike > 103 results
Shoe + Black + K-swiss > 0 results
In this case, if they see the brand "K-swiss" listed as an available choise in their filter, their search will return 0 results.
This is obviously rather disappointing to the user... I'd much rather know that switching the "brand" from "adidas" to "k-swiss" will 0 results, and simply remove the entire option from the filter.
Same thing goes for categories, colors, etc.
In practice this would mean a single page view would not only return the filtered product list described in my primary objective, but potentially hundreds of similar yet different lists. One for each filter value that could replace another filter value, or be added to the existing filter values.
Capacity
I suspect my database will eventually contain:
between 250 and 1.000 unique tags
And it will contain:
between 10.000 and 100.000 unique products
Current Ideas
I did some Google searches and found the following article: http://www.pui.ch/phred/archives/2005/06/tagsystems-performance-tests.html
Judging by that article, running hundreds of queries to achieve the 2nd objective, is going to be a painfully slow route. The "toxy" example might work for my needs and it might be acceptable for my First objective, but it would be unacceptably slow for the Second objective.
I was thinking I might run individual queries that match 1 tag to it's associated product_id's, cache those queries, and then calculate intersections on the results. But, do I calculate these intersections in MySQL? or in PHP? If I use MySQL, is there a particular way I should cache these individual queries, or is supplying the right indexes all I need?
I would imagine it's also quite possible to maybe even cache the intersections between two of these tag/product_id sets. The amount of intersections would be limited by the fact that a tag_type can have only one particular value, but I'm not sure how to efficiently manage this type of caching. Again, I don't know if I should do this in MySQL or in PHP. And if I do this in MySQL, what would be the best way to store and combine this type of cached results?

Using sphinx search engine can make this magic for you. Its is VERY fast, and even can handle wordforms, what can be useful with SEO requests.
In terms of sphinx, make a document - "product", index by tags, choose proper ranker for query (ex, MATCH_ALL_WORDS) and run batch request with different tag combinations to get best results.
Dont forget to use cachers like memcahed or any other.

I did not test this yet, but it should be possible to have one query to satisfy your second objective rather than triggering several hundred queries...
The query below illustrates how this should work in general.
The idea is to combine the three different requests at once and group by the dedicated value and collect only those which have any results.
SELECT t1.product_id, count(*) FROM tagtable t1, tagtable t2, tagtable t3 WHERE
t1.product_id = t2.product_id AND
t2.product_id = t3.product_id AND
t1.tag_type='yourcategoryforShoe' AND t1.tag_value='Shoe' AND
t2.tag_type='product_color' AND t2.tag_value='Black' AND
t3.tag_type='brand'
GROUP BY t3.tag_value
HAVING count(*) > 0

What's the most elegant MySQL schema for products, options and categories?

I've worked with a dozen or so template systems (Zen Cart, Cube Cart, etc.). Each of these has their own bizarre way of structuring products, options and categories. All the add-on features result in a McGuyver'd stack of cards situation that makes working with the code a total drag.
So six years ago I built my own webstore engine, which has evolved over the years and become its own stack of cards. Now I'm doing a total overhaul on the engine. While no one engine will suit all webstore needs, I was wondering if the following model has any drawbacks, or if there's a better way to create a flexible, normalized, non-obnoxious database for commerce:
Notes:
option_types = colors, sizes, materials
options = red, white, blue, S, M, L, cotton, spandex, leather
Other than basic stuff omitted on purpose (position, active, etc.), anyone see a way to improve this?

Here are my notes/opinions on this. You're missing cardinalities, but I'll do my best to guess them.
Categories is ok.
Remove id from item_categories as you're not using it. Create a composite primary key on category_id and item_id.
giving each record a unique id is smarter in many ways: faster to look-up on one field than on two, safer to delete, etc
What lookup would you do on that id? The queries you'll run are: "Getting all categories for an item" and "Getting all items for a category". I don't understand why it would be safer to delete. However, I'd say adding an id might be unsafer to insert as you might have different ids but same category_id and item_id pairs. You'll have to check the constraints there and make sure the pairs are unique (and aren't that what PKs are used to?)
items is ok... (see comments below)
Remove id from item_options (same case as above and see comments below)
option_types is ok
Now, I think the way items and options are related will require more thinking. It seems to be a many-to-many relationship. As an item, such as a T-Shirt can have many sizes it makes sense to say that each pair items and options should have a different size. But what happens when, apart from the size you also have a different material, such as cotton and leather. You will have to have information on the pairs cotton-S, cotton-M, cotton-L and leather-S, leather-M and leather-L. This makes sense as I'm pretty sure all of them will have a different price and weight. But now let's add 2 colors to our T-Shirts. You'll have to add a price and weight for each of the 12 combinations we'll have now.
Not to mention that if a user would like to see the price of an item he'll have to choose all the options until he reaches a price. I'm not sure how this should be done as I'm not aware of the requiremets. I'm just throwing an idea: you could apply prices and weight variations over a base price and weight that would be part of the item.
Just some unprocessed thoughts before going to sleep:
option_types could be some kind of hierarchy
Carefully think of how you would handle stock given that design. You'll have 10 items for a Black T-Shirt... but how many items will you have for a Black Leather T-Shirt? How is that number related to the 10 original ones?

Options table I would add value under name. i.e.
Black L
Black M
Black S
Blue L
Blue M
Blue S
etc.
as a spin off to Mosty's idea.

php mysql - should i add the field "category-name" to a table or not?

what do you think would be performance-wise the better way to get the category-names of a news-system:
add an extra field for the cat-names inside a table, which allreade contains a field for the cat-ids
no extra field for the cat-names, but cat-ids and read in the cat-names (comma-seperated string: "cat1,cat2,cat3,cat4") into the php-file by an existing config-file and then build the cat-names with the help of the db-field "cat-ids" an array and a for-loop?
Thanx in advance,
Jayden
edit: cant seem to add a "hi" or "hallo" on top of the post, the editor just deletes it...

If you are measuring milliseconds and the disk IO of your system is not extremely slow, then option 2 would yield better performance. But, we are talking a negligible gain in execution time. Since you already will be querying the DB to get the news item it would be highly optimized to just get the category name at the same time. I would add a mapping table of category-name-id to category-names. And the join on that when getting news items.
From a flexibility standpoint and the standpoint of eliminating as many possible sources of error I would also go with my above idea. Since it adds flexibility to your system and keeps all your data in one spot. Changing the name of a category would require editing one column i the database instead of editing a php config file or, if option 1 was used, updating each and every news record.
So my best advise, add a table with category-name-id to category-names mappings and then have the news-items contain the id of the category they belong to.
For performance you could then cache the data you retrieve about existing categories and other data so you don't have to poll the DB for that information all the time.
For instance. You could, instead of joining at all, get all the categories from the category table I described above. Cache it in the application and only get it once the cache is invalidated. i.e. a timeout occurs or the data in the db is manipulated.

I think of two possible ways.
Have a category table, a articles table and a relationship table, and have a many-to-many relationship between categories and articles (as described in the relationship table).
If you feel smart today, declare each category as a binary number (0, 1, 2, 4, 8, 16 etc), and add them in a field on the articles table. If an article has a category value of 11, it has categories 1+2+8.
I like the first solution better, quite frankly.

I would create a categories table like this:
Categories
-----------
category_id name
-------------------------
1 Weather
2 Local
3 Sports
Then create a junction table, so each article can have 0 or more categories:
Article_Categories
-------------------
article_id category_id
-----------------------------
1 2
1 3
2 1
To get the articles with their categories (comma delimited) from MySQL server, you can use GROUP_CONCACT():
SELECT a.*, GROUP_CONCAT(c.name) AS cats
FROM Articles a
LEFT JOIN Article_Categories ac
ON ac.article_id = a.article_id
LEFT JOIN Categories c
ON c.category_id = ac.category_id
GROUP BY a.article_id

Add an additional table, that will save lots of issues in future for you. It is just the recommended way.
By the way, that idea of multiple id's in one field, don't try that way. It will give lots of code and issues which are totally unnecessary. If you really find performance issues you can always decide to take a step further and de-normalize or cache some of the data. There are lots of caching options available.

I think your first option is the suitable one. Because it make sense with the relationship with your data. And in a situation you want to display the category name with your news you can simply get everything by single select query with join.
So I recommend Option 1 You have mentioned.
And performance also can measure in two ways. Execution performance and development performance I feel both performance are in good position with your option 1. You don't need to do much just a one query. If you go for the option 2, then you have to load from config file, explode it with comma, then search using array elements which is time consuming.

I may be wrong, but since you already query the database, it's probably faster if you add a name field there..
Please also take into account that having the name in the same table as the ID provides consistency - if you have a config file you'll have to add a new category there plus in the table.
Also think of possible errors that may put wrong data into your config file - if this'd be the case your category names might get messed up..

Storing table references in text field (for Agile Toolkit)

I have been experimenting with RDB design lately, and I was wondering about storing items in a field that can have more than one value:
CARS Color_avail
1 corvette 1, 2, 3 <<<<<<<
2 ferrari 2
3 civic 1
COLORS
1 red
2 White
3 black
so on CRUD I would like to add more than one item via a drop down / checkboxes or something that would hold multiple values.

I can see the benefit of displaying the output like this in a form, but do you really want to store it like this in the database ?
For example with a datamodel that holds a comma separated list as in your example, what SQL would you use to identify all the cars available in white ?
The traditional way to hold a many to many relationship like this is to use an additional table e.g. you have a separate table that holds CAR_COLOUR with the following contents
CAR COLOUR
1 1
1 2
1 3
2 2
3 1
So now you can easily query things like, get a list of all cars and colours
SELECT CAR, COLOUR
FROM CARS CA,
COLOUR COL,
CAR_COLOUR CACOL
WHERE CA.CAR=CACOL.CAR
AND CACOL.COL=COL.COLOUR
OR if you just want the white cars, add the following to the WHERE clause
AND COL.COLOUR='White'
an index on the id fields and on both fields in CAR_COLOUR will mean you get great performance even if you have thousands of rows whereas putting them all in a comma separated list in a single field will mean you have to use substr or like which would prevent the use of indexes and mean as the amount of data grows, the performance will degrade rapidly.

Storing relations in the coma-separated list makes sense in some senses. You don't need commas though. There are 2 existing controls which can help you with that.
Displaying list of values with checkboxes in a form:
$form->addField('CheckboxList','corvette')->setValueList($array);
(you can populate array through $model->getRows() although I think it needs to be associative. You can probably join them with var_dump and foreach).
Your other options is to use a hidden field with selectable grid.
$field = $form->addField('line','selection');
$grid = $form->add('MVCGrid');
$grid->setModel('Colors',array('name'));
$grid->addSelectable($field);
$form->addSubmit();
To hide the actual field, you can either use "hidden" instead of "line" or use JavaScript to hide it:
$field->js(true)->hide();
or
$field->js(true)->closest('dl')->hide();
if you need to hide markup around the field too.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.