peppy — 2013-12-06T17:56:09-05:00 — #1
I have an ecommerce site where a member can add a product for sale. I would like them to have the option of adding product variants such as item sizes, colors, etc. Each variant combination will have it's own item quantity and price.
In addition, I would like this member to have the ability to create their own custom variants (custom product types like materials, finish type, pattern, etc.).
I have a main products table called Products (id,title,quantity,price). My plan is to make a second table called Variants, where I can join it to Products based on the item id.
How should I set up the columns for the Variants table to make the site most efficient? What is the best way to design a database for keeping track of Variants?
r937 — 2013-12-07T16:30:41-05:00 — #2
okay, here's a suggestion, just to stimulate your thinking process (not necessarily the best solution)
you have a product table, with many columns that describe/define all of a product's attributes
what if each variant was a separate product?
i'm serious... think it through and evaluate the pros and cons
one advantage is simplicity, yes?
well, for one thing, how would you know which ones are variants?
peppy — 2013-12-07T17:41:08-05:00 — #3
Thanks for the response. I've been thinking about something sort of like this.
I was planning on producing one product in the products table with a unique product id. That way I can create just one page for all of the variant products.
Then in the Variants table, use the same product for a number of different variants. (Also, I plan on limiting the number of variations to a maximum of 5 types). All of these would be displayed on the 1 product page.
The variants table could look like this for a T-shirt (with Product ID 323 in the Products table):
ID | productID | name1 | value1 | name2 | value2 | name3 | value3 | name4 | value4 | name5 | value5 | quantity | price |
1 | 323 | Color | Black | Size | XL | | | | | | | 3 | 10.99 |
2 | 323 | Color | Black | Size | XXL | | | | | | | 2 | 11.99 |
3 | 323 | Color | Red | Size | XL | | | | | | | 4 | 10.99 |
4 | 323 | Color | Red | Size | XXL | | | | | | | 1 | 11.99 |
Let's say there could be 50 rows for all different combinations of colors and sizes. This would be sort of like a secondary products table, which I could join on my main products table.
What do you think of this table and technique? I would like to plan on making this technique scalable so that it will still be efficient after millions of products.
lemon_juice — 2013-12-07T17:57:40-05:00 — #4
One disadvantage could be duplication of data. If there are many product variants and long descriptions, which are the same for all variants, then the amount of data stored in the product table gets multiplied. Probably it will never become so huge as to be unmanageable but for certain scenarios it might matter - for example, a simple search engine that uses SQL to find a phrase in product descriptions - since indexes can't really be used (apart from the awkward FULLTEXT) then size matters for performance. Of course, this can be worked around by caches, etc. but we are after simplicity here...
I'm not saying this is a bad idea. I have made a few online shops and I've always used a separate variant table and linked it to the main product table by the product ID. So each variant would actually have an ID composed of the product ID and variant ID - if there was ever a need for an ID like exporting to external systems, etc. I thought it was the best solution in terms of data normalization and storage optimization, and it also seemed more logical than a single table. I've never used one table for both so I can't really compare how it would work out in practice.
Another solution could be a hybrid approach using one table where each record can be either a product or a variant of a product and have parent ID column that would be set for variants effectively building a hierarchical structure of products, similar to what we usually do with tables for storing nested categories or menus. Then the common fields like description could be set to NULL for the variants to prevent data duplication. I've never done this in practice but I might actually go for it if I were to build a product database again
r937 — 2013-12-07T18:12:04-05:00 — #5
this is actually where i was headed, since in my experience all these products will have a different SKU
however, i would do it without the "parent" concept -- for example, if you have a shirt in three colours, which one is the parent?
i've seen this done with a brand code -- all products in a brand (even if there's only one) have the same brand code, and each has a different SKU or id
personally, i would be hesitant to use NULL in some descriptions and not others -- you mentioned the idea of a search, well, you would want the search to return all variant descriptions, not just the non-NULL one, since the descriptions will likely ~not~ all be the same, but rather, reflect the difference in variants e.g. description: men's red polo shirt, colour: red, etc.
r937 — 2013-12-07T18:17:45-05:00 — #6
no no no
this is called EAV (entity-attribute-value) and it's a real dog's breakfast
do a search here on sitepoint or in your favourite web search engine to see why
r937 — 2013-12-07T18:21:04-05:00 — #7
one final thought...
you mentioned this was going to be used on a web site where users can enter their own products
please, please do not underestimate the likelihood that users will become frustrated or confused by your attempts to explain and/or program the difference between a product that has variants and a product that doesn't
just give them one form to fill out, and when they've added a product, leave the form fields filled in, so that, for example, they can simply change the colour field on the shirt and hit "add product" again
peppy — 2013-12-07T18:57:12-05:00 — #8
Thanks for the responses.
Instead of the EAV model, which would be best to use when the database grows to millions of products and variants (10+ Gigabytes)?
Also, I got a great plan for managing the front end design for the users and for allowing users to build a variants chart. Also, I will probably just store one Brand name in the products table to cover all variants, and the multiple SKU numbers for each variant in the variants table.
lemon_juice — 2013-12-08T09:31:27-05:00 — #9
If there are three shirts in different colours then none of them is the parent. There is a separate row for the parent with a generic name like "Indian cotton shirt" and the three rows for the colour variants will have names specific to that colour version. Then for consistency sake you might have the parent and child rows even for products with one variant. If the parent ID to the same table looks a bit messy we could split the variants to another table and then the db structure looks pretty simple - if you want to list products without their variants you use the product table, if you want to list products with all their variants you just get records from the variant table (and then if necessary, getting some related general info from the product table is a matter of a simple JOIN).
Okay, so I see now that you were thinking about a slightly different scenario where each variant would need a different description - in such cases I would do the structure like you suggested. However, in my particular experience this was not the case - the site owners wanted to have a common description (and optionally a common set of specification) for all variants of a product. These are usually long pieces of texts and they wouldn't even want to update them separately for all variants. So each variant would have a short name (either full variant name or just a partial name appended to the product name for display), price, SKU and maybe photos, while the rest of the properties are inherited from the product row. For such scenarios having one flat table for all products and variants would result in wasting a lot of space due to many duplicated descriptions. So I think it all comes down to requirements.
r937 — 2013-12-08T12:46:12-05:00 — #10
so you've basically doubled the number of rows
i don't like it
lemon_juice — 2013-12-08T16:13:58-05:00 — #11
Why not? The parent row and child row contain different kind of data so what is wrong in spitting them into separate rows? I prefer to double the number of rows than to make my database 2x, 3x, 4x, 5x, and so on (depending the average number of variants) larger because the contents of the description (MEDIUM)TEXT columns have to be duplicated across variants. In my experience product descriptions take an order of magnitude more space than all other product data combined (excluding images).