Noob question, possible to have variable number of fields in a record?

PrimeLens · August 14, 2012, 5:20pm

Hi, I’m a mysql noob, but I’m in a position for a project where I’m hiring devs and making strategic decisions on storing data so I would love any help with this question.

The data I need to store will have a varying number of key fields per record.
Some example key / value pairs, I’m assigning 100 for each value as the problem lies with the keys

IDdata 1 : dataA = 100 , dataB = 100 , dataC = 100
IDdata 2 : dataA = 100 , dataB = 100 , dataC = 100 , dataC1 = 100 , dataC2 = 100 , dataD = 100
IDdata 3 : dataA = 100 , dataA1 = 100 , dataA1a = 100 , dataA1b = 100 , dataA2 = 100 , dataB1 = 100
IDdata 4 : dataA = 100 , dataB = 100 , dataC = 100 , dataD = 100 , dataD1 = 100 , dataD1a = 100

as you can see just to store the above I would need a table really wide where the column headers cover every single possible variation of key.

I’m guessing that mysql tables must specify the number of columns when they are created right?

The solution I am considering is to have a table with just two columns
IDdata and data string

Then I could store the key value pairs concatenated in a string with fields separated by colons and semi colons or commas or something.

When retrieving a record I would instead retrieve the string and parse it. That way if a record went four levels deep on its A group of data and another record only went one level deep but went up to F number of groups it would be cope.

Please tell me if I am going at this right or wrong with regards to mysql

r937 · August 14, 2012, 7:00pm

every relational database, not just mysql

key/value pairs are best stored as key/value pairs (i.e. one pair per row) instead of as a single string mishmash that has to be parsed to be searched

but key/value pair schemas are ~notoriously~ difficult to pull meaningful data from – do a search for EAV (entity-attribute-value)

if at all feasible, you should anticipate all possible keys and declare the table with one column per key, using NULL for the values that do not pertain to a given row

PrimeLens · August 14, 2012, 7:10pm

thanks for answering

I would like to do that but the nature of this data is that number of fields per record can vary.
There would be roughly 100+ fields per record spread out over a range of approx 1000 (best guess) possible keys
No matter what column width I choose there is always the possibility that a record can come along with one field too many.

What do you think of my solution to have just two fields and parse the string?
Are there other solutions that you can think of?

r937 · August 14, 2012, 8:29pm

okay, a predefined table is infeasible

i believe i already commented on it

it won’t scale because every search requires a complete table scan

EAV

PrimeLens · August 14, 2012, 8:45pm

Thanks !

kduv · August 14, 2012, 11:48pm

You may want to consider a non-relational DB like MongoDB or similar. Many of them are schema-less and can have a variable amount of “columns” per record.

oddz · August 15, 2012, 6:47am

In regards to relational databases EAV would be the *best option given the circumstance. Depending on what needs to be done with the values it *might be best to break out the value column into multiple columns for different data types. So when the value represents a date it can properly be stored as a date at least, same with integers. Storing everything as a string can lead to big problems down the line. This is especially true when it comes to supporting virtual foreign keys, where a value references another table by an surrogate primary key. This really sucks but it sucks even more when you need to cast strings to integers when making a comparison against two tables in a join. How I would set it up would be entirely dependent on the data types which need to be supported. If it is only integer values you are dealing with than I would probably go the route of the value column being an integer type. However, if the data type could be best stored as either a string, integer, date, etc depending on context I would go the route of multiple value columns with proper data types defined.

wwb_99 · August 15, 2012, 11:44am

EAV is never a good option. Seriously.

r937 · August 15, 2012, 1:51pm

so your suggestion for this scenario is then… ?

wwb_99 · August 15, 2012, 2:08pm

Hard to understand what the problem really is - IDdata.Col1 isn’t exactly descriptive. Would help to know how this thing is getting queried, etc.

That said, a few surface options would be:

Actually fit it into a relational model – again, no idea what the actual problem here is. But that might be workable.
MongoDb or better yet CouchDb. Let documents be documents.
Doing some sort of hybrid approach with a set of standard metadata columns and then a serialized XML / Json payload column.

Jeff_Mott · August 15, 2012, 2:17pm

This sounds like a textbook many-to-one relationship. The way you handle this is by creating two tables. The first is the IDdata table:

[FONT=Courier New]IDdata

| id | some_other_data |
------------------------[/FONT]

The second table could be called IDdata_attributes, or IDdata_properties, or whatever works for your application’s nomenclature.

[FONT=Courier New]IDdata_attributes

| IDdata_id | attribute | value |

^ -------- PK --------- ^[1]

The IDdata_id field relates an attribute-value back to a specific IDdata record, which lets you have multiple attribute-values related to the same IDdata.

EDIT: To further illustrate for the OP, here’s how your data would look.

IDdata
------------------------
| id | some_other_data |
------------------------
| 1  | IDdata 1        |
| 2  | IDdata 2        |
| 3  | IDdata 3        |
| 4  | IDdata 4        |
------------------------

IDdata_attributes
---------------------------------
| IDdata_id | attribute | value |
---------------------------------
| 1         | dataA     | 100   |
| 1         | dataB     | 100   |
| 1         | dataC     | 100   |
| 2         | dataA     | 100   |
| 2         | dataB     | 100   |
| 2         | dataC     | 100   |
| 2         | dataC1    | 100   |
| 2         | dataC2    | 100   |
| 2         | dataD     | 100   |
| 3         | dataA     | 100   |
| 3         | dataA1    | 100   |
| 3         | dataA1a   | 100   |
| 3         | dataA1b   | 100   |
| 3         | dataA2    | 100   |
| 3         | dataB1    | 100   |
| 4         | dataA     | 100   |
| 4         | dataB     | 100   |
| 4         | dataC     | 100   |
| 4         | dataD     | 100   |
| 4         | dataD1    | 100   |
| 4         | dataD1a   | 100   |
---------------------------------

/FONT ↩︎

r937 · August 15, 2012, 4:33pm

jeff, great illustration of the EAV scheme, thanks

Jeff_Mott · August 15, 2012, 5:10pm

Sure thing. More than anything else, it just seemed like the OP needed a quick into to many-to-one.

PrimeLens · August 25, 2012, 4:10am

Just revisiting this thread and saw the most recent posts - Jeff, thats a great solution.