Need some advice on reorganizing this database

cydewaze · November 4, 2013, 6:51pm

My office just received a database from outside the office that I’m going to have to build a web tool around. The tool will be fairly simple - you use a web form to specify criteria (using checkboxes) and when you submit the form you get back a list of matching resources.

There are four categories of criteria, and each criteria contains a narrowing set of values (ex areas would be: state then country then city then block, etc)

The problem is that the people who set up this database made a separate table for each criteria, and as such the resources are duplicated. What’s worse is that one of the criteria is just a column in all tables, so you can even have resources repeated in the same table.

Example rows from the areas table.

[table=“width: 500”]
[tr]
[td]resource[/td]
[td]topic[/td]
[td]state[/td]
[td]county[/td]
[td]city[/td]
[td]block[/td]
[/tr]
[tr]
[td]resource 1[/td]
[td]topic 1[/td]
[td]Y[/td]
[td]N[/td]
[td]N[/td]
[td]N[/td]
[/tr]
[tr]
[td]resource 1[/td]
[td]topic 2[/td]
[td]Y[/td]
[td]Y[/td]
[td]N[/td]
[td]N[/td]
[/tr]
[/table]

etc.

Normally I would put the resources in a table with resource name and ID (pk), then a table for topics, a table for criteria, etc, then make relational tables to do a many to many relationship, but I’m having trouble getting my head around things. For one, resource 1 can have different area values depending on what topic is selected. Same for the other criteria. Just looking for the best way to organize it.

DarthGuido · November 5, 2013, 8:11am

The problem is that the people who set up this database made a separate table for each criteria

I don’t quite understand this part. In the data you posted it looks like it’s all in 1 table? Can you give an example?

cydewaze · November 5, 2013, 1:41pm

Yep sure can (I’m always horrible at explaining these things).

The tool works like this:

First you select an area of interest (known as “topics” in the database). Examples are: housing, land use, public health, safety.

Once you do that it returns a list of resources in the chosen topic(s). You can then refine your search with any one of three other set of checkboxes.

Area (state, county, city, block, etc)
Setting (rural, suburban, urban, etc)
Mode (car, bus, rail, walking, etc)

The current database has separate tables for each one of those options (area, setting, mode) and the resources (most but not all) are duplicated in each of those tables. My example table in the OP is from the Areas table but the other two look the same. And the first set of checkboxes (the topics) are just a column that appears in all tables. It’s like someone just imported 3 excel spreadsheets and decided to make a database out of them.

Here’s how the three tables are set up:

Area
resource_name (text)
topic (text)
state (y,n)
county (y,n)
city (y,n)
town (y,n)
block (y,n)

Mode
resource_name (text)
topic (text)
automobile (y,n)
bus (y,n)
rail (y,n)
bicycle (y,n)
pedestrian (y,n)

Setting
resource_name (text)
topic (text)
rural (y,n)
suburban (y,n)
urban (y,n)
downtown (y,n)

I’ve moved the topic names to their own table to give them their own ID numbers, and I made a resources table to do the same, and so that each resource is only listed once (rather than once in each table). I’ve also created a relational table to tie each resource to their appropriate topics (many-to-many, as each resource can appear in more than one topic). I started to do the same for the areas, settings, and modes, but the problem is that, for example, Resource 5 when in Topic 1 will have a Y for “city”, but Resource 5 in Topic 2 will have a N for “city”. It’s quite confusing I know, which is why I’m here.

DarthGuido · November 5, 2013, 3:33pm

If you get your data from an external provider, and only have to show it, and not maintain (edit, delete, add) anything, I think these tables should do the job:

topics table (contains only topic name)
resources table (contains only resource name)
ASM table (contains only area/settings/mode) (area-settings-mode, couldn’t come up with a good entity name )
ASM-options table (contains area/settings/mode and the option name)
and finally the table that binds them all, with topic name, resource name, area/settings/mode, option name and option value (Y/N)

The first four tables are used to populate your dropdowns/checkboxes (you could do that with a select distinct on the last table as well though).
The last to get the resources that respect the selected criteria.

No need for id numbers, since you won’t be changing any resource, topic or other names anyway.

You could simply create three tables just like the data you received, and do all select queries on those 3 tables, but that would mean you’d have to change your queries and application every time they decide to add/remove/change an Y/N option column.

cydewaze · November 5, 2013, 3:53pm

I’m not really clear on this part. You mean one table for all the area, settings, and mode data? Do you mean like a parent/child table where area, settings, and mode are parents, and the sub-settings have a parent id that links them to their parents?

So like resource id, topic id, and the area/setting/mode id in the same row?

Well one of the reasons I’m changing this db is to make future changes easier to implement without touching the web app.

DarthGuido · November 5, 2013, 4:33pm

ASM table (contains only area/settings/mode)
A table (like the topics and resources tables) that only contains the names of whatever-you-call-this-entity. So right now it would contain only three rows:

[B]ASM[/B]
Area
Settings
Mode

ASM-options table
I didn’t add an Options table, because the options differ for each ASM, so if you choose Area, you wouldn’t be able to choose the Settings and Mode options, or at least that wouldn’t make much sense. So this table would contain a row for each ASM/option pair:

[B]ASM    Option[/B]
Area   State
Area   County
etc...

And you would retrieve the options for the chosen ASM.

So like resource id, topic id, and the area/setting/mode id in the same row?

Yes. But not numeric id’s, the plain names:

[B]Topic    Resource     ASM    Option   Optionvalue[/B]
Topic 1  Resource 1   Area   State    Y
Topic 2  Resource 1   Area   State    N
etc...

cydewaze · November 5, 2013, 5:04pm

Oh my! Hmmmmmm. I’ll have to get my head around THAT now!

That final table makes sense but I’m trying to figure out why I need the ASM table if I have ASM-options. Just to populate dropdowns? Seems like I could have a column in ASM-options to flag the “top level” entries just as easily.

DarthGuido · November 5, 2013, 6:25pm

An extra column? You could do a SELECT DISTINCT ASM FROM ASM
But then, like I said before, you could do that on the final table as well, for topics, ASM, options and resources, getting rid of all the other tables.

cydewaze · November 5, 2013, 8:26pm

Thanks guido, you’ve given me a lot to think about.

cydewaze · November 6, 2013, 12:53pm

I spent some time thinking about this, and I think I have my head a bit farther around it.

Let’s take Resource 1 for an example, and let’s also say we’re limiting out search to Topic 3. We’d then have this:


[B]Topic    Resource     ASM    Option   Optionvalue[/B]
Topic 3  Resource 1   Area     State     Y
Topic 3  Resource 1   Mode     Car       Y
Topic 3  Resource 1   Setting  Rural     Y

One row for each possible “Y” value, correct?

Then in the query I’d have:


SELECT......

WHERE ( asm-options.asm = 'Area'
AND asm-options.option = 'State'
AND asm-options.optionvalue = 'Y' )

I foresee a collosal WHERE clause down the road in this (although I suppose I’d just use IN with a list of values)

r937 · November 6, 2013, 3:07pm

collosal, complex, and inefficient

google EAV (entity-attribute-value)

cydewaze · November 6, 2013, 4:40pm

Ok, just watched a video about, and it makes sense as far as storing the data (because the values that are columns in the current database would be rows in EAV) but I’m not sure how I’d retrieve records using that.setup. Plus, what if you have one option in the options column that is numeric and one that is text? You’d have to use a varchar data type for everything.

Another disturbing thing I noticed (about the existing database, not the EAV) is that not every resource exists in all three of the old tables. It’s such a wreck.

cydewaze · November 6, 2013, 6:14pm

After thinking about this some more, I came up with an idea, and thus the following table structure:

Resources (id, name)
Topics (id, name)
Criteria (id, parent, child)
RTC (id, resource id, topic id, criteria id)

The Resources and Topics tables are (should be) self-explanatory.

The Criteria table is something like this:


[B]id  parent      child[/B]
1   mode        car
2   mode        rail
3   mode        bus
4   setting     rural
5   setting     urban
6   setting     suburban
etc...

Then the RTC table would tie the resource id, topic id, and criteria id together. I can build my initial set of checkboxes from the topics table, and the subsequent ones out of the criteria table.

Does anyone think this’ll work?

Edit: Rudy will probably scold me for not making a proper parent/child category table out of criteria, which if course I would do in the finished product

cydewaze · November 7, 2013, 12:57am

It turns out that my above solution won’t work at all, of course. In the existing web app (that’s going away and which I have to recreate) when you choose additional options you actually get fewer results. For any given resource, if any of the chosen options is N in the database, that resources is hidden, so there are countless possible options.

I clearly need to think of something else.

cydewaze · November 8, 2013, 12:05am

Alright, got this to work with the table structure two posts up.

Resources (id, name)
Topics (id, name)
Criteria (id, parent, child)
RTC (id, resource id, topic id, criteria id)

The reason it threw me off was because of the way the existing tool (the one mine is replacing) works. In the existing tool, when you select more than one criteria, rather that show you all resources related to either of those criteria, it shows you only resources that belong to both. It’s like an AND instead of an OR.

I’m finding it a little tough to replicate this behavior in a query, so I’ve done it in my code. But I’m not going to spend too much time on it until I know whether or not the owners of this application even want it to work this way.

But for now, I have a database that is organized much better than the original one.