Critique database structure (Has a way to go)

So I’ve done a lot with databases but haven’t ever really worked on a website where efficiency has to be managed well due to a large amount of traffic. The idea is pretty simple when you get to reading the structure, and I’m willing to answer any questions about any tables.

In my previous database designs I found myself incorporating a lot of things that really should’ve had their own tables into other tables. For example I had an articles table that had a tags field with a comma-seperated string for their tags, and no definition of tags in their own table. Which I came to pay for in the long run when I wanted to make the tags part of the site more in-depth I couldn’t keep putting massive strain on the database. So this design I tried to keep data seperate from one another but included what I thought was necessary, there are comments at the end of each table to indicate their primary function.

My biggest concerns are indexes, foreign keys, and MyISAM vs InnoDB, I’ve been doing a lot of reading on how one is better for a lot more select queries, and another is better for more write transactions. On the premise of the site I would have to say the amount of viewing (select queries) to writing(update/inserts) would be 2:1 perhaps, so twice as many selects to writes.

However I believe only InnoDB supports foreign keys, so if anyone seems that they are required then that will settle that component of it.

I appreciate you guys taking the time to look at this lengthy post and table structure and giving feedback.


CREATE TABLE `theaters` (
`theaterID` INT( 11 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`companyID` INT( 11 ) NULL ,
`name` VARCHAR( 100 ) NOT NULL ,
`description` VARCHAR( 255 ) NULL,
`mainImage` VARCHAR( 150) NULL,
`address` VARCHAR( 100 ) NOT NULL ,
`city` VARCHAR( 50 ) NOT NULL ,
`state` VARCHAR( 2 ) NOT NULL ,
`phone` INT( 11 ) NULL COMMENT  'Raw digits of a phone number, dashes can be added through PHP.'
) ENGINE = INNODB 
COMMENT =  'Sole purpose is to hold the main attributes of a movie theater.';

CREATE TABLE `companies` (
`companyID` INT( 11 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`name` VARCHAR( 50 ) NOT NULL ,
`description` TEXT NULL
) ENGINE = INNODB
COMMENT =  'Used to store each company''s name and ID for reference of theaters.';

CREATE TABLE `theater_company` (
`theaterID` INT( 11 ) NOT NULL ,
`companyID` INT( 11 ) NOT NULL
) ENGINE = INNODB 
COMMENT =  'Stores the relationship of theaters to companies. (1 to Many)';

CREATE TABLE `theater_pricing` (
`theaterID` INT( 11 ) NOT NULL ,
`adult` VARCHAR( 5 ) NULL ,
`senior` VARCHAR( 5 ) NULL ,
`child` VARCHAR( 5 ) NULL ,
`3d` VARCHAR( 5 ) NULL ,
`matinee` VARCHAR( 5 ) NULL
) ENGINE = INNODB 
COMMENT =  'Prices will be stored in a 10.75 format ($10.75).';

CREATE TABLE `theater_reviews` (
`reviewID` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`theaterID` INT( 11 ) NOT NULL ,
`overall` FLOAT NOT NULL COMMENT  'Averages the other ratings',
`reviewContent` VARCHAR( 255 ) NOT NULL COMMENT  'Since it''s mostly rating based, this is a small input field for additional comments.' ,
`picQuality` TINYINT( 1 ) NULL ,
`audioQuality` TINYINT( 1 ) NULL ,
`priceQuality` TINYINT( 1 ) NULL ,
`staffQuality` TINYINT( 1 ) NULL ,
`restroomQuality` TINYINT( 1 ) NULL ,
`stickyFloor` TINYINT( 1 ) NULL ,
`seatingQuality` TINYINT( 1 ) NULL ,
`concessionSelection` TINYINT( 1 ) NULL ,
`concessionPrices` TINYINT( 1 ) NULL ,
`visitAgain` TINYINT( 1 ) NULL COMMENT  'Yes/No - No=0, Yes=5'
) ENGINE = INNODB 
COMMENT =  'Used to store individual reviews, all attributes are on a 1-5 integer scale.';

CREATE TABLE `theater_comments` (
`commentID` INT( 11 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`theaterID` INT( 11 ) NOT NULL ,
`userID` INT( 11 ) NOT NULL COMMENT  'Who posted this comment?',
`timePosted` INT( 11 ) NOT NULL COMMENT  'PHP Generated timestamp.',
`title` VARCHAR( 100 ) NOT NULL ,
`comment` TEXT NOT NULL
) ENGINE = INNODB
COMMENT =  'Used to store comments specifically about a theater.';

CREATE TABLE `theater_likes` (
`theaterID` INT( 11 ) NOT NULL ,
`userID` INT( 11 ) NOT NULL ,
`likes` TINYINT( 1 ) NOT NULL COMMENT  '0 for dislike, 1 for like.'
) ENGINE = INNODB
COMMENT =  'Stores all users votes whether it be like or dislike.';

CREATE TABLE `theater_images` (
`theaterID` INT( 11 ) NOT NULL ,
`userID` INT( 11 ) NOT NULL COMMENT  'User that uploaded the image.',
`imageLocation` VARCHAR( 150 ) NOT NULL COMMENT  'URL to the specific image.'
) ENGINE = INNODB 
COMMENT =  'Used to store the locations of various images of the specific theater ID.';

CREATE TABLE `theater_visits` (
`theaterID` INT( 11 ) NOT NULL ,
`userID` INT( 11 ) NOT NULL COMMENT  'What user visited theaterID'
) ENGINE = INNODB 
COMMENT =  'Stores what user has visited what theater.';

Either you should remove the companyID field from the theaters table, or you should remove the theater_company table altogether. Which depends on the relationship:

[list][]If a theater can only ever be connected to one company, then having the companyID field in the theaters is probably the best solution.
[
]If a theater can be connected to more than one company, then you should definitely keep the theater_company table, and remove the companyID from the theaters table.[/list]

Thank you very much for the quick feedback, a theater:company is a 1:0 or 1 relationship, so I should remove the theater_company table entirely, and keep the companyID field in theaters as you suggest?

what is a company, anyway (i think i have the theatre part figured out)

why does the comment say that relationship of theaters to companies is one-to-many?

Yeah r937, I actually editted that out of my own personal version because I realized that’s wrong, I re-explained the relationship in my most recent post.

Companies own theaters, like your local theater may be owned by AMC, Regal, etc, and if the companyID is null, it will be referred to as unknown or independent.

okay, so theater_company table is not required

tip: declare your prices as DECIMAL(4,2) so that you can, someday, do AVG(price) and similar calculations

Thanks r937, gonna be honest you’ve helped me in the past and was hoping you would have some input. :slight_smile:

Just a quick note: If you remove the theater_company table, and stick with the companyID field in the theaters table, how will you handle a case where a theater is owned as a joint-venture between two companies?

There might be ways to work around this, but work-arounds are seldom pretty, especially when you have loads of data in the tables.

Thanks for the information C. Ankerstjerne, I considered that and referred to my client who stated a theater will only be owned by one company, though I can see how that is a possible change in the future. I’m unsure what to do as I want it to be as minimal as possible (less confusing for myself) at the time being, while being flexible for the future as you suggest.

However, to all the others browsing this thread to help me, here is an updated table structure:


CREATE TABLE `theaters` (
`theaterID` INT( 11 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`companyID` INT( 11 ) NULL ,
`name` VARCHAR( 100 ) NOT NULL ,
`description` VARCHAR( 255 ) NULL,
`mainImage` VARCHAR( 150) NULL,
`address` VARCHAR( 100 ) NOT NULL ,
`city` VARCHAR( 50 ) NOT NULL ,
`state` VARCHAR( 2 ) NOT NULL ,
`phone` INT( 11 ) NULL COMMENT  'Raw digits of a phone number, dashes can be added through PHP.'
) ENGINE = INNODB 
COMMENT =  'Sole purpose is to hold the main attributes of a movie theater.';

CREATE TABLE `companies` (
`companyID` INT( 11 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`name` VARCHAR( 50 ) NOT NULL ,
`description` TEXT NULL
) ENGINE = INNODB
COMMENT =  'Used to store each company''s name and ID for reference of theaters.';

CREATE TABLE `theater_pricing` (
`theaterID` INT( 11 ) NOT NULL ,
`adult` DECIMAL(4,2) NULL ,
`senior` DECIMAL(4,2) NULL ,
`child` DECIMAL(4,2) NULL ,
`3d` DECIMAL(4,2) NULL ,
`matinee` DECIMAL(4,2) NULL 
) ENGINE = INNODB 
COMMENT =  'Prices will be stored in a 10.75 format ($10.75).';

CREATE TABLE `theater_reviews` (
`reviewID` INT NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`theaterID` INT( 11 ) NOT NULL ,
`overall` FLOAT NOT NULL COMMENT  'Averages the other ratings',
`reviewContent` VARCHAR( 255 ) NOT NULL COMMENT  'Since it''s mostly rating based, this is a small input field for additional comments.' ,
`picQuality` TINYINT( 1 ) NULL ,
`audioQuality` TINYINT( 1 ) NULL ,
`priceQuality` TINYINT( 1 ) NULL ,
`staffQuality` TINYINT( 1 ) NULL ,
`restroomQuality` TINYINT( 1 ) NULL ,
`stickyFloor` TINYINT( 1 ) NULL ,
`seatingQuality` TINYINT( 1 ) NULL ,
`concessionSelection` TINYINT( 1 ) NULL ,
`concessionPrices` TINYINT( 1 ) NULL ,
`visitAgain` TINYINT( 1 ) NULL COMMENT  'Yes/No - No=0, Yes=5'
) ENGINE = INNODB 
COMMENT =  'Used to store individual reviews, all attributes are on a 1-5 integer scale.';

CREATE TABLE `theater_comments` (
`commentID` INT( 11 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`theaterID` INT( 11 ) NOT NULL ,
`userID` INT( 11 ) NOT NULL COMMENT  'Who posted this comment?',
`timePosted` INT( 11 ) NOT NULL COMMENT  'PHP Generated timestamp.',
`title` VARCHAR( 100 ) NOT NULL ,
`comment` TEXT NOT NULL
) ENGINE = INNODB
COMMENT =  'Used to store comments specifically about a theater.';

CREATE TABLE `theater_likes` (
`theaterID` INT( 11 ) NOT NULL ,
`userID` INT( 11 ) NOT NULL ,
`likes` TINYINT( 1 ) NOT NULL COMMENT  '0 for dislike, 1 for like.'
) ENGINE = INNODB
COMMENT =  'Stores all users votes whether it be like or dislike.';

CREATE TABLE `theater_images` (
`theaterID` INT( 11 ) NOT NULL ,
`userID` INT( 11 ) NOT NULL COMMENT  'User that uploaded the image.',
`imageLocation` VARCHAR( 150 ) NOT NULL COMMENT  'URL to the specific image.'
) ENGINE = INNODB 
COMMENT =  'Used to store the locations of various images of the specific theater ID.';

CREATE TABLE `theater_visits` (
`theaterID` INT( 11 ) NOT NULL ,
`userID` INT( 11 ) NOT NULL COMMENT  'What user visited theaterID'
) ENGINE = INNODB 
COMMENT =  'Stores what user has visited what theater.';

I’ll just comment a few things that I’ve noticed, from technical point of view - some might find it as nitpicking.

InnoDB vs MyISAM - InnoDB.
It scales better. It doesn’t lock down entire table when writing new data to it, so if your website is such that certain tables might experience a lot of traffic - that’s why you should avoid MyISAM.
The only advantage MyISAM has is fulltext index support, but I really haven’t seen an application of larger scale that decided to use MyISAM due to that feature since there are engines that do searches way better and faster than you could do with MyISAM + fulltext indexes + your application logic (for example, Sphinx is one of those fulltext searching engines).
You probably know about transactions already, so there’s no point bringing that up :slight_smile:

Second thing is, and this IS nitpicking - how come your integer columns are signed? Especially auto_increment ones - I mean, are you really going to use negative integer values?
If not, you just wasted yourself a good half of numbers at your disposal - but then again, it would be a few centuries before you ran out of them even when they’re signed integers.

The other thing - if you won’t have 2 to the power of 32 of company IDs, you can reduce its data storage type to a smaller integer, which takes up less space, which speeds up the whole deal with reading and displaying everything.

As I said, it is a bit of nitpicking as other, more experienced people, have covered the bits when it comes to actual db design.
But if you are worried about performance of the site, have you considered caching mechanisms of any sort so you alleviate database load?

Second thing is, and this IS nitpicking - how come your integer columns are signed? Especially auto_increment ones - I mean, are you really going to use negative integer values?
If not, you just wasted yourself a good half of numbers at your disposal - but then again, it would be a few centuries before you ran out of them even when they’re signed integers.

Sorry I don’t understand what you mean by signed integer values, I’m assuming you mean the fact that I specified them to 11 characters. I never thought about why or why not to do it, it’s just something I’ve always done.

The other thing - if you won’t have 2 to the power of 32 of company IDs, you can reduce its data storage type to a smaller integer, which takes up less space, which speeds up the whole deal with reading and displaying everything.

So would you suggest using TINYINT or SMALLINT, and if so how many characters, I can’t imagine more than a thousand companies at the absolute most, so maybe TINY or SMALL int would be the way to go.

As for your remark about caching mechanisms, I’ll be blunt in stating I’m not working on a website that’s expecting a large amount of users, simply trying to prepare for the future in database structure.

the number in parentheses does not define how many digits the column can hold

all integers can hold the same range of numbers, whether you say INT(11) or INT(3) or INT(937)

his comment about signed integers related to this…

  • INTEGER holds values from -2147483648 to 2147483647, but auto_increments start at 1, so a signed integer auto_increment holds just over 2 billion values
  • UNSIGNED INTEGER holds values from 0 to 4294967295, so since auto_increments start at 1, an unsigned integer auto_increment holds just over 4 billion values, i.e. twice as many

TINYINT UNSIGNED only goes to 255

all these ranges are clearly spelled out in da manual

:slight_smile:

r937:
Your explanation on unsigned integers as opposed to signed was perfect, I always wondered what that column was for. (Learned MySQL via phpmyadmin, not properly learning it :-/).

After looking at the manual with integer types, TINYINT maximum of 255 seems to be spot on for the ratings since they max at 5, and I believe SMALLINT should suit the companyIDs as a possible range is 1-65535.

Now, as I attempt to milk this community for all it’s knowledge. As I’ve been wondering about indices, I know they’re already assigned to tables with primary keys, but as for those that don’t (like: theater_images,theater_visits), I should assign indices to the theaterID field in these tables since that’s most likely what it will be called upon in queries in the code?

Also, as far as foreign keys go, I hear that they’re necessary, however after doing some research I realize the constraint is between two tables only, so honestly I can’t see any two specific tables where they would be beneficial, except maybe theaters:theater_comments. Any input on that?

you should have foreign keys in almost all your tables

every table besides the theaters table which has a theaterID should have a foreign key to the theaters table

for example –

CREATE TABLE theater_likes 
( theaterID INTEGER NOT NULL 
[COLOR="Blue"], CONSTRAINT likes_theater_fk
    FOREIGN KEY ( theaterID )
      REFERENCES theaters ( theaterID )[/COLOR]
, userID    INTEGER NOT NULL 
[COLOR="blue"], CONSTRAINT likes_user_fk
    FOREIGN KEY ( userID )
      REFERENCES users ( userID )[/COLOR]
[COLOR="Red"], PRIMARY KEY ( theaterID , userID )[/COLOR]
, likes TINYINT NOT NULL COMMENT  '0 for dislike, 1 for like.'
) ENGINE = INNODB
COMMENT =  'Stores all users votes whether it be like or dislike.';
 

also, this –

CREATE TABLE theaters 
( theaterID INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY 
, companyID INTEGER NULL 
[COLOR="Blue"], CONSTRAINT theater_company_fk
    FOREIGN KEY ( companyID )
      REFERENCES companies ( companyID )[/COLOR]
, name VARCHAR( 100 ) NOT NULL 
, ...

I’ve got several questions that are in relation to actual performance and not database design itself since it’s against the theory and some might not like it.

I’ve noticed that certain tables don’t have surrogate primary key (auto_increment) but a natural one (composite of userID and theaterID) and so on.
Also, those tables are defined as InnoDB.

When using natural key rather than surrogate to uniquely identify the row in the table, InnoDB will create overhead and you lose the inherent ability of InnoDB to quickly find rows based on primary key lookup.
When primary key is nonsequential, which is the case of natural keys, InnoDB uses 8 bit signed integer for its internal uses, instead of 4 byte one that is usually the case. You can also refer to http://dev.mysql.com/doc/refman/4.1/en/innodb-index-types.html for some further insight on inner workings.

could you please provide a reference for this claim

the index created for the primary key does ~not~ become unusable just because it isn’t declared on an auto_increment

i suspect you may have misinterpreted something that you read somewhere…

this is nonsense, sorry

First of all, hello Rudy and I don’t mean to argue here about things but what I wrote are all facts and of course, I’ll provide the sources and I’ll try to explain more clearly what I meant since I think I wasn’t clear enough.

Natural key - here being userID + theaterID - aren’t sequential in lookup table theater_likes, or to put it more simply - what guarantee is there that user with ID 1 will be the 1st entry in the table and not user with ID 2 and so on so I don’t know why it’s nonsense in this particular case that the primary key is not sequential but who am I to judge.

could you please provide a reference for this claim

http://blog.johnjosephbachir.org/2006/10/22/everything-you-need-to-know-about-designing-mysql-innodb-primary-keys/

I also stand corrected, InnoDB creates overhead if no primary key is specified so I’m sorry for posting false information. However, the blog linked up clears up certain things about how InnoDB works internally and why it’s good that each table has its own primary key and foreign keys for join purposes.

hello furicane :slight_smile:

you brought up an interesting point of discussion, so i will respond with some thoughts

the blog you linked to says innodb primary keys should be “sequential”

however, it also explains why, and the reason is, the pk index is used as a clustering index – so what the recommendation actually should have said is that inserts should be sequential, i.e. that new rows should have a pk value that sequentially comes in at the top end of the existing range of values, rather than somewhere in the middle, which can cause index reorganizations

you can not say that a primary key is not sequential, because of course ~every~ primary key is sequential, since any set of values, whether they be numbers, strings, dates, can be sorted into a sequence

it is, though, important that inserts not cause frequent index reorganizations, and i’m not going to argue with that

but wait…

suppose we take this example –

CREATE TABLE theater_likes 
( theaterID INTEGER NOT NULL 
, userID    INTEGER NOT NULL 
, PRIMARY KEY ( theaterID , userID )
, likes TINYINT NOT NULL 
);

the primary key becomes the clustering index, and there is one index

however, if we introduce a surrogate key like this –


CREATE TABLE theater_likes 
( theater_likes_id INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT
, theaterID INTEGER NOT NULL 
, userID    INTEGER NOT NULL 
, UNIQUE KEY ( theaterID , userID )
, likes TINYINT NOT NULL 
);

notice the surrogate key forces us to declare an additional key, the UNIQUE key on the pair of columns, which is necessary to ensure that each user can like any theater only once – data integrity, dontcha know :wink:

so now there is a good primary key index (which guarantees that new inserts come at the end of the clustering index), but there is also a second index!

furthermore, the pk index will never be utilized in searches!

i would need to see real world benchmark data before i concede that having two indexes to update instead of one is better

:slight_smile:

Thanks so much for explains that r937. So I’ve been reading up on the MySQL documentation and came across this:

If ON DELETE or ON UPDATE are not specified, the default action is RESTRICT.

Later restrict is defined as rejecting the action on the parent table:

Specifying RESTRICT (or NO ACTION) is the same as omitting the ON DELETE or ON UPDATE clause.

So clearly this is not what I want correct, since if a theater gets deleted, I would want all comments for that theater deleted as well, same with likes, images, comments, etc. So after reading through, I would want

CASCADE: Delete or update the row from the parent table, and automatically delete or update the matching rows in the child table.
I believe however, that I would only want on delete cascade, and not on update?

First off, I totally worded wrong what I meant since English is my 3rd language so phrasing my thoughts is kinda harder.

you can not say that a primary key is not sequential, because of course ~every~ primary key is sequential, since any set of values, whether they be numbers, strings, dates, can be sorted into a sequence

This is what I had in mind - when you use a composite key userID, theaterID you get values such as [1,1], [2,1], [123,1], [3,1] and so on - so yes, they are inserted sequentially but their values do not correspond to the insert order.
If you have a key such as auto_increment, every next value is sequential - incremented by 1 or whatever is set as the offset in the .cnf file.

Now on to your (valid) point - yes, surrogate key would require yet another index and no, I’m not suggesting that having 3 indexes beats one composite.