What I don't understand - DB Efficiency/Optimization

Alright well, I’m a college student majoring in CS, and database administration has always triggered an interest, I love how in depth it can get just storing data and setting up the best possible ways to access it, archiving databases, securing them from your own end, etc.

Well I’ve been working on a fairly large project in PHP/MySQL - Using codeigniter, and well I’ve been dealing with databases for a while and I understand the concept of foreign keys, except I’ve yet to be able to implement them, for example, if I have a table users, and a table posts, I’ll simply put author_id field in the posts table. So when I need to select a users posts, I can do a join query, or whatever. Now after discussing the concept of it with one of my professors, she said as a general rule of thumb, before 10,000 records, I really shouldn’t have to worry about optimizing a database to the fullest, however I should still plan out tables well.

So I have a series of tables in my latest projects, and using the profiling tools in codeigniter, the most complicated of queries is showing in the .0010’s of seconds (ten thousandths I believe :-/). So should I really be worried about optimizing my database, and if so, how would the concept of foreign keys help speed/efficiency? She also told me to seriously look into indexing, however I read some things that said it causes general overhead in the database server.

I see the forum has a lot of great minds, so any input would definitely be beneficial, any at all. Thank you.

First you need to determine whether the majority of tables created support foreign keys. Some engines such as; MYISAM don’t support actual foreign key constraints. You also need to balance foreign key support with the ability to do fill text searching. In most cases innodb is used, which does support foreign key constraints, but you loss full text searching capabilities. Otherwise, indexes can be placed on the foreign keys, known to be used in a majority of joins, to optimize query efficiency. In my opinion its always good define indexes in the beginning. That is less work that needs to o done at later date, perhaps by someone not as familiar with software as the creator. When I work with innodb I always define foreign keys.

I set them all up initially as InnoDB for the possibility of expanding into foreign key setup.

Than foreign keys should have been defined from the beginning. If not for optimization definitely for data integrity.

foreign keys do nothing for performance

if there is any performance improvement, it is not the foreign keys themselves, but the result of the indexes being created to support the foreign keys (the indexes are created automatically after version 4.1.2, before that you had to create them yourself)

that said, foreign keys do enforce data integrity, which, if you think about it, is pretty much the most important feature you’d want from a database

Thank you for the responses, so setting indexes is pointless unless you are setting up foreign keys since it’s automatic? Also, by data integrity you mean in the way that I could make it so if an authorID is deleted, it deletes all posts associated with that authorID?

Its possible to have a post assigned to an author that doesn’t exist without a true foreign key constraint.

no no no :slight_smile:

setting up indexes is vitally important for performance

if you declare foreign keys you get indexes as a byproduct

but if you don’t declare foreign keys, you still need indexes for performance

that’s just one aspect of it

more important is that bad data never gets into the database

Ok, well since I don’t have foreign keys set up and I need to setup indexes, what would be the most optimal field to set them up on? I’m assuming postID since every single post is retrieved by it’s ID in the row.

any column referenced in a WHERE condition that isn’t already indexed as a primary key

data integrity also means that if you’re using a database type that doesn’t support FK’s then you would have to check programatically if the FK exists before inserting a row into a table

example

tblAuthors

fldAuthorId
fldFname
fldLname

tblTitles

fldTitleId
fldTitle
fldAuthorId

before inserting a row of data in tblTitles you should run a query to ensure the value of fldAuthorId you are about to enter in tblTitles already exists in tblAuthors.

the same applies when deleting records in db’s that do not support FK’s. when deleting records from a table, you will have to programatically delete all associated records in other tables.

A couple of things to consider with regard to MyISAM vs Innodb

  • Locking Levels
  • Transaction Support (InnoDB supports transactions, AFAIK MyISAM doesn’t)

yep it doesn’t :frowning:

but historically, I’m not sure if the situation still applies, many web hosting providers only supported MyISAM because it performed better due to the lack of overheads associated with supporting FK’s, locking, transactions etc.

Something to consider with foreign keys - ideally they should be unique across the database. For example, if you have a post table, any foreign key pointing back to that table should have the same name - probably postid. No other table should be pointed to using a foreign key named postid.

Do whatever maintenance programmers who follow you a huge favor and follow this rule. If you need to join two different tables that both have a foreign key to a third table you can always use aliases (left join mytable AS myalias) to disambiguate the keys for that query.

A major problem with Innodb is that that MySQLHotCopy doesn’t work with it (this is a perl script that locks the tables, copies the table files, and then unlocks the tables). This means that the only way to copy innodb tables that I know of is mysqldump. For a database in the gigabytes this can take hours to do, where a hotcopy can be done in a few minutes.

Disclaimer - this is based on experience some years ago with MySQL 4.0. This problem may have been addressed by now.

Not a hard and fast a rule and it can really be a judgment call in many cases. For example, naming a column authors_id rather than users_id provides a contextual purpose whereas the later does not. Another example is owners_id over the generic users_id. Using something generic may seem like a good idea, but in my opinion the meaning of columns are easier to grasp when column names are derived from an actual context based on the business goals at hand rather than a generic reference.

For backs ups I’ve never found mysqldump to be a problem. I use it all the time to create database snapshots.

I start with a snapshot, use replication to keep all the slaves up to date and pretty much have zero down time.

If your file system has snapshot support, then that’s one way to deal with the problem.