Database normalization: city and country tables etc

Hi everyone,

I’m busy normalizing my database and have a few questions.

My city table has a city id, city name and country id
The country table has a country id, country name and continent id
The continent table has a continent id and a continent name

I’ll maybe need to add a region table as well.

Now the city id, country id and continent id columns are used as foreign keys in a “restaurant” table. Is it necessary to include the country id column since it’s already present (as a foreign key) in the city table? If it isn’t necessary then how would I use it to access the country name in the country table? Similarly the continent id might not be necessary since it is present in the country table. Or doesn’t it work that way?

I’m just asking because my restaurant table already has a number of foreign key columns and the more there are, the more confusing things become. Scanning across a row just shows fields containing numbers. Is this just the ways things are?

Thanks for your time!

Yes, you need to store the country id, or you will not get back its name. But that assumes you need a country id to begin with…

SO, why give the country an id??? I bet it’s a number.
How many countries are there that have the same name??? (Hint an integer less than one but not a negative number)
how many Scotlands can there be ? Well to quote Highlander “There can be only one.”

So your country table should just be country and continent - do you know any continents with the same name? Didn’t think so, so continents do not need an id either, just a name. I think Africa uniquely identifies which continent that is. Ditto Asia, Europe, etc.

So now your country table is reduced to just two columns - country and continent, country being the primary key. (and no continent table is needed either, unless you wish to use it as a look-up to populate a drop-down list for a select field in html.)
Now you do not need a join to get at the country name because it will be in the restaurant table already!

You are falling into the beginners trap of thinking that every table should have a numerical id, even when the actual item of interest has a uniquely identifying name.

“Simplify and add more lightness”

best analogy for primary keys ever

That’s all correct. Your restaurant table only needs a city id, because each city is already identified with a particular country and continent. When you select, you would join the city table to restaurant on the city id, the country table to city on country id, and the continent table to country on continent id.

This seems to be the old surrogate key vs natural key discussion. There’s no consensus about which is the best approach – even among experts. Here’s an article that seems to have a good pro/con list. Personally, I’m in the surrogate key camp (that is, always use an ID), mostly for the “keep business logic out of the keys” reason. For example, if one day a country changes its name, then it’s better if that country name isn’t littered throughout your database as foreign keys.

Why are the countries and continents being stored in separate tables? Any given single restaurant of any given restaurant chain (McDonalds,KFC, Burger King, etc probably being the best example) can only ever be in one city, which only ever in one country which can only ever be in one continent. I would not rely on the name of a city as being the primary key (PK) for the city table as you might have multiple cities in the same country with the same name, using St Ives in England, according to Wikipedia there are three in England:

  • St Ives in Cambridgeshire
  • St Ives in Cornwall
  • St Ives in Dorset

If you was to do any search (with city name as the PK for the city table) for restaurants of say McDonalds in St Ives you would get any listed that are in either of the above towns called St Ives. The person searching might have intended the St Ives in Cambridgeshire and that search might show it as having a branch of McDonalds there when in fact it might be the St Ives in Cornwall that has the branch. Also any given town might have multiple branches of a restaurant, New York for example will probably have a half dozen branches of each restaurant chain in the Manhattan area alone.

I would have a table recording each branch of a restaurant, with it’s address, opening hours, etc as a separate record. I can’t see any reason for seperate city and continent tables unless it’s forming part of a larger app that needs them. Also I would use a surrogate key (SK) for the restaurants as a restaurant may change it’s name or one chain might sell x number of branches to another chain which would then rename them. If you use the restaurant name alone as a PK, it ten branches change their name, you then got to change their name in the restaurants table as well as any other tables that have it listed as a foreign key(FK). If you use an numerical ID field as the PK for the restaurants table then you’d only need to change the name of the relevant restaurants in the restaurants table.

Hi everyone,

thank you for all of your input!

It seems that no one will normalize a database in exactly the same way. I’ve tried my best to structure the database according to the normal forms. My primary keys are all integers because they result in better MySQL performance (so I’ve been told). Looking at my tables I’m happy enough to have the continent name in the country table.

Restaurant table:
restaurant_id, restaurant_name, city_id, other columns

City table:
city_id, city_name, region_id

Region table:
region_id, region_name, country_id

Country table:
country_id, country_name, continent_name

@Jeff. Thanks for the info on the joins. It will hopefully be enough to have only the city_id in the restaurant table.

I have seen e-commerce “customer” tables that include columns for first name, last name, city, state, zip etc all in one table. Then why can’t I do the same? Have all the location columns in the restaurant table. What would be the harm?

not only can you do the same, you should

how many customer tables have you ever seen where the first name is stored as an integer foreign key to a first names table?

why people want to do this with city names but not first names is illogical

“normailization” is the same in both cases

Hi again Rudy,

just to be 100% sure, are you saying that it would be perfectly OK to include the city, region, country and continent columns directly in the restaurant table? With this format I would no longer need separate city, region tables etc…

So a row would start with a restaurant_id of say, 32, then KFC, Paris, France etc…

Separate tables would be necessary only if I include city/region/country/continent-specific information, such as climate, demographics, latitude/longitude etc.

Thank you for your help.

yes, i am

mind you, it is then possible for someone to enter a restaurant in Paris, Farnce, and this restaurant would never show up in a search for France

but you gotta decide how much complexity (and user interface constrictions) you want to impose, in order to prevent this

Ok, thank you Rudy.

You of course know a lot more about databases and SQL than I do. If you have a moment could you please elaborate on your example. I don’t quite get why one table vs multiple tables would make a difference when someone conducts such a search.

Thanks!

The number of countries and their names will not change very often so RedBishop could have them stored in the server-side script as an array and just edit the array in the script when needed.

It’s possible that Google might have an api that could be used to validate (at least partially) any address and maybe the the maps and streetview could be used so that the user could see where they are on a map. RedBishop would need to look at the licensing, fees/charges, etc for that.

if the countries are set up as a table, and your application ensures that all restaurants are related to existing countries via the foreign key, then incorrect spelling of any country name will not be possible

if you simply allow country name to be entered into a text box, there is no foreign key check, and consequently incorrect spellings will be possible