Database, best practice for region information

Hello,
I looking to put together a database with country, state, city, longitude, and latitude information.

What is the best practice, should I create 3 different tables to hold the country, state, city(longitude/ latitude) or is it best to keep the region information within one table.

what are the pros and cons having it in a multiple tables vs one table?

r

“Normalizing” the data uses less space, for one thing.

Imagine one table:

COUNTRY     -          STATE     -          CITY      -         LAT    -    LONG
USA                         ILLINOIS             CHICAGO          xxx          yyy
USA                         ILLINOIS             MCHENRY         xxx          yyy
USA                         ILLINOIS             SPRINGFIELD    xxx          yyy
....
USA                         MISSOURI          ST LOUIS           xxx         yyy
USA                         MISSOURI          ST CHARLES     xxx         yyy
USA                         MISSOURI          WENTZVILLE     xxx          yyy
...
USA                         TEXAS               AUSTIN              xxx          yyy
USA                         TEXAS               DALLAS             xxx          yyy
...
CANADA                  MANITOBA         WINNIPEG         xxx         yyy
CANADA                  MANITOBA         CARMAN           xxx         yyy
...
CANADA                  QUEBEC            MONTREAL       xxx         yyy

Entering the country every single time, the city every single time, etc.

Split it into three tables to reduce the bloat and speed up searches.

Table “COUNTRY” with a numeric ID and a varchar name.
Table “STATE” with a numeric ID and a varchar name.
Table “CITY” with a numeric ID, varchar name, varchar LAT, varchar LONG, numeric STATE, numeric COUNTRY.

CITY:

ID      -       Name       -           LAT  -   LONG   -       STATE           -    COUNTRY
1               CHICAGO              xxx      yyy                1                         1
2               MCHENRY            xxx       yyy               1                         1
3               SPRINGFIELD       xxx      yyy                1                          1
...
45             ST LOUIS              xxx     yyy                 2                          1
46             ST CHARLES        xxx     yyy                 2                          1
47             WENTZVILLE        xxx      yyy                2                          1
...
99            AUSTIN                 xxx      yyy                3                           1
100          DALLAS                xxx      yyy                3                          1
... 
131         WINNIPEG             xxx       yyy               11                         2
132         CARMAN               xxx      yyy                11                         2

… etc.

HTH,

:slight_smile:

1 Like

Well, it depends on what you want to do and the number of records that you will have and which database you’re going to use. If you’re only to have a list of cities and this list is limited, the benefits of using more than one table would be negible. It can still be good practice thinking in the future growth of the database and future information that you may include but as per today, the difference will not be much.

Now, if you’re going to have the information for every city in the world, well, then you’ll probably notice the difference in the performance. How big can this difference be? Depends on the database, the amount of RAM, the cache, the processor in the server… but it shouldn’t be that much.

The difference could add up if you start adding tables and other data into the database, though. That’s why it is important to do the things right and, if possible, planning for the future.

time for my favourite analogy…

in an employee database, you will notice that invariably the designer has a table with firstname, lastname, hiredate, department, etc.

how many times have you seen the “normalization” of firstname into its own firstnames table?

same reason not to do it for addresses, cities, counties, regions, provinces, states, et cetera

by the way, i have “normalization” in quotes because splitting out county and city and state tables is ~not~ normalization

2 Likes

I know you’re the expert but I don’t fully agree with you. Depending on the number of records, it may be good to separate states into its own table. Same with countries. With counties, I don’t think it’s worth it although it looked to me that every state in US had a country named Helena… lol But it would too complicated for little benefit

imagine you had to do a search for all rows in idaho

you would do a WHERE condition on the states table, to pull the 1 row for idaho, and get its PK (presumably, but not necessarily, an integer)

then you would have to join to your main table, and do a search in the states FK, also an integer

bottom line, you’re still dependent on an index on the FK

you tell me at what point this is better than just skipping the states table and doing a search in the main table, still dependent on an index on the states column

you’re going to say that an integer index is faster to search than a state code index

and my response is, at what point is the join “better”

Depends on the situation. You’re thinking that I would use codes for states in the US. The truth is that the thought didn’t even occur to me because I’m not American and I don’t use US codes. Sure, with a bit of investigation, I may have decided that using codes is better but my first thought would have been to use the complete name.

We rarely use codes where I live, we use full names.

The second thing is that using codes may be not very friendly if my database is international.

And, if I have to show the information on the interface for someone to choose a particular state, using the code may not be the best. People from other countries may not know the meaning of each code so I would have to show the full name. Of course, I can keep the code in the database and list the full name in the interface but for a software or site for international users, it would be more pratical keep it on the database.

So everything is relative

Off-Topic
:hint: @r937
Maybe a book or an article
http://www.sitepoint.com/write-for-us/
on Normalisation is in order?

But is it Relational? :wink:

I think he’s probably already covered it (it’s been a while since I read it): :stuck_out_tongue_winking_eye:

http://www.amazon.com/Simply-SQL-Rudy-Limeback/dp/0980455251

(ironic that you can’t get this particular Sitepoint published book from Sitepoint anymore)

normalization has been covered to death elsewhere

there are two ways to teach it – the mathematics of set theory, and the design of database tables

neither is going to address the problem of the mis-appropriation of the word by well-meaning DBAs

i repeat, the creation of a states table, in the type of scenario as in this thread, is ~not~ normalization

The only thing I’m saying is that depending on the case, it may be better that states had their own table.

If that table will have only a few thousands of cities in the US, then creating a table for states and another table for the country may be silly.

But if you’re going to have hundreds of thousands, then it may be better.

@Mittineague Of course it is… you have your primary key and foreign key for that :wink:

1 Like

@ WolfShade, you couldn’t have said it any better. :smile:

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.