Can someone please confirm, and correct if necessary, my understanding of how indexing works in terms of databases and other storage types (like file systems)?
From what I understand, indexing isn’t black magic, but is essentially the ordering/sorting of fields. The idea being, if the system using the index, knows that the index is ordered in a particular manner, it can use that knowledge to make a series of best-guesses to find the data it’s looking for in much fewer read operations than would be required otherwise. For example, if our table was as follows…
---------------------------------
| Name | Location |
---------------------------------
| John Deakins | Canada |
| Jeremy Black | Australia |
| George Fosters | USA |
| Tom Penn | Ireland |
| Bill Flask | New Zealand |
---------------------------------
If the system was asked to find the location for Tom Penn, without an index it would need to go through every block until it finds Tom Penn. If the Name column isn’t unique, then it has to scan the entire table. If however there was index on the “Name” field, the name field would be sorted in alphabetical order in that index, as follows…
------------------
| Name |
------------------
| Bill Flask |
| George Fosters |
| Jeremy Black |
| John Deakins |
| Tom Penn |
------------------
So the system may first access the middle record of the index, which in this case would return “Jeremy Black”. Knowing the index is in alphabetical order, the system knows the record it is after is beyond halfway, so it may now look at the record between half-way and the end of the table. In the example above, doing so would return “John Deakins”. So now the system knows “Tom Penn” is within the last 25% of the table, which in this example, only leaves one record, with is of course “Tom Penn”.
So because all indexing is, is the ordering/sorting of data, a database table with a unique, auto-incrementing “ID” field, should already be sorted by “ID”, so the use of an index in such a circumstance isn’t of benefit, and may actually be an overhead.
So, to re-iterate my question, am I on the right track here, have I essentially described how indexing works, or am I quite a way off? I constructed the above text based on just one rather hard to understand article, so I could be way off the money.
Confirmation and/or clarification on how indexing works would be much appreciated.