How Does Indexing Work?

Can someone please confirm, and correct if necessary, my understanding of how indexing works in terms of databases and other storage types (like file systems)?

From what I understand, indexing isn’t black magic, but is essentially the ordering/sorting of fields. The idea being, if the system using the index, knows that the index is ordered in a particular manner, it can use that knowledge to make a series of best-guesses to find the data it’s looking for in much fewer read operations than would be required otherwise. For example, if our table was as follows…


---------------------------------
|      Name      |   Location   |
---------------------------------
|  John Deakins  |    Canada    |
|  Jeremy Black  |  Australia   |
| George Fosters |     USA      |
|    Tom Penn    |   Ireland    |
|   Bill Flask   | New Zealand  |
---------------------------------

If the system was asked to find the location for Tom Penn, without an index it would need to go through every block until it finds Tom Penn. If the Name column isn’t unique, then it has to scan the entire table. If however there was index on the “Name” field, the name field would be sorted in alphabetical order in that index, as follows…


------------------
|      Name      |
------------------
|   Bill Flask   |
| George Fosters |
|  Jeremy Black  |
|  John Deakins  |
|    Tom Penn    |
------------------

So the system may first access the middle record of the index, which in this case would return “Jeremy Black”. Knowing the index is in alphabetical order, the system knows the record it is after is beyond halfway, so it may now look at the record between half-way and the end of the table. In the example above, doing so would return “John Deakins”. So now the system knows “Tom Penn” is within the last 25% of the table, which in this example, only leaves one record, with is of course “Tom Penn”.

So because all indexing is, is the ordering/sorting of data, a database table with a unique, auto-incrementing “ID” field, should already be sorted by “ID”, so the use of an index in such a circumstance isn’t of benefit, and may actually be an overhead.

So, to re-iterate my question, am I on the right track here, have I essentially described how indexing works, or am I quite a way off? I constructed the above text based on just one rather hard to understand article, so I could be way off the money.

Confirmation and/or clarification on how indexing works would be much appreciated.

you were doing just fine until you got to the auto_increment

tables are ~not~ sorted by any column

also, in order to be an auto_increment, i believe the column must be a KEY, if not the PRIMARY KEY

i.e. an auto_increment column ~has~ an index

Yeah I wasn’t sure about that last bit. It was an out-right guess. It seemed plausible.

Thanks for the confirmation r937.

to put my two cents in (disclaimer: no-expert)

Even if a table has an auto-incrementing id, it might be a good idea to add an additional index (or even several).

Let say you do many queries where you want to know all the people in a certain location. In that case you want to add location as an index as that will speed up those queries. The downside will be that adding a record will be slower as the additional index needs to be updated.

it’s not a big downside

you would find it almost impossible to actually measure the difference