Introduction to database indexes

If your database queries are taking an age and a half to execute you could do worse than investigate the addition of some indexes to the columns which are involved.

book-indexYou’ll no doubt be familiar with indexes at the back of books which have the name of some concept with a list of page numbers beside it. Indexes for databases are used in much the same way as indexes in books are. They both essentially provide a quick way to look up the location of information you want to find. Each index is a form of categorisation of the overall data. A book usually has only one index as that’s all it needs and the categorisation of the index is usually concepts related to the book sorted alphabetically.

Advantages of indexes are increased search performance

The graphic above is a cropped screenshot of the index at the back of Succeeding with Agile: Software Development Using Scrum. Imagine for example your interested in finding mentions (or concepts related to him) of Kent Beck in that book. By looking in the index you can quickly see that he is mentioned on page 58, 288 and 289. If there was no index or the information you were looking for wasn’t in the index you would have to go through every page to look for it. This book has over 450 pages so it’s not going to be a quick task. What’s worse is that the information may not even be in the book at all.

Same idea applies in databases. Imagine your Eason’s, Barnes and Noble or Amazon and have an authors database table with 100,000 records in it. This table has a load of columns but one which we care about is called lastName. If you queried the authors table with something like select * from authors where lastName = ‘beck and there was index on the lastName column the database engine will use a logical categorisation (each record categorised by lastName) of all the data in the authors table to quickly find where that record is physically located on the disk drive.

Specifics is for another day but it does this most likely by using special structures known as B-trees where the possible areas on the disk that the record(s) in question can be physically located gets smaller and smaller as the engine goes further into the tree. By continuously going in the general direction of the physical records via a B-tree the DB engine can find the records very quickly.

If you had the same search scenario without an index on lastName the database engine would have to look at every record to see if it matched ..where lastName = ‘beck’. That’s 100,000 records to look at. That’s a lot of IO and CPU time.

In database terms looking through the whole table when no relevant index is present is called a table scan whereas using an index is referred to as an index seek. There is also such a thing called an index scan which generally speaking is where an index does exist but the database engine deems it quicker to just look at the records one by one anyway. It might do this because the cost of traversing the index structure (e.g the B-tree which is data stored on disk and requires IO and CPU time just like table records) will be more expensive than simply going through the data pages (in which the records are stored) one by one. This might happen in cases where there is an index but the table size is small and/or the requested records make up a large percentage of total records in the database table.

Generally speaking seeking is good and scanning is bad. Adding indexes will cause an index seek to be used if that is less expensive than an index scan. In most instances it is so indexes can really improve the performance of your select queries. The improvement can sometimes be dramatic, cutting perhaps 99% execution time off a particular query in a large table. Of course “there’s no such thing as a free lunch” and indexes do have their disadvantages.

Disadvantages of indexes are slower updates and more disk space required


Yes adding indexes to your database tables do come at a cost. The cost is very likely to be something your willing to pay though. Let’s refer back to the image of the book index I first used above to explain the cost of having indexes.

We see information about Kent Beck is located on page 58, 288 and 289. Great.

What if the author of this book, as part of the update for the next edition inserted new text (about a whole page) into page 28 which caused almost all information to be moved to the next page? The author has done an ‘insert’ and thus the index is now out of date. The information about Kent Beck is not in fact still on pages 58, 288 and 289 but rather is it now on pages 59, 289 and 290. Of course all references in the index that refer to page 28 and above are out of date, not just the Kent Beck related ones. The index therefore needs to be updated to reflect the change. This may not be a problem for a book as new editions don’t come out that often. For a database however which might have millions of daily inserts, deletes or updates directly affecting indexed columns the IO and CPU cost to constantly reshuffle the indexes on disk so they are correct can be enormous. Thankfully in the vast majority of cases systems read from much more than write to databases.

Another disadvantage of adding indexes is the disk space requirements. The image above shows only a tiny excerpt of an index that is actually 11 pages long. All them extra characters and pages no doubt cost the publisher extra to print the book. Database indexes require extra space too. The specific amount will depend on the size of the table and the number of columns in the index.


Indexes are perhaps the single best way to increase the performance of your database queries. I used the analogy of an index in a book to aid in explaining what they are and how unfortunately the don’t come without a price (a fair price I think) but if you’ve any questions let me know in the comments. I find the topic of indexes fascinating (so expect more posts about them) and a big part of me wishes I could transpose myself back into my college data algorithms class to actually understand what the lecturer was going on about when he was talking about things like indexing algorithms and B-Trees. I really did think I would never use ‘that stuff’ again :-).

Using a connectionStrings section connection with log4net now supported

Just upgraded to Version 1.2.11 of log4net due to the fact that one of the improvements in this version is the support for the use of a connectionStringName which references a connection from the connectionStrings configuration section.


Previously this was not part of the core release so you either had to take extra steps (it didn’t just work) or use an explicit connection string in your log4net setting. It’s nice for maintenance if your 3rd party tools can just reference a connectionString rather than explicitly define one themselves.

Sharing connectionStrings and appSettings between multiple projects

My last post talked about how to do this for appSettings via the file attribute however as I mentioned in that post the connectionStrings section unfortunately does not have a file attribute. It along with most other config sections only supports configSource which can only point to a config file in the current project. I did a bit more research and it seems configSource may be up to the job after all but only in conjunction with either Visual Studio file linking or build events to explicitly copy the master setting files.

Create your ‘master’ *.config files

OK first create a physical folder in your solution root called ‘Config’ for instance and lash your settings files in there. These are the master files you will be editing. Also replicate the physical structure with a Visual Studio solution folder as this is recommended so you can then easily manage the files via solution explorer. Additionally you should not have to explicitly tell TFS (if that’s what your using) about your new files if your replicate the physical structure with a solution folder.

Your structure might look something like the following:


Use the .config extension for security reasons as files with that extension won’t be served via a HTTP request.

Sharing config files by linking to them from consuming projects

For each project that needs to consume these master settings add a link to each of them. You can do this to each project directly or if like me you’d rather not clutter your root up you can create a dummy ‘Config’ folder (unfortunately virtual folders like solution folders do not exist on the project level) and add them as links to that folder. How links are added is very similar to the way regular existing items are added, however instead of just clicking ‘Add’ one must expand the dropdown and click ‘Add as Link’ instead.


After you have added the files as links your dummy physical folder will look very similar to folders with regular files but the files/links within it will have a slightly different icon beside them.


Now update your web/app.config and point your appSettings or connectionStrings section configSource to “config\localhost.appSettings.config” for example…


Now try two things.

Publish your project and notice how the linked files are ‘pulled in’ from the actual location for the purposes of deployment. In conjunction with a simple web transform to go from config\locahost.appSettings to config\live.appSettings etc. this means happy days, all works fine there.

Run your project in the IDE. It will fail at run time. That is because configSource is not able to follow links. It must point to an actual file and unlike during deployment Visual Studio does not ‘pull in’ the actual files (even temporarily) to the location that they are being link to from.

tooutputIf on each of the linked files we set the Copy to Output Directory property to “Copy if newer” or “Copy always” VS will put them into the bin/config folder. We can then prepend ‘bin’ to all our configSource references and VS will find the config files fine. I’d rather not have to point my configSource to anything in bin but this approach essentially solves what I’m trying to do.

When you publish again you will of course have a ‘Config’ folder in the package root itself but also in the package bin folder. You can remove the ‘Config’ folder from the root of the package by selecting “None” for the Build Action property on each of the links if you like.

Sharing config files by copying them with a pre build event

If you add the below XCOPY command into the pre build events box for project XYZ, Visual Studio or more precisely MSBuild will copy all files from the master ‘Config’ folder you created above to a folder called ‘Config’ in the XYZ directory.


If you point your configSource back to “config\localhost.appSettings” etc. Visual Studio will run fine because the physical files exist there.

Now however the problem is with the deployment.  When using the publish tool the ‘Items to deploy..‘ default of ‘Only files needed to run this application’ will not recognize the XYZ config folder with all the *.config’s in it as ‘needed’ as the XYZ project knows nothing about it and thus the package won’t have the required settings files to run correctly.

You can get around this easily by changing the ‘Items to deploy..‘ drop down box to “All files in this project folder“. This will work but personally I don’t like this though as it clutters the package and delivers the source code as is, not compiled into .dlls.


You could alternatively actually add a physical XYZ/Config folder and add copies of all the master config files to that folder and hence the project to ‘trick’ Visual Studio into recognizing the files in that folder as required. Of course the files in the XYZ/Config folder would always be overridden with files from the master config folder via the pre build event.

Which approach to choose. Link and Copy to bin or XCOPY to Config

I’m sure there are other ways to do this, and I have seen people wrap abstracted settings in class files but in terms of the above two well which method to choose really depends on your situation.

Both at the end of the day require copying of the master config files to somewhere where the configSource attribute can read them. It seems that if you go the linking route both deployment and running via Visual Studio will work fine if your OK with having your configSource’s pointing to bin. There’s also the small price to pay of needing a dummy physical folder as a hook to add the links to (I suggest adding a readme to this folder so other devs know what’s going on). Additionally ‘links’ don’t physically exist in your regular solution so you won’t have any problems with source control.

Choosing to copy settings files to arbitrary locations in the projects that need them has the potential to complicate deployment and force you to change the way you package up your files. Of course if you don’t deploy via Visual Studio or simply deploy everything in a project folder regardless of what Visual Studio has included in your project then everything is fine. With the build event approach the XCOPY copies all master settings each time you build, you don’t have to remember to explicitly link new settings files (localhost.nhibernate.config for example) from each project that needs them. Again source control will not be a problem as even though XCOPY does create actual files they will not be looked at by TFS or most other source control systems unless those files are explicitly added.

Common appSettings file for all projects in a solution

Many solution structures look a bit like below. This means that if some code in for instance MYSOFTWARE.APPLICATIONSERVICES requires access to an appSetting called XYZ all the launching projects (in bold) that use that code need to have XYZ specified in their web/app config files.


Depending on the amount of launching projects and the amount of ‘shared’ appSettings a solution contains maintainence of these settings can be problematic. It is therefore often useful to abstract out all solution wide appSettings into a single file which all projects reference. This is done via the file attribute on the appSettings section:

<appSettings file="../config/localhost.appSettings.config"></appSettings>

There are actually two ‘external’ related attributes on the appSetting section. These are  file as mentioned above and also configSource. Both will allow you to store settings outside of web.config/app.config. Only file however will allow you to load an external appSettings file which is in an arbitary location relative to the parent web/app config. The configSource on the other hand will throw an error if you try to ‘pull in’ your localhost.appSettings.config or similarly named file from a location other than the current project. That of course puts a spanner in the works in terms of being able to share a single appSettings file across all projects in a solution.

The other difference between the two is that with file, settings specified in the external file are merged with settings defined in the section itself, with the settings in the section itself taking precedence. With configSource however the external file is the only source of appSettings, meaning settings defined in the appSetting section itself are redundant.

If you take this approach you can then use web config transforms to swap out
..config/localhost.appSettings.config for ..config/dev.appSettings.config etc. when building your deployment packages. This means you transform one value rather than
multple values for X amount of appSettings.

Unfortunately the .net connectionStrings section only supports configSource and not file, so it’s not as easy to abstract out connections strings and share them between projects.

Register Castle Windsor IOC components via convention

Favouring convention over configuration means less change points (and remember points) for developers to deal with when new code needs to be added to software. My team tries to follow this guideline as much as possible but given the fact we are new users of Castle Windsor IOC, we were not aware that we could by convention have an IXXXService implemented by the first found XXXService.

We had code like the following in our IOC bootstrapper file:


which meant of course we had to explicity update the bootstrapper file if we created a new service. A new colleague started the other day and refactored to this:


Now we only have to explicity register exceptions to the interface IXYZ being implemented by class XYZ rule. Pretty neat I thought.

We still use Castle Windsor 2, for Castle Windsor 3 I believe you need to use