Monday, September 26, 2005

The Economics of Indexing

The economics of conventional database indexing is based on the assumption that each piece of data is written once and is subsequently queried many times - hence traditional indexes have concentrated on achieving optimal query performance and relied on repeated queries to amortize the large cost of having created the index initially. These normative economics fail in situations where the likelihood of querying any specific item of data is minimal – you incur the expense of indexing it, but you are unlikely to recover that cost later on. In what situations is that likely? I would suggest granular drill through from analysis and discovery; compliance retention and data surveillance as the most obvious examples.

For instance, a credit card fraud surveillance application is unlikely to drill into the detailed transaction history for any particular card until the overall spending profile is deemed suspicious. Thankfully, most credit card usage is legitimate and the vast majority of detailed transaction history will remain untouched – forcing the conventional indexing economic model into debt. Moreover, as new transactions continuously pour in, each and every one needs to be indexed quickly with minimal latency as any interrogation for potential fraud needs complete and accurate information – all within the challenging environment of a large retained data population.

The familiar B-tree, hash and inverted-list (including bit map) indexes cannot service these demands easily and require heavy hardware investment to alleviate their problems. Frankly, B-trees and fast updates go together like a horse and carnage [sic]. Hence the emergence of data appliances, that supplant indexes with extreme hardware parallelism to avoid the impedance mismatch between conventional indexing and this acquisition dominant workload. But more hardware means less MTBF, both theoretically and empirically, so wouldn’t it be better to change the economics of indexing rather than to mask the problem with extra hardware?


Post a Comment

<< Home