Monday, September 19, 2005


How do you measure the virtue of a software solution? That partly depends on what your priorities are, but you will probably agree that absolute virtues are (a) completeness - it does what you need it to do; (b) performance - it does what it does quickly and (c) reliability – it can be trusted to do what it purports to do. You would probably also agree that simplicity promotes performance and reliability because there is less to do and less of it to go wrong. But then simplicity conflicts with completeness, doesn’t it? Not if you can lift the solution into a more generic and uniform problem space with fewer dependencies and fewer restrictions. Admittedly, that’s not an easy task and often requires a leap of inspiration, but it would ideally be a key objective for software developers (time allowing...).

It’s akin to seeking the theory of everything in physics, an elegant solution sought to replace the current cocktail of awkward theories that fail at arbitrary boundaries created by localised assumptions. An example in the RDBMS world is cost-based optimisers over their rule-based ancestors – the problem has been lifted to the more generic solution of comparing the true cost of the possible access paths whereas the original rule-based solution was a glorified bunch of ever-expanding incomplete and specific heuristics; the cost-based approach is less complex, yet is more complete, accurate and predictable. No doubt, as I write, somebody out there is groaning about the dire performance of a cost-based optimiser in one of their databases – but that probably has more to do with the inaccuracy of the underlying data distribution statistics rather than any flawed reasoning by the optimiser. At least, when there is an anomaly with the cost-based approach, you can readily discover the root cause and fix it – the rule-based approached was always a bit of a black-art.

On a related but tangential note, isn’t it about time we got away from the tyranny of collecting data distribution statistics as a distinct process – and instead had relational databases that updated them on the fly as and when the data changes - so that they are always accurate? That solution sounds more elegant too... (hint)

BTW, this whole meander was by triggered by rational trigonometry and some undocumented connections in my synapses.


Post a Comment

<< Home