Tuesday, September 15, 2009

The world is multidimensional? Really?

I cannot believe that people still consider top-down, uni-dimensional, taxonomic hierarchies useful.

This Stack Overflow question (REST: How to Create a Resource That Depends on Three or More Resources of Different Types?) repeats an assumption. Essentially the confusion comes from assuming that "URI's map directly to a hierarchy".

I think it's over-exposure to the Windows file system where hard links are a rarity.

Perhaps it's also from over-exposure to hierarchical site-maps that simply repeat the menu structure without adding information.

Someone who is reading Everything is Miscellaneous suggested I read up on "faceted classification" as if that was something new or different.

What's interesting in Weinberger’s book is (1) recognizing this and (2) taking some concrete action.

What To Do?

What's perhaps the most important thing is this

Stop Forcing Things Into Hierarchies

I sat in a multiple hour meeting where we debated the file-system structure for artifacts created during a development project. Each artifact has several dimensions.
  • Phase of the project (Inception, Elaboration, Construction, Deployment)
  • Deliverable type (DB Design, Application Programming, Web Site, etc.)
  • Status (Work in Progress, Waiting UAT, Completed, Rework, etc.)
  • Calendar (Year, Quarter, Month the work started, as well as ended)
  • Team (DBA's, Batch/Backend, Web/Frontend, ETL, etc.)
Sigh.

Since the data is multidimensional, no single taxonomic hierarchy can ever "work". Each alternative (and there are 5!=120 ways to permute five dimensions) appears equally useful.

If you want, you can enumerate all 5! permutations to see which is more "logical" or "works better for the team". What you'll find is that they all make sense. They all make sense because the dimensions are all peers -- equally meaningful.

Alternatives

One alternative is to do this.

1. Create a relatively flat structure. Define all your things in this flat structure. In a Relational Database context, this means assign surrogate keys to everything, "natural" keys are more problem than solution. In a content management context, just throw documents anywhere.

2. Create "alternative" indices via hard links to the flat structure. Do not limit yourself to a few alternative orderings of the dimensions. There are n! permutations of your dimensions. Expect to create many of these for different user consituencies.

Remember, Search Exists

Recognize that highly structured metadata fields in a database are usually a waste of time and money. Search exists. Much data is unstructured or semi-structured and search functions exist that handle this nicely.

If you stop force-fitting hierarchies, you find that you have now have several dimensions. Each dimension has a set of reasonably well-defined tags. Each document or database fact row is a point in multi-dimensional space.

A single SQL-style query among these multiple dimensions is a pain in the neck. Search, however, where the dimensions are implied instead of stated, is much, much nicer.