Friday, April 07, 2006

On Coordination Costs and Human Factors

I reproduce here what I thought would be a short email that I fired off this afternoon discussing the web style. I hope it might gain from wider scrutiny...

On Coordination Costs

In an atypically verbose exposition, Matchmaker Roy Fielding propounded thusly:
Don't get carried away. You won't find a constraint about "nouns" anywhere in my dissertation. It talks about resources, as in resources, because that is what we want from a distributed hypermedia system (the ability to reuse those information sources through the provision of links). Services that are merely end-points are allowed within that model, but they aren't very interesting because they only amount to one resource. The really interesting services provide many resources.

Note that this is just another way of restating Reed's law in relation to Metcalfe's law.
Similarly, Layer Stripper Bill de hÓra recently flirted in the same manner:
Some people when faced when a distributed programming problem, think 'I know, I'll expose a new service interface.'

Now they have n-squared problems.
View Sourcerer Joe Gregorio almost in passing shined a light on some cobwebs (on caching and authentication on the web):
"In the process of implementing httplib2 I also discovered some rough spots in HTTP implementations."
Putting these together and handwaving as usual...

It strikes me that the first two statements are all about changing the frame and making economic arguments for REST (Representational State Transfer), namely constraining system design to resources, identifiers and uniform interfaces in order to lower coordination costs... ergo the web style as an enabler of Reed's Law.

I hadn't seen such explicit harkening to Metcalfe and Reed in the past even though there has always been this notion of REST as incorporating end-to-end principles. I too have argued in this vein about a complexity and integration argument for REST.

In a similar thread Benjamin Carlyle put it thusly
Uniformity is key. Simplicity at the network level is key. Managing state transparently as resources is important for its own reasons
Intuitively, the argument about applying architectural constraints to get payoffs in terms of leverage has a lot of appeal to engineers. It seems however that we need some economists to weigh in here with some options pricing theory or other to give more heft to these arguments. Often, decisions about software systems are not made by engineers but rather by financiers and it helps to speak their language.

Instead of Reed's notion of group forming, it seems the argument by analogy for REST is about application integration and the barriers that obtain in that sphere.

Now I like the large numbers that we can throw out in these reformulations. The thing is that Andrew Odlyzko says (pdf) that both Metcalfe and Reed's law are bunk despite their evident appeal and ability to dazzle venture capitalists.

In that paper he sets out to get quantitative measurements and hard data and puts the value of communication networks at nlog(n) which is nothing to sneeze at, better than Sarnoff's linearity but far less than Metcalfe and Reed's estimates. Now Odlyzko was measuring the utility of the internet which encompasses more than the web and you can argue, following Sam Ruby, that the email and peer-to-peer styles, whether expressed as, Bittorrent, Skype or usenet, are the true winners in his numbers... Also you could argue with his methodology and I have my own quibbles, but it's a brave man who takes on Odlyzko... Thus I'll take his numbers in my handwaving, acknowledging after all that the web was the key enabler and popularizer of the internet.

The Human Factor

This gets me to the third quote from Joe Gregorio, namely the perennial rough spots and implementation quirks that are our daily bread as engineers trying to design and produce systems for the web.

We see daily abuse of HTTP and there are annoying glitches with the libraries and implementations that exist and what is deployed in the real world. Perhaps this is because REST is the web style rather than the programming model and consequently enforces very few prescriptions. We are only now seeing good frameworks geared toward REST; historically, HTTP client and server libraries have been minimalist.

Now it seems to me that this is about the place where theory meets practice and we get into the realm of pragmatism and leverage. In the wild we see
  • the difficulties of interoperation
  • differing interpretations of specifications if indeed specs are read
  • backward compatibility constraints e.g. for leverage and adoption, HTTP 1.1 had to accomodate some of the pitfalls in HTTP 1.0
  • difficulties in authoring structured data
  • configuration and deployment issues (mime types, content negotiation etc.)
  • competition with other styles, the web style exists in a marketplace of architectures
  • vested interests and economic models e.g. limits imposed by shared hosting providers, asymetry of some broadband networks etc

With this in mind, I wonder if I can come up with a stab towards Koranteng's postulates on coordination costs
  1. There is a natural dampening factor in the utility of distributed computing

    We can use Odlyzko's numbers as the lower bound in practice of network effects and Reed's law as the theoretical limit (with Metcalfe being a great popularizer).

    I happen to be reading Graham Greene's The Human Factor and, looking through some of the issues that hinder adoption, many of them could be summarized as comprehension or human variability hence I'll characterize the issue as the human factor. All that is left is to augment with some Black-Scholes options thinking and financial derivatives to package to CEOs
  2. the human factor in technology adoption is sizable and its effect can be measured. Moreover I would argue that it should be recognized as an explicit architectural constraint in the design on software systems.
  3. In the realm of distributed computing, this human factor is bounded by Odlyzko's limit and Reed's law.
    Mathematicians can derive the correct coefficient for me... 1/nlog(n) ?
The rest as they say is advocacy and implementation details...

We are operating with imperfect specifications, imperfect frameworks and imperfect implementations. REST as laissez faire distributed computing doesn't acknowledge these costs as architectural constraints but rather seems to go about it by encouraging best practices and hoping that, by existence proof, people will come to it... One can look at the high level requirements that have been articulated
  • Simple protocols for authoring and data transfer
  • textual formats for protocol and some exchanged hypermedia
  • Sensitive to user-perceived latency
  • Mark Baker's talk about "principled sloppiness" (i.e. "must-ignore style extensibility")
We don't tend to enforce many of these things in the deployed protocols. I wonder what other best practices can lower coordination costs and whether they can be encoded in protocol to remove the human factor...

Anyway food for thought...

the human factor

The Gospel of REST

If anything this enables me to add Reed's insight to my nascent taxonomy (or is it theology?) of the web style which some may have come across... namely:

There's a tag: REST

There's a slogan: the web style

There's a Holy Book: Architectural Styles and the Design of Network-based Software Architectures

There's a Reverend: HTTP

There's a choir: the HUHXtable quartet (HTTP, URI, HTML, XML)

There are Four Horsemen: GET, POST, PUT, DELETE

There are prophets: (you know who you are)

There are pillars: Resource Modeling, Idempotency etc.

There are priests and tax collectors: the caching and other intermediaries. Ergo "Render unto Caesar that which is Caesar's" recast as the notion of "giving visibility to intermediaries".

There are angels and demons: a band of Apaches and various HTTP libraries which are alternately sources of delight and exasperation.

There's a Messiah: the browser (which comes with various pretenders: Firefoxes, Great Explorers, Viking Operas and Fruity Safaris).

There are red herrings: url opacity etc.

There are false gods: WS-*, crusty old architectures of appropriation etc.

There's the wilderness and prodigal children: WebDAV?

There's Mary Magdalene and the disciples: HTML and Forms.

There's immaculate conception: the virtuous XML.

There are worldly travellers: the three mobile kings JavaScript, Java Applet and ActiveX (some discredited) and a Flashy pretender.

There are scrappy offspring: Atom, RSS and Atompub.

There are gruesome Philistines: implementation details such as Structured Data, Character Encoding and Security.

There are elevator pitches, Cliff Notes and ballads: Sir Tim's lullaby of Web Architecture 101 is quite reasonable as a Song of Solomon

There's myrrh and frankincense: the web as conversation engine

And now there's the Promised Land ©: Reed's Law as the proverbial milk and honey

The parts I'm missing are the Apocrypha and Gnostic gospels (with Judas in the news this week)... but those should be forthcoming... As will the eventual accommodation by Rome as the official religion but then Bill de Hóra has noted that we are almost there.

[Update] Ernie Prabakar suggests that "Lo-Rest" works as apocrypha and that "SOAP is really gnostic - it focuses on the divinity of XML, to the denial of its incarnation in HTML." One wonders who Rome is in the technology world, perhaps Microsoft like he suggests... I'd note in passing that I've heard in corridors that the IBM Software Group Architecture Board is "looking at REST" anew. That's got to qualify as progress... Pretty soon I'll be able to publish an official gospel from my muddy trenches. Looking further down the line, there will likely be a split between the Catholic and Orthodox churches as the empire suffers from navel gazing and an East/West axis of discontent and eventually there'll be Martin Luther and the Reformation... I wonder whether I'll live see to see the Protestants of REST and if I'll recognize them.

A Data Digression

I noticed last year that Roy Fielding produced a white paper about JSR 170 (Java Content Repository) for his day job (pun intended) applying principled constraints to the modeling of content repositories.

I've been curious about the surprising inertia behind that specification having played with IBM implementations in the past few years. Perhaps however, the immaturity of implementations in that space is a tribute to some of these arguments about coordination costs applied to the marketplace of data (relational, object relational, XML, SQL, XPath, XQuery, ActiveRecord, ODMA, Spring, Hibernate, SDO etc).

I wonder if the Atom store dream is the way to go, namely rather than apply the constraint of an API and a language, Java, in a world in which we have a Tower of Babel of languages and persistence frameworks, it makes more sense to focus on wire protocol (as in Atom Publishing protocol) and wire format say Atom. In other words the greater payoff would be not in establishing a programming model (the JCR) but rather in moving to Atompub which is agnostic on the underlying programming model and lowers the coordination costs by stripping a layer of comprehension from the mix. All this of course is modulo the quirks of compound documents, media collections etc...

The web worked because it was an overlay system that acknowledged existing systems and encoded much of its benefits in protocol rather than API. At the current stage of development in the software industry, it appears that the combination of protocol and data formats rather than API is the more effective approach to lowering coordination costs and dampening the human factor.

File under: , , , , , , , , , , , , , , , , , , , ,

1 comment:

Anonymous said...

Amen to that!