Showing posts with label corporations. Show all posts
Showing posts with label corporations. Show all posts

Monday, May 23, 2005

On Recommendation Systems

Pete Lyons wonders about the voodoo behind Amazon's recommendations which of late seems to have nailed things for him as they successfully mine the Long Tail of music. He asks

Do they have more data now or better algorithms?
Sad fellow that I am, I actually happen to have read the research paper behind their approach Amazon.com recommendations - Item-to-Item Collaborative Filtering (PDF) which describes the techniques they used. The question then is how do these glue layer, machine-learning heuristic, and algorithm-eating artificial intelligence folks do recommendation systems and what should one make of them?

Now I'm no algorithm wonk, but my mathematics and electrical engineering hasn't atrophied enough to prevent me from recognizing the elegance of Amazon's technique. What's also interesting is that like Google Suggest, they make good use of offline processing so that almost everything can be pre-calculated and cached for reasons of scalability and speed.

As with almost all recommendation or machine learning systems, the more data they have the better things get. Of course the computation costs also increase but, if you can parallelize things a little, you can leverage the experience we now have of 10 years of Mr Moore in the Datacenter of the web and throw server farms at the problem. It's not just the data however, if you look at the mathematics, you'll see that the algorithm used plays the most crucial role much like PageRank or Kleinberg's earlier Clever (PDF, PS) did for search systems on the web as discussed here.

Music or book purchases on the whole are episodic unless you're someone like me who is obsessed and constantly looking for the perfect beat. Amazon as a retailer doesn't have to be worried about real-time timeliness; I'd guess their average user buys or browses through the site monthly, weekly or at a stretch perhaps once a day (e.g. outliers like myself who are always tweaking their wishlists). Amazon also gives users easy ways of providing feedback so that they can train the system at their leisure as they interact with the site. Like Netflix, they just have to throw in a little unobtrusive DOM scripting, hidden iframe or XMLHTTPRequest thingimijig and the user can click on those star rating systems without even seemingly reloading the page.

The benefits for the user are frictionless immediacy and the notion that one is teaching the autistic machine. The endpoint of machine learning is to perfectly anticipate human desires. Per contra I actually believe human psyches require a little imperfection. We need a little interaction and especially some conversation as we mediate our ever-changing world. In Apple's Knowledge Navigator 1988 concept, the interaction was more with a friend or butler. And as we know Jeeves was opinionated and mostly correct but he also made mistakes and Wooster could sometimes be smug. Should the people working on recommendation systems ever get to their nirvana, I would speculate that it would be a little unsatisfactory for the users, hence I'd suggest that they make sure to throw in enough errors (maybe a 5 percent threshold of randomness) to keep things exciting. I actually cherish the mistakes in my music collection and even the things Best Left Unread.

Enough people have been noting that Audioscrobbler's music recommendation system has improved enough that, late adopter that I am, I went ahead yesterday and installed their plugin on serveral of my machines. On the older machines, I use Winamp which has lower memory requirements than iTunes. My digital music collection (10,300 songs, 70.6 GB) is larger than the largest iPod so I am waiting for another Moore's Law inspired doubling of storage before I adopt that platform, not to mention that my ear canal is too small for those white earbuds.

Audioscrobbler's approach has the virtue of actually keeping track of your actual usage of the content in question so that if it notices that you keep playing James Carter's The Intimacy Of My Woman's Beautiful Eyes as I did last week, then perhaps it will be smart enough to figure out that you'll dig Nicholas Payton's Captain Crunch (Meets The Cereal Killer). At least that's the theory. Some fear that this smacks of Big Brotherism and indeed we need to be vigilant that the benefits we gain in giving up some of our privacy outweigh the possible pitfalls.

On the other hand, humans are social beasts and don't mind a certain amount of looking over one's shoulders or, like the teenagers on the bus, we don't mind others plugging in to our headphone jacks. Or like some of our neighbours, we want everyone in the neighbourhood to hear our sounds whatever they may be. There was some mp3 player or other I saw on Engadget a while back that had 2 headphone jacks to facilitate intimate sharing of tunes and that's the right idea. There's even controversy among iPod people over the new vocabulary: does podjacking mean "plugging your cord into the jack of another person's iPod (and vice versa, of course) to hear what that person is listening to" or is it "using an FM transmitter attachment to take over neighboring radios". Amazon's Listmania feature, our Netflix queues, the sharing of iTunes playlists, things like Webjay, Shoutcast, last.fm and All Consuming express the pent-up desire for the two-way web of publishing of content and interests.

If you love books or music as much I do, and write about them with incisiveness and opinion, others will always be asking you what the latest thing is in order that they can seek out new things (or know what to avoid) or simply share their opinions or anecdotes. These are cultural markers that help establish shared context. That is the role of DJs or the personal recommenders we all have, the people who are arbiters of the cool. And if the people who market these items seek out these tastemakers with the moral equivalent of payola, it's just smart advertising. Samuel Jackson will never buy a hat in his life again unless Kangol loses their corporate mind. In my own little way, long after I was active at WHRB, record companies were still sending me 12 inch singles even though my Technics turntables had been stolen and I'd asked repeatedly to be removed from their mailing lists.

Anyway here's Koranteng's Musical Toli at Audioscrobbler.

It's a little scary in that you could figure out that as I was writing these words Chuck Brown & The Soul Searchers - Bustin' Loose was making my early morning brighter and you might even picture me getting up to take a few shuffle steps around my living room in my pajamas.

The Audioscrobbler recommendation system kicks in after 100 or so tracks played and I'll give them a couple of weeks or to see if it's hype and report back with the results. I too am looking for overlooked breakbeats.

Dan Bricklin's Listgarden is software in that mode. My current Forms work is all about lists, indeed until the lawyers heard about Microsoft List Builder, that was the preferred terminology for some. I actually would rather call a form a form, and a view a view but I'm a cantakerous sort as you might know.

In the web site link category where timeliness is of importance, the sheer scale and real-time nature of the blogosphere makes the design of efficient recommendations systems more problematic. I assume all the big boys are burning the midnight oil trying to crack that question. We want both search and recommendations to help navigate the web. Google has been very quiet of late although their Web Accelerator is pointing to the kind of thinking required. Just when you think that Blogger has been sadly neglected (no categories? no trackback? no easy lists or blogrolls etc) all those PhDs will surprise you. At least I hope they will - the contortions I've been going through with Blogger deserve a seperate post. I assume the same thing is true at Yahoo, MSN and the like. In the meantime, it has been Technorati, del.icio.us, PubSub, Blogdigger, Furl and BlogPulse for me (not to mention Blogdex, Popdex, Hot Links, Mememorandum and that latecomer Ice Rocket). Mark Fletcher has been intimating that Bloglines will surface something in the summer. If they do, they will have quite an advantage since so many are increasingly living in Bloglines and other newsreaders that Tim Bray is now pondering the usefulness of his Browser Market Share numbers.

Another Glue Layer Person, Leonard Richardson has one of the most interesting papers (and software that one can play with complete with Python source). He explicates the problem space quite clearly:

The Ultra Gleeper: A Recommendation Engine for Web Pages
Recommendation engines were built and run into troubles. Seemingly insurmountable problems emerged and the flame of hype moved elsewhere. Recommendation engines for web pages were not built or successfully launched. To even attempt one would require development of a web crawler and the associated resources. Today, recommendation engines have something of the reputation of a well-meaning relative who gives you gifts you often already have or don't quite want. Most useful recommendations come from knowledgeable friends or trusted web sites.

But over the years, as people built these web sites, they came up with models and tools for solving the basic problem of finding and tracking useful web sites. The wide adoption of these strategies has not only brought down the cost of building a web page recommendation engine, it's removed some of the insurmountable problems that still plague recommendation engines for other domains. It's now possible for someone with a dedicated server to run a recommendation system for themselves and their friends. I've done it and I'll show you how to do it.
An officious-looking "Legal Education" document came in through my Big Blue inbox in the past few days, so presumably I should consult that before peeking at the code if indeed that is allowed. The tension between the open source imperative and the inhibition of so-called "Intellectual Property" is a minefield that like all such battlegrounds causes stalls (and amputations); we still haven't learnt how to navigate easily these things.

David Hyatt recently wrote about Implementing CSS and gave pointers to the major lessons learned while implementing CSS support in Safari and Mozilla along with a couple of optimizations he came up with (very clever algorithms by the way) and the subsequent public discussion about the design and performance tradeoffs made in complex things like web browsers stimulated my engineering juices. Lots of things are browser-like these days and the techniques he has written about have wide applicability. A high performance dynamic style rule matching component could be used in many products that don't have to do with browsing. Instead I have to consult a lawyer or read yet another stack of incomprehensible powerpoint slides. Why exactly did my parents forbid me a law career? It seems that lawyers are the only ones who have guaranteed job security (trigger-happy disclaiming is the rule)

And while digressing about job security, I should mention that the news of 10,000 to 13,000 jobs being eliminated at IBM hasn't made for a comfortable work environment for the past few weeks. Those are mighty nice round numbers... Rumours are rampant in the workplace that as the Beeb suggest "it's too expensive to lay off in Europe, they have all those unions and are vaguely socialist therefore they're going to cut around here in the US where there is much less difficulty hiring or firing. Otto Von Bismarck knew a thing or two about paternalism and the unions in Europe even survived Maggie Thatcher's defenestration of Arthur Scargill. I know my family are all wondering what might happen and if I'll be affected and not just quietly, it's an ongoing concern in conversations. And who knows? We all worry about our own little patch of the woods but also about those colleages who we may not be talking to in coming months sometimes to the point of work being affected. Despite our undoubted professionalism, we're only human and don't understand abstract and capricious things like gravity or economics, we'd rather discuss tangible things like books, music and recommendation systems.

The two topics that never get spoken about in the US corporate world are one's actual salary and the impending job cuts. Everybody just keeps their heads down; the fear being that if you make too much noise, you might get the cut because admittedly we've all heard that such things have happened somewhere, somehow, at some time in some corporation. Corporations being the expression of unsentimental capitalism, I assume the worst part of being a manager is having to lay someone off and see their face. I'm too sensitive for that kind of thing and hence I'll never aspire to a managerial position. The one thing that you'll hear as the received wisdom-du-jour gets explained is that the euphemistic "resource action is not the fault of those affected" but rather "simply reactions to market conditions" or philosophically that ultimately it was a failure of planning or of something or other. But maybe I shouldn't stick my neck out any further...

In the spirit of the recommendation systems I've discussed I'll endeavour to share a few playlists with toli readers this week before London calls this weekend.

Soundtracks to this tale:
See also: On The Long Tail of Music, Metrics and Recommendations

File under: , , , , , , , , , , , , , , , , , , , , , ,

Monday, May 16, 2005

On Blogging at IBM

Fellow traveler, James Snell, points out IBM's newly published blogging guidelines and policies:

You can read them at length there, they seem fairly reasonable, even if couched in the obligatory corporate PR self-congratulatory bromides about "innovation-based companies". I wonder, are there any companies that claim to be anti-innovation?

Blogging@IBM

IBM blogging policy and guidelines

Responsible Engagement in Innovation and Dialogue
  1. Know and follow IBM's Business Conduct Guidelines.
  2. Blogs, wikis and other forms of online discourse are individual interactions, not corporate communications. IBMers are personally responsible for their posts. Be mindful that what you write will be public for a long time -- protect your privacy.
    [snip]

Use your best judgment... Ultimately, however, you have sole responsibility for what you choose to post to your blog.

Don't forget your day job. You should make sure that blogging does not interfere with your job or commitments to customers.


The day job injunction is one about focus. When it comes to Freedom of Expression, companies know that they can't control what someone does on their own time and indeed that it can make the workplace a happier one if employees can pursue their muses. My own management chain have worried periodically about my focus. It hasn't been much use telling them that the Technology toli is actually my attempt to gain ideas that feed back into the day job or indeed that I've been blogging about Forms Glue of late. Or even that my education has been all about learning to handle balance and coping with daily insanity of which there is much in large bureacracies. Some just look at the blog and get scared by the veritable outpourings in this land. "How can he possibly write all this they must be asking?" Well I do have weekends, mornings and nights, right? At least I hope I do... Of late the 5am to 7am shift while drinking tea, reading the news and enjoying the early morning sun has been very productive and prolific. Thus at best they can only give a gentle reminder, day job doesn't even get a number in the guidelines.

The good news is that I have only pressed against the spirit of a couple of these guidelines. The one about "Clients, partners or suppliers should not be cited or obviously referenced without their approval" in particular.

And for these I would invoke the "Use your best judgement" plank as a justification.

I like to link. Like the hyphen, the hyperlink is promiscuous, sociable and an assertion of interest. Hyperlinking is the singular power of the web style; a link shares the googlejuice around and often shows that a human has made a judgment. The judgment is value neutral and doesn't imply anything other than interest (or sometimes dissent). The controversies over linking, deep-linking will continue to be fought until this is more widely understood. Links also get spammed but that's another story. A shout out to The Power of the Schwartz or to Sun & Sun (a frequent victim of The Ampersand Curse) is just that: a shout out. I certainly am not going to seek approval to link to these fine folks.

And as far as picking fights goes, it often isn't the wisest thing but sometimes it serves to clear the air (see On The Importance of Biting Satire for example). I've noted:
Sometimes you have to resort to the down and dirty column.

I like my satire savage. It should be vicious, biting and deeply heartfelt. The targets should feel a sharp wound.
Less said on that however.

I would say a similar thing about the "Use a disclaimer" item. This is a weasely concession by overly freaked-out folks to keep lawyers employed. I do recognize that the things I cross-post at the official Inside Lotus blog should have a different tenor, coming as they do from company hosted facilities and presumably, in that respect, I am acting as the public face of Lotus. Thus I take a greater care with my words in the toli that surfaces on that forum.

On the other hand, I think it is obvious that an individual doesn't speak for a company.

In legal terms, and as the son of a lawyer, I can confidently say that a disclaimer adds no value or protection whatsoever. If someone objects to your blog post, website or email, and if they have deep pockets (say the Scientologists for example), they can, and will sue willy-nilly and tie you up in court, protestations of disclaimer notwithstanding. The wonder of the lawyer lobby is that it manages to keep risk aversion and litigation at such a high pitch in the cultural zeitgeist. It is true that oftentimes, the market will tar you with the brush of guilt by association; in economic terms therefore it is wise for companies to worry about such things. But a certain humanity is often lost by blandly avoiding controversy. There are many a company with Strange Bedfellows all over the world (whether it is in the pursuit of oil, gold or blood diamonds, paying bribes to people while later tarring said countries with the brush of corruption. It takes two to do the corruption tango.

combating corruption


If you really did believe (as for example many executives did in the apartheid era) that it was imperative to share in the fruits of the sweat and tears of others - sanctions be damned! like Reagan and Thatcher maintained) then one should indeed expect swift retribution from the marketplace if appropriately sensitized. I remember Barclays Bank paying a heavy price in the 1980s for such an attitude (and it is only 19 years later that they are emboldened to return to South Africa). I can think of many such examples and perhaps you could point me to your favourites e.g. watching a nice liberal mother explain to her 4 year old son why the Del Monte can of peaches from South Africa had to be put back on the shelf and the Waitrose brand peaches (without the colourful logo) substituted, circa 1988 in Brent Cross shopping centre in London.

Now employee blogging is much the same as employee use of any technology, be it phone, email or the web. Oftentimes, the use of said technology can be very productive and useful (in moderation) and indeed it can sometimes save lots of time and keep the employee focused on corporate business. If I'm able to arrange renewal of my license over the phone or the web during my lunch break, I presumably wouldn't have to take an afternoon off work to head to the DMV. I recently joked in passing about how I recently had to respond to an anonymous email from some department or other to justify maintaining my office phone since it had seen relatively little activity in this era of instant messaging and email. It is incidents like that that lead people to talk all too often about "faceless corporations". That legal fiction of personhood is frequently invoked by companies but often conveniently forgotten when the lights go out.

American society is deeply litigious and gets stuck on the notion of explicit adherance to the letter of the law as opposed to the European notion of staying within the spirit of the law and letting an experienced judiciary adjudicate when the boundaries are overstepped. This means that there is a vast industry of tax and accountancy lawyers who specialize in weaseling out of the letter of the law with new tax shelter products every year engaged in an arms race with the IRS.

In this vein I would suggest that if Lotus was Old Europe, that IBM is heartland America, a New World of slightly puritanical rectitude. Coming from a culture that is often reacting to the fights between these two elephants, I would say that each approach has its merits and that perhaps the grass should have some say in these things.

Sometimes of course, this excessive concern for litigation has benefits for society, for the greater good as it were. Cambridge sidewalks tend to get cleared fairly quickly when it snows since people who twist their ankles and fall in front of your house will get their 50 cents and more in legal revenge. In comparison, English and French sidewalks were treacherous in the winter time - it often felt like a tightrope or walking the plank (in my tradition of metaphorical excess). There is also huge innovation in the kinds of cups that are used for coffee to prevent litigation-induced scalding. I don't drink coffee but I am amazed at what I see people holding when they walk out of Dunkin Donuts or Starbucks. It's Nuclear Star Wars leading to good old Teflon all over again.

The 401K account, which is about the only thing other than the plain providential, and literal, lottery, that Americans will have for retirement if Dubya and Cheney have their way with Social Security - what with their continued focused and highly selective war-mongering, and deficit spending like proverbial Palm Wine Drinkards, is just a case in point about this phenomenon. A lawyer took a look at the tax code, found a loophole and now every dinner table conversation is about the 401K. Following up on the same idea, it is plain fact that the Roth IRA is the most popular political and economic innovation of the past decade. Bless you Senator Roth, wherever you are, you citizen you.

Palm Wine Drinkards


On the other hand this is the same tendency that leads to much inhibition. The US has half of the world's supply of lawyers and the world's largest insurance industry and for good reason. I shouldn't even mention the reinsurance industry and the whole stack of derivative products founded on this litigious risk mitigation tendancy.

Playground swings are no longer as fun since manufacturers have shortened the rope to prevent high velocity and now parents will strap you in like a pilot. Where is the thrill of youthful daredevil inventiveness going, I ask? My cousin famously broke his arm as a child on our playground swing and he is much the better for it. He bacame a far more sensitive soul once he had to be confined to a cast and realized his limitations and the wisdom of the repeated warnings of his parents and entire family. Actually it was the traditional healers of his father's village of Taviefe in the Volta region of Ghana who set his arm in place, armed with their inimitable herbs and centuries-old experience. We turned to tradition as opposed to modernity. A great respect for tradition and confidence in his roots was fostered in this experiece. Certainly in family lore we all know better where we come from.

roots


I can't imagine my Auntie Grace filing a lawsuit against the swing manufacturer, or her sister in whose backyard the great swing was to be found, or perhaps even her nephew, me, who was in attendance at the fateful fall and who didn't intervene. That however is the degenerate kind of thing that would happen, and does happen fairly frequently in the US where the ties of family and societal culture are sometimes loosened into anomie.

There are already far too many emails emanating from corporate accounts with noxious disclaimers, clogging up mailing lists everywhere and causing comprehension problems. They are a public nuisance and there is no reason to add further disclaimers to the mix.

As you might have guessed, I dissent on that front, my Blogger profile simply says "Oh, and I work at Lotus/IBM". The Girlfriend Fiancée says that that tag line is "a little unprofessional" but it wasn't chosen without care. This joint is an individual one, this is a someone's voice you are hearing, engaging and thinking aloud in public conversation.

I think that suffices. What do you think?

Update: My friend Justin adds some Mediocre Indian Cuisine to the advertising mix. Join me in welcoming another jaundiced Lotus/IBMer to the blogosphere. He started the blog before these newfangled stamp of approval thingimijigs were published and we are all the better for it.

There is another post lurking about where and how people at IBM blog, but that's another conversation for another early morning, right Tessa?

Soundtrack for this tale: Brooklyn Zoo by ODB


File under: , , , , , , , , , , , , , , , , , ,