Tuesday, April 19, 2011

Meet Butterflyzer Part I: Is this any way to develop a product?

It's been a while since I posted and I've got quite a backlog of things I'd like to write about.. EclipseCon 2011, Agent Modeling Platform improvements, my mixed feelings about Big Data and AI, and so on. But what I've really been excited about recently is the progress being made on a completely different piece of software, and I'd like to share it with you. It's called "Butterflyzer".



As a bit of background, while I haven't blogged much about this, I've done a good deal of work in two different fields outside of my general focus on creating artificial models of real world complex systems. Along the way, I've invested a lot of time in learning how to leverage various Eclipse-based technologies to do the things that I really want to do. (For those who aren't involved in software development, Eclipse is the leading Open Source development environment and a growing platform for general purpose application development.) All of this has led me to a set of somewhat contrarian and untested views. So I thought.. hey, if not many other people share these ideas, they either a) not very good ideas, or b) an opportunity. As usual, I've saved time by simply guessing b) and then spent an obsessive amount of time and energy in creating an application that attempts to support that guess. Here is my experience, for what it's worth.

1) I spent a few years doing fascinating research on fine-grained agents for reasoning. I'd been aware of the theoretical limitations of AI, but that work exposed me to the real-world limitations of knowledge representation and machine reasoning. That experience has left me deeply skeptical about the sound and fury surrounding efforts like IBM's Watson, Wolfram Alpha and Hadoop. I'm not questioning that what they offer is valuable and interesting, but I do wonder about how they're being presented and understood, especially by the broader public. There are important -- even profound -- issues that are simply being glossed over. As I say, that's a subject for another post. But what that work did convince me of was the value of light-weight, "human-in-the-loop" approaches to using machines to support reasoning. I came away from it with the feeling that we should let computers do what computers do well -- sift through lots of data discovering patterns that fit existing templates, and let people do what they do best -- invent new templates. My guiding passion is to develop tools that harness the power of the computer to enable people to think more deeply and openly about our world, to share that information, and to make wiser choices. While the evergreen challenge for AI seems to be creating software that comes up with good answers to hard questions, I'm more challenged by the idea of creating software that comes up with good questions that have no answers.



2) More recently, I've somewhat accidentally spent a lot of time on Social Media research. Again, that work has led me to the conclusion that most of the efforts at Internet research have been too focussed on simple answers to frankly pretty lame questions. "How can I increase my click-through rate?" (Gee, I dunno, say something actually useful or at least interesting?) "What's the best day to post on Twitter to get maximum RTs?" (Hmm.. I guess "interesting" isn't really an important metric, judging from Twitter trending topics.) That might sound hyper-critical or even arrogant. Don't get me wrong, there are a lot of really cool efforts out there. But most of them are self-funded or academic exercises. I'm talking about where the money is, and as usual, that's in telling the cleanest story you can to investors. For that you need a metric and -- since that metric usually can't be money, since there often isn't any -- it has to be some other single dimension, like, yep, click-through rates and retweet counts. Now, I don't know about you, but I can't think of anything more trite and less worthwhile than trying to optimize that kind of stuff. So at base my motivation isn't arrogance, it's frustration -- we have all of these amazing technologies, and we can do so much better.




3) I've spent a lot of time developing desktop applications and working with other people that do the same thing. Since we have a good hammer (tools for building "rich" clients, aka applications) I'm certain that that it's the right tool to use. But pretty much the rest of the world is convinced that screw-drivers ("thin" clients, aka web pages and services) are the way to go. Everyone is moving to web-based, software as service applications. OK, they've long since moved there, and are now moving on to the cloud. Really, rich clients weren't even cool in 1998. Granted, some applications still demand the kind of high-performance, deep UI experience, and low-latency that you can only get from a desktop app. But just about the last thing that it makes sense to use a desktop application for is web analysis. Who'd be dumb enough to do that?





So, to cut to the chase, there are a lot of tools out there that:

1) Analyze massive biga-bytes of data and come up with answers to questions that no one ever thought of asking before. As Mike Olson, the CEO of Cloudera, one of the big players in the field, said "The light goes on.. Ah! I'm going to use a 1000 servers, because I can". Which means that people don't end up asking this critical meta-question: "is this question worth asking?" Still, given the number of questions being asked, it's inevitable that some small subset of questions are worthwhile. Undoubtedly, some of the information that Big Data provides us with is really useful.

2) Take that information and present it to end-users in a succinct, "actionable" way. I've spent a lot of time analyzing the sort of data that everyone else seems to be analyzing, and what I discovered was that, um... it was surprisingly hard to say anything interesting about any of it. That's my fault. As anyone who has bought breakfast cereal or red wine can tell you, packaging beats content every day, and I'm lousy at packaging. For example, I studied the events in the Middle East using the same kind of data that Sysomos reported on in this blog. Cool, huh? Now, this is how I reported the same data to my colleagues and clients: a) We don't really know where any of the Tweets are coming from, b) There were more tweets after something happened then before anything happened, and c) People used terms like "people", "time", and "RT"*, all of which implies that d) we've been wasting money and time on this and we should change our approach. That just doesn't come off as well. I should have made more charts and graphs. I don't mean that in a totally cynical way. To me, interesting means "surprising" and "novel", but I'm happy with a definition that includes "pretty" and "compelling". Visualization allows us to really get our minds around information, but that information needs to be meaningful in the first place.

3) Provide all of that data packaged as SaaS (web-based) analysis tools. These are described with words designed to make CIOs feel savvy when dropping wads of cash on said tools. "Business Intelligence" is a must. (Because oddly enough, no-one wants "Business Stupidity" tools.) "Integrated" and "Solutions" are always good filler. "Continuous", "Cloud", etc.. are still pretty hip. "Big Data" is totally hip. Hey, what we really need is a Business Intelligence System for Automated Monitoring and Assimilation of Massive Industry Buzzwords. We could integrate that with our CMS and then attach that to our CRM for real-time customer sales feedback, and then... But I'm sure someone is already in this "space". OK, getting off into critique for the sheer joy of it, and that's not very helpful. But let me just point out that the one thing that you don't see on any of these websites is how much the product actually costs. Hey, if you have to ask, right? And look, these aren't products, they're services, ok, so why don't you just let us send you a bill in a couple of months? As I write this, that kind of approach is sounding pretty damn smart...

Snotty criticism aside, you can't argue with success. And people have done some really ground-breaking and impressive work along the way. It's certainly not as if the limitations of these technologies aren't well known to the people actually working on them, and if other people are willing to spend money on technology they don't understand, then that's fine. Full employment for software developers, computer scientists and mathematicians is a good thing, probably. Keeps us off the streets.

But the thing is, I don't want to create tools for CIO budgets, I want to create tools that people will actually use, be delighted with and do interesting, unexpected things with. And (as demonstrated in the screenshots above) easily share their discoveries with others without having to pay a tax on the results. This requires software that 1) doesn't need unlimited infrastructure and data sets, 2) takes an open-ended, exploratory approach 3) has a responsive, engaging UI, and that you can download, install and use on your own computer for whatever and however long you want to. And it probably needs to cost less than a hundred bucks. I think it's worth trying. The attempt is called Butterflyzer.

You can download the free Butterflyzer Alpha and see more screenshots and screencasts at http://butterflyzer.com. (Psst.., the first 50 people who follow us will get 50% off the retail price when and if we release a commercial product.) I'd love to hear what you think!

To find out what Butterflyzer actually does, check out Part II.

*Actually, there are some interesting things you can learn from such analysis, and I'm not really doing the Sysomos graph tools justice. Here's Butterflyzer's take on similar but more recent data. Here, we've integrated actual sample Tweets. Things are a little noisy graphically but the tools allow you to easily narrow and expand the information shown.


Popular Posts

Recent Tweets

    follow me on Twitter