As a bit of background, while I haven't blogged much about this, I've done a good deal of work in two different fields outside of my general focus on creating artificial models of real world complex systems. Along the way, I've invested a lot of time in learning how to leverage various Eclipse-based technologies to do the things that I really want to do. (For those who aren't involved in software development, Eclipse is the leading Open Source development environment and a growing platform for general purpose application development.) All of this has led me to a set of somewhat contrarian and untested views. So I thought.. hey, if not many other people share these ideas, they either a) not very good ideas, or b) an opportunity. As usual, I've saved time by simply guessing b) and then spent an obsessive amount of time and energy in creating an application that attempts to support that guess. Here is my experience, for what it's worth.
1) I spent a few years doing fascinating research on fine-grained agents for reasoning. I'd been aware of the theoretical limitations of AI, but that work exposed me to the real-world limitations of knowledge representation and machine reasoning. That experience has left me deeply skeptical about the sound and fury surrounding efforts like IBM's Watson, Wolfram Alpha and Hadoop. I'm not questioning that what they offer is valuable and interesting, but I do wonder about how they're being presented and understood, especially by the broader public. There are important -- even profound -- issues that are simply being glossed over. As I say, that's a subject for another post. But what that work did convince me of was the value of light-weight, "human-in-the-loop" approaches to using machines to support reasoning. I came away from it with the feeling that we should let computers do what computers do well -- sift through lots of data discovering patterns that fit existing templates, and let people do what they do best -- invent new templates. My guiding passion is to develop tools that harness the power of the computer to enable people to think more deeply and openly about our world, to share that information, and to make wiser choices. While the evergreen challenge for AI seems to be creating software that comes up with good answers to hard questions, I'm more challenged by the idea of creating software that comes up with good questions that have no answers.
2) More recently, I've somewhat accidentally spent a lot of time on Social Media research. Again, that work has led me to the conclusion that most of the efforts at Internet research have been too focussed on simple answers to frankly pretty lame questions. "How can I increase my click-through rate?" (Gee, I dunno, say something actually useful or at least interesting?) "What's the best day to post on Twitter to get maximum RTs?" (Hmm.. I guess "interesting" isn't really an important metric, judging from Twitter trending topics.) That might sound hyper-critical or even arrogant. Don't get me wrong, there are a lot of really cool efforts out there. But most of them are self-funded or academic exercises. I'm talking about where the money is, and as usual, that's in telling the cleanest story you can to investors. For that you need a metric and -- since that metric usually can't be money, since there often isn't any -- it has to be some other single dimension, like, yep, click-through rates and retweet counts. Now, I don't know about you, but I can't think of anything more trite and less worthwhile than trying to optimize that kind of stuff. So at base my motivation isn't arrogance, it's frustration -- we have all of these amazing technologies, and we can do so much better.
3) I've spent a lot of time developing desktop applications and working with other people that do the same thing. Since we have a good hammer (tools for building "rich" clients, aka applications) I'm certain that that it's the right tool to use. But pretty much the rest of the world is convinced that screw-drivers ("thin" clients, aka web pages and services) are the way to go. Everyone is moving to web-based, software as service applications. OK, they've long since moved there, and are now moving on to the cloud. Really, rich clients weren't even cool in 1998. Granted, some applications still demand the kind of high-performance, deep UI experience, and low-latency that you can only get from a desktop app. But just about the last thing that it makes sense to use a desktop application for is web analysis. Who'd be dumb enough to do that?
So, to cut to the chase, there are a lot of tools out there that:
1) Analyze massive biga-bytes of data and come up with answers to questions that no one ever thought of asking before. As Mike Olson, the CEO of Cloudera, one of the big players in the field, said "The light goes on.. Ah! I'm going to use a 1000 servers, because I can". Which means that people don't end up asking this critical meta-question: "is this question worth asking?" Still, given the number of questions being asked, it's inevitable that some small subset of questions are worthwhile. Undoubtedly, some of the information that Big Data provides us with is really useful.
2) Take that information and present it to end-users in a succinct, "actionable" way. I've spent a lot of time analyzing the sort of data that everyone else seems to be analyzing, and what I discovered was that, um... it was surprisingly hard to say anything interesting about any of it. That's my fault. As anyone who has bought breakfast cereal or red wine can tell you, packaging beats content every day, and I'm lousy at packaging. For example, I studied the events in the Middle East using the same kind of data that Sysomos reported on in this blog. Cool, huh? Now, this is how I reported the same data to my colleagues and clients: a) We don't really know where any of the Tweets are coming from, b) There were more tweets after something happened then before anything happened, and c) People used terms like "people", "time", and "RT"*, all of which implies that d) we've been wasting money and time on this and we should change our approach. That just doesn't come off as well. I should have made more charts and graphs. I don't mean that in a totally cynical way. To me, interesting means "surprising" and "novel", but I'm happy with a definition that includes "pretty" and "compelling". Visualization allows us to really get our minds around information, but that information needs to be meaningful in the first place.
Snotty criticism aside, you can't argue with success. And people have done some really ground-breaking and impressive work along the way. It's certainly not as if the limitations of these technologies aren't well known to the people actually working on them, and if other people are willing to spend money on technology they don't understand, then that's fine. Full employment for software developers, computer scientists and mathematicians is a good thing, probably. Keeps us off the streets.
You can download the free Butterflyzer Alpha and see more screenshots and screencasts at http://butterflyzer.com. (Psst.., the first 50 people who follow us will get 50% off the retail price when and if we release a commercial product.) I'd love to hear what you think!
To find out what Butterflyzer actually does, check out Part II.
*Actually, there are some interesting things you can learn from such analysis, and I'm not really doing the Sysomos graph tools justice. Here's Butterflyzer's take on similar but more recent data. Here, we've integrated actual sample Tweets. Things are a little noisy graphically but the tools allow you to easily narrow and expand the information shown.