I thought that effort was pretty cool, but the limitations were really obvious. Not coincidentally -- clouds are in the air after all -- Jacob Harris, the guy responsible for many of the New York Time's brilliant infographics (hate the word, but it is descriptive) posted a scathing but quite legitimate critique of word clouds, ending with the line "Every time I see a word cloud presented as insight, I die a little inside." I like this guy's style! Though I think Jacob doth over-harsh a bit. Word clouds have their place. My take is that they are great for depicting the madness of crowds but lousy at deciphering it. As I said in my original post:
"One of the downsides of this approach is that unlike in our graph visualizations, you lose concept proximity. So for example, looking at the cloud you can instantly see that Tunisia and Lebanon are frequently mentioned, but what you can't see is how frequently they are used together. That's what our graph visualization tools do now, and I think there are some neat possibilities for marrying the two approaches."After I posted this, a couple of my Eclipse compatriots -- Ed MacKerrow and Fabian Steeg -- shared that they'd been thinking along similar lines. I've been playing with this over the last few weeks (probably too much, given that no one has given any indication that it's a feature they're willing to pay for) and I've come up with an initial implementation in Butterflyzer. As always, there are trade-offs and no visualization is perfect, but I'm pretty happy with the results.
A SOPA Opera
Let's use the latest attempt (editorial alert) by the greedy and small-minded to destroy the Internet as our subject. A search on Tweets containing "SOPA" returns a sample of 10,000 or so tweets from the last week. This is what a Butterflyzer produced "Spring Cloud" looks like:


Right away we can see how things are associated with each other. Sometimes the associations are obvious; for example when we see "vote" and "house" (bottom middle, in white) together. That's useful, and sometimes you can even pick the content up just by looking at the tags, as in "unconstitutional", "stanford" and "review". But many of the relationships are much more interesting. For example, notice that NDAA comes up quite a bit. When it does come up, it seems to be associated with "@barrackobama, "obama" and also interestingly "#OWS". That gives real insight into the meaning and relevance of the tweet terms, not simply their existence.
We can infer more subtle information than that from the diagram; for example, that the relationship between Obama, NDAA and SOPA in the twitter discourse is organic -- that is, probably not reflecting an orchestrated Tweet campaign, a few viral Tweets or say, Fox News. (If you're looking for those guys, you'll like the video below.) Instead, the use of multiple terms for Obama might indicate that people are associating the president and these two legislative (editorial alert) attacks on the Bill of Rights without any help from say, the mainstream media. And in fact, if we look at the individual tweets, that's exactly what we do see. In the following list we've eliminated the RTs, but the RT ratio to original content is also quite light. If we Focus on those Terms, this is what we see:
The Details
So, how are we creating these Spring Clouds? No secrets or bogus patent claims here. (Perhaps you or someone you know has discovered this solution already; if so let me know and I'd love to add a reference to your work here.) I'll outline the basic approach below...it's really pretty simple. If you're a visualization geek, try to figure about how this was done before reading further. (The name is already a pretty big hint.) If you're not a visualization geek, then good news -- you don't have to figure out any of this. Butterflyzer handles all of the details, and you can set it all up with a few mouse clicks to customize the graph view. See the section below for how.
Preparing the data:
- All of the Tweets are collected using Butterflyzer's automated search tools.
- Butterflyzer identifies all Terms (words) that appear in any Tweet and indexes the Tweets against those.
- We then drop: some common english terms; terms with occurances below some threshold (say 250); and the original search words and any other really common connections.
- Create nodes representing a) Terms and b) Tweet Groups. Tweets are grouped together whenever they have exactly the same set of edges. Edges connect the terms to any Tweets containing those terms. Butterflyzer does all of this automatically based on your specifications.
- Layout the Graph using a Spring Layout Algorithm. We use the Eclipse Zest project's Spring Layout Algorithm written by Ian Bull and Casey Best, heavily hacked to provide thing like continuous updating.
- The Trick: We then hide the Tweet Groups and their connecting edges.
Each circle represents a group of tweets, and the area of the circle is proportional to the number of Tweets in the group, just as you'd expect. As you can see, we're using a lot of data and a lot of computation for what looks like a pretty simple word cloud.
So again, this thing might actually be useful, right? Suppose you're a Legislative Assistant for one of the senators who will be deciding whether to vote this (editorial alert) heinous bill out of committee. You might want to dig into what folks that are interested in a particular aspect of the legislation -- such as copyright -- are actually talking about (and perhaps whether your boss or any of his or her campaign donors come up in any of them). You can quickly scan through the cloud and associated terms, and even identify interest clusters.
Limitations
As I said above, all visualizations have limitations. They're on a tradeoff landscape like everything else. Here, the tradeoff vs. Clouds is information density. Because a Cloud diagram doesn't care where words go it can pack things wherever they fit best. We can improve the Spring Cloud layout from where it is now, but there is is a natural limit. Also, the Spring Layout algorithm we're using isn't very good at handling cases of overlapping nodes. Sometimes that's cool, but sometimes it's just messy. Especially since SWT doesn't support alpha for text (but does a great job for shapes as you can see above) the text can be difficult to read and look crowded.
Another issue is that the underlying spring algorithm is in O(n^2). That's geek speak for scary and ugly. As the number of nodes increases -- and as you can see from the examples above, it definitely will increase -- the cost of building a graph increases with the square of the increase in size. There are techniques to mitigate that, but computational costs are a limiting factor, especially for a tool like Butterflyzer that needs to do things like respond to user input in a timely way. We've spent a lot of time trying to tune the UI to be responsive while still putting enough cycles toward the layout, but there is definite room for improvement there.
Finally, compared to Word Clouds, Spring Clouds just aren't that sexy looking. Because we can't just put words wherever we want, things are unbalanced and lack symmetry. We can improve that, but again, only within limits. If truth and beauty have a love-hate relationship, sometimes you're just going to have to pick sides.
Try It!
- Download and install the Butterflyzer Beta.
- To do it yourself:
- Do a search on SOPA (or whatever you like) by typing it into the search field.
- Click the "Tweet Terms" node under "Collections".
- You might want to do some clean up as described above and set up your views to show only the graphs and outline. See the extensive Butterflyzer docs for more on that.
- From the Filter Menu, select "Cliques".
- From the "Draw Options" menu, Unselect (by clicking) "Show Items", "Show Items Text", "Show Icon", "Scale Count" and "Scale Topsy". Select "Scale Related".
- Or just download SOPA Example and open it from within Butterflyzer.
Sex, Lies and Spring Cloud Video
There aren't docs on how to do this part yet, but if you want to experiment with this new feature, just select "Select Time Span" from the "Filter" Toolbar Menu or the "View" Menu.






