Word Clouds; Tag Clouds. Which is the best software?

Terry Freedman produces ‘Computers in Classrooms’, an invaluable newsletter for all concerned with ICT. In the April 2010 issue he evaluates a number of programs that generate word clouds.

You can sign up for a free subscription to ‘Computers in Classrooms’ at http://www.ictineducation.org.

Word Cloud Shoot-Out

Terry Freedman

What’s a word cloud? It’s an application that will create a visual representation of a piece of text, based on word frequency. The more often a particular word occurs, the larger it will appear.


This sort of application has two immediately obvious applications. Firstly, it can be used to very quickly get across the gist of a piece of text. I used this in my summary of the ICT Level Descriptors in order to be able to convey very quickly what was expected from students at each Level. You can see from the illustration below that this is a very appealing approach, especially if you’re a visual learner. It gives you an immediate way in to the data. It’s not a substitute for a more in-depth study of it, but it’s a good way of helping you to organise your thoughts before you do so.



Another kind of use for word clouds is to check if what you’ve written about reflects what you intended to write about. It’s all too easy to get sidetracked into a different train of thought entirely. Students are good at doing this when they’re writing essays. Throwing the text into a word cloud generator is a quick and easy way to see if that has happened and, if it has, demonstrating it to the student.

There is a third use for word clouds, which is to help you tag an article before publishing it to the internet. Presumably you already know what you’ve written about, but the word cloud may yield a couple of sub-themes which you hadn’t consciously been aware of.

For example, in my article on geotagging, I included a few references to photos and video. Now I had clearly meant to, but the interesting this is that until I generated the word cloud it didn’t occur to me that tagging the article with the terms ‘photography’, ‘digital photography’ and ‘video’ would be quite appropriate.

Yet another use is to illustrate the article from which the word cloud was generated. This is a little self-referential perhaps, but sometimes it is difficult to find a suitable photo or drawing with which to embellish a piece of text. In such circumstances, a word cloud does the job very nicely.

Finally, another use, which I will be expanding on below, is to enable you to explore the data in more detail. For example, you will be able to find out how many times a particular word occurs, which may prove useful to anyone hoping to improve their use of English – or who helps others to do so.

Until recently, my word cloud generator of choice was Wordle, which is what I used for the screenshot above. However, a few weeks ago I received an email from Hardy K.S. Leung from Tagxedo, telling me about the new service and inviting me to try it out. Here’s how he described the features:
  • Highly interactive (no server round-trip)
  • Fast cloud generation time
  • No registration required
  • Custom shapes
  • Use image as shapes
  • Words as shapes
  • Powerful layout engine (very nice shape hugging)
  • Lots of fonts
  • Accepting user-uploaded fonts
  • Save to image files (Png or Jpeg)
  • History view (see all “versions” and pick the one you like)
Now, there is obviously a lot there to recommend it. Wordle, the one I normally use, is quite basic by comparison. So, I thought I’d do a comparative review, bringing in two other such services too: TagCrowd and Many Eyes. Here are the results, based on my article about digital financial literacy.

Wordle

http://wordle.net
This is Wordle’s interpretation of the article on financial literacy. I generated this by copying and pasting the text. I mention this because when I used the URL of the article instead, Wordle worked on my most recent blog entry, not the URL submitted. Tagxedo, on the other hand, interpreted the URL correctly.


Tagxedo

http://www.tagxedo.com/
Here is Tagxedo’s treatment of the same article.



I would say that although Tagxedo may have the edge over Wordle in terms of how far you can customise the results, it has not done a good job of reflecting the gist of the article. When you look at the Wordle word cloud, you can see straight away that the article was about financial literacy. That cannot be said for the Tagxedo word cloud.

Therefore, I’d suggest that Tagxedo would be a good tool to use if you’re looking for a way to jazz up a piece of writing, or the cover of a coursework assignment. I would not recommend it as a means of summarising what a piece of writing is about. Yet: Tagxedo is currently in beta, so I think it would be worth revisiting the site every couple of weeks to see if the algorithm it uses has improved.

TagCrowd

http://tagcrowd.com/
Here is the word cloud generated by TagCrowd.



I think you’d agree that it’s rather minimalist. However, it does have some attractive features behind the scenes. For example, you can set the maximum number of words to show, and their minimum frequency. You can also tell it to ignore the words in a customised list. All this makes it a more serious and, I think, a more serious tool than Wordle.

Many Eyes

http://manyeyes.alphaworks.ibm.com/manyeyes/
Here’s the word cloud that Many Eyes generated:



This is not as good, in my opinion, as the Wordle word cloud – but bear with me. Many Eyes has a number of things up its sleeve, including what they call an enhanced tag cloud generator. Here is the result I obtain when I use that instead, and then put the mouse pointer over the word ‘financial’:



As you can see, rather than merely returning the number of times the word occurs, it gives the context in which these instances occur. This makes the tag cloud a much more revealing, and therefore useful, tool.

I mentioned other features of Many Eyes. These may be summarised as follows:

Be sociable

First of all, Many Eyes facilitates collaboration or, at the very least, being able to use other people’s data. This is because in order to use any of the data handling features you have to register, and then if you want to use your own data you have to upload it. This then makes it available for everyone else to use. The result is that there are lots of data sets to choose from for your data-handling work: no more looking for ingenious ways of generating new large data sets.

There is a 90 day restriction on using the data sets, so I’d suggest contacting the owner of any data you wanted to use in order to obtain permission to keep it for longer if necessary.

Not just word clouds…

I’ve already mentioned that there is an ‘enhanced’ version of the word cloud. There are several other ways to explore blocks of text, and these are potentially very powerful indeed.

Word tree

In this, you enter a word or phrase, and then the tool creates a sort of mind map in which you can see which phrases lead on from, or lead to, the word or phrase you’ve entered. For example, look at how the text is treated using the phrase ‘financial literacy’:



Now, it’s a bit hard to see the detail in this screenshot, but I think you can get the general idea. What’s happened is that we’ve been provided with a list of phrases and the phrases that emanate from them. Moreover, this data is dynamic. For example, you can make any of the words in a branch the source of a new branch. Also, deciding on the word or phrase to use is quite a skill in itself, and lends itself to discussion work followed by trial and error and further discussion.

Phrase net

This analyses the block of text in terms of the relationships between words, using connectors like ‘and’ and wildcards. For example, in this phrase net generated from the England and Wales ICT Programme of Study, the word ‘organise’ is connected to several other words.



This immediately suggests, I think, what the architects of this curriculum had in mind, and what kind of things should be considered under the heading ‘organisation’. Furthermore, holding the mouse pointer over a connecting line will, again, reveal the number of times the words are connected and the sentence in which those connections occur.

… And not just words

Unlike Wordle and the others, which cannot handle tabulated data especially well, Many Eyes can. You simply copy/paste the data from a spreadsheet, and then you can start working with it.

Again, you have several options available to you, including bar charts, tree maps, bubble charts and even a world map. I would certainly recommend that you introduce your data-handling colleagues – history, mathematics, geography and science immediately spring to mind – and invite them to play around with it and perhaps come up with ways of using Many Eyes’ charting facilities with their students.

So what?

All this could, of course, be a sledgehammer to crack a peanut if all you want to do is generate a pretty picture or get a quick idea of what a piece of text is all about. However, I think these tools are potentially much more useful than that.

I mentioned in passing that I had used the text of the ICT Programme of Study in the Phrase Net tool. I also used it in the Word Tree tool, using the word ‘information’ as a filter. Here’s what I got:



As before, it’s not easy to see the detail, but what this has done is create a complete list of how the word ‘information’ is used in the entire Programme of Study. Moreover, because I used the default setting of arranging the phrases in the order in which they occur, I can see at a glance how the concept of information is used in each Level Descriptor, and how it is refined as we move up the Levels.
Thus at Level 1, children are expected to know that information can come in various forms. At the higher levels, however, they must use this knowledge to tailor their work for different specific audiences.

Conclusions

Clearly there is a lot more to word clouds than the usual Wordle-type representation that many of us are familiar with. It’s difficult to recommend one particular application because each has its strengths and weaknesses. So here are my recommendations:

Use Wordle if you want an accurate representation of a block of text, and which looks attractive, without too much hassle.

Use Tagxedo if the ability to customise, and the option of having different shapes, is more important than accuracy – but remember that it is currently in beta and so may be worth checking out again every so often.

Use TagCrowd if you want more control over which words are included, or the conditions under which they’re included.

Use Many Eyes if you want to explore both verbal and statistical data in myriad ways.

And a price comparison? All of these are free to use.

If you want to try it out for yourself, go to http://manyeyes.alphaworks.ibm.com/manyeyes/datasets/financial-literacy/versions/1, where you will find the text of my article on financial literacy. Click on the Visualisations button near the bottom of the screen to see what you can do. You will have to register (free) if you wish to keep the results of your efforts.