Ngrams

Thanks to Erica at Reading 1900-1950 for spreading the word about Ngrams.

The Google Ngram Viewer is ‘a phrase-usage graphing tool.’ Based on scans of over 5.2 million books, it charts the yearly appearance of  any n-grams (letter combinations) that you care to enter.

I’ve plotted incidences of the phrase ‘Great War’ against appearances of the phrase ‘First World War’ over the twentieth century. The result is much what I expected – though it’s interesting to see the dramatic rise of ‘First World War’ after 1935, as fears of a second increased. (click on the graph to see it enlarged).

ngram

Since this website possibly has a family audience, I shall not post the graphs I made to track the usage of very rude words over the century. But try it for yourself. It’s fun.

Update, half an hour later:

It’s just occurred to me that one can possibly use this tool to track the reputation of writers. This graph suggests quite a bit about how the  popularity of   ‘Edith Sitwell’ and ‘Siegfried Sassoon’ fluctuated over the twentieth century:

sitsass2

This one tracks the relative fortunes of Siegfried Sassoon and Wilfred Owen:

sassowen

And this shows the inexorable triumph of Virginia Woolf over Arnold Bennett:

bennwoolf

But now I really ought to stop doing this, and get on with something more useful.

4 Comments

  1. Tom Deveson
    Posted May 28, 2013 at 6:20 pm | Permalink

    It *is* useful – thank you!

  2. Posted May 29, 2013 at 2:47 pm | Permalink

    Interesting. A search for ‘atomic bomb’ from 1900 to 1944 reveals a few actual uses of the term before Hiroshima, but also a lot of problems with the dataset – documents written after 1945 but dated earlier by google.

    • Posted May 29, 2013 at 7:37 pm | Permalink

      That’s the problem with it – Google Books looks at a lot of texts, but without any discrimination. The results can be suggestive, but are not really evidence.

  3. Posted December 7, 2013 at 3:05 pm | Permalink

    There’s a good article at:
    http://searchengineland.com/when-ocr-goes-bad-googles-ngram-viewer-the-f-word-59181
    which points out some problems with ngrams, especially with early texts. The long ‘s’ confuses most ocr devices, it seems, and a ‘suck’ is likely to be read as a ‘fuck’, which is actually rather different.


Post a Comment

%d bloggers like this: