Prior to this DH Lab, we had finished the mid-term class assessment, and had made our way up through The Scarlett Letter. The lab class is just 50 minutes, so to introduce the concept of Ngrams, I used the now-classic TED talk “What We Learned from 5 Million Books”; it provided a quick introduction to the Google Ngram viewer and what it might afford students for text mining purposes.
On our Moodle LMS, I provide links to Ted Underwood’s Stone and the Shell Blog on etymology and poetic diction so that really engaged students can read Professor Underwood’s lucid and sophisticated ideas about what text mining is beyond Google. And I also link to the Atlantic Monthly’s article on the new Google Ngram viewer. I try to adhere to my own “DH at the CC” principle of “do it during class or they won’t get to do it at all.”
Here are the instructions I gave to students in a smart classroom with some demo’ing on my part first:
For this week’s DH Lab, use your readings so far and the examples from The Atlantic Monthly and the TEDx talk to dig around the 19th century for trends in the way people were writing about themes from the decades that we’ve been reading in.
(Start with 1800-1900 in American English but you can play around with those years if it serves your research purpose….)
You will probably have to try a few of them before you find any that make some real sense. Once you find an ngram that seems interesting or telling to you, post the link to it and offer a comment on what you think it might mean. In other words, try your hand at “culturomics” and text mining.
And here are some interesting ngrams that students found. What I like about these is how their tacit and emerging ideas about American literature and history are present in their choice of word pairs and clusters.
My strongest impression of using this assignment in class is that students were thoroughly engaged in it, looking for words that would bring an interesting data patten. One or two were satisfied with utterly flat lines, such as what emerged for “who, what, where, when, how, why,” but overall students posted intriguing ngrams that led them to more questions than answers–which is just what I had wanted them to see about this kind of text mining.
What continues to be a challenge in doing DH at this level is to fully integrate these new tools into class in a way that supports their ongoing 10-week investigation of the subject matter–here, “American Literature.” Last term, In Women Writers, I focused too much on the tools and left the integration behind. This term, I’m asking them to use fewer tools and leading them through a more archival approach–through primary sources, material culture etc. Since we only have one day in the lab–and students thoroughly enjoy and engage in face-to-face discussion the other two days, it will be a matter of really designing each of these assignments in such a way that each of these labs will give a double aha moment: about how new tools can offer new ways to think about old ideas and even lead to new questions altogether….