Data-mining the novel
Interesting report at the Chronicle about how Google Books is beginning to allow literature scholars to use data-mining techniques on novels:
New insights can be gleaned by shining a spotlight into the “cellars of culture” beneath the small portion of works that are typically studied, [Franco Moretti, a Stanford professor of English and comparative literature] believes. He has pointed out that the 19-century British heyday of Dickens and Austen, for example, saw the publication of perhaps 20,000 or 30,000 novels—the huge majority of which are never studied. The problem with this “great unread” is that no human can sift through it all. “It just puts out of work most of the tools that we have developed in, what, 150 years of literary theory and criticism,” Mr. Moretti says. “We have to replace them with something else.” Something else, to him, means methods from linguistics and statistical analysis. His Stanford team takes the Hardys and the Austens, the Thackerays and the Trollopes, and tosses their masterpieces into a database that contains hundreds of lesser novels. Then they cast giant digital nets into that megapot of words, trawling around like intelligence agents hunting for patterns in the chatter of terrorists.
Unsurprisingly, this has sparked a methodological debate in the field:
Novels are deeply specific, [Katie Trumpener, a professor of comparative literature and English at Yale University] argues, and the field has traditionally valued brilliant interpreters who create complex arguments about how that specificity works. When you treat novels as statistics, she says, the results can be misleading, because the reality of what you might include as a novel or what constitutes a genre is more slippery than a crude numerical picture can portray.
Worth a read.





