Are books considered data?

The answer is a clear “yes” to digital humanities scholars like Matthew Wilkens, who applies quantitative and computation methods to literature and cultural history.

“I care about the way in which books and films are both shaped by and shape the cultures that produce them,” Wilkens said. “Though humanists don’t describe them this way, books are huge data sets that we have access to. Trillions of words and books that have been recorded throughout history. That’s a fantastic resource.”

wilkens matt crop.jpg

Matthew Wilkens
Matthew Wilkens

This fall, Wilkens joins the Information Science faculty as an associate professor, expanding the department’s expertise in the digital humanities. He’ll teach INFO 3350, “Text Mining for the Humanities.”

“My goal is to get students up to speed on treating books as data, how to go from a bunch of sentences to something you can put in a feature matrix and feed into machine learning algorithms,” he said.

Wilkens’ work focuses in particular on literary text mining, geolocation extraction, genre detection, and the cross-pollination of critical and social-scientific methods. He directs the Textual Geographies project, is a co-investigator of the Text Mining the Novel project, and is a founding editorial board member of the Journal of Cultural Analytics. His book “Revolution: The Event in Postwar Fiction” examines postwar literature as a model of literary and cultural change. A native of Rochester, NY, he received a PhD in literature from Duke University and previously taught at the University of Notre Dame and Wayne State University. 

Growing up, Wilkens wanted to be a chemist, going so far as to earn bachelor’s and master’s degrees in chemistry. But he learned that life in the lab wasn’t for him, so Wilkens pursued studies in his other loves: English and literature. 

[B]ooks are huge data sets that we have access to. Trillions of words and books that have been recorded throughout history. That’s a fantastic resource.

For Wilkens, these eclectic and seemingly incompatible academic disciplines – computational and statistical training as a chemist; qualitative work in literary studies, history, and foreign language as a literature scholar – found a middle way. 

“That middle way,” he said, “is information science and the humanities.” 

By applying computational methods to literature stored in vast, online libraries, Wilkens – like Info Sci colleague and fellow digital humanities scholar David Mimno – is a kind of digital archeologist, resurrecting human history by mining through an era’s collection of written works.

“What we’re looking for are the things that books allow us to diagnose about the culture that produced them,” he said. “Through text mining, we can infer much about class, gender, and race.”

For example, Wilkens analyzed British fiction from 1880 through 1940 and found a curious trend among foreign-born authors of color: their stories about London were much more likely to note the city’s parks, rivers, and other green spaces than stories penned by foreign-born white authors. 

“Later in the 20th century, working-class immigrants to Britain gravitated to these places as a third space,” he said. “Scholars hadn’t previously realized that this was true in the early part of the century, too, when migrants were typically wealthier and more highly educated. But we see the same pattern in both cases. This tells us that aspects of racial exclusion were established much earlier than previously known.”

In joining Cornell Information Science, Wilkens said he’s excited to be involved with such a vibrant program, to work alongside “amazing colleagues,” and to teach some of the best students in world. He hopes to continue to strengthen the ties among Cornell’s humanities disciplines, Information Science, and its home unit, Computing and Information Science (CIS).

“In a period of entrenchment in a lot of humanities fields, you feel a different energy and enthusiasm in Information Science, where colleagues from all across the University are interested in exploring what the humanities bring to the field,” he said. “The chance to work with Cornell computer scientists and social scientists, I can’t tell you how exciting it is to have this kind of community.”

Louis DiPietro is the communications coordinator for Information Science.