sexta-feira, 3 de dezembro de 2010
Digitalizando as humanidades
Grupo de pesquisa de usa tecnologia digital para estudo de textos literários
A equipe analisa 3.600 livros digitalizados do século 19 usando programas como Java The R Project for Statistical Computing. Eles chamam isso de "digitalização das humanidades", o professor Mateus Jockers, juntou-se pelo Professor Franco Moretti e cerca de 20 alunos do Departamento de Inglês, acredita que seja o futuro da análise literária.
Com essa metodologia, os pesquisadores podem determinar padrões de vocabulário e caracterizações de gêneros literários calculando, por exemplo, o número de ocorrências de cada palavra em um texto.
Confiram na matéria abaixo:
Digitizing the Humanities
Stanford research group uses digital technology to study literary texts
By Maggie Beidelman - Palo Alto Patch
Much to the dismay of tweedy traditionalists of English literature, a research team at The Stanford Literary Lab is using computer science to explain textual linguistics.
The team analyzes 3,600 digitized novels of the 19th century using programs like Java and The R Project for Statistical Computing. They call it "digitizing the humanities," and lecturer Matthew Jockers, joined by Professor Franco Moretti and nearly 20 students of the English Department, believe it to be the future of literary analysis.
With this methodology, researchers can determine patterns of vocabulary and characterizations of literary genres by calculating, for example, the number of occurrences of every single word in a text.
"Suddenly we no longer have to look at a single text but can look at this entire sort of literary ecosystem," said Jockers.
Instead of doing a traditional close reading of a text to see how sentences move together to form a tale or story, researchers are now able to investigate common linguistics across a literary genre, he said.
It's a clash between traditionalists and those open to the new research, as computer science in the literature lab is sometimes regarded as a forbidden fruit.
"It's seen as an abomination, the fact that a machine could replace the human work of the critic," said Federica Perazzini, a visiting student on a Fulbright scholarship to conduct the research. "But actually, there's a huge human work behind the digital humanities. To query a database, you need a human critic to ask the questions, and a human interpreter to analyze the data."
Perazzini is a student in Jocker's class, "Literary Studies in the Digital Library," which will present a paper on the research for presentation at the 2011 meeting of the Alliance of Digital Humanities Organizations (ADHO) at Stanford.
"What the skeptics miss is indeed the big picture: the possibility that the study of 4,000 19th century novels, instead of the usual 40 or 50, may give us not just a bigger but a different, literary history," said Moretti of the digital methodology referred to as "text mining."
In October, the team was awarded $790,310 for a two-year project using the "mining"—or search—software of Software Environment for the Advancement of Scholarly Research.
With SEASR's support, Stanford is at the forefront of this research, though the methodology is being pursued across the nation. Stanford has also joined forces with the HathiTrust, a digital library whose goal is to preserve texts by digitizing them.
The project's current corpus consists of 3,600 British, Irish and American novels from the 19th century.
One of Jocker's favorite analyses has been of the relative frequency of the word "the" used in British versus American novels from 1800 to 1900.
Jockers mapped his research of the word "the" on a graph to visualize the study.
"When you plot that as a line, what you see is that the American usage of the word is consistently higher than the British usage," said Jockers, who attributed this to the simple fact that the British frequently drop the word "the" in conversation, such as in, "I'm going to university."
But it was a different factor that Jockers found fascinating. "When you look at the two lines, they tend to mirror each other over time," he said.
"As the American frequency of the word is increasing, so is the British. That's hard to explain, because this is the pre-Internet era, and these two countries are separated by 2,000 miles of water, mirroring each other. I've spent five years thinking about that one."
With the new technology, there's a huge shift in terms of the critical paradigm, said Perazzini. "We are able to query and be open to 'the great unread,' all the other novels that were written that nobody actually knows about. You have to rethink the old ways to do literary criticism."
Moretti, who leads the team with Jockers to create a research analysis presentation at the end of the grant period in 2012, answers to the question of the study's clout.
"Working on a digital archive is like entering a Greek myth—hybrids, metamorphoses, distortions everywhere," said Moretti. "It's fascinating; but it's also difficult to extract a rational picture."
"That, however, is our project, and its pathos lies in keeping the rational aim in sight—by using statistics, network theory, computational linguistics, evolution—through a maze of crazy contradictory details and endless false starts or disappointments."