The Defense Advanced Research Projects Agency (DARPA) isn’t known for thinking small, and DARPA has turned its attention (and budget) to a massive task: developing a set of software engines that can transcribe, translate, and summarize both text and speech without training or human intervention. The program, called the Global Autonomous Language Exploitation (GALE), attempts to address the lack of qualified linguists and analysts who know important languages like Mandarin and Arabic.
When bid solicitations went out last year, they told interested parties that DARPA wanted three separate modules built. The first handles the transcription of spoken languages into text. The second is a translation module that can convert foreign text into English, and the third is a “distillation” engine that can answer questions and summarize information provided by the other two modules. While this technology would certainly be put to use by military personnel in the field, it is really designed for deployment in the US, where analysts are easily overwhelmed by the electronic information gathered by the intelligence community.
Most of this information simply goes untranslated, but if GALE is a success, the US government would have access to transcriptions of foreign broadcast news, talk shows, newspaper articles, blogs, e-mails, and telephone conversations. Even with the translation work done, though, this information would be overwhelming, which is why the distillation engine is such an important component of the product.
The project, now more than one year old, has several teams of contractors competing with one another to develop the best software. Those companies are IBM, SRI International, and BBN Technologies, and they are supported by the Linguistic Data Consortium at the University of Pennsylvania. To remain in the program and continue to receive funding, each group must hit performance milestones; DARPA says that the transcription engine must be at least 65 percent accurate and the translation engine must be 75 percent accurate at the first milestone. The final milestone in the program is 95 percent accuracy in both modules.
The Associated Press recently took a look at the BBN team, which has 24 people working on the project. DARPA has already made clear that they will cut any team not meeting performance targets, which would automatically translate into job cuts at a company like BBN. “I cannot entertain that idea right now,” team leader John Makhoul told the AP. “It’s just so drastic that we just don’t think about it.”
As it happened, though, no one was cut after the first performance trials of the system, so research into GALE continues for all three firms. Though all were able to approach or exceed the initial targets, approaching 95 percent accuracy—even for casual discussion in noisy environments—remains a huge challenge. DARPA has a stated goal of “eliminating the need for linguists and analysts,” but that day may still be years away.