25 February 2024

Pentagon explores military uses of large language models

Eva Dou, Nitasha Tiku and Gerrit De Vynck

After the initial delight around the world over the advent of ChatGPT and AI image generators, government officials have begun worrying about the darker ways they could be used. On Tuesday, the Pentagon began meetings with tech industry leaders to accelerate the discovery and implementation of the most useful military applications.

The consensus: Emerging artificial intelligence technology could be a game changer for the military, but it needs intensive testing to ensure it works reliably and that there aren’t vulnerabilities that could be exploited by adversaries.

Craig Martell, head of the Pentagon’s Chief Digital and Artificial Intelligence Office, or CDAO, told a packed ballroom at the Washington Hilton that his team was trying to balance speed with caution in implementing cutting-edge AI technologies, as he opened a four-day symposium on the topic.

“Everybody wants to be data-driven,” Martell said. “Everybody wants it so badly that they are willing to believe in magic.”

The ability of large language models, or LLMs, such as ChatGPT to review gargantuan troves of information within seconds and crystallize it into a few key points suggests alluring possibilities for militaries and intelligence agencies, which have been grappling with how to sift through the ever-growing oceans of raw intelligence available in the digital age.

“The flow of information into an individual, especially in high-activity environments, is huge,” U.S. Navy Capt. M. Xavier Lugo, mission commander of the recently formed generative AI task force at the CDAO, said at the symposium. “Having reliable summarization techniques that can help us manage that information is crucial.”

Researchers say other potential military uses for LLMs could include training officers through sophisticated war-gaming and even helping with real-time decision-making.

Paul Scharre, a former Defense Department official who is now executive vice president at the Center for a New American Security, said that some of the best uses probably have yet to be discovered. He said what has excited defense officials about LLMs is their flexibility to handle diverse tasks, compared with earlier AI systems. “Most AI systems have been narrow AI,” he said. “They are able to do one task right. AlphaGo was able to play Go. Facial recognition systems could recognize faces. But that’s all they can do. Whereas language seems to be this bridge toward more general-purpose abilities.”

But a major obstacle — perhaps even a fatal flaw — is that LLMs continue to have “hallucinations,” in which they conjure up inaccurate information. Lugo said it was unclear if that can be fixed, calling it “the number one challenge to industry.”


The CDAO established Task Force Lima, the initiative to study generative AI that Lugo chairs, in August, with a goal of developing recommendations for “responsible” deployment of the technology at the Pentagon. Lugo said the group was initially formed with LLMs in mind — the name “Lima” was derived from the NATO phonetic alphabet code for the letter “L,” in a reference to LLMs — but its remit was quickly expanded to include image and video generation.

“As we were progressing even from phase zero to phase one, we went into generative AI as a whole,” he said.

Researchers say LLMs still have a ways to go before they can be used reliably for high-stakes purposes. Shannon Gallagher, a Carnegie Mellon researcher speaking at the conference, said her team was asked last year by the Office of the Director of National Intelligence to explore how LLMs can be used by intelligence agencies. Gallagher said that in her team’s study, they devised a “balloon test,” in which they prompted LLMs to describe what happened in the high-altitude Chinese surveillance balloon incident last year, as a proxy for the kinds of geopolitical events an intelligence agency might be interested in. The responses ran the gamut, with some of them biased and unhelpful.

“I’m sure they’ll get it right next time. The Chinese were not able to determine the cause of the failure. I’m sure they’ll get it right next time. That’s what they said about the first test of the A-bomb. I’m sure they’ll get it right next time. They’re Chinese. They’ll get it right next time,” one of the responses read.

An even more worrisome prospect is that an adversarial hacker could break a military’s LLM and prompt it to spill out its data sets from the back end. Researchers proved in November that this was possible: By asking ChatGPT to repeat the word “poem” forever, they got it to start leaking training data. ChatGPT fixed that vulnerability, but others could exist.

“An adversary can make your AI system do something that you don’t want it to do,” said Nathan VanHoudnos, another Carnegie Mellon scientist speaking at the symposium. “An adversary can make your AI system learn the wrong thing.”

During his talk on Tuesday, Martell made a call for industry’s help, saying that it might not make sense for the Defense Department to build its own AI models.

“We can’t do this without you,” Martell said. “All of these components that we’re envisioning are going to be collections of industrial solutions.”

Martell was preaching to the choir Tuesday, with some 100 technology vendors jostling for space at the Hilton, many of them eager to snag an upcoming contract.

In early January, OpenAI removed restrictions against military applications from its “usage policies” page, which used to prohibit “activity that has high risk of physical harm, including,” specifically, “weapons development” and “military and warfare.”

Commodore Rachel Singleton, head of Britain’s Defense Artificial Intelligence Center, said at the symposium that Britain felt compelled to quickly develop an LLM solution for internal military use because of concerns staffers may be tempted to use commercial LLMs in their work, putting sensitive information at risk.

As U.S. officials discussed their urgency to roll out AI, the elephant in the room was China, which declared in 2017 that it wanted to become the world’s leader in AI by 2030. The U.S. Defense Department’s Defense Advanced Research Projects Agency, or DARPA, announced in 2018 that it would invest $2 billion in AI technologies to make sure the United States retained the upper hand.

Martell declined to discuss adversaries’ capabilities during his talk, saying the topic would be addressed later in a classified session.

Scharre estimated that China’s AI models are currently 18 to 24 months behind U.S. ones. “U.S. technology sanctions are top of mind for them,” he said. “They’re very eager to find ways to reduce some of these tensions between the U.S. and China, and remove some of these restrictions on U.S. technology like chips going to China.”

Gallagher said that China still might have an edge in data labeling for LLMs, a labor-intensive but key task in training the models. Labor costs remain considerably lower in China than in the United States.

CDAO’s gathering this week will cover topics including the ethics of LLM usage in defense, cybersecurity issues involved in the systems, and how the technology can be integrated into the daily workflow, according to the conference agenda. On Friday, there will also be classified briefings on the National Security Agency’s new AI Security Center, announced in September, and the Pentagon’s Project Maven AI program.

No comments: