Jaim Coddington
Eight years ago, Pedro Domingos envisioned an artificial intelligence breakthrough he called “the master algorithm” – a single, universal machine learning model that would be able to derive all past, present, and future knowledge from data. By this definition, the master algorithm would generalize to almost any task that humans can do, revolutionizing the global economy and automating our daily lives in countless ways.
Today, the spectacular rise of large language models (LLMs) such as ChatGPT has a lot of observers speculating that the master algorithm, or at least its primordial form, has been found. ChatGPT and its siblings show great promise in generalizing to a vast range of use cases – any domain where data exists and where knowledge can be represented by tokenized language is fair game.
The power and apparent simplicity of this vision is seductive, especially in the public sector. Leading tech companies in the federal space are already probing the potential for LLMs to transform government bureaucracies, processes, and data management. The funding lavished on OpenAI and other research and development teams for LLMs is based on business cases for digital assistants, customer service chatbots, internet or database search, content generation, content monitoring, etc. The possibilities are vast.
Some of these are common to the public and private sectors, and LLMs will be exquisitely useful in addressing them. But not every problem or solution can or should be dual use. Governments face wicked problems that the private sector does not. Right now, LLMs are only half of the answer for the most difficult problems in government – disaster response, counter drug trafficking, and military operations, to name a few. These are tasks where data is scarce, dirty, and intermittent in ways that are hard to fathom by commercial expectations. They demand decisions, actions, and human judgment in high-stakes scenarios where false positive or false negative hallucinations from an LLM could be deadly, or catastrophic. In a sensor-saturated future, where every relevant aspect of our existence can be captured as data, sophisticated LLMs may finally, truly, completely eat the world. That future, and the manifestation of a genuine master algorithm, is still a long way off.
What we need now is a salt to the LLM’s pepper – a complement that relies less on mountains of human-generated or human-curated data, is still highly generalizable, and can give humans predictive insight that is immediately relevant to the problem sets faced by the public sector in the physical world. To find it, we can look back at the previous machine learning hype cycle starting in 2015, when AlphaGo, AlphaStar, and OpenAI Five helped create a fresh wave of excitement about the potential of reinforcement learning (RL).
In each of these cases, an RL algorithm mastered a complex strategy game and defeated human world champions in that game. OpenAI Five’s achievements in the globally popular videogame Dota II are particularly interesting because its research team used an approach called “self-play” to train the model. Unlike AlphaGo and AlphaStar, which benefited from training on historical gameplay data from humans, OpenAI Five learned entirely by playing against itself. By scaling and parallelizing self-play instances, the OpenAI Five team was able to train the model on 45,000 years of Dota gameplay over the course of 10 real-time months.
In later projects involving multi-agent reinforcement learning (MARL) in a virtual environment, OpenAI researchers found that individual reinforcement learning “agents” playing against each other in teams were able to cooperate to achieve objectives and discover novel and unforeseen actions entirely through self-play, with no outside direction from humans. In other words, these MARL agents quickly mastered a complex game, and then learned new ways of interacting with their environment to win the game – alien tactics that humans did not or could not discover by themselves.
This is inductive reasoning on steroids. With MARL, it becomes possible to rapidly simulate thousands of alternate versions of a given scenario, and then analyze and learn from those scenario iterations. By identifying patterns in these iterations and understanding how variables like agent decision making and environmental features change outcomes, MARL can help us plan and understand future actions. Dr. Strange’s character in the movie Infinity War provides an analogy for this: he “goes forward in time” to examine over 14 million possible futures of the war between the Avengers and Thanos. Ultimately, he finds just one in which the Avengers are able to defeat their nemesis, and this foresight helps the good guys win in the end.
What if we created a relevant abstraction of the real world – a virtual environment with representative physics and a focus on the behaviors, interactions, and decisions between intelligent agents? If we get this balance between physics and intelligence right, this MARL environment would allow us to peer into the future and optimize our decision making in new and powerful ways.
Consider the specific problem of counter drug trafficking operations. Every year, thousands of drug-carrying vessels transit the Caribbean Sea and eastern Pacific Ocean, delivering vast quantities of illegal drugs to ports in the north. The United States’ Joint Interagency Task Force South (JIATF-S) uses every resource at its disposal – Navy and Coast Guard ships, maritime patrol aircraft, and more – to detect and interdict as many of these shipments as possible. This is an exceptionally difficult wide area search problem, and JIATF-S simply cannot cover the entire ocean all of the time. On average, JIATF-S only detects about 10% of estimated maritime smuggling events. Even when these vessels are detected, approximately one in five get away.
MARL could help address this problem by simulating JIATF-S operations thousands of times over, revealing through agent behavior and decisions the optimal placement and employment of scarce patrol ships, aircraft, and other resources to detect and interdict more illicit vessels. MARL could also help JIATF-S planners experiment with tactics and long-term strategies, simulating scenarios where new technologies or methods are used to help with search, or new overseas bases become available for operations. These simulations could also be used to understand how changes in the environment or drug cartels’ trafficking operations affect the JIATF-S mission.
This type of experimentation with MARL could greatly benefit other national security use cases such as military wargaming, systems engineering, mission planning, and command and control. It could also enable similar use cases across each of these areas, creating a common tool and a common thread between the acquisitions and procurement community and the warfighting community. For example, if a MARL simulation platform helps a wargamer quickly create an experiment to test a novel idea or a hypothetical capability, the same tool could just as easily be used by an operational commander’s staff to compare and contrast differing friendly courses of action. MARL could also help mission planners develop and refine adversary courses of action and enhance red cell efforts.
If we evolve this idea from Charmander to Charizard, we can envision a capability that approaches clairvoyance. In the future, the MARL tool could automatically run simulations based on critical real-world data injects: a new adversary troop movement is detected, a new weapon system is deployed, critical infrastructure is suddenly damaged, or a new weather pattern emerges in the operational environment. It is not unrealistic to think that MARL could rapidly provide decision makers with a window into the future for these types of events, in conjunction with the automation and alerting delivered by other machine learning capabilities like LLMs and computer vision. If MARL could become a crystal ball, perhaps the time has come for a deeper look.
No comments:
Post a Comment