4 July 2023

THE THREE WICKED PROBLEMS INHIBITING DATA-DRIVEN DECISION-MAKING IN THE ARMY

Zachary Szewczyk

Military leaders have sought hard data to drive their decisions for decades, perhaps most famously beginning with Secretary of Defense Robert McNamara’s so-called Whiz Kids in the 1960s. As retrospective analysis of McNamara’s data-driven approach to strategy made clear, however, data alone does not good decisions make. Errors in collection, transmission, and presentation decimated the efficacy of this initiative. The Vietnam War is a cautionary tale in data-driven decision-making gone wrong, an important reminder that modernity’s insatiable need for ever more data may be no more a silver bullet today than it purported to be sixty years ago.

Reasons for data’s failure to enable perfected decision-making are legion. Careful attention must be paid to the assumptions underlying the trend toward centralized decision-making, a concept that relies on complex, brittle systems of systems, demolishes agility and adaptability, and runs counter to the decentralized mandates of mission command. Here, too, we must not only ask whether or not we can enable certain functions with data, but also whether or not we should. Important questions regarding the efficacy of data-driven decision-making must also be addressed, especially in a world where data overload is not a danger but rather a given. Finally, well-studied biases threaten to make efforts to achieve data-driven decision-making little more than a quest for decision-driven data. Here, however, I deal only with the substantial technical challenges facing the joint force in making data-driven decisions. These challenges fall into three categories: collection, transport, and presentation. They represent the most significant barriers to meeting the threefold requirements for analysis—(1) correct and complete data, (2) in a suitable platform, with (3) the requisite analytics to answer decision-makers’ information requirements. And each challenge is a wicked problem, one for which no perfect solution exists, only progressively better ones.

Collection

In the context of data-driven decision-making, collection refers to the acquisition of data. Notably, this diverges from the definition of “collection” in Army Doctrine Publication (ADP) 2-0, Intelligence, which defines it as not only “the acquisition of information” but also the “provisioning” of this information to “processing elements.” In a data context, that secondary step raises its own challenges that are typically less present with other types of intelligence collection, so data transport should be considered independently of collection.

In data collection, sensors capture an abstraction of reality, at a given point in time, and turn it into data; this data may then be transported and later presented elsewhere. Collection, as the first challenge for data-driven decision-making, aligns directly to the first technical requirement for analysis, the need for correct and complete data. The most insidious challenges to data-driven decision-making manifest at this stage:How well does the sensor’s data represent reality at the time of its capture? In short, does the sensor capture the right data? This is a key question in the field of signal detection theory.

Is the sensor’s data sufficiently abstract so as to not overwhelm the collection mechanism, but not too abstract that it inhibits informed decision-making? In short, does the sensor capture data in the right way?

When considering the challenge of making informed decisions based on the disposition of an adversary, which may seek to distort a friendly decision-maker’s perception of reality, collection is already a well-understood challenge to data-driven decision-making. The intelligence community’s prized and closely guarded sources and methods are a reflection of this; these are the exquisite means through which decision-makers accurately understand their opponents.

Informed decision-making is also challenged by collection of friendly information as well, though. Proprietary vendor equipment, for example, often lacks organic sensors to collect any data whatsoever; when those organic sensors do exist, they often provide little more than diagnostic data appropriate for technicians to troubleshoot problems but not for military leaders to make decisions regarding the employment of their forces. As a result, while a mechanic might troubleshoot a fueler using inputs from the vehicle itself, a commander seeking to make informed decisions based on the disposition of his or her fleet must do so through a painstakingly slow and error-prone manual process.

Even information systems suffer from collection challenges. In the cyber domain, for instance, despite extensive built-in sensor mechanisms, lack of the right data captured in the right way continues to inhibit defensive cyberspace operations. Even though the domain’s very nature—a virtual space where collection as code enables the tuning of sensors with the push of a button—should intuitively make collection easier, sufficient collection remains a key challenge confounding operations. This should serve as a cautionary tale when dealing with collection in the land, sea, air, and space domains, where sensors are governed by physics, materiel, and acquisitions—and code, too.

Transport

While collection certainly faces challenges, it is firmly grounded in the present. Transport, on the other hand, is reliant on technology, systems, and programs dating back years or decades. Transport, in the context of data-driven decision-making, covers what ADP 2-0 describes as processing—activities including “data conversion and correlation, document and media translation, and signal decryption”—as well as the transmission of that data between systems. Transport aligns directly to the second technical requirement for analysis, correct and complete data in a suitable platform.

Transport is the most significant yet, paradoxically, the least studied challenge facing data-driven decision-making. This challenge is primarily the result of stasis in transmission mediums, particularly within the Department of Defense. While data volume has grown exponentially in the internet age, and will continue to do so at an ever-increasing pace going forward, transmission mediums progressed at a comparatively glacial pace—again, particularly within the Department of Defense. This impacts tactical units the most, whose options for data transmission not only failed to keep pace with the increase in data, but also with industry as well. While technologies like 5G and systems like Starlink present a nearly incomprehensible increase in capability to units accustomed to tactical connections measured in mere kilobits per second under the best of circumstances, they face a long and fraught road to general availability. As capabilities for data collection continue to increase, and researchers develop new and interesting ways to present that data, transport will continue to remain the most impactful factor enabling—or inhibiting—data-driven decision-making.

Presentation

Presentation, in the context of data-driven decision-making, refers not just to the literal visual presentation of data but also to the system through which data is made available to its users. Presentation aligns directly to the third and final requirement for analysis, correct and complete data, in a suitable platform, with the requisite analytics to answer the decision-makers’ information requirements.

The importance of presentation cannot be overstated, but any discussion of presentation must be accompanied by an acknowledgement that persistent and pernicious problems with collection and transport keep presentation from becoming the limiting factor in most scenarios. Nevertheless, in those scenarios where adequate collection and transport exist, presentation plays a significant role in the overall efficacy of the system. Unfortunately, the means—presentation of data in order to enable informed decision-making—have in many cases instead become the end itself: presentation of data in order to present data. Decision-makers, program managers, and others who influence tool development, procurement, and adoption must keep in mind the idea that “our ability to know is a function of our tools for knowing,” closely related to Jason Weiss and Dan Patt’s idea that software defines tactics. Data is how analysts see, and the tools through which they interact with data are like the window through which they peer. Presentation must not be a tertiary concern, even if it does appear third on this list. Tools matter. The impact of adequate presentation tools is particularly pronounced in defensive cyber operations, where those that only enable basic pattern matching, counting, and stacking will invariably do a poorer job enabling decision-making than those that capture the advanced analytics necessary to uncover subtle evidence of compromise.

Dependencies, Not a Checklist

These three challenges—collection, transport, and presentation—cannot be considered in isolation. To enable true data-driven decision-making, all three must be addressed together.

Without collection and transport, even the most intuitive presentation system will fail to enable data-driven decision-making. It would have no data to present. Similarly, without collection, even perfect transport and presentation will serve no real purpose. Those systems would have no data to transport or present. This dependency relationship works both ways. Without appropriate presentation, even exquisite collection and transport would fail. Of course, exquisite collection and perfect presentation mean naught without transport to connect them.

This should not be construed to justify boutique, end-to-end solutions. Incompatible systems already permeate the military. Rather, this should highlight the importance of understanding the ability of new systems to integrate with preexisting systems for collection, transport, and presentation as well as the suitability of those preexisting systems to support the new one. Each new solution need not add a new row to the table below; instead, designers must ensure that their new solution C can integrate with a sufficient transport medium 1 to feed a suitable presentation tool Y—all in service of enabling the ultimate goal, data-driven decision-making.

At the other end of the spectrum, however, concepts like joint all-domain command and control, or JADC2, attempt to address these three challenges together but do so in far too general a manner to succeed. As retired Air Force Lieutenant General David Deptula recently explained, “While [the Department of Defense’s JADC2] definition captures what JADC2 aims to achieve, it says little about how to achieve it. As a result, joint all domain command and control has partially stalled due to a cloudy department-wide vision that every service views slightly differently. To make this concept a reality, the Pentagon needs a straightforward, clear, and understandable description of what its vision entails.” Opposite cumbersome specificity lies unhelpful generality, a danger leaders must take care to avoid in addressing data-driven decision-making.

The present approach to enabling data-driven decision-making continues to err on the side of specificity by addressing the three columns in the figure above individually. In defensive cyberspace operations, for example, efforts to address collection gaps across the enterprise frequently clash with legacy wide-area network uplinks incapable of handling both increased collection and regular business usage. Fortunately, fresh improvements to the Army’s Big Data Platform, called Gabriel Nimbus, have recently given analysts an excellent presentation tool through which to interact with data as more networks begin to send their data to the platform. The imperative demonstrated by this specific example—that collection, transport, and presentation must be considered together—is present across the broader context of data-driven decision-making. As exciting as Gabriel Nimbus’s new data science tools are, they cannot be applied to data not resident in the platform.

Safeguards

Any discussion of data-driven decision-making must include a proviso that a human must make the eventual decision itself. A particular course of action may make sense based on the data, but it may be incompatible with rules of engagement or the law of armed conflict. Humans must ultimately retain responsibility for making the best decision possible given sound judgment and the information at hand. This is a crucial moral and ethical consideration that should remain at the front of any discussion of data-driven decision-making, especially when most (if not all) of that process becomes opaque, as is the case with systems based on machine learning or artificial intelligence.

Humans should not only serve as a final check in an otherwise automated decision-making process, but also implement safeguards along the way. Each step in any multistep process introduces another opportunity for error; when each step relies on the integrity of the previous one, those errors may cascade. At the collection, transport, and presentation stages, operator error, malicious actors, and simple entropy threaten the entire system. While several techniques for mitigating errors at each stage exist, all involve trade-offs.

Total agreement across multiple sensors in a distributed, austere environment, for example, is unlikely—especially given the likelihood of malign interference. Relying on agreement among a simple majority of sensors may also prove problematic if an adversary manages to interfere with 51 percent of them, thereby causing the system to present a tainted representation of reality. For obvious reasons, decision-makers should not regularly rely on that other 49 percent, for while the minority may represent a more accurate picture of reality in this specific situation, it may not in others. Conceptually, it is important to remember that different perspectives lead to different conclusions, but all perspectives are equally valid perceptions of reality. Consider, for example, a well-camouflaged enemy squad in a dense forest: an optical sensor would capture what appear to be trees and foliage, while a thermal sensor would capture the enemy forces’ heat signatures. Both would be valid perceptions of reality, but clearly only one is accurate and meaningful and enables the decision-maker to discern the true nature of the situation. Sensors capture an abstraction of reality, at a given point in time, and turn it into data; whether or not that abstraction is an accurate representation of reality, however, remains a question for decision-makers to decide. This question alone will require earnest analysis and careful deliberation.

Any discussion of data-driven decision-making risks getting mired in generalizations, and it is important to keep it grounded in reality by considering realistic applications. The example of the challenges facing defensive cyberspace operations highlighted earlier in this article is one such case. Correct and complete collection remains elusive across the enterprise. Transport for high volumes of data remains infeasible under many circumstances. And precious few systems have succeeded in presenting that data in a coherent manner. But this framework applies equally to tactical battlefield scenarios. There is the challenge of improving the sensor-to-shooter loop, for instance—shrinking the delay between a sensor capturing data and a shooter prosecuting a target. On a fundamental level, improving the connection between sensor and shooter is a data-driven decision-making problem. Further cases abound across the joint force and the defense enterprise.

Unfortunately, this is a far simpler concept to explain than to solve. Again, these are wicked problems for which no perfect solution exists, only progressively better ones. And the joint force does not just face three wicked problems in collection, transport, and presentation on its road to data-driven decision-making, but rather three wicked and interrelated ones. This article’s examination of those problems only provides a framework for conceptualizing the specific technical challenges facing the joint force. Of course, developing solutions is another matter, entirely. Yet as Charles Kettering believed, a problem well stated is a problem half solved.

Captain Zachary Szewczyk commissioned into the Cyber Corps in 2018 after graduating from Youngstown State University with an undergraduate degree in computer science and information systems. He has supported or led defensive cyberspace operations from the tactical to the strategic level, including several high-level incident responses. He currently serves in the 3rd Multi-Domain Task Force.

The views expressed in this work are those of the author and do not reflect the official policy or position of the United States Military Academy, the Department of the Army, or the Department of Defense.

Thanks C. R. G. and J. T. L. for providing feedback during the writing of this article. Their input was considered, but this article is not necessarily an accurate reflection of their opinions.

No comments: