Pages

24 February 2021

Death by Data: Drones, Kill Lists and Algorithms

Jennifer Gibson

In 2018 Google employees made headlines when they openly protested the company’s involvement in Project Maven – a controversial US programme aimed at integrating artificial intelligence into military operations. Google argued it was simply helping automate analysis of drone footage. Employees signed an open letter to CEO Sundar Pichai arguing Google ‘should not be in the business of war’ (BBC 2018). For many communities in places like Pakistan and Yemen, computers are already making life and death decisions. Massive amounts of signals intelligence are being run through algorithms that make decisions as to who is ‘suspicious’ and who ‘isn’t.’ For populations with a drone flying overhead, those decisions can be deadly. Nobody knows the damage America’s covert drone war can wreak better than Faisal bin ali Jaber. Faisal’s brother-in-law, Salem, was killed by a drone just days after he preached against al-Qaeda in 2012 (Jaber 2016). The strike was likely a ‘signature’ strike, one taken based on a suspicious ‘pattern of behaviour.’ This chapter will examine the case of Faisal bin ali Jaber and address just some of the troubling questions that arise as big data and remote warfare converge. Can targeting based on metadata ever be compliant with international humanitarian law (IHL) and its principle of ‘distinction’ and just what are the ‘feasible’ precautions the US must take to ensure it is?

America’s Drone Wars: the Case of Faisal bin Ali Jaber

While the first known drone strike took place in Afghanistan in 2001, it was not until President Obama came into office that drones became the weapon of choice in the United States’ ‘War on Terror’. Dubbed by some ‘The Drone Presidency’, President Obama used drones to carry out at least 563 strikes during his time in office, ten times more than his predecessor George W. Bush (Cole 2016). Controversially, the strikes were taken outside of traditional battlefields in places like Yemen, Pakistan and Somalia, and killed potentially as many as 4936 people (Purkiss and Serle 2017).

One of the biggest criticisms levelled at the programme – both under President Obama and now under President Trump – is the degree of secrecy with which it is carried out and the lack of accountability for mistakes that have been made. President Obama failed to even acknowledge the programme’s existence until early in his second term (Obama 2013), and it took another three years before his administration would release the first accounting of civilian harm (Shane 2016). That accounting estimated that almost eight years of strikes had killed between 64 and 117 ‘non-combatants’, a range significantly lower than independent estimates, which ranged from 207 (excluding Somalia) to 801 (Shane 2016). The New York Times’ Scott Shane wrote that ‘[i]t showed that even inside the government, there is no certainty about whom it has killed’ (Shane 2016).

The data when it was released also failed to include very basic details, such as when and where those civilian casualties occurred (Zenko 2016). This made it impossible for human rights groups and independent monitors to compare their own numbers to the government’s figures or to assess why there were such wide discrepancies. It also left the families of those who lost loved ones asking: ‘Has my family been counted?’ (Jaber 2016).

Faisal bin ali Jaber was one of those people asking. An engineer from Yemen, Faisal’s brother-in-law, Salem, and nephew, Waleed, were killed by a US drone strike on 29 August 2012. Salem was an imam who was known for speaking out against al-Qaeda in his sermons, and Waleed was one of only two policemen in their local village of Khashamir. The Friday before he was killed, Salem gave a sermon at the mosque, denouncing al-Qaeda’s ideology. The sermon so strongly denounced al-Qaeda that Faisal would later state that members of the family were worried he might be in danger of reprisals from the group. When Faisal spoke to Salem about the family’s concerns, Salem responded: ‘If I don’t use my position to make it clear to my congregation that this ideology is wrong, who will? I will die anyway, and I would rather die saying what I believe than die silent’ (Jaber 2013).

Shortly after the sermon, three young men arrived in the village, demanding to speak with Salem. Worried about security and concerned they might be al-Qaeda, Salem eventually agreed to meet them, but took Waleed with him for protection. They agreed to meet outside the mosque in the open, where Salem and Waleed thought it would be safest. Within minutes of stepping out of the mosque to meet the three young men, a drone hovering overhead fired, killing all five people (Ibid.).

Faisal was present on the day Salem was killed, as the entire family had gathered to attend his eldest son’s wedding. Instead of celebrating, they spent the day collecting body parts. When the Yemeni security services arrived an hour after the strike, Faisal asked them why they waited to strike until Salem and Waleed were present. There was a checkpoint less than 1km from the village that the men must have travelled through to reach the village and a military base 3km away. They had no answers (Ibid.).

Faisal went looking for answers and in November 2013, he travelled to the US to speak to Congress and meet with officials from the National Security Agency. The headline on the front page of the New York Times summed up the trip: ‘Questions on Drone Strike Find Only Silence’ (Shane 2013). Eight months later, one of Faisal’s relatives was offered a bag containing $100,000 in sequentially marked US dollar bills at a meeting with the Yemeni National Security Bureau (NSB). The NSB official told a family representative that the money was from the US and that he had been asked to pass it along. When the family asked for official written notification of who it was from, the security agents refused (Isikoff 2014).

In 2015, Faisal filed a civil claim against the US Government seeking an apology and a declaration that the strike which killed his relatives was unlawful. He did not seek compensation, instead asking only for the acknowledgment that did not come with the cash secretly offered to his family in 2014. In the suit, Faisal also questioned why, in the apparent absence of any immediate threat, the three unidentified targets could not have been detained safely by Yemeni forces at checkpoints or, failing that, why the missiles could not have been fired sooner when the targets were isolated (Ahmed Salem Bin Ali Jaber v United States 2015).

Despite new information showing that US officials knew shortly after the strike that Salem and Waleed were civilians (Currier, Devereaux and Scahill 2015), in June 2017 the Court of Appeal for the District of Columbia rejected Faisal’s case. The Court ruled unanimously that it could not rule on the matter, citing precedent preventing the judicial branch from adjudicating ‘political questions.’ However, in a rare concurring opinion, Judge Janice Rogers Brown issued an unprecedented rebuke of the drone programme. She pronounced American democracy ‘broken’ and congressional oversight a ‘joke’ in failing to check the US drone killing programme. The judge, an appointee of President Obama’s predecessor, George W. Bush, seemed troubled that current legal precedent prevented her court from acting as a check on potential executive war crimes. Calling drone strikes ‘outsized power’, she questioned who would be left to keep them in ‘check’ stating that ‘it is up to others to take it from here’ (Ahmed Salem Bin Ali Jaber v United States 2017).

The Role of Algorithms in US Targeted Killings

Faisal has never received answers to why, or how, his family was targeted. It was likely though a ‘signature’ strike gone wrong. The precision language that surrounds drones and the US targeted killing programme suggests that the US always knows the identity of the individuals it targets. The reality is much different.

There are two main types of strikes: ‘personality’ strikes taken against known, named high value targets, and ‘signature’ strikes, which are taken based upon ‘suspicious’ patterns of behaviour. (Becker and Shane 2012) President Obama authorised both types of strikes in Yemen and Pakistan, with the criteria for taking such strikes widely regarded as too lax. One intelligence agent, speaking anonymously to the New York Times, said the ‘joke’ was that the CIA ‘sees three guys doing jumping jacks’ and the agency suspects a terrorist training camp (Ibid.).

The practice elicited widespread criticism, with a variety of actors raising concerns about the legality of such strikes, the civilian harm they engendered, and the potential counter-productivity of killing individuals you could not even identify. In response, President Obama signalled in May 2013 that the US would take steps to phase out this controversial tactic (New York Times 2013). His successor, President Trump, reportedly reinstated them within months of coming into office (Dilanian, Nichols and Kube 2017).

What has become clear through leaks to the media is that the ‘signature’ upon which both administrations relied is far less visual and far more data driven. Lethal drone strikes are the culmination of a complex process that involves the collection of data and intelligence through mass surveillance programmes that hoover up millions of calls, emails and other means of electronic communications. Surveillance drones gather countless images and videos which are analysed and fed into the identification and location of suspects. In April 2014, General Michael Hayden, former Director of the CIA, told a John Hopkins University Symposium that the United States ‘kills people based on metadata’ (Cole 2014).

This is especially true in places like Yemen, where the US has a limited footprint. Without human sources on the ground, it is overly reliant on signals intelligence from computers and cell phones, and the quality of those intercepts is limited (Currier and Maass 2015). Moreover, according to a leaked US military document in 2013, signals intelligence is often supplied by foreign governments with their own agendas (Ibid.). Such questionable signals intelligence makes up more than half the intelligence collected on targets (Ibid.).

The remaining signals intelligence is collected through mass surveillance programmes run by the United States and its European allies, including the UK. Through classified programmes, such as OVERHEAD, GHOST HUNTER and APPARITION, the US and its allies have been hoovering up intelligence from satellites, radio and cell phone towers in countries like Yemen and Pakistan for the express purpose of identifying and locating targets (Gallagher 2016). The aim of such programmes, according to one leaked document about a programme called GHOSTWOLF, is to ‘support efforts to capture or eliminate key nodes in terrorist networks’ (GCHQ 2011).

According to one drone operator, the United States often locates drone targets by analysing the activity of a SIM card, rather than the actual content of the calls. He said the problem with this is that they frequently have no idea who is holding the cell phone they target:

‘People get hung up that there’s a targeted list of people. It’s really like we’re targeting a cell phone. We’re not going after people – we’re going after their phones, in the hopes that the person on the other end of that missile is the bad guy.’

‘Once the bomb lands…you the know the phone is there. But we don’t know who’s behind it, who’s holding it. It’s of course assumed that the phone belongs to a human being who is nefarious and considered an ‘unlawful enemy combatant. This is where it gets very shady.’(quoted in Scahill and Greenwald 2014)

A leaked document from Edward Snowden shows just how this data is then fed into algorithms that help the US identify targets. According to a document titled ‘SKYNET: Applying Advanced Cloud-based Behavior Analytics’, the US developed a programme called SKYNET that it used to identify suspected terrorists based on their metadata – the electronic patterns of their communications, writings, social media postings and travel. According to one slide, SKYNET ‘applies complex combinations of geospatial, geotemporal, pattern-of-life and travel analytics to bulk DNR [Dial Number Recognition] data to identify patterns of suspect activity.’ Put more plainly, the programme used ‘behaviour-based analytics’ to run data such as travel patterns, ‘frequent handset swapping or powering down’, low-use phone activity, or frequent disconnections from the phone network through an algorithm which then identified those who fit the ‘pattern’ of a terrorist (National Security Agency 2015).

The essential flaw in the use of automated algorithms to select targets is aptly demonstrated by the individual SKYNET itself identified – Ahmed Zaidan. Ahmed Zaidan, the former Bureau Chief of Al Jazeera in Pakistan, is a ground-breaking journalist who managed to interview Osama bin Laden twice before September 2001. As part of his job in Pakistan, he regularly interviewed those associated with al-Qaeda and other militant groups. Yet SKYNET still classified him as the ‘highest scoring’ target, identifying him as a courier for al-Qaeda in part because of his travel patterns (National Security Agency 2015).

The Use of Metadata in Targeting – IHL Compliant?

Zaidan’s case aptly demonstrates the problematic nature of targeting based on algorithms and raises questions about just how certain metadata can be. One of the foundational principles of international humanitarian law[1] is the protection of civilians in conflict. In order to ensure such protection, Article 51(2) of the First Additional Protocol to the Geneva Conventions, requires parties to a conflict to ‘distinguish’ between combatants and non-combatants when targeting individuals for lethal force (Henckaerts and Doswald-Beck 2009, 3). This principle applies to both international and non-international armed conflicts and has been described by the International Court of Justice (ICJ) as the ‘cardinal rule’ of IHL (Corn 2011–12, 441).

Because the principle of distinction is so central to IHL, Article 57(2) of the First Additional Protocol states that it is essential that those who are planning to carry out the attack should ‘[d]o everything feasible to verify that the objectives to be attacked are neither civilians nor civilian objects.’ (Henckaerts and Doswald-Beck 2009). In situations where there is still ‘doubt’ that an individual is a legitimate target after taking all ‘feasible’ precautions, Article 50(1) states ‘that person shall be considered to be a civilian.’ (Henckaerts and Doswald-Beck 2009).

The question that therefore arises in the context of the use of metadata and targeting is how good is metadata at distinguishing between civilians and those ‘directly participating in hostilities’ or carrying out a ‘continuous combat function’? And is that sufficient enough to meet the ‘feasibility’ standard set out by the International Committee of the Red Cross, which requires a party to a conflict to take ‘those precautions which are practically possible taking into account all the other circumstances’ (Rogers 2016).

The evidence to date would suggest it is not. Take for instance the SKYNET programme leaked by Edward Snowden. While we do not know whether the United States ever used the programme to carry out lethal action, we do know based on Michael Hayden’s comments that it likely used some form of algorithm to ‘kill people based on metadata.’ SKYNET therefore provides a window into the type of programmes the US is developing and the types of problems that might arise.

Patrick Ball, a data scientist and the Director of Research at the Human Rights Data Analysis Group, has previously given expert testimony before war crimes tribunals. After reviewing the SKYNET slides, he identified several flaws in the way the algorithm worked, which made the results scientifically unsound. One of the key flaws was that there are very few ‘known terrorists’ for the National Security Agency (NSA) to use to train and test the model. A typical approach to testing the model would be to give it records it has never seen, but according to Ball, if the NSA is using the same profiles the model has already seen then ‘their classification fit assessment [will be] ridiculously optimistic.’ He goes on to point out that a false positive rate of 0.008 would be remarkably low if this were being used in a business context, but when applied to the general Pakistani population, it means 15,000 people would be misclassified as ‘terrorists’ (Grothoff and Porup 2016).

Security expert Bruce Schneier agrees with Patrick Ball:

Government uses of big data are inherently different from corporate uses. The accuracy requirements mean that the same technology doesn’t work. If Google makes a mistake, people see an ad for a car they don’t want to buy. If the government makes a mistake, they kill innocents.(Grothoff and Porup 2016)

Looking beyond the programme itself, there is also the evidence emanating from on the ground investigations by independent organisations. Take, for example, independent monitoring by the Bureau of Investigative Journalism, which suggests that as many as a quarter of those killed under the Obama administration may have been civilians (Purkiss and Serle 2017). Experts believe that ‘signature’ strikes, such as the one that killed Faisal’s family members, likely accounted for the vast majority of these (Dworkin 2013).

In March of this year, a German court questioned the lawfulness of US strikes in Yemen in part because of their impact on civilians. The case was brought by Faisal in 2014, arguing that the continued use of Ramstein Air Base by the US for strikes in Yemen threatened his and his family’s right to life under the German constitution. In March 2019, the German high court agreed. It found that ‘at least part of the US armed drone strikes […] in Yemen are not compatible with international law’, and that Germany must do more to ensure its territory is not used to carry out unlawful strikes (Bin Ali Jaber v Germany 2019). In its decision, the court acknowledged that Faisal and his family ‘are justified in fearing risks to life and limb from US drone strikes that use Ramstein Air base in violation of International Law’ (Bin Ali Jaber v Germany 2019).

A factor in the court’s decision was significant increase in drone strikes since President Donald Trump took office, and his administration’s reported rollback of safeguards that were intended to protect civilians, including the renewed use of signature strikes. The court went on to state that there were ‘weighty indicators to suggest that at least part of the US armed drone strikes…in Yemen are not compatible with international law and that plaintiffs’ right to life is therefore unlawfully compromised.’ (Bin Ali Jaber v Germany 2019) The German Government’s declarations to the contrary, according to the court, were based on ‘insufficient fact-finding and ultimately not legally sustainable’ (Bin Ali Jaber v Germany 2019). The court also noted that the fact that Faisal and his family were denied a judicial review by the American courts of their relatives’ deaths ‘runs counter to the idea that there were any [independent investigations by US authorities’ (Bin Ali Jaber v Germany 2019).

Conclusion: ‘Feasible’ Precautions and the Death of Faisal’s Family

The German court’s decision highlights a key facet of the legal question surrounding the use of metadata in targeting: that adequate, independent post-strike investigations are the bare minimum of what ‘feasible’ precautions should include. Algorithms, at their best, merely tell us about relationships. They don’t tell us whether Faisal’s brother-in-law is meeting with three young men because he’s planning an attack with them, or instead, if the meeting is to explain why he believes al-Qaeda’s ideology is wrong. Metadata cannot tell us whether Ahmed Zaidan is meeting with known fighters because he too plans to fight, or because he is a journalist doing his job. Moreover, even the best algorithms only work after constant testing and refinement, the type of refinement that requires one to identify and correct for errors. Without post-strike investigations, there is no way the US can tell whether their strikes have hit lawful targets. Without this information, there is no way they can take ‘feasible’ precautions to ensure the mistakes like those that killed Faisal’s family do not happen again.

Notes

[1] There is significant controversy over whether US drone strikes in places like Pakistan and Yemen are indeed part of an armed conflict. There is not scope in this paper to go into depth on this debate, and so for the purposes of argument it assumes that they are. This is because if they are not, international human rights law, not international humanitarian law, would apply. The former is a much stricter legal standard on the use of lethal force and if a strike, or method, does not meet the bare requirements of international humanitarian law, it will in all cases fail to meet those set out by international human rights law.

No comments:

Post a Comment