31 January 2016

Analysis of NSA’s Former Bulk Telephone Metadata Program

Peter Koop
January 22, 2016

Section 215 bulk telephone records and the MAINWAY database

One of the most controversial NSA programs was the bulk collection of domestic telepone records (metadata) under authority of Section 215 of the USA PATRIOT Act. 

The Snowden revelations provided hardly any information about this program, but many details became available from documents that were declassified by the US Director of National Intelligence (DNI).

Because in these declassified documents all codenames are redacted, it was a mystery which NSA systems were used to store and analyse these metadata.

By combining many separate pieces from both the Snowden-documents, as well as those declassified by the government, it now has become clear that NSA put the domestic phone records in its central contact chaining database MAINWAY, which also contains all sorts of metadata collected overseas.


Reconstruction of the MAINWAY dataflow
(Click to enlarge) 

MAINWAY versus MARINA

Initially it was thought that MAINWAY was a repository just for telephone metadata. This goes back to a report by USA Today from May 10, 2006, which revealed that the NSA created a database containing “tens of millions” of domestic telephone call records obtained from AT&T, Verizon and BellSouth (the latter merged with AT&T as of 2007).

As such, MAINWAY was seen as the equivalent of MARINA, which is NSA’s storage for internet metadata. But meanwhile, various documents from the Snowden revelations have made clear that the actual repositories for telephone metadata are ASSOCIATION (for metadata from mobile calls) and BANYAN (for metadata from landline calls).

New documents have also shown that MAINWAY contains metadata from internet communications too. For example, in the following diagram about the FAIRVIEW collection program, we see that internet metadata from the Upstream collection first flow into MAINWAY before ending up in MARINA:




Dataflow for internet metadata collected under the
FAIRVIEW program under Transit Authority
(Click to enlarge) 

It’s not clear what exactly the differences between the contact chaining database MAINWAY and the metadata repositories like MARINA, ASSOCIATION and BANYAN are. It seems likely that in MAINWAY metadata are stored more or less temporarily for the purpose of analysing them. Metadata that NSA wants to keep for a longer period of time, or even indefinitely are then stored in the other repositories.

While the domestic metadata collected in bulk have to be destroyed after 5 years, the calling records that are the result of a query can be stored by the analyst. They may then be “subjected to other analytic methods or techniques besides querying, or integrated with records obtained by the NSA under other authorities”, as well as shared with others inside and outside NSA.*


MAINWAY, SIGINT Navigator (SIGNAV), ASSOCIATION and BANYAN
mentioned in a presentation about DEMONSPIT, under which call
records were obtained from major Pakistan telecom providers(!)
(Click to enlarge) 

Metadata sources

Based upon Snowden documents, The New York Times reported on September 28, 2013, that MAINWAY is used for chaining both phone numbers and e-mail addresses and that it is fed with data from tapping “fiber-optic cables, corporate partners and foreign computer networks that have been hacked”.

The report also says that as of August 2011, MAINWAY was fed with “1.1 billion cellular records a day in addition to the 700M records delivered currently”. However, The New York Times erroneously attributed these numbers to collection under authority of section 702 FAA and was therefore not able to identify that MAINWAY was also fed with the bulk phone records of Americans (which happens under section 215 Patriot Act).

The latter only became clear after The New York Times and ProPublica published some NSA documents about the FAIRVIEW program on August 15, 2015. One of these documents confirms that it was AT&T that provided the aforementioned number of records, and also that this happened under BR FISA (= Section 215) authority.

So as of 2011, at least 1,8 billion domestic phone records a day were coming in, which makes 54 billion a month and about 650 billion a year. Before they were handed over to NSA, AT&T stripped off the location data in order to comply with the FISA Court orders, that don’t allow those data to be collected.

Apparently Verizon Wireless and T-Mobile US saw no obligation to remove these location data, so their cell phone records couldn’t be collected by NSA, which therefore only got less than 30% of the domestic telephone metadata.


Under the President’s Surveillance Program (2001 - 2004/2006)

NSA started collecting telephone and internet metadata from US telecommunication providers shortly after the attacks of September 11, 2001. This was part of the President’s Surveillance Program (PSP, protected under the STELLARWIND classification compartment), which was based upon what in the end would be 43 subsequent secret authorizations by president George W. Bush.

The goals of collecting these metadata were identifying unknown terrorist operatives through their contacts with known suspects, discover links between known suspects, and monitor the pattern of communications among suspects.

At first, only metadata were collected from communications in which at least one party was outside the US. AT&T (identified as Company A or FAIRVIEW) started to provide both phone and internet metadata from international channels as early as November 2001, and for Verizon (Company B or STORMBREW) the automated transfer of such data started in February 2002.

Allegedly, raw metadata were transferred in real-time through a high speed data link between the main computer centers of the telecoms and an NSA facility.* Then, parsers were used to filter the metadata of unwanted information (like credit card numbers), and the records were put in a standard format compatible with NSA databases.

For example, in September 2003, AT&T “captured” several trillion internet metadata, of which some 400 billion records (apparently those with a high probability of containing terrorist communications) were selected for processing. These were flowing into the MAINWAY contact chaining database, which also contains metadata from collection abroad. The 2009 report about the STELLARWIND program says:

“NSA’s primary tool for conducting metadata analysis, for PSP and traditional SIGINT collection, was MAINWAY. MAINWAY was used for storage, contact chaining, and for analyzing large volumes of global communications metadata.” 
(interestingly, in some documents MAIN WAY seems to be written as two separate words, which make it resemble MAIN CORE, which is a central database containing essential intelligence information on Americans produced by the FBI and other US intelligence agencies) 


Under FISA Court orders (2004/2006 - 2011/2015)

In July 2004, the collection of domestic internet metadata was moved from the President’s Surveillance Program to the FISA Court (FISC), which authorized this effort based upon section 402 FISA, or as it is called by NSA: PR/TT (short for Pen Register/Trap and Trace).

In May 2006, the same happened with the bulk telephone records, for which the FISC allowed continuation under authority of section 215 USA PATRIOT Act, or as NSA calls it: BR FISA (short for Business Records FISA).

Under the FISA Court orders, bulk telephone collection eventually became to include “all call detail records or ‘telephony metadata’ created […] for communications between the United States and abroad” or “wholly within the United States, including local telephone calls”. Only metadata of fully foreign communications were excluded, as was the case for most mobile phone calls, due to technical reasons.

Because right from the beginning, NSA stored these domestic phone and internet metadata in the same database (MAINWAY) that contains metadata from traditional collection efforts abroad, queries could result in contacts chains made up of identifiers from both foreign and domestic sources. The query tool simply didn’t identify the difference.

Also it was possible for analysts to start a query with selectors that were not BR FISA-approved, and in some cases this also provided results from both the foreign and the domestic collection. This was not according to the FISA Court orders, and after NSA informed the court about this, they had to stop accessing the telephone metadata in 2009, until these issues had been solved.*

An internal NSA training module from 2011 shows that at least by then, NSA had tagged the metadata records with XML tags to identify not only what legal authority the metadata were collected under, but also the SIGAD of the intercept facility where that had happened.


A rare diagram about the BR FISA metadata collection:
the decision process as it was from 2006 - 2009
(Source - Click to enlarge) 

Other databases for domestic call records

The domestic call records were not only stored in MAINWAY, but also in another database, one that was apparently dedicated for US phone metadata. An NSA training presentation (.pdf) from 2007 confirms that BR FISA data were stored in two NSA repositories, although both names had been redacted.

An NSA review from June 2009 describes this second database as a “repository for individual BR FISA metadata call records for access by authorized Homeland Security Analysis Center (HSAC) and data integrity analysts to view detailed information about specific telephony calling events”. 

This seems to refer to the complete calling records, and also the PCLOB-report (.pdf) about the BR FISA program says there’s analysis software that “provides the associated information about the telephone calls involved, such as their date, time of day, and duration”.

So probably the second database gave access to these additional details, whereas MAINWAY only contains or provides “summaries of one-hop chains”, i.e. selector #1 was in contact with selector #2 and the number of times this happened within a specific timeframe.

The PCLOB-report suggests that when, either manually by an analyst, or in an automated process, a contact chain was created, the full records related to the phone numbers of this contact chain were transferred to the second database, which in the report is called the “corporate store”.

In the glossary of the 2009 NSA Review, the second repository is listed with a remarkably long name, which, according to its position, has to start with and M, N or O:




This exceptionally long name of the second database could indicate that it was some kind of provisional repository, because on page 23 of the 2009 BR FISA review it is said:

“NSA is preparing to incorporate the [second database] into the NSA corporate architecture. This transition to the corporate engineering framework will maximize use of the latest technologies and proven configuration management to minimize any security and compliance risks” 
And indeed, in appendix B of a report (.pdf) by the NSA’s Inspector General from August 1, 2012, we see that the second database now has a shorter name, and that it had replaced a “Transaction Database” with a much longer name in January 2011:


Transaction is another term that NSA uses for metadata, so “transaction database” probably just means that it contains the (full) metadata records. This 2012 Inspector General report lists three additional storage systems for BR FISA data, making a total of five being involved here:

1. Contact chaining database that accepts metadata from multiple sources (= MAINWAY)

2. Database repository that stores detailed metadata information, which supports the contact chaining summaries in (MAINWAY). Replaced an earlier database in January 2011.

3. Contingency database for the time the aforementioned database was being rebuild

4. System backup that stores an exact copy of the raw metadata from the providers

5. Backup tapes on which periodically the raw metadata were saved off-line 
So when NSA needs large data centers, that’s also because the same data are stored at least in threefold.

Bulk internet metadata (PR/TT)

As mentioned before, MAINWAY was not only fed with telephone metadata, but also with metadata from domestic internet communications. These metadata include the “to”, “from”, and “cc” lines of an e-mail, as well as the e-mail’s time and date. Its seems that for contact chaining, no metadata from other kinds of internet communications, like messengers, were used.

On August 11, 2014, an internal NSA Review (.pdf) about this PR/TT program was declassified, which shows similar storage systems as for the phone records: full copies of the internet metadata were also stored in the MAINWAY contact chaining database, as well as in a dedicated second repository:



The PR/TT bulk internet metadata program was shut down in December 2011 for “operational and resource reasons” and all data were deleted. Based upon declassified NSA reports, The New York Times reported on November 19, 2015, that this “internet dragnet” was ended because, among other reasons, similar results could be achieved under other authorities:

- Section 702 FAA, which allows access to internet communications between foreigners and Americans from the “PRISM-providers” and “Upstream collection”.

- The SPCMA regulation, which allows using US person identifiers for querying metadata that have been collected abroad. 
With collection of internet metadata both overseas (under EO 12333 authority) as well as at the borders of the US (under 702 FAA), NSA probably didn’t need the purely domestic ones anymore, to still capture those that are of interest.

Also, querying the metadata collected overseas appeared more attractive, because abroad, NSA is allowed to collect much more types of metadata, than inside the US, where collection was heavily restricted by the FISA Court.

In a declaration for the FISA Court from February 13, 2009, then NSA director Alexander explained that multi-tiered chaining of phone calls is more efficient and useful, “because unlike e-mail, which involves the heavy use of spam, a telephonic device does not lend itself to simultaneous contact with large numbers of individuals”.

Replacement?

According to the secret Budget Request to Congress for 2013, NSA wanted to create (or maybe expand MAINWAY into) a metadata repository capable of taking in 20 billion metadata records a day and make these available to analysts within 60 minutes.

But after Snowden disclosed the Verizon bulk phone records order in June 2013, the American public became aware of the actual scope of this program and it became the most controversial part of NSA’s activities. 

In January 2014, the Privacy and Civil Liberties Oversight Board (PCLOB) judged that Section 215 collection was actually of “minimal value in safeguarding the nation from terrorism” and that there was “no instance in which the program directly contributed to the discovery of a previously unknown terrorist plot or the disruption of a terrorist attack”.

According to PCLOB, the bulk phone records did provide some value “by offering additional leads regarding the contacts of terrorism suspects already known to investigators, and by demonstrating that foreign terrorist plots do not have a U.S. nexus”. This however, was not seen as a sufficient justification for the large-scale collection of domestic phone records.

In the course of 2015, US Congress eventually enacted the USA FREEDOM Act, which prohibits NSA to collect and store domestic call records in bulk as of November 29, 2015. Instead, the agency now has to apply for a warrant from the FISA Court approving specific selectors, which are then provided to telecommunication providers, who use them for querying their own databases and only the results are handed over to NSA.

How this new regime will work out, is explained in the USA FREEDOM Act Business records Fisa Implementation Transparancy Report (.pdf), which was published just a few days ago.

> Next time: a closer look at the contact chaining process

Links and Sources

No comments: