4 April 2018

Why (almost) everything reported about the Cambridge Analytica Facebook ‘hacking’ controversy is wrong


If you follow the Guardian or the New York Times, or any major news network, you are likely to have noticed that a company called Cambridge Analytica have been in the headlines a lot.  The basic story as reported is as follows: A shady UK data analytics company, with the help of a 24 year old tech genius developed an innovative technique to ‘hack’ facebook and steal 50 million user profiles. Then they used this data to help the Trump and Brexit campaigns psychologically manipulate voters through targeted ads. The result was Vote Leave ‘won’ the UK’s Brexit referendum and Trump was elected president in the US.

Unfortunately, almost everything in the above summary is false or misleading.


First, There was no hack.

The data collected was scraped from Facebook user profiles, after users granted permission for a third party app to access their data. You know those little confirmation windows that pop up when someone wants to play Candy Crush or use Facebook to log in, rather than make a new password, for a random site? Yeah those.


A Cambridge academic called Aleksandr Kogan — NOT Cambridge Analytica and NOT the whistleblower Christopher Wylie — made a ‘Test Your Personality’ app, helped to promote it by paying people $2–4 to install it on Amazon’s Mechanical Turk crowdsourcing site, and used the permissions granted to harvest profile data. 270,000 users installed the app, so you might expect that 270,000 profiles were collected but the app actually collected data from 50 million profiles.

50 million?!?

Yes. You see back in the heady days of 2014, Facebook had a feature called ‘friends permission’ that allowed developers to access the profiles of not only the person who installed their app but all their friends too. The only way to prevent this from happening was to have toggled a privacy setting, which few Facebook users even knew existed (here is a blog from 2012 explaining how to do so). The friends permission feature is how Kogan multiplied 270,000 permissions into 50 million profiles worth of data.

That Facebook users were having their data shared by their friends without their knowledge or permission was a serious concern that many privacy advocates noted at the time. So in 2015, facing growing criticism and pressure, Facebook removed the feature citing a desire to give their users “more control”. This decision caused consternation amongst developers as the ability to access friends profiles was extremely popular (see the comments under this 2014 post from Facebook announcing the changes). Sandy Parakilas, an ex-Facebook manager, reported to Bloomberg that “tens or maybe even hundreds of thousands of developers” were making use of the feature before it was discontinued.


To review, there are two key points to remember at this point:
None of what I just described involves ‘hacking’ Facebook or exploiting a bug. Instead, it all revolves around the use of a feature that Facebook provided to all developers and (at least) tens of thousands took advantage off.
The data collected was not internal Facebook data. It was data that developers (s̵c̵r̵a̵p̵e̵d̵ ) accessed* from the profiles of people who downloaded their apps (and their friends). Facebook has a lot more data on users than is publically available and it has it for everyone who uses their platform. No-one but Facebook has access to that data. This is a point that almost all the journalists involved seem unable to grasp, instead they repeatedly equate ‘Facebook’s internal data’ to ‘data (s̵c̵r̵a̵p̵e̵d̵ )accessed from Facebook profiles using a third party app’. But these are VERY different things.
(*changed term as per convincing suggestion provided in the responses.)

The importance of this second point becomes apparent when you read exchanges like this one:
Simon Milner, Facebook’s UK policy director, when asked if Cambridge Analytica had Facebook data, told MPs: “No. They may have lots of data, but it will not be Facebook user data. It may be data about people who are on Facebook that they have gathered themselves, but it is not data that we have provided.

This exchange is being reported as evidence that Facebook lied to politicians about its relationship with Cambridge Analytica. But when you understand the difference between Facebook’s internal data and data collected on Facebook by outside developers it is clear that what Facebook’s policy director is saying is very likely true.

So where does Cambridge Analytica come in to the story?

Well, they paid Kogan to collect those 50 million profiles. Whose idea that was originally is currently a matter of ‘he said, she said’. Kogan says Cambridge Analytica approached him and Cambridge Analytica says Kogan came to them. Whatever the case may be, this is the part of the story where there was an actual breach; not of Facebook’s internal data but of Facebook’s data sharing policies. Developers were permitted to collect all the user data they wanted from their apps, but what they were not allowed to do — even back in 2014 — was take that data and sell it to a third party.

And yet, regardless of Facebook’s official policies, it seems that they did not expend much effort to police their developers or track how the data they collected was being used. This is likely why, when Facebook first uncovered that Kogan had sold some data to Cambridge Analytica in 2015, they were content to receive written confirmation from both that the data had been deleted.

The fact that there were (at minimum) tens of thousands of developers with access to such information meant that it was inevitable that data harvested on Facebook was being sold, or otherwise provided, to a wide array of third parties. Again, the disgruntled ex Facebook manager confirmed as much:
Asked what kind of control Facebook had over the data given to outside developers, he replied: “Zero. Absolutely none. Once the data left Facebook servers there was not any control, and there was no insight into what was going on. Parakilas said he “always assumed there was something of a black market” for Facebook data that had been passed to external developers.

So given how prevalent Facebook data harvesting was and that there are many developers with more than 270,000 users to harvest from, why is Cambridge Analytica receiving so much media attention?

The answer to this seems to primarily be how journalists, particularly Carole Cadwalladr at the Observer, have framed the story. The majority of coverage has pushed two angles. First, that a whistleblower from Cambridge Analytica revealed ‘a major breach’ of Facebook’s data, an issue covered above, and second, that this ‘breach’ was linked to the success of Trump’s presidential campaign.

Chris Wylie the mastermind who ‘hacked’ Facebook…

This second angle is as dubious as the first and relies heavily on bombastic claims made by Chris Wylie—the pink haired ex-Cambridge Analytica employee pictured above. Carole Cadwalladr, who spent years on the story, has explained in various interviews that she approached the story not as an investigative journalist but as a features writer. This meant that she focused on delving into ‘the human side of the story’, or put another way- Chris Wylie. There are pros and cons to such an approach but the biggest drawback is how invested and reliant it made her and subsequent coverage in accepting Wylie’s narrative, which just so happens to paint him as a young mastermind at the center of global political conspiracies.

Cadwalladr fully endorses Wylie’s presentation and fawningly describes himas: “clever, funny, bitchy, profound, intellectually ravenous, compelling” … “impossibly young” … “His career trajectory has been, like most aspects of his life so far, extraordinary, preposterous, implausible” … “Wylie lives for ideas. He speaks 19 to the dozen for hours at a time” … “when Wylie turns the full force of his attention to something — his strategic brain, his attention to detail, his ability to plan 12 moves ahead — it is sometimes slightly terrifying to behold” … “his suite of extraordinary talents include the kind of high-level political skills that makes House of Cards look like The Great British Bake Off.”

Wow… what a guy.

Cadwalladr’s person-focused approach might make for more accessible articles but it also helps to obscure the relevant technical details in favour of providing sensationalist quotes and personal anecdotes from Wylie and his friends and coworkers. Presenting these kinds of details could be insightful, if they were subjected to sufficient critical examination but this rarely occurs. Cadwalladr, instead, seems to have entirely bought into Wylie’s narrative: “by the time I met him in person, I’d already been talking to him on a daily basis for hours at a time.”

So let’s address the oversight and take a bit more of a critical look at what Wylie’s narrative claims:
That Steve Bannon wanted to weaponize big data… No difficulty believing.
That Cambridge Analytica claims to be able to provide effective tools for psychological targeting and manipulation… Certainly true.
That Chris Wylie, himself, was involved with some shady business and views himself as partly responsible… Sure.
That the self-promotional claims of Cambridge Analytica actually equate to how effective the services they provide are… Hmmmm.

This last point is the most important and yet it is also the one lacking almost any supporting evidence.

The temptation might be to point to Trump’s surprising victory but there are a lot of confounding factors there. Trump won, yes. But he won against the most unpopular Democratic candidate in modern history, who was vying for a third Democratic term (something which had not been achieved since the 1940s). Furthermore, he won by a very slim margin and actually lost the popular vote.

Alexander Nix, CEO of Cambridge Analytica standing in front of lots of impressive graphs!

Could all that just be evidence of how precise Cambridge Analytica’s psychological targeting was? Maybe, but we start to run into the perils of dealing with an unfalsifiable hypothesis. A better approach would be to look at Cambridge Analytica’s relative record of success and failure. Unfortunately, we do not have access to their full client list but we do know that when they first rose to prominence they were working for the Ted Cruz presidential campaign. That’s right, Ted Cruz — the Republican senator who was crushed by Trump in the Republican primaries, despite having the power of Cambridge Analytica at his command. I am not the first to notice this apparent contradiction, Martin Robbins made the same point on Little Atoms last year:
So the story of the Republican primaries is actually that Cambridge Analytica’s flashy data science team got beaten by a dude with a thousand-dollar website. To turn that into this breathtaking story of an unbeatable voodoo-science outfit, powering Trump inexorably to victory, is quite a stretch. Who else have they even worked for? Without a list of clients it’s very easy to cherry-pick the winners.

The techniques that Cambridge Analytica purport to use involve using social network data to build algorithms that can accurately predict what kind of messages will be effective given an individual’s personality and psychology. This is what the stories mean when they talk about using psychographics to micro-target voters. But a lot of the claims being made about the effectiveness of such techniques is widely exaggerated. Kogan — the Cambridge academic at the heart of the controversy — has made similar arguments. He claims that he is being scapegoated and argues that the personality profiles he gathered turned out to not be particularly useful for making the predictions needed for micro-targeting:
In fact, from our subsequent research on the topic,” he wrote, “we found out that the predictions we gave SCL were 6 times more likely to get all 5 of a person’s personality traits wrong as it was to get them all correct. In short, even if the data was used by a campaign for micro-targeting, it could realistically only hurt their efforts.

Kogan is hardly an impartial source but his claim accords with various studies that have shown less than stellar results for nefarious social media manipulation. Take, for instance, the controversial Facebook ‘mind control’ study, which I’ve heard several journalists reference in recent days. What always seems to be missing from reporting on this study is just how underwhelming it was.

Facebook ran an experiment on almost 689,000 users in which it tweaked the algorithm running their news feed to display slightly more or slightly less status updates from friends that contained positive or negative words. As any researcher knows, with such a large sample you are guaranteed to find statistically significant differences between groups. A more important criteria with such massive groups is how large the effect observed was. In the Facebook study this equated to a truly terrifying difference: those who saw less negative updates used around 0.05 more positive words out of every 100 words in their status updates, whereas those who saw less positive updates used around 1 less positive word per 100 in their status updates. That’s right Facebook might have been able to manipulate people to use around 1 less positive word for every 100 words in their updates. It would be wrong to paint this as Facebook being powerless, bigger interventions would have bigger effects, but it is important to keep things in perspective.

Note the starting point of the y-axis. There is a reason it isn’t 0.

The real story then is not that Kogan, Wylie, and Cambridge Analytica developed some incredibly high tech ‘hack’ of Facebook. It is that, aside from Kogan’s data selling, they used methods that were common place and permitted by Facebook prior to 2015. Cambridge Analytica has since the story broke been outed as a rather obnoxious, unethical company- at least in how it promotes itself to potential clients. But the majority of what is being reported in the media about its manipulative power is just an uncritical regurgitation of Cambridge Analytica (and Chris Wylie’s) self-promotional claims. The problem is that there is little evidence that the company can do what it claims and plenty of evidence that it is not as effective as it likes to pretend; see the fact that Ted Cruz is not currently president.

No one is totally immune to marketing or political messaging but there is little evidence that Cambridge Analytica is better than other similar PR or political canvassing companies at targeting voters. Political targeting and disinformation campaigns, including those promoted by Russia, certainly had an impact on recent elections but were they the critical factor? Did they have a bigger impact than Comey announcing he was ‘reopening’ the Hillary email investigation the week before the US election? Or Brexiteers claiming that £250 million was being stolen from the NHS by the EU every week? Colour me skeptical.

To be crystal clear, I’m not arguing that Cambridge Analytica and Kogan were innocent. At the very least, it is clear they were doing things that were contrary to Facebook’s data sharing policies. And similarly Facebook seems to have been altogether too cavalier with permitting developers to access its users’ private data.

What I am arguing is that Cambridge Analytica are not the puppet masters they are being widely portrayed as. If anything they are much more akin to Donald Trump; making widely exaggerated claims about their abilities and getting lots of attention as a result.

No comments: