25 October 2018

No, A.I. Won’t Solve the Fake News Problem

By Gary Marcus and Ernest Davis

In his testimony before Congress this year, Mark Zuckerberg, the chief executive of Facebook, addressed concerns about the strategically disseminated misinformation known as fake news that may have affected the outcome of the 2016 presidential election. Have no fear, he assured Congress, a solution was on its way — if not next year, then at least “over a five- to 10-year period.” The solution? Artificial intelligence. Mr. Zuckerberg’s vision, which the committee members seemed to accept, was that soon enough, Facebook’s A.I. programs would be able to detect fake news, distinguishing it from more reliable information on the platform. With midterms approaching, along with the worrisome prospect that fake news could once again influence our elections, we wish we could say we share Mr. Zuckerberg’s optimism. But in the near term we don’t find his vision plausible. Decades from now, it may be possible to automate the detection of fake news. But doing so would require a number of major advances in A.I., taking us far beyond what has so far been invented.


As Mr. Zuckerberg has acknowledged, today’s A.I. operates at the “keyword” level, flagging word patterns and looking for statistical correlations among them and their sources. This can be somewhat useful: Statistically speaking, certain patterns of language may indeed be associated with dubious stories. For instance, for a long period, most articles that included the words “Brad,” “Angelina” and “divorce” turned out to be unreliable tabloid fare. Likewise, certain sources may be associated with greater or lesser degrees of factual veracity. The same account deserves more credence if it appears in The Wall Street Journal than in The National Enquirer.

But none of these kinds of correlations reliably sort the true from the false. In the end, Brad Pitt and Angelina Jolie did get divorced. Keyword associations that might help you one day can fool you the next.

To get a handle on what automated fake-news detection would require, consider an article posted in May on the far-right website WorldNetDaily, or WND. The article reported that a decision to admit girls, gays and lesbians to the Boy Scouts had led to a requirement that condoms be available at its “global gathering.” A key passage consists of the following four sentences:

The Boy Scouts have decided to accept people who identify as gay and lesbian among their ranks. And girls are welcome now, too, into the iconic organization, which has renamed itself Scouts BSA. So what’s next? A mandate that condoms be made available to ‘all participants’ of its global gathering.

Was this account true or false? Investigators at the fact-checking site Snopes determined that the report was “mostly false.” But determining how it went afoul is a subtle business beyond the dreams of even the best current A.I.

First of all, there is no telltale set of phrases. “Boy Scouts” and “gay and lesbian,” for example, have appeared together in many true reports before. Then there is the source: WND, though notorious for promoting conspiracy theories, publishes and aggregates legitimate news as well. Finally, sentence by sentence, there are a lot of true facts in the passage: Condoms have indeed been available at the global gathering that scouts attend, and the Boy Scouts organization has indeed come to accept girls as well as gays and lesbians into its ranks.

What makes the article “mostly false” is that it implies a causal connection that doesn’t exist. It strongly suggests that the inclusion of gays and lesbians and girls led to the condom policy (“So what’s next?”). But in truth, the condom policy originated in 1992 (or even earlier) and so had nothing to do with the inclusion of gays, lesbians or girls, which happened over just the past few years.

Causal relationships are where contemporary machine learning techniques start to stumble. In order to flag the WND article as deceptive, an A.I. program would have to understand the causalimplication of “what’s next?,” recognize that the account implies that the condom policy was changed recently and know to search for information that is not supplied about when the various policies were introduced.

Understanding the significance of the passage would also require understanding multiple viewpoints. From the perspective of theinternational organization for scouts, making condoms available at a global gathering of 30,000 to 40,000 hormone-laden adolescents is a prudent public health measure. From the point of view of WND, the availability of condoms, like the admission of girls, gays and lesbians to the Boy Scouts, is a sign that a hallowed institution has been corrupted.

We are not aware of any A.I. system or prototype that can sort among the various facts involved in those four sentences, let alone discern the relevant implicit attitudes.

Most current A.I. systems that process language are oriented around a different set of problems. Translation programs, for example, are primarily interested in a problem of correspondence — which French phrase, say, is the best parallel of a given English phrase? But determining that someone is implying, by a kind of moral logic, that the Boy Scouts’ policy of inclusion led to condoms being supplied to scouts isn’t a simple matter of checking a claim against a database of facts.

Existing A.I. systems that have been built to comprehend news accounts are extremely limited. Such a system might be able to look at the passage from the WND article and answer a question whose answer is given directly and explicitly in the story (e.g., “Does the Boy Scouts organization accept people who identify as gay and lesbian?”). But such systems rarely go much further, lacking a robust mechanism for drawing inferences or a way of connecting to a body of broader knowledge. As Eduardo Ariño de la Rubia, a data scientist at Facebook, told us, for now “A.I. cannot fundamentally tell what’s true or false — this is a skill much better suited to humans.”

To get to where Mr. Zuckerberg wants to go will require the development of a fundamentally new A.I. paradigm, one in which the goal is not to detect statistical trends but to uncover ideas and the relations between them. Only then will such promises about A.I. become reality, rather than science fiction.

No comments: