Pages

23 March 2024

How to better study—and then improve—today’s corrupted information environment

Sean Norton, Jacob N. Shapiro

Social media has been a connector of people near and far, but it has also fueled political conflict, threatened democratic processes, contributed to the spread of public health misinformation, and likely damaged the mental health of some teenagers. Given what’s come to light about these platforms over the last several years, it is increasingly clear that current guardrails—both government regulations and the companies’ internal policies—aren’t sufficient to address the issues plaguing the information environment. But for democracies and their citizens to thrive, a healthy virtual ecosystem is necessary.

To get there, experts need an international effort to link policymakers to research by gathering, summarizing, and distilling relevant research streams. Two such initiatives, the International Panel on the Information Environment and the proposed International Observatory on Information and Democracy, have begun working towards that goal. Both are inspired by the Intergovernmental Panel on Climate Change (IPCC), a multinational organization that elects a scientific bureau to conduct evaluations of climate research and create policy recommendations. Since its founding in 1988, the IPCC has firmly established the anthropogenic origin of climate change and provided policy recommendations that formed the basis of two major international agreements, the Kyoto Protocol of 1997 and the Paris Agreement of 2015. Policymakers and researchers have called for similarly structured efforts to create research-informed, globally coordinated policies on the information environment.

For such efforts to work, though, they have to able to draw on a well-developed research base. The IPCC’s first report, written from 1988 to 1990, capitalized on decades of standardized measurements and research infrastructure, including atmospheric carbon dioxide monitoring, sophisticated measurements from weather balloons and meteorological satellites, and 16 years of satellite imagery of the Earth’s surface.

Experts simply don’t have that kind of depth of evidence on the information environment. There is little consilience in theoretical arguments about how this ecosystem works, and standardized measurements and research tooling are nearly non-existent. For an IPCC-style body on the information environment to reach its full potential, governments and other entities need to make substantial investments in data access, standardized measurements, and research tooling.

To examine what those investments should look like, it’s helpful to outline the current state of research on the information environment and the challenges of performing high-quality science under the current status quo.

The state of the information environment research. Our team conducted a survey of work published from 2017 until 2021 in 10 leading communications, economics, political science, computer science, and sociology journals, plus six major general-interest science research publications. We found that research is concentrated on two major social media platforms: Facebook and Twitter (now X).

Researchers could provide more reliable policy if they were able to characterize the entire information ecosystem. Our sample of relevant academic work, however, found that 49 percent of the papers used Twitter data exclusively, and 59 percent used it in some form, despite its relatively small base of 436 million monthly active users in 2021. Major platforms with large global user bases—including YouTube, WeChat, and Telegram—remain critically understudied. While there were approximately 22.4 and 1.4 published papers per 100 million active users for Twitter and Facebook respectively, YouTube, WeChat, and Telegram attracted far less research despite their substantial userbases. Some 2.3 billion people actively use YouTube each month. And WeChat and Telegram have 1.3 billion and 550 million active monthly users respectively.

Beyond limited platform coverage, existing research is also geographically and linguistically limited. Sixty-five percent of papers analyze only a Western democracy (the United States, EU countries, the United Kingdom, Australia, or New Zealand), more than half of which exclusively study the United States. Additionally, 60 percent of papers analyze only English-language data. This means the most populated regions of the world are the least-studied, indicating a severe need to enable research in and on Latin America, Asia, and Africa. In numerical terms, there were 27.22 papers per 100 million population in our sample focused exclusively on the United States, the European Union, and Oceania (mostly Australia and New Zealand), while the entire rest of the world is represented by only 1.39 papers per 100 million inhabitants.

Making global-scale policy recommendations requires some degree of normal science: settled, foundational results that can shape the focused inquiry and experiments necessary to develop prescriptive remedies to online harms. As an example, researchers have argued that ”pre-bunking” interventions—which range from warnings that a post or link may be misinforming to games designed to teach players how to detect misinformation in the wild—are effective at reducing belief in misinformation. More recent research argues that this effect is driven not by increased ability to identify misinformation, but rather by increased distrust of all information, including facts. In a paper published last year, the authors argue that such conflicting results reflect a larger issue in the information environment literature: a lack of scholarly consensus on how to design, test, and measure theimpact of interventions. This lack of consensus on research design and evaluation is pervasive throughout information environment scholarship, making it difficult to draw the sort of general conclusions from the literature necessary to create high-quality policy recommendations.

Combined, the lack of geographic coverage, limited research on most platforms, and absence of foundational results mean this field is far from being able to provide the kinds of highly reliable, global scale evidence the IPCC relied on for its firstassessment report.

Current research roadblocks. As part of a joint initiative between Princeton University and the Carnegie Endowment for International Peace, our team interviewed 48 academic and civil society researchers from more than 20 countries to determine what factors slow or reduce the scope of information environment research. (We have since engaged with almost 200 more researchers.) The researchers’ top complaint was a lack of sufficient data access. When they did have access to data, they reported that lack of technically skilled personnel or big-data-capable infrastructure—such as necessary data collection, secure storage, and especially analysis tools—made it difficult to use these data effectively. The conclusion is clear: To maximize the policy benefit of IPCC-like institutions, researchers need to go beyond enabling data access and make major investments into research tooling that helps scientists make use of data and advance the field.

Research runs on data, which has become less available as the information environment field has advanced. Last June, X (formerly Twitter) started charging researchers $42,000 per month for access to its data stream, despite providing less data than a previous free version of this interface. Meta made vast quantities of previously available public data unavailable after the Cambridge Analytica scandal of 2018 and said it would stop updating its CrowdTangle service, a Meta-provided data tool which provides researchers with access to most public activity on Facebook and a substantial share of Instagram.

Of course, data access is about more than just being able to download the information; it’s also about making it functionally accessible to a broader range of researchers. Right now, large quantities of data are useful to only a small group of well-funded, highly technically skilled researchers. Our interviews revealed that scholars faced a range of issues that limited their ability to actually analyze data, including inability to hire data scientists and engineers, the need to lean heavily on early-career students to solve technical problems (at detriment to those researcher’s own research agendas), and difficulty in implementing the video, image, and network analysis methods required to move beyond text to a fuller picture of social media data.

These barriers are reflected in the research base: Most papers in our sample are small-to-medium scale, text-only analyses that rarely utilize cutting-edge machine learning or statistical methods. And researchers reported that they frequently reduced the scope of, or entirely abandoned, projects due to lack of data access or the inability to create the needed research tools. Of 169 information environment papers from 2017 to 2021 that analyzed user-generated content, 71 percent utilized text data, 13 percent images, 8 percent video, and the remainder URLs or audio. Fifty three percent of all papers used straight forward regression analysis, and 42 percent used only basic descriptive statistics. Only 23 percent of all papers used machine learning methods, and more strikingly, only 19 percent of papers used network analysis, despite the key role networks play in behavior onsocial media.

These barriers also significantly influence who produces research on the information environment. Researchers Darren L. Linvill and Patrick L. Warren found that a handful of well-funded, large non-profit and academic research centers produced the overwhelming majority of public-facing reports on disinformation and influence campaigns through mid 2022. In the peer-reviewed research world, large and well-supported research centers such as NYU’s Center for the Study of Social Media and Politics (CSMaP) and Indiana University’s Observatory on Social Media (OSoMe) pursue multiple projects simultaneously, many of which rely on common resources (such as Center for the Study of Social Media and Politics long-standing panel of thousands of American Twitter users). The outsized productivity of research groups that have achieved sufficient scale to hire technically skilled support staff demonstrates the value of investing in shared infrastructure and tooling. But such groups are still few, struggle to raise funds to provide broader public goods, and cannot alone produce research at the rapid pace needed to improve the information environment research base.

Recent developments in data access are promising, most notably the European Union’s Digital Service Act, which will require large social media companies to provide vetted researchers with access to data for studies of systemic risks to the European Union. Under the current status quo, however, data access alone is unlikely to advance research at the scale and speed necessary to maximize the potential of IPCC-like bodies. Most research groups are at capacity with the data they can already access. To accelerate the maturation of the information environment research base requires finding creative ways to help researchers more efficiently turn data into knowledge. Researchers should follow the example of climate science and create common scientific resources and processes, such as baseline datasets and measurements, access via peer review to high-powered data processing facilities (following the time-allocation model of large telescopes and particle accelerators), and infrastructure grants for the development and maintenance of critical computational tools.

Efforts such as the International Panel for the Information Environment and International Observatory on Information andDemocracy should be lauded for their forward-looking vision and goals of translating research to support policies that secure the global information environment. For them to reach their full potential, there’s a need to increase research capacity by building larger research institutions that realize economies of scale in studying the information environment. Managing the information commons is inherently difficult, because so many competing values are in play.

Beyond obvious tensions between safety and free speech, there are deep questions about how to construct commons that strike the right balance between allowing democracies to set standards without giving autocracies an excuse to suppress dissent, or that encourage creative activity without helping malign actors to profit from spreading clear misinformation. This is made much more complicated by the furious pace of AI-enabled content generation tools. Unless researchers fix how they study this environment, the scientific community will be increasingly left behind as civil society and governments struggle to realize the potential and minimize the perils from online spaces.

No comments:

Post a Comment