22 April 2023

Large language models: fast proliferation and budding international competition


The capabilities of large language models have improved significantly in recent years and have come to broader public attention with the release of OpenAI’s ChatGPT in November 2022. The sudden general interest in these models includes attention from potentially malicious actors, who may seek to misuse them, and attention from policymakers, who are now increasingly invested in national competition over language-model development and securing access to the massive amount of computing power needed to support innovation in this domain.


On 30 November 2022, the American technology company OpenAI released a large language model called ChatGPT that allows the public to converse with an artificial-intelligence (AI) chatbot. The model, which mimics human language, attracted over 100 million active users by January 2023, becoming the fastest software application ever to reach this milestone. Many users, however, noted that the model – like others such as Microsoft Bing, created using an updated version of OpenAI software – sometimes produced outputs that were offensive, shocking or politically charged. These outputs underscored the fact, which researchers have long emphasised, that the language-modelling subfield of AI will be susceptible to use and abuse by propagandists and other malicious actors.

OpenAI researchers trained ChatGPT’s underlying model, updated on 13 March 2023 from the model known as Generative Pre-trained Transformer 3 (GPT-3) to GPT-4, using massive amounts of internet text with the goal of probabilistically predicting the next word in any sequence of text. The fact that these models are essentially next-word predictors means that they mimic content acquired from training data even when that content is biased, untrue or harmful. And yet, despite being trained on this simple task, many language models have demonstrated novel capabilities such as writing computer code, translating between languages and even distinguishing between legal and illegal moves in chess. Meanwhile, the capabilities that as-yet-released models might acquire – and the social and political ramifications thereof – remain unclear. The sudden general interest in these models includes attention from potentially malicious actors, who may seek to misuse them, and attention from policymakers, who are now increasingly invested in national competition over language-model development and securing access to the massive amount of computing power needed to support innovation in this domain.

Model proliferationThe capabilities of large language models have improved significantly in recent years, with OpenAI’s creation of GPT-3 in 2020 serving as an important milestone. These improvements have been due to the creation of larger, more versatile model architectures as well as increases in dataset sizes and the amounts spent by technology firms to increase the computational power expended to train the models. Empirical studies suggest that there are strong relationships between the dataset size, computing budget and parameter count for a given model and that, in practice, computing budgets are the strongest constraint on model improvements.

In addition, more actors are developing large language models, with proliferation occurring along three dimensions. Firstly, as Figure 1 shows, research on large language models is primarily occurring in the United States, but researchers in other countries – notably China, in addition to specific research institutes elsewhere, such as DeepMind in the United Kingdom – have invested significant resources into building their own models. Secondly, the types of institutions developing language models have expanded to include both large technology firms such as Google and Microsoft as well as decentralised researcher collectives. And thirdly, developers have explored an increasingly wide range of release strategies, including withholding models from release (Google’s strategy until the limited release of Bard on 21 March 2023), restricting access to outputs behind an application programming interface (API) (like OpenAI has done with ChatGPT), releasing models under non-commercial licenses (as Meta has done with its models, called Open Pre-trained Transformers and LLaMA), and publishing complete, downloadable models on the internet (as the research collectives EleutherAI and BigScience have done). The proliferation of these models has two near-term implications for domestic and national security: these models will probably enable the production of higher-quality and greater quantities of content for influence operations, and competition over language-model development could escalate geopolitical tensions.
Language models as influence toolsCurrent iterations of large language models excel at producing newly written text based on user input, but they are also limited by their tendency to ‘hallucinate’ (stating inventions as facts), to stray in focus over long passages of text and to produce biased, harmful or shocking outputs (though the latter outcome is often the result of user prompting). These failures might be advantageous, however, for the production of disinformation. Because modern influence operations often rely on short snippets of text intend to provoke strong – typically negative or outraged – reactions and need not be factually correct, many researchers suspect that propagandists will soon begin using language models to automate content production in a widespread fashion.

Researchers have shown that text produced by language models such as GPT-3 can meaningfully shift readers’ beliefs, including on sensitive political topics and at magnitudes comparable to human-authored disinformation. Language models can effectively mimic fringe beliefs, and they tend not to mistranslate idioms or resort to crude tactics that human propagandists often use to save time, such as repeatedly copying and pasting the same content across multiple accounts. These features strongly incentivise propagandists to augment their operations with language models, probably by using human–machine teams that have the potential to greatly increase the quality of the texts produced with moderately lower marginal costs.

Researchers have shown that text produced by language models such as GPT-3 can meaningfully shift readers’ beliefs, including on sensitive political topics and at magnitudes comparable to human-authored disinformation.
Domestic or international propagandists have two broad options for accessing large language models: using less sophisticated, but publicly available, models or relying on AI companies that provide structured input/output access to their more capable models via APIs. The first option permits propagandists the freedom to fine-tune, making small contributions to the model’s training data to induce changes to its behaviour and exert more control over the model’s outputs. For instance, the model could be tailored with a politically biased dataset to ensure that it defaults to ideologically desired positions. This benefit could compensate for the loss in quality that comes with relying on public models, none of which are currently as large or as capable as the best privately held models.

Public models must be run locally in order to produce outputs, but many language models are so large that they cannot fit on even the most memory-capable processors today. It would be possible for most national governments to build the necessary computing infrastructure within a related security agency to surmount this obstacle, but costs aside, technical issues might also serve as a barrier. Despite the fact that deep-fake video technology has been available for several years, it has not become a commonplace tool of propagandists, probably in part because it still requires technical proficiency to use. This too may change, however, as newer text-to-image models such as Midjourney or Stable Diffusion make it easier to generate realistic-looking photos from only a text prompt, without the need to code.

Private models, by contrast, cannot be fine-tuned – though accessing them through an API is more straightforward. But private developers have attempted to scrub sensitive or inflammatory content from the outputs of their models, and they might also block access from certain countries or ban individual users who appear to be using the models maliciously. Despite these risks, accessing models via an API imposes no immediate costs on propagandists even if they are caught, which means it may remain the preferred method of access.

Identifying propagandists attempting to use models to generate malicious content would probably be difficult. Some services claim to be able to detect AI-generated text, but they can be fooled and are unlikely to improve faster than the models themselves. Individual models may be trained in ways that permit easy re-identification of output text, but unless all AI developers agree to use such training regimes, these solutions will not be able to prevent malicious users from reposting large amounts of text generated from alternative models.

International competitionIn October, the administration of US President Joe Biden announced new export controls on advanced semiconductor chips flowing to China, in part because these chips are critical to AI development. While the stated intention of the export controls is to constrain the development of AI systems used for surveillance or military applications, language models are also highly dependent on these advanced semiconductors. The imposition of export controls was probably motivated in part by an attempt to preserve US advantages in language modelling, whether as part of a broader strategy for AI-technology competition or because the administration wants to inhibit China’s development of language models in particular (for instance, because of fears that such models would be used to augment existing Chinese influence operations).

It is not clear that such measures will be effective, however. While language models have improved due to increases in computing power, they cannot continue to do so at the current rate. This is, in part, both because of the financial constraints of continuing to scale up models and because training significantly larger models would require advanced semiconductors with interconnect speeds that are currently infeasible. Researchers are actively seeking to develop more computationally efficient methods of training similar models. Attempts by one country to restrict the computing power of another as a means of competing over AI development may incentivise the target country to develop competitive edges in these more computationally efficient AI approaches. In addition, there is only so much ‘high-quality’ text on the internet (for example, books and academic journal articles), which may soon become a more pressing constraint on the development of language models than the availability of computing power. And finally, while it is computationally expensive to train language models from scratch, it is not nearly as costly to fine-tune existing models, which in many cases can be just as effective at performing specific tasks such as generating disinformation.

Countries may also increasingly view the development of language models as a point of national pride, a perception that can exacerbate attempts to compete over their development. A high-level expert panel convened by the UK government released a report on 6 March 2023 stating that the country ‘is falling behind’ as a place for computational innovation – ranking tenth in the world in 2022 in terms of national compute power, down from third in 2005 – and called for the acquisition of at least 3,000 advanced hardware accelerators ‘in the immediate term’ for use in AI research.

Countries may also increasingly view the development of language models as a point of national pride, a perception that can exacerbate attempts to compete over their development.
But while companies outside the US are eager to produce models to rival ChatGPT, they may face regulatory barriers not present in the US. In Europe, for instance, regulators are currently attempting to articulate how ‘general-purpose AI’ – a category that would include large language models – will fit into the overall risk-based structure of the draft European Union AI Act. Language models are not trained with a specific task in mind and can be used for both benign purposes and purposes that the AI Act would likely treat as ‘high risk’. Because of this, at least some regulators want to subject developers of general-purpose AI systems to the same legal requirements as developers of high-risk AI systems. The EU is also considering more moderate approaches to regulation, but even these would impose far more liability on language-model developers than exists in the US and would be likely to slow European technological development.

In China, similar regulatory ambiguities exist. Compared to the EU, more Chinese companies have explicitly signalled a desire to develop ChatGPT competitors, with Baidu having unveiled a competing product on 16 March (known as Ernie Bot) and Tencent announcing plans to do so. But new Chinese regulations enacted in early 2023 prohibit so-called ‘deep synthesis service providers’, including providers of language models, from ‘produc[ing], reproduc[ing], publish[ing], or transmit[ting] fake news information’. Although Chinese tech companies have a relatively clear field of competition due to crackdowns on access to ChatGPT and other Western language models, the difficulty of training models that clearly comply with these regulations may dissuade them from rapidly deploying language models of their own.
OutlookCountries interested in using AI for domestic- or international-propaganda purposes are unlikely to train models from scratch; most national governments do not currently have the technical expertise to train cutting-edge language models and, even if they did, training one could cost millions of US dollars and would likely only provide a marginal quality improvement over a fine-tuned public model. Even in selecting public models to use, propaganda departments may prefer mid-sized, easier-to-use models over larger, more-capable alternatives. National elections that will occur in 2024, including in Taiwan, the US and possibly the UK, may serve as early indicators of how large language models will affect political campaigns, both as tools to be used by campaigns themselves and by domestic and international actors attempting to sway outcomes.

National elections that will occur in 2024, including in Taiwan, the US and possibly the UK, may serve as early indicators of how large language models will affect political campaigns, both as tools to be used by campaigns themselves and by domestic and international actors attempting to sway outcomes.

Some commentators have worried that existing language models represent a specific set of values, a concern that has led OpenAI to signal plans to allow users to customise the behaviour of ChatGPT based on their own values. An analogous concern among security analysts is that authoritarian governments may produce language models deliberately intended to sway public opinion in favour of the government. To date, however, while some governments have experimented with high-level restrictions on algorithmic outputs, they have demonstrated neither the technical capability nor the intent to train their own alternative models or to seriously involve themselves in training industry models. Language-model deployers may – and in China, currently do – face requirements to censor certain types of outputs, but they do not currently face requirements to systematically embed pro-regime biases in their models.

The Biden administration’s October 2022 export controls will, at least in the short term, constrain China’s access to the most advanced chips used in computationally intensive subfields of AI. Chinese research attention may therefore turn towards subfields that are less computationally demanding, with researchers developing new competitive advantages. The export controls are also predicated on a belief that access to advanced semiconductors is a primary constraint on language-model development, but relying on larger and larger computing expenditures cannot continue to improve language models at current rates of growth indefinitely, even if it is unclear when exactly these constraints will kick in. And despite the Biden administration’s apparent desire to compete over language-model development, regulatory uncertainties in both the EU and China already meaningfully constrain the pace of technological change. Nonetheless, as language models are placed nearer to the centre of national technological competition, governments may respond by more aggressively cutting off citizens’ access to language models developed by rival countries, thereby further balkanising the internet.

No comments: