Pages

26 February 2023

Early thoughts on regulating generative AI like ChatGPT

Alex Engler


With OpenAI’s ChatGPT now a constant presence both on social media and in the news, generative artificial intelligence (AI) models have taken hold of the public’s imagination. Policymakers have taken note too, with statements from Members addressing risks and AI-generated text read on the floor of the House of Representatives. While they are still emerging technologies, generative AI models have been around long enough to consider what we know now, and what regulatory interventions might best tackle both legitimate commercial use and malicious use.

WHAT ARE GENERATIVE AI MODELS?

ChatGPT is just one of a new generation of generative models—its fame is a result of how accessible it is to the public, not necessarily its extraordinary function. Other examples include text generation models like DeepMind’s Sparrow and the collaborative open-science model Bloom; image generation models such as StabilityAI’s Stable Diffusion and OpenAI’s DALL-E 2; as well as audio-generating models like Microsoft’s VALL-E and Google’s MusicLM.

While any algorithm can generate output, generative AI systems are typically thought of as those which focus on aesthetically pleasing imagery, compelling text, or coherent audio outputs. These are different goals than more traditional AI systems, which often try to estimate a specific number or choose between a set of options. More traditional AI systems might identify which advertisement would lead to the highest chance that an individual will click on it. Generative AI is different—it is instead doing its best to match aesthetic patterns in its underlying data to create convincing content.

In all forms (e.g., text, imagery, and audio), generative AI is attempting to match the style and appearance of its underlying data. Modern approaches have advanced incredibly fast in this capacity—leading to compelling text in many languages, cohesive imagery in many artistic styles, and synthetic audio that can impersonate individual voices or produce pleasant music.

Yet, this impressive mimicry is not the same as comprehension. A study of DALL-E 2 found that it could generate images that correctly matched prompts using the word “on” just over one quarter of the time. Other basic spatial connections (such as “under” and “in”) led to even worse results. ChatGPT shows similar problems. As it is merely designed to string words together in a likely order, it still cannot reliably pass basic tests of comprehension. As is well documented by Professor Gary Marcus, ChatGPT may often fail to “count to four… do one-digit arithmetic in the context of simple word problem… figure out the order of events in a story… [and] it couldn’t reason about the physical world.”

Further, text generation models constantly make things up—OpenAI CEO Sam Altman has said as much, noting “it’s a mistake to be relying on [ChatGPT] for anything important right now.” The lesson is that writing convincing, authoritative-sounding text based on everything written on the internet has turned out to be an easier problem to solve than teaching AI to know much about the world. However, this significant shortcoming did not stop Microsoft from rolling out a version of OpenAI’s technology for some users of its search engine.

Still, this sense of authenticity will make generative AI appealing for malicious use where the truth is less important than the message it advances, such as disinformation campaigns and online harassment. It is also why an early commercial application of generative AI is to create marketing content, where the strict accuracy of the writing simply isn’t very important. However, when the media website CNET started using generative models for writing financial articles, where the truth is quite important, the articles were discovered to have many errors.

These two examples offer a glimpse into two separate sources of risk from generative AI— commercial applications and malicious use—which warrant separate consideration, and likely, distinct policy interventions.[1]

HANDLING THE COMMERCIAL RISKS OF GENERATIVE AI

The first category of risks comes from the commercial application of generative AI. Many companies want to use generative AI for various business applications that are far more general than simply generating content. For the most part, generative AI models tend to be especially large and relatively powerful, and so while they may be particularly good at generated text or images, they can be adapted for a wide variety of tasks.[2]

The most prominent example may be Copilot, an adaptation of OpenAI’s GPT-3. Developed by GitHub, Copilot integrates GPT-3 into a more specific tool for generating code, aiming to ease certain programming tasks. Other examples include the expansion of image-generating AI in helping to design video game environments and the company Alpha Cephei, which takes open-source AI models for speech analysis and further develops them into enterprise voice recognition products.

The key concern with collaborative deployment using generative AI is that neither company may sufficiently understand the function of the final AI system.[3]

The original developer solely developed the generative AI model but cannot see the full extent to how it is used when it is adapted for another purpose. Then a “downstream developer,” which did not participate in the original model development, may adapt the model and integrate its outputs into a broader software system. Neither entity has complete control or a comprehensive view into the whole system. This may increase the likelihood of errors and unexpected behavior, especially since many downstream developers may overestimate the capacity of the generative AI model. This joint development process may be fine for processes where errors are not especially important (e.g., clothing recommendations) or where there is a human reviewing the result (e.g., a writing assistant).

However, if these trends extend into generative AI systems used for impactful socioeconomic decisions, such as educational access, hiring, financial services access, or healthcare, it should be carefully scrutinized by policymakers. The stakes for persons affected by these decisions can be very high, and policymakers should take note that AI systems developed or deployed by multiple entities may pose a higher degree of risk. Already, applications such as KeeperTax, which fine-tunes OpenAI models to evaluate tax statements to find tax-deductible expenses, are raising the stakes. This high-stakes category also includes DoNotPay, a company dubiously claiming to offer automated legal advice based on OpenAI models.

Further, if generative AI developers are uncertain if their models should be used for such impactful applications, they should clearly say so and restrict those questionable usages in their terms of service. In the future, if these applications are allowed, generative AI companies should work proactively to share information with downstream developers, such as operational and testing results, so that they can be used more appropriately. The best-case scenario may be that the developer shares the model itself, enabling the downstream developer to test it without restrictions. A middle-ground approach would be for generative AI developers to expand the available functionality for, and reduce or remove the cost of, thorough AI testing and evaluation.

Information sharing may mitigate the risks of multi-organizational AI development, but it would only be part of the solution. This approach to help downstream developers responsibly leverage generative AI tools only really works if the final system is itself regulated, as will be the case in the EU under the AI Act, and as is advocated for in the U.S.’s Blueprint for an AI Bill of Rights.

MITIGATING MALICIOUS USE OF GENERATIVE AI

The second category of harm arises from the malicious use of generative AI. Generative models can create non-consensual pornography and aid in the process of automating hate speech, targeted harassment, or disinformation. These models have also already started to enable more convincing scams, in one instance helping fraudsters mimic a CEO’s voice in order to obtain a $240,000 wire transfer. Most of these challenges are not new in digital ecosystems, but the proliferation of generative AI is likely to worsen them all.

Since these harms result from malicious use by scammers, anonymous harassers, foreign non-state actors, or hostile governments, it may also be much more challenging to prevent them, compared to commercial harms. However, it might be reasonable to require a certain degree of risk management, especially by commercial operations that deploy and profit from these cutting-edge models.

This might include tech companies that provide these models over API (e.g., OpenAI, Stability AI), through cloud services (e.g., the Amazon, Google, and Microsoft clouds), or possibly even through Software-as-a-Service providers (e.g., Adobe Photoshop). These businesses control several levers that might partially prevent malicious use of their AI models. This includes interventions with the input data, the model architecture, review of model outputs, monitoring users during deployment, and post-hoc detection of generated content.

Manipulating the input data before model development is an impactful way to influence the resulting generative AI, because these models greatly reflect that underlying data. For example, OpenAI uses human reviewers to detect and remove “images depicting graphic violence and sexual content” from the training data for DALL-E 2. The work of these human reviewers was used to build a smaller AI model that was used to detect images that OpenAI didn’t want to include in its training data, thus improving the impact of the human reviewers. The same type of model can also be used at other stages to further prevent malicious use, by checking to see if any images submitted by users, or the images generated by generative AI, might contain graphic violence or sexual content. Generally, the practice of using a combination of human reviewers and AI tools for removing harmful content may be an effective, if not sufficient, intervention.[4]

The development of generative models also may provide an opportunity for intervention, although this research is just emerging. For example, by getting iterative feedback from humans, generative language models can become moderately more truthful, as suggested by new research from DeepMind.[5]

User monitoring is another tactic that may bear fruit. First, a generative AI company can set transparent limits on user behavior through the Terms of Service. For instance, OpenAI says its tools may not be used to infringe or misappropriate any person’s rights, and further limits some categories of images and text that users are allowed to generate. OpenAI appears to have some system to implement these terms of service, such as by denying obvious requests for harassing comments or statements on famous conspiracy theories. However, one analysis found that ChatGPT responded with misleading claims 80% of the time, when presented with a catalog of misinformation narratives. Going further, generative AI companies could monitor users, using algorithmic tools to flag requests that may suggest malicious or banned use, and then suspend users who become repeat offenders.

In a more nascent approach, researchers have proposed using patterns in generated text to identify it later as having come from a generative model, or so-called watermarking. However, it is too early to determine how such a detection might work once there are many available language models, available in different versions, that individual users are allowed to update and adapt. This approach may simply not adapt well as these models become more common.

Collectively, these interventions and others might add up to a moderately effective risk management system. However, it is highly unlikely it would be anywhere near perfect, and motivated malicious actors will find ways to circumvent these defenses. In general, the efficacy of these efforts should be considered more like content moderation, where even the best systems only prevent some proportion of banned content.

IT IS STILL THE EARLY DAYS OF GENERATIVE AI POLICY

The challenges posed by generative AI, both through malicious use and commercial use, are in some ways relatively recent, and the best policies are not obvious. It is not even clear that “generative AI” is the right category to focus on, rather than including individually focusing on language, imagery, and audio models. Generative AI developers could contribute to the policy discussion by disclosing more specific details on how they develop generative AI, such as through model cards, and also explain how they are currently approaching risk management.

It also warrants mention that, while these harms are not trivial, there are more pressing areas in which the U.S. needs AI governance, such as protections from algorithms used in key socioeconomic decisions, developing meaningful online platform policy, and even passing data privacy legislation.

If perhaps not a priority, it is worth considering regulations for commercial developers of the largest AI models, such as generative AI.[6] As discussed, this might include information sharing obligations to reduce commercialization risks, as well as requiring risk management systems to mitigate malicious use. Neither intervention is a panacea, but they are reasonable requirements for these companies which might improve their net social impact.

This combination might represent one path forward for the EU, which was recently considering how to regulate generative models (under the distinct, but related term, “general-purpose AI”) in its proposed AI Act.[7] This would raise many key questions, such as how to enforce these rules and what to do about their considerable international impact. In any case, if the EU or other governments do take this approach, it is worth keeping policies flexible into the future, as there is still much to be learned about how to mitigate risks of generative AI.

No comments:

Post a Comment