Artificial intelligence is permeating every aspect of our lives, promising to make them more efficient, smarter, and easier. But are we truly prepared to entrust so much of our world to these complex, opaque systems?
Hidden biases, deepfakes, unseen vulnerabilities, and malicious uses are just some of the shadows looming over the bright AI-enhanced future. As Silicon Valley giants race to develop increasingly powerful and astonishing models, the need to make these systems more transparent, reliable, and secure becomes ever more pressing.
This is where the concept of red teaming comes into play, offering a glimmer of hope for many in the industry. Originally a wargaming technique later adopted in the world of cybersecurity, the idea behind red teaming is simple: put yourself in your adversary’s shoes to uncover your own system’s weaknesses. Now, companies like OpenAI, Google, and Microsoft are beginning to employ this method to test their most advanced AI models.
But can playing cops and robbers really be enough to tame a technology that has the potential to redefine what it means to be human? Adapting red teaming to AI is no easy task; we are dealing with systems that are constantly evolving, learning, and transforming at a rapid pace.
There is also the question of trustworthiness: thus far, as we will explore later, it is the very companies producing the large AI models that have established red teams to put their models to the test, while external researchers must grapple with often overly restrictive terms of service that make little distinction between independent research and hacking. Can we truly rest easy knowing that the same companies developing these AI systems are also the ones overseeing their security? Perhaps, as some experts suggest, it is time to consider a broader, independent approach that involves regulators, academia, and civil society.
With AI, we cannot afford to be reactive. We must anticipate threats, prevent risks, and imagine the unimaginable. Red teaming is a starting point, but the path to secure and reliable AI is still long, and reaching the goal remains an uncertain prospect.
Applying Red Teaming to AI
Red teaming is not a new concept. It originated as a military training technique and later found its way into the world of cybersecurity in the 1990s when companies and organizations began hiring “ethical hackers” to test their systems’ defences. The core principle remains the same: think like the adversary to discover your own weaknesses before someone else does.
Today, this methodology is finding a new frontier in artificial intelligence. Silicon Valley giants, from Microsoft to Google, NVIDIA to OpenAI, are subjecting their most advanced AI systems to the scrutiny of red teams, each with their own unique approach to ensuring the security and ethical use of these technologies.
Microsoft has a dedicated AI Red Team that focuses on emulating real adversaries to identify risks, uncover blind spots, and improve the overall security posture of AI systems. This team investigates vulnerabilities and other potential dangers, such as the generation of potentially malicious content, and has recently released the PyRIT (Python Risk Identification Tool for generative AI) framework. Microsoft emphasizes a defence-in-depth approach and, as early as 2021, released tools like Counterfit for AI security risk assessment, collaborating with organizations such as MITRE to develop frameworks to address AI risks.
Google has created an AI Red Team that works in conjunction with their Secure AI Framework (SAIF) to address risks to AI systems and responsibly drive security standards. Their red teaming approach involves simulating various adversarial scenarios to prepare for attacks on AI systems as part of a broader initiative to ensure the ethical implementation of artificial intelligence technologies.
NVIDIA addresses AI red teaming by considering the entire lifecycle of machine learning systems, from conception to implementation and end of life. Their methodology covers a wide range of issues, including technical and model vulnerabilities, as well as abuse and damage scenarios. NVIDIA’s red team brings together expertise in offensive security, responsible AI, and machine learning research to address these issues comprehensively.
OpenAI has formed the OpenAI Red Teaming Network, a community of experts collaborating on red teaming exercises to make models more secure. Currently, it is perhaps the only major player that has broadened the horizons of this concept to apply it to the new challenges of AI systems. This network extends beyond internal adversarial testing to include external experts who help develop domain-specific risk taxonomies and assess potentially malicious capabilities in new systems. OpenAI also emphasizes the importance of diverse expertise and perspectives in evaluating AI systems.
While these examples illustrate the relevance of taking a proactive and comprehensive approach to security, transplanting red teaming from cybersecurity to the world of AI is not without its pitfalls. AI models are, in some ways, complex and changeable “beasts,” evolving and adapting at a speed that no red team can hope to match. There is also the question of expertise: finding bugs in code is one thing; assessing an AI’s social and ethical impact is another.
Yet, despite the challenges, red teaming remains a key tool for making AI safer and more reliable. With the right amount of creativity and rigour, red teams can uncover hidden vulnerabilities and guide the development of more robust models.
Adapting to AI: External and Multi-Stakeholder Teams
How can we maximize the benefits of red teaming in AI? The two keywords are independence and multidisciplinarity. When it comes to general-purpose AI systems such as LLMs, which will be used by a variety of people for a variety of purposes, external red teams can offer unique perspectives that might elude an internal team. While an internal red team might focus primarily on bugs and vulnerabilities in the code, an external team can consider the broader implications of AI misuse, from ethical consequences to social impacts. This is why AI red teams will need to include not only cybersecurity experts but also psychologists, sociologists, lawyers… the more diverse the perspectives, the more comprehensive the stress-testing result will be.
But independence and multiplicity of views are not enough. We also need shared standards and guidelines to ensure that AI red teaming is conducted ethically and systematically. This is where international and industry organizations come in: they should be tasked with developing frameworks to guide companies in this process. In the United States, NIST, with its AI Safety Institute, is already working on this.
Let us not forget, however, that red teaming is only one piece of the puzzle. To make AI truly safe, we must combine it with other techniques, such as automated tests and periodic code reviews, and push as hard as possible for the world of artificial intelligence to adopt the paradigm of security by design, which today is more a slogan than a real objective. Finally—and this is already one of the most pressing issues of the moment—we must promote collaboration between industry, regulators, and academia to create a regulatory framework that encourages responsible innovation. In other words, the efforts and resources devoted to making AI safer must be commensurate with the scope and impact of this technology on society.
Looking ahead, red teaming will become increasingly crucial as AI becomes more powerful and pervasive. We may even see the emergence of AI designed specifically for red teaming. And as awareness increases, AI startups and SMEs will also begin to embrace this practice.
Conclusions
Artificial intelligence is redefining our world at a dizzying pace. But as we race towards a future shaped by AI, we cannot afford to neglect the issue of security and reliability. Red teaming, with its proactive and multidisciplinary approach, emerges as an indispensable tool in this journey.
It will not be an easy journey, as the challenges posed by AI are complex and constantly evolving. Red teaming, then, is not a magic wand, but simply a step in the right direction—a way to anticipate potential dangers and act preemptively, to help build trust in a world that will increasingly rely on ever-faster, more intricate, and opaque machine learning models.