Advanced AI Safety Testing — 5 Major Labs Gain Access

The U.S. Department of Commerce has made a major decision: it has expanded the pre-release AI model safety testing program, bringing 5 major labs into the fold. These five labs include Google DeepMind, Microsoft, and xAI alongside Anthropic and OpenAI. Previously, only Anthropic and OpenAI were part of this program.

This news may not grab top headlines, but in terms of long-term impact on the AI industry, it is one of the most significant developments of 2026.

How Did Things Work Before?

Until this decision, only Anthropic and OpenAI voluntarily submitted their models for pre-release safety testing to the U.S. government. This was based on a voluntary agreement signed at the White House in 2023.

The problem was that other companies like Google, Microsoft, and xAI were not part of this program. This meant that models like Gemini, Microsoft’s models, and Grok were being released without independent government safety testing.

Imagine if only two car manufacturers crash-tested their vehicles while the rest put theirs on the road without any testing.

What’s Different About the New Program?

The key changes are:

1. Five Labs Instead of Two

Google DeepMind, Microsoft, and xAI have been added. This means virtually all major American AI models will undergo safety testing before public release.

2. Mandatory Instead of Voluntary

This is the most important change. Previously, companies participated voluntarily. Now there is a legal framework behind it. The Department of Commerce can require companies to submit their models before release.

3. Clearer Testing Criteria

NIST (the National Institute of Standards and Technology) has designed a comprehensive testing framework covering several key areas:

Biological Safety: Can the model generate instructions for creating biological weapons?
Cybersecurity: Can the model be used for advanced cyberattacks?
Autonomy: Does the model show tendencies toward independent, uncontrolled action?
Persuasion and Manipulation: Can the model be used to deceive or manipulate public opinion?
Discrimination and Bias: Does the model have systematic bias against specific groups?

4. Public Reporting

Test results (with sensitive information redacted) will be published publicly. This means the public and researchers can see how each model scored.

What Does Each Lab Bring to the Table?

Each of these five companies has a specific area of expertise:

Anthropic: A pioneer in Constitutional AI techniques and red teaming. They have extensive experience identifying dangerous model behaviors.

OpenAI: Has the largest safety testing team and experience testing very large models like GPT-5 and GPT-5.5.

Google DeepMind: Specializes in theoretical AI safety research with groundbreaking work in AI Alignment.

Microsoft: Has extensive practical experience protecting large-scale systems. Azure AI and Bing have provided valuable experience at scale.

xAI: Elon Musk’s company behind the Grok model. Their inclusion is interesting since Musk previously criticized government oversight, but he appears to have accepted under pressure.

Why Does This Matter?

Here are some key reasons:

Models have gotten more powerful: 2026 models are far more capable than those from 2023. When a model can write complex code, it can also write malware. When it can produce scientific papers, it can also produce convincing misinformation. Greater power means greater risk.

Access has become mainstream: Previously, only researchers and developers worked with AI models. Now hundreds of millions of everyday people use them. When the scale of usage goes up, even a small risk can have a massive impact.

International competition: China and the EU are also developing AI safety regulations. The U.S. doesn’t want to fall behind.

Reactions

Reactions have been mixed:

Supporters: AI safety researchers and civil rights groups have welcomed this decision. Yoshua Bengio, one of the pioneers of deep learning, said: “This is a necessary step. We must ensure every powerful model is safe before release.”

Critics: Some believe the program isn’t sufficient. Open-source models (like Llama) are not included. Non-American companies are not covered either. And pre-release testing only covers a single moment in a model’s lifecycle.

Industry: AI companies have generally shown cautious support. They know that overly strict oversight could slow innovation, but on the other hand, a clear framework is better than sudden, unpredictable regulations.

Comparison with Europe and China

The U.S. approach sits somewhere between Europe and China:

EU: The EU AI Act is the strictest AI safety legislation in the world. It classifies AI models by risk level and defines specific requirements for each tier. Penalties are severe: up to 7% of global revenue.

China: China also has strict regulations but focuses more on content control and censorship than technical safety. Chinese AI models must pass government content filters.

U.S.: The American approach is lighter than Europe’s, based more on industry collaboration than legal enforcement. But with the expansion to five labs, it’s moving toward a more serious stance.

What’s Still Missing?

Despite this progress, several important gaps remain:

Open-source models: Llama, Mistral, and dozens of other open-source models are not included. Anyone can download and use them without any oversight. This is a significant gap.

Post-release monitoring: Testing only occurs before release. But a model’s behavior in the real world may differ from lab tests. A continuous monitoring system is needed.

International coordination: AI models know no borders. A dangerous model can be used from anywhere in the world. Without international coordination, national oversight has limited effectiveness.

What’s Next?

The U.S. Congress is expected to pass comprehensive AI safety legislation by the end of 2026. This law will likely:

Make safety testing mandatory for models above a certain threshold
Define transparency requirements for AI companies
Establish an independent regulatory body for AI
Set penalties for violations

For now, expanding the safety testing program to five labs is a major step forward. It shows that both the government and industry are taking AI safety more seriously. The question is whether this pace is fast enough, or whether models are advancing faster than regulations.

One thing is certain: the era of unregulated AI is coming to an end. And that is good news for all of us.