Saturday, April 11, 2026
Mobile Offer

🎁 You've Got 1 Reward Left

Check if your device is eligible for instant bonuses.

Unlock Now
Survey Cash

🧠 Discover the Simple Money Trick

This quick task could pay you today — no joke.

See It Now
Top Deals

📦 Top Freebies Available Near You

Get hot mobile rewards now. Limited time offers.

Get Started
Game Offer

🎮 Unlock Premium Game Packs

Boost your favorite game with hidden bonuses.

Claim Now
Money Offers

💸 Earn Instantly With This Task

No fees, no waiting — your earnings could be 1 click away.

Start Earning
Crypto Airdrop

🚀 Claim Free Crypto in Seconds

Register & grab real tokens now. Zero investment needed.

Get Tokens
Food Offers

🍔 Get Free Food Coupons

Claim your free fast food deals instantly.

Grab Coupons
VIP Offers

🎉 Join Our VIP Club

Access secret deals and daily giveaways.

Join Now
Mystery Offer

🎁 Mystery Gift Waiting for You

Click to reveal your surprise prize now!

Reveal Gift
App Bonus

📱 Download & Get Bonus

New apps giving out free rewards daily.

Download Now
Exclusive Deals

💎 Exclusive Offers Just for You

Unlock hidden discounts and perks.

Unlock Deals
Movie Offer

🎬 Watch Paid Movies Free

Stream your favorite flicks with no cost.

Watch Now
Prize Offer

🏆 Enter to Win Big Prizes

Join contests and win amazing rewards.

Enter Now
Life Hack

💡 Simple Life Hack to Save Cash

Try this now and watch your savings grow.

Learn More
Top Apps

📲 Top Apps Giving Gifts

Download & get rewards instantly.

Get Gifts
Summer Drinks

🍹 Summer Cocktails Recipes

Make refreshing drinks at home easily.

Get Recipes

Latest Posts

OpenAI Releases Research Preview of ‘gpt-oss-safeguard’: Two Open-Weight Reasoning Models for Safety Classification Tasks


OpenAI has released a research preview of gpt-oss-safeguard, two open weight safety reasoning models that let developers apply custom safety policies at inference time. The models come in two sizes, gpt-oss-safeguard-120b and gpt-oss-safeguard-20b, both fine tuned from gpt-oss, both licensed under Apache 2.0, and both available on Hugging Face for local use.

https://openai.com/index/introducing-gpt-oss-safeguard/

Why Policy-Conditioned Safety Matters?

Conventional moderation models are trained on a single fixed policy. When that policy changes, the model must be retrained or replaced. gpt-oss-safeguard reverses this relationship. It takes the developer authored policy as input together with the user content, then reasons step by step to decide whether the content violates the policy. This turns safety into a prompt and evaluation task, which is better suited for fast changing or domain specific harms such as fraud, biology, self harm or game specific abuse.

Same Pattern as OpenAI’s Internal Safety Reasoner

OpenAI states that gpt-oss-safeguard is an open weight implementation of the Safety Reasoner used internally across systems like GPT 5, ChatGPT Agent and Sora 2. In production settings OpenAI already runs small high recall filters first, then escalates uncertain or sensitive items to a reasoning model, and in recent launches up to 16 percent of total compute was spent on safety reasoning. The open release lets external teams reproduce this defense in depth pattern instead of guessing how OpenAI’s stack works.

Model Sizes and Hardware Fit

The large model, gpt-oss-safeguard-120b, has 117B parameters with 5.1B active parameters and is sized to fit on a single 80GB H100 class GPU. The smaller gpt-oss-safeguard-20b has 21B parameters with 3.6B active parameters and targets lower latency or smaller GPUs, including 16GB setups. Both models were trained on the harmony response format, so prompts must follow that structure otherwise results will degrade. The license is Apache 2.0, the same as the parent gpt-oss models, so commercial local deployment is permitted.

https://openai.com/index/introducing-gpt-oss-safeguard/

Evaluation Results

OpenAI evaluated the models on internal multi policy tests and on public datasets. In multi policy accuracy, where the model must correctly apply several policies at once, gpt-oss-safeguard and OpenAI’s internal Safety Reasoner outperform gpt-5-thinking and the open gpt-oss baselines. On the 2022 moderation dataset the new models slightly outperform both gpt-5-thinking and the internal Safety Reasoner, however OpenAI specifies that this gap is not statistically significant, so it should not be oversold. On ToxicChat, the internal Safety Reasoner still leads, with gpt-oss-safeguard close behind. This places the open models in the competitive range for real moderation tasks.

OpenAI is explicit that pure reasoning on every request is expensive. The recommended setup is to run small, fast, high recall classifiers on all traffic, then send only uncertain or sensitive content to gpt-oss-safeguard, and when user experience requires fast responses, to run the reasoner asynchronously. This mirrors OpenAI’s own production guidance and reflects the fact that dedicated task specific classifiers can still win when there is a large high quality labeled dataset.

Key Takeaways

  1. gpt-oss-safeguard is a research preview of two open weight safety reasoning models, 120b and 20b, that classify content using developer supplied policies at inference time, so policy changes do not require retraining.
  2. The models implement the same Safety Reasoner pattern OpenAI uses internally across GPT 5, ChatGPT Agent and Sora 2, where a first fast filter routes only risky or ambiguous content to a slower reasoning model.
  3. Both models are fine tuned from gpt-oss, keep the harmony response format, and are sized for real deployments, the 120b model fits on a single H100 class GPU, the 20b model targets 16GB level hardware, and both are Apache 2.0 on Hugging Face.
  4. On internal multi policy evaluations and on the 2022 moderation dataset, the safeguard models outperform gpt-5-thinking and the gpt-oss baselines, but OpenAI notes that the small margin over the internal Safety Reasoner is not statistically significant.
  5. OpenAI recommends using these models in a layered moderation pipeline, together with community resources such as ROOST, so platforms can express custom taxonomies, audit the chain of thought, and update policies without touching weights.

OpenAI is taking an internal safety pattern and making it reproducible, which is the most important part of this launch. The models are open weight, policy conditioned and Apache 2.0, so platforms can finally apply their own taxonomies instead of accepting fixed labels. The fact that gpt-oss-safeguard matches and sometimes slightly exceeds the internal Safety Reasoner on the 2022 moderation dataset, while outperforming gpt-5-thinking on multi policy accuracy, but with a non statistically significant margin, shows the approach is already usable. The recommended layered deployment is realistic for production.


Michal Sutter is a data science professional with a Master of Science in Data Science from the University of Padova. With a solid foundation in statistical analysis, machine learning, and data engineering, Michal excels at transforming complex datasets into actionable insights.

🙌 Follow MARKTECHPOST: Add us as a preferred source on Google.



Source link

Latest Posts

Don't Miss

Stay in touch

To be updated with all the latest news, offers and special announcements.