This page contains press release content distributed by XPR Media. Members of the editorial and news staff of the USA TODAY Network were not involved in the creation of this content.

Quesma Releases OTelBench: Independent Benchmark Reveals Frontier LLMs Struggle with Real-World SRE Tasks

New benchmark shows top LLMs achieve only 29% pass rate on OpenTelemetry instrumentation, exposing the gap between coding ability and real-world SRE work.

OTelBench shows that while LLMs are impressive at generating code snippets, they’re not yet capable of the cross-cutting reasoning required for production engineering.”

— Jacek Migdał, founder of Quesma

WARSAW, POLAND, January 20, 2026 /EINPresswire.com/ — Quesma, Inc. announced the release of OTelBench, the first comprehensive benchmark for evaluating LLMs on OpenTelemetry instrumentation tasks. The open-source dataset tests 14 state-of-the-art models across 23 real-world tasks in 11 programming languages, revealing significant gaps in AI’s ability to handle production-grade Site Reliability Engineering (SRE) work.

While frontier LLMs have demonstrated impressive coding capabilities, the benchmark reveals a stark reality: the best-performing model, Claude Opus 4.5, achieved only a 29% pass rate on OpenTelemetry instrumentation tasks, compared to 80.9% pass rate in the SWE-Bench. This gap highlights a critical distinction between writing code and performing the complex, cross-cutting engineering work required for production systems.

The $1.4 Million Per Hour Problem
Enterprise outages cost an average of $1.4 million per hour, making production visibility mission-critical. Distributed tracing, the gold standard for debugging complex microservices, allows teams to link user actions to every underlying service call. However, implementing this visibility remains difficult, with 39% of organizations citing complexity as their top observability obstacle. OpenTelemetry has emerged as the industry standard with backing from 1,100+ organizations, yet configuring it correctly remains a major source of toil for SRE teams.

Fundamental Limitations Exposed
The benchmark tested models on agentic coding tasks where they were given source code from realistic applications, an interactive Linux terminal, and clear instrumentation objectives. The results revealed several critical failure modes:

Context propagation, passing trace context between services to maintain parent-child span relationships, proved to be an insurmountable barrier for most models. This is particularly concerning because context propagation is fundamental to distributed tracing.

“The backbone of the software industry consists of complex, high-scale production systems with mission-critical reliability, and seasoned engineers are architecting, evolving, and troubleshooting them,” said Jacek Migdał, founder of Quesma. “OTelBench shows that while LLMs are impressive at generating code snippets, they’re not yet capable of the cross-cutting reasoning and sustained problem-solving required for production engineering. This gap matters because many vendors are marketing AI SRE solutions with bold claims but no independent verification. We need benchmarks like this to separate reality from hype.”

Language Ecosystems Matter
Success rates varied dramatically across programming languages, revealing that AI generalization is far weaker than human engineers. Models had some moderate success with Go and, quite surprisingly, C++. A few tasks were completed for JavaScript, PHP, .NET, and Python. Just a single model solved a single task in Rust. None of the models solved a single task in Swift, Ruby, or (to our biggest surprise, due to a build issue) – Java.

Why This Matters for AI Development
OTelBench reveals several reasons why OpenTelemetry instrumentation challenges current LLMs:
– Reliability-critical applications reside in private repositories at companies like Apple, Airbnb, and Netflix, limiting training data.
– Instrumentation requires cross-cutting changes across codebases, rather than sequential additions.
– Some tasks required 50+ commands over 10+ minutes. Models consistently performed worse as tasks lengthened.

Migdał added, “AI SRE in 2026 is what DevOps Anomaly Detection was in 2016—lots of marketing, huge budgets, but lacking independent benchmarks. Just as SWE-Bench became the standard for coding evaluation, we need SRE-style benchmarks to determine what actually works. That’s why we’re releasing OTelBench as open-source: to create a North Star for navigating the AI hype and to enable the community to track real progress.”

A Path Forward
Despite the challenges, the benchmark reveals promising signals. Claude Opus 4.5, GPT-5.2, and Gemini 3 models show capability on specific tasks, with go-otel-microservices-traces reaching a 52% pass rate. With more environments for Reinforcement Learning with Verified Rewards, OpenTelemetry instrumentation appears to be a solvable problem for future AI systems.

Until then, organizations requiring distributed tracing across services should expect to write that code themselves—or work alongside AI assistants that understand their limitations.

OTelBench is available today as an open-source project at https://quesma.com/benchmarks/otel/, enabling researchers and practitioners to reproduce results and contribute additional test cases.

Lucie Šimečková
Quesma
press@quesma.com

Legal Disclaimer:

EIN Presswire provides this news content “as is” without warranty of any kind. We do not accept any responsibility or liability
for the accuracy, content, images, videos, licenses, completeness, legality, or reliability of the information contained in this
article. If you have any complaints or copyright issues related to this article, kindly contact the author above.

Information contained on this page is provided by an independent third-party content provider. XPRMedia and this Site make no warranties or representations in connection therewith. If you are affiliated with this page and would like it removed please contact pressreleases@xpr.media

Sotheby’s Concierge Auctions: Bidding Open for Historic ‘Visions of America’ Sale

Sotheby’s Concierge Auctions: Bidding Open for Historic ‘Visions of America’ Sale

Seven prestigious American properties set to sell alongside a curated selection of American art and objects at

January 26, 2026

New Homes in Inverness, FL Tackle 2026 Housing Crisis: Developer Debuts ‘Sabal Palm’ Model Under $250k

New Homes in Inverness, FL Tackle 2026 Housing Crisis: Developer Debuts ‘Sabal Palm’ Model Under $250k

Priced from $247,700 with No HOA fees and access to Down Payment Assistance, these solid-block homes offer a rare

January 26, 2026

Sasha’s Pet Resort Celebrating January’s National-Walk-Your-Dog-Month

Sasha’s Pet Resort Celebrating January’s National-Walk-Your-Dog-Month

This celebration is an annual January observance focused on inspiring owners to prioritize daily walks for their dogs

January 26, 2026

Moving Beyond Hype: Expert AI Prompts Launches Financial Tool to Measure the Hard Dollar Value of Automation

Moving Beyond Hype: Expert AI Prompts Launches Financial Tool to Measure the Hard Dollar Value of Automation

New 'AI ROI Calculator' from Expert AI Prompts helps small businesses quantify the financial value of automation.

January 26, 2026

Jill Bennett – #1 NextHome Realtor in Texas for 2025 Production Volume

Jill Bennett – #1 NextHome Realtor in Texas for 2025 Production Volume

Jill Bennett earns the #1 production spot among all NextHome Texas agents, leading a brokerage with four Top 30

January 26, 2026

Alternative to Meds Center Provides Clinical Insight on Effexor Withdrawal and Safer Discontinuation Support

Alternative to Meds Center Provides Clinical Insight on Effexor Withdrawal and Safer Discontinuation Support

Exploring withdrawal risks, symptom variability, and safer discontinuation strategies Many people are surprised by how

January 26, 2026

Leving Law Firm Matrimonial Law Seminar: Maximizing Client Success through Follow-Up and More

Leving Law Firm Matrimonial Law Seminar: Maximizing Client Success through Follow-Up and More

CHICAGO, IL, UNITED STATES, January 20, 2026 /EINPresswire.com/ — The nationally acclaimed Law Offices of Jeffery M.

January 26, 2026

Pirani Unveils 2026 Latin America Risk Study Highlighting AI, Cybersecurity, and Compliance

Pirani Unveils 2026 Latin America Risk Study Highlighting AI, Cybersecurity, and Compliance

MIAMI, FL, UNITED STATES, January 20, 2026 /EINPresswire.com/ — Pirani announced the release of the 2026 Latin America

January 26, 2026

Pharmacy Owners and Clinics Urged to Register with MedTrax to Meet DSCSA Compliance Requirements

Pharmacy Owners and Clinics Urged to Register with MedTrax to Meet DSCSA Compliance Requirements

Drug Supply Chain Security Act Enforcement is now in effect. Act today to avoid serious fines and legal consequences

January 26, 2026

Outdate Rx awarded Pharmacy Reverse Distribution agreement with Premier, Inc.

Outdate Rx awarded Pharmacy Reverse Distribution agreement with Premier, Inc.

REDLANDS, CA, UNITED STATES, January 20, 2026 /EINPresswire.com/ — Outdate Rx has been awarded a national group

January 26, 2026

PCI Race Radios Returns to King of the Hammers for the Event’s 20th Anniversary Celebration

PCI Race Radios Returns to King of the Hammers for the Event’s 20th Anniversary Celebration

CYPRESS, CA, UNITED STATES, January 20, 2026 /EINPresswire.com/ — PCI Race Radios is proud to announce its attendance

January 26, 2026

Cloudticity Completes SOC 2 Type II Examination, Expanding Proven Leadership in Healthcare Cloud Security

Cloudticity Completes SOC 2 Type II Examination, Expanding Proven Leadership in Healthcare Cloud Security

Cloudticity completes SOC 2 Type II Examination, reinforcing commitment to protecting sensitive healthcare data &

January 26, 2026

U.S. Military Veteran Antoine Scott Debuts Stand-Up Comedy Special Same Thing I Said on Fawesome TV

U.S. Military Veteran Antoine Scott Debuts Stand-Up Comedy Special Same Thing I Said on Fawesome TV

The highly anticipated stand-up special delivers sharp wit, heartfelt storytelling, and healing laughter, now streaming

January 26, 2026

ilumin Publishes Practical Guide for Patients Choosing a Cataract Surgeon in Omaha, NE

ilumin Publishes Practical Guide for Patients Choosing a Cataract Surgeon in Omaha, NE

January 20, 2026 – PRESSADVANTAGE – ilumin has released a new resource titled Best Cataract Surgeon in Omaha: 14

January 26, 2026

Ciari Guitars Introduces the Trio Plus — The World’s Most Versatile Professional Folding Guitar

Ciari Guitars Introduces the Trio Plus — The World’s Most Versatile Professional Folding Guitar

Ciari Guitars introduces the Trio Plus, the world's most versatile, stage-ready, professional guitar that folds and

January 26, 2026

SilverTech Appoints Chris Crombie as Chief Growth Officer to Accelerate Market Expansion

SilverTech Appoints Chris Crombie as Chief Growth Officer to Accelerate Market Expansion

BEDFORD, NH, UNITED STATES, January 20, 2026 /EINPresswire.com/ — SilverTech is pleased to announce that Chris Crombie

January 26, 2026

Flux AI Launches CRAISEE Teams Enterprise: the First Scalable Generative AI Platform for Organizations

Flux AI Launches CRAISEE Teams Enterprise: the First Scalable Generative AI Platform for Organizations

CRAISEE Teams Enterprise is the first organizational Generative AI Platform with 5000+ integrated AI models for teams

January 26, 2026

Prosper Insights & Analytics: January Data Shows Softer Confidence but Steady Consumer Mood as 2026 Begins

Prosper Insights & Analytics: January Data Shows Softer Confidence but Steady Consumer Mood as 2026 Begins

For retailers and investors, the message is that consumers are still in the game—they’re just demanding a clearer value

January 26, 2026

‘What About Me’ by The Tano Jones Revelry Climbs to #22 on Mediabase Hot AC and #24 on Billboard Adult Pop Airplay

‘What About Me’ by The Tano Jones Revelry Climbs to #22 on Mediabase Hot AC and #24 on Billboard Adult Pop Airplay

Band’s breakout single earns major national radio support, American Top 40 debut, and global momentum Having our music

January 26, 2026

New Research-Based Curriculum Uses Music-Making to Prepare Young Brains for Reading

New Research-Based Curriculum Uses Music-Making to Prepare Young Brains for Reading

Early childhood curriculum strengthens music skills that support reading achievement I have been BLOWN away by what the

January 26, 2026

AMPP Announces 2026 Annual Service and Technical Awards Winners

AMPP Announces 2026 Annual Service and Technical Awards Winners

Recognizing global leaders advancing corrosion control, materials performance, and asset protection across industries

January 26, 2026

Impact Windows Identified as Top Home Upgrade for 2026

Impact Windows Identified as Top Home Upgrade for 2026

Florida homeowners are focusing on upgrades that improve safety, cut energy costs, and add long-term value. CLEARWATER,

January 26, 2026

Omegarender Named World’s Best Architectural CGI Company 2025–2026

Omegarender Named World’s Best Architectural CGI Company 2025–2026

Omegarender has been recognized at the highest level of the International Property Awards, one of the leading global

January 26, 2026

Beverly Hills Chamber of Commerce Announces 4th Annual Women Who Shine: International Soirée

Beverly Hills Chamber of Commerce Announces 4th Annual Women Who Shine: International Soirée

BEVERLY HILLS, CA, UNITED STATES, January 20, 2026 /EINPresswire.com/ — The Beverly Hills Chamber of Commerce, through

January 26, 2026

As In-Custody Death Lawsuits Rise, Public Safety Agencies Rethink Risk Detection

As In-Custody Death Lawsuits Rise, Public Safety Agencies Rethink Risk Detection

Agencies face mounting pressure to identify distress sooner amid limited resources WASHINGTON, DC, UNITED STATES,

January 26, 2026

Skin Barrier Health Emerges as a Central Topic in U.S. Skincare Industry Discussions

Skin Barrier Health Emerges as a Central Topic in U.S. Skincare Industry Discussions

Professionals and consumers alike are increasingly examining formulation approaches designed to support long-term skin

January 26, 2026

Strive Workspaces Celebrates 1 Year in Boulder with a Vibrant Workspace Community

Strive Workspaces Celebrates 1 Year in Boulder with a Vibrant Workspace Community

Strive Workspaces celebrates the one-year anniversary of Strive Boulder, marking a year of growth, community-building,

January 26, 2026

HonorHealth’s commitment to exceptional nursing care earns national recognition

HonorHealth’s commitment to exceptional nursing care earns national recognition

HonorHealth earns another national Beacon Award, reinforcing its leadership in nursing excellence and commitment to

January 26, 2026

Reed Family Dentistry Celebrates Award-Winning 2025, Honored for Excellence in Patient Care and Community Service

Reed Family Dentistry Celebrates Award-Winning 2025, Honored for Excellence in Patient Care and Community Service

We believe dentistry is about more than teeth. It’s about building relationships, serving families, and giving back to

January 26, 2026

Alsion Montessori School Announces 2026 Open House Events: Montessori Meets College Prep

Alsion Montessori School Announces 2026 Open House Events: Montessori Meets College Prep

At Alsion, we believe students do their best work when they feel supported rather than pressured. Our goal is to help

January 26, 2026

The Good Feet Store Opens in Princeton to Help Local Residents Live the Life They Love

The Good Feet Store Opens in Princeton to Help Local Residents Live the Life They Love

The Leading Retailer of Premium, Personally-Fitted Arch Supports Expands with the Third Store in New Jersey The newest

January 26, 2026

Launch of YouTubeVideoPromotion.com Offers Compliant Video Growth via Official Google Advertising

Launch of YouTubeVideoPromotion.com Offers Compliant Video Growth via Official Google Advertising

LONDON, UNITED KINGDOM, January 20, 2026 /EINPresswire.com/ — YouTubeVideoPromotion.com, has entered the digital

January 26, 2026

Dehumidifier Corporation of America Strengthens Support for Controlled-Environment Agriculture with Enhanced Grow Room Dehumidifier Line

Dehumidifier Corporation of America Strengthens Support for Controlled-Environment Agriculture with Enhanced Grow Room Dehumidifier Line

CEDARBURG, WI – January 20, 2026 – PRESSADVANTAGE – Dehumidifier Corporation of America (DCA) announced the expansion

January 26, 2026

PASCAL Introduces Time Architecture Collection with Men’s Timeless Watch and Women’s Oval Watch

PASCAL Introduces Time Architecture Collection with Men’s Timeless Watch and Women’s Oval Watch

January 20, 2026 – PRESSADVANTAGE – PASCAL announces the launch of the Time Architecture Watch Collection, marking the

January 26, 2026

LDG Estate Agents Announces Comprehensive Service Enhancement Programme Across Property Divisions

LDG Estate Agents Announces Comprehensive Service Enhancement Programme Across Property Divisions

Looking at this address "53 Great Titchfield, 53 Great Titchfield", this appears to be incomplete and likely refers to

January 26, 2026

Silverback AI Chatbot Shares an In-Depth Overview of Its AI Chatbot Feature and Its Role in Modern Digital Communication

Silverback AI Chatbot Shares an In-Depth Overview of Its AI Chatbot Feature and Its Role in Modern Digital Communication

New York, New York – January 20, 2026 – PRESSADVANTAGE – Silverback AI Chatbot has released an announcement outlining

January 26, 2026

Peer-Reviewed Clinical Study Shows Ruby-O® Balance Achieves Triglyceride Response at Lower Omega-3 Dose

Peer-Reviewed Clinical Study Shows Ruby-O® Balance Achieves Triglyceride Response at Lower Omega-3 Dose

The study evaluates how omega-3 delivery architecture influences triglyceride outcomes and omega-3 incorporation in

January 26, 2026

Intiqe Launches at Davos to Establish a New Trust Layer for AI in the Future of Work

Intiqe Launches at Davos to Establish a New Trust Layer for AI in the Future of Work

Career Twin platform debuts on Tech & Society Day, redefining how human capability is captured, owned, and used in

January 26, 2026

At AirCargo Conference, Crown Data Systems to demo new products, AI tools

At AirCargo Conference, Crown Data Systems to demo new products, AI tools

If you’re using Crown Freight Manager and CrownX | Driver effectively, the work you save is equal to two or three FTEs.

January 26, 2026

The Good Feet Store Opens in Bellingham to Help Local Residents Live the Life They Love

The Good Feet Store Opens in Bellingham to Help Local Residents Live the Life They Love

The Leading Retailer of Premium, Personally-Fitted Arch Supports Expands with the Ninth Store in Washington State This

January 26, 2026