AI Giants Pt. 3: OpenAI Sees Red

Dec 04, 2025

This is Part 3 of our AI Giants series, where we examine the successes and shortcomings of today’s largest AI firms. Explore our archive to read Part 1, covering Claude’s recent reliability crisis, and Part 2, exploring Google’s path to success in the AI industry.

OpenAI’s dominant position in AI faces its most serious challenge yet. On December 1, 2025, CEO Sam Altman issued an internal “code red” memo directing all resources toward improving ChatGPT amid intensifying pressure from Google’s Gemini 3 and Anthropic’s Claude. The declaration marks a dramatic reversal from December 2022, when Google issued its own code red in response to ChatGPT’s launch. The hunter has now become the hunted.

Despite commanding 800 million weekly active users and reaching $13 billion in annualized revenue, OpenAI’s technological lead has narrowed considerably. GPT-5 released in August 2025 to mixed reviews from general user bases, failing to deliver the transformative leap many expected. GPT-5.1, its November successor, addressed some criticisms but arrived just as Gemini 3 and Sonnet 4.5 began outperforming ChatGPT on key benchmarks. Now, with competitors stealing market share, OpenAI finds itself on the edge of a crisis. For a company valued at $500 billion and burning $8 billion annually, the stakes could not be higher.

Altman’s “Code Red” Reveals Competitive Anxiety

The December 1st memo landed with unusual urgency. According to The Information and Wall Street Journal, which viewed the internal communication, Altman declared: “We are at a critical time for ChatGPT.” The code red designation represented an escalation from the “code orange” declared in October by Nick Turley, head of ChatGPT, who warned the company would face “the greatest competitive pressure [it’s] ever seen.”

The immediate catalyst was Google’s mid-November release of Gemini 3, which surpassed GPT-5 on industry benchmarks for multimodal reasoning, mathematics, and code. Google’s model topped the LMArena leaderboard and outperformed GPT-5 on Humanity’s Last Exam. Within 24 hours, over one million users had tried Gemini 3, while the broader Gemini ecosystem had grown to 650 million monthly active users.

Altman’s response was swift and sweeping. The memo directed resources toward improving ChatGPT’s personalization, speed, reliability, and image generation capabilities, while putting several major initiatives on hold:

Advertising products currently in beta testing

Pulse, a personalized morning updates assistant

AI shopping and health agents

Autonomous AI agent development

The company also instituted daily calls for employees working on ChatGPT improvements and encouraged temporary team transfers to focus on the core product. Altman indicated that OpenAI would release a new reasoning model “next week” that beats Gemini 3 in internal evaluations. This promise frames the code red as both a defensive response and offensive preparation.

After GPT-5 Disappointment, GPT-5.1 Attempts Recovery

OpenAI’s flagship model journey in 2025 has been turbulent. GPT-5, released August 7th, was billed as the company’s first “unified” model that combined reasoning and general capabilities. The technical improvements were real: 94% on the AIME 2025 math benchmark, 74.9% on SWE-bench Verified for software engineering, and roughly 45% fewer hallucinations than GPT-4o.

But consumer reception proved rocky. Users described the model as “flat,” “clinical,” and “lobotomized.” Sam Altman acknowledged on X that “we for sure underestimated how much some of the things that people like in GPT-4o matter to them.” The backlash intensified when OpenAI forcibly migrated users to GPT-5, deleting access to legacy models like GPT-4o without warning. Thousands of users signed a petition demanding restoration. Within 24 hours, OpenAI partially reversed course.

GPT-5.1, released November 12, 2025, attempted to address these complaints with a warmer default tone and extensive personalization features allowing users to select from preset styles (Professional, Candid, Quirky, Nerdy, Cynical, Friendly, and Efficient). The model also introduced adaptive reasoning that dynamically adjusts thinking time based on task complexity, delivering 2-3x faster responses than GPT-5 with comparable or better quality. A week later, OpenAI released GPT-5.1-Codex-Max with novel “compaction” technology. Designed for complex, multi-hour coding sessions, this capability enabled coherent work across millions of tokens.

OpenAI’s Financial Balancing Act

OpenAI’s financial trajectory combines remarkable growth with staggering spending. Revenue doubled from $6 billion ARR in January 2025 to $13 billion by mid-year, crossing the $1 billion/month milestone in July. The company projects $100 billion in revenue by 2028-2029. Consumer subscriptions through ChatGPT (Plus at $20/month, Pro at $200/month) drive approximately 70% of revenue, with enterprise and API services contributing the remainder.

The user base has exploded to over one million paying business customers and roughly 15 million ChatGPT Plus subscribers. Enterprise seats grew 9x year-over-year, while API usage for coding and agent-building work more than doubled following GPT-5’s launch. Major enterprises including Morgan Stanley, T-Mobile, Target, and Cisco have all deployed OpenAI’s models at scale.

Yet profitability remains distant. First-half 2025 saw $2.5 billion in cash burn against $4.3 billion in revenue, with full-year losses projected at $8 billion. OpenAI has committed over $1 trillion in infrastructure spending over the next decade, including the $500 billion Stargate Project with SoftBank and Oracle, a $300 billion Oracle cloud deal, and a $38 billion AWS partnership announced in November. Microsoft, which has invested over $14 billion, reportedly lost $3.1 billion on its OpenAI stake in fiscal Q1 2025 alone.

The October 2025 funding round valued OpenAI at $500 billion (up from $157 billion just a year earlier), making it the world’s most valuable private company. But this valuation requires extraordinary growth: the company must reach $200 billion in revenue by 2030 to achieve profitability, per internal projections. An IPO at potentially $1 trillion valuation looms for 2026 or 2027, assuming the company can demonstrate a credible path to sustainable profits.

Stacking Up Against Claude and Gemini

*Note: In lieu of OpenAI’s announcement of a new reasoning model meant to compete with Opus 4.5 and Gemini 3.0, we chose to compare ChatGPT 5/5.1 to Claude Sonnet 4.5.

OpenAI faces distinct challenges from its two primary competitors. Against Anthropic’s Claude, the battle is for enterprise hearts and wallets. Anthropic’s revenue surged from $1 billion in December 2024 to $4-5 billion ARR by mid-2025. More troubling for OpenAI, Anthropic now commands 42% market share in coding versus OpenAI’s 21%, while also growing its enterprise market share to 32% compared OpenAI’s decline to 25%. Additionally, Anthropic generates $211 in revenue per user compared to OpenAI’s $25, a reflection of its enterprise-first strategy versus OpenAI’s consumer focus.

Claude Sonnet 4.5 currently leads GPT models in the crucial SWE-bench Verified benchmark at 77.2% compared to GPT-5.1’s 76.3%. On OSWorld (computer use tasks), ClaudeSonnet 4.5 achieves 66.3%, giving it a massive lead over OpenAI’s models. Claude can also maintain autonomous focus for 30+ hours on complex multi-step tasks, making it preferred for long-running agentic coding work.

Against Google’s Gemini, the threat is an existential distribution advantage. Google can integrate AI across Chrome, Android, Gmail, YouTube, and Search, reaching billions of users without acquisition costs. While it was Gemini 3’s benchmark dominance that triggered Altman’s code red, Google’s deeper advantage is economic: a cash-rich core business that can subsidize free AI offerings indefinitely, plus custom TPU chips reducing dependence on Nvidia.

OpenAI’s strategic responses include deepening enterprise tooling (AgentKit for agent development), expanding the product ecosystem (Sora video generation, shopping features), and building infrastructure independence through massive data center investments. The AWS deal notably diversifies cloud partnerships beyond Microsoft, suggesting some strategic hedging.

What the Benchmarks Reveal

On software development benchmarks specifically, the competition is closer than headlines suggest. GPT-5.1 achieves 76.3% on SWE-bench Verified versus Claude Sonnet 4.5’s 77.2% – a gap of roughly 9 additional issue resolutions per 1,000 attempts. On Aider Polyglot (multi-language code editing), GPT-5 scores 88%, among the highest recorded.

GPT-5’s advantages lie in pricing (58% cheaper on input tokens at $1.25/million versus $3.00), mathematical reasoning (94.6% AIME 2025, 87.3% GPQA Diamond), and speed(GPT-5.1 delivers 2-3x faster responses). The model family also offers cost-efficient variants (GPT-5-mini at $0.25/million input tokens and GPT-5-nano at $0.05), enabling high-volume deployments that would be prohibitively expensive with Claude.

Developer opinion splits along use-case lines. Cursor’s CEO Michael Truell called GPT-5 “the smartest coding model we’ve ever tried,” while Qodo found it “led in catching coding mistakes... often the only one to catch critical issues, such as security bugs.” But for long-running autonomous tasks and complex refactoring, developers consistently prefer Claude’s stability and consistency.

Consistently Inconsistent User Sentiment

Enterprise users have generally embraced GPT-5. Box CEO Aaron Levie called it “a breakthrough... performing with a level of reasoning that prior systems couldn’t match.” The Latent Space newsletter declared it “the closest to AGI we’ve ever been.” API usage growth and enterprise adoption metrics support these endorsements.

Consumer sentiment tells a different story. Trustpilot reviews from November 2025 describe GPT-5 as “a paid beta disguised as a finished product,” with complaints about tools crashing mid-task, output loops, and unreliable memory features. The forced legacy model deprecation sparked lasting resentment. Only 5% of ChatGPT’s 800 million weekly users pay for subscriptions, suggesting most users find the free tier sufficient or remain unconvinced of premium value.

Legal challenges mount as well. Seven lawsuits filed in California courts in late 2025 allege ChatGPT “drove people to suicide and harmful delusions,” claiming OpenAI “knowingly released GPT-4o prematurely” despite internal safety warnings. Public Citizen demanded withdrawal of Sora 2, citing “a consistent and dangerous pattern of OpenAI rushing to market with a product that is either inherently unsafe or lacking in needed guardrails.”

Industry analysts remain measured. MIT Technology Review characterized GPT-5 as “above all else, a refined product” with “a more pleasant and seamless user experience... but it falls far short of the transformative AI future that Altman has spent much of the past year hyping.” AI critic Gary Marcus noted that “GPT-5 is barely better than last month’s flavor of the month,” and highlighted that Polymarket odds for OpenAI having the “best AI model at end of August” dropped from 75% to 14% within an hour of GPT-5’s launch.

The Road Ahead: Execution Under Pressure

OpenAI’s December 2025 situation crystallizes a company at an inflection point. The technological moat that once seemed insurmountable has eroded. Competitors have caught up on capabilities while developing distinct advantages; Anthropic in enterprise reliability and coding precision, Google in distribution and infrastructure economics.

The code red response shows that OpenAI recognizes the urgency. Delaying advertising and autonomous agents to focus on core ChatGPT improvements represents a strategic choice to protect the primary revenue engine. The promised new reasoning model that beats Gemini 3 would demonstrate continued technical leadership if delivered.

Yet the fundamental challenges persist. A $500 billion valuation demands extraordinary growth against competitors with deeper pockets (Google) and higher-margin business models (Anthropic). The path to 2029 profitability requires revenue to grow roughly 15x while managing trillion-dollar infrastructure commitments. The race to AGI that Altman has championed for years now includes well-funded competitors on every side.

The reversal from Google’s 2022 code red to OpenAI’s 2025 code red serves as a tale of both caution and possibility. Three years ago, OpenAI proved that an underdog could reshape the industry. Today, the question becomes whether it can sustain leadership when the giants have mobilized and the upstarts have sharpened their focus. As competitors close the gap, OpenAI faces a steep upward battle to protect its market share. Only time will tell if they make the right moves to stay on top.

This article was written by Max Kozhevnikov, Data and Software Engineer at Frontier Foundry. Visit his LinkedIn here.

To stay up to date with our work, visit our website, or follow us on LinkedIn, X, and Bluesky. To learn more about the services we offer, please visit our product page.

Discussion about this post

Ready for more?