GPT-5

Show HN: Codex context bloat? 87% avg reduction on SWE-bench Verified traces

If you had to build a context window manager in 24h, would you stick to the existing model or come up with something better?Here's what I did:1. Built a proxy that intercepts Codex's calls to OpenAI and rewrites them on the fly.2. Replayed 3,807 rounds of SWE-bench Verified traces through it: avg prompt 44k → 6k tokens (-87%).3. Posted it to HN to get the next reduction applied to my confidence interval — starting with the inevitable "How about accuracy?"npx -y pando-proxy ·

Show HN: A CLI to use any model in your coding agent

Hi everyone, I've been working on a CLI tool that can help to easily run any model in claude, Codex, Gemini, Pi, and OpenCode.It's also an API keys manager, supports multiple providers or OpenAI/Claude/Gemini accounts. You can add openrouter, poe, Vercel AI gateways etc.It has a built-in provider that is free to all, which is using Deepseek-V4, no login or API key required, add your own when you're ready.After installation you can try claude instantly (No config, no logi

Show HN: VT Code – Rust TUI coding agent with multi-provider support

Hi HN, I built VT Code, a semantic coding agent. Supports all SOTA and open sources model. Anthropic, OpenAI, Gemini, Codex. Agent Skills, Model Context Protocol and Agent Client Protocol (ACP) ready. All open source models are support. Local inference via LM Studio and Ollama (experiment). Semantic context understanding is supported by ast-grep for structured code search and ripgrep for powered grep.I built VT Code in Rust on Ratatui. Architecture and agent loop documented in the README and Dee

Ask HN: How are you evaluating AI apps and CLI?

I'm sure many of you work for companies where various AI tools are being made available and IT departments asking for feedback on those tools. The IT departments are allocating in some cases unlimited budget in the hopes that something comes out as a winner and sticks out eventually...For example the models from Anthropic, OpenAI, Google etc. can be accessed via: - IDE integration, e.g. VS Code, JetBrains etc. - Dedicated apps and CLIs, e.g. Codex, Claude, Copilot CLI etc.It's already

Ask HN: Is the ongoing AI research driving LLM models to be better?

I'm just a curious hobbyist that has ran LLM models locally and follow a lot of content about it. Hope we have a few AI researchers here on HN to clarify this.When using Opus or Codex vs. a chinese or Open source model, it feels like its reasoning capabilities are basically the same.The difference is typically in coding. It looks like OpenAI and Anthropic invest a lot in pre-training (paying Mercor and the like).Also a lot in creating synthetic data, I believe this has bigger AI research in

Ask HN: What does your agentic software dark factory look like?

In some of the comment threads around here a few of you shared interesting ideas and patterns, enough that I believe everyone interesting in harness engineering is working on some sort of software dark factory or another.We have OpenAI’s Symphony[1], StrongDM’s Factory[2], Yegge’s GasTown[3], and probably a few others I’ve missed.So I’m curious. What have you been working on? What have learned? What has worked and what has failed? And what do you think comes after?I’ll go first. The first thing

Show HN: Agent MCP Studio – build multi-agent MCP systems in a browser tab

I built a browser-only studio for designing and orchestrating MCP agent systems for development and experimental purposes. The whole stack — tool authoring, multi-agent orchestration, RAG, code execution — runs from a single static HTML file via WebAssembly. No backend.The bet: WASM is a hard sandbox for free. When you generate tools with an LLM (or write them by hand), the studio AST-validates the source, registers it lazily, and JIT-compiles into Pyodide on first call. SQL tools run in DuckDB-

Google Plans to Invest Up to $40 Billion in Anthropic

Google will invest $10 billion in Anthropic PBC, with another $30 billion potentially to follow, strengthening the ...

Straight out of sci-fi: Anthropic makes Trump blink in AI showdown

This is especially so because this week in the Trump administration-Anthropic showdown — after Amodei met with White House ...

Spotify isn’t the only service now integrated with Anthropic’s Claude

This week, Spotify released new AI features that work inside Anthropic’s Claude app. Spotify isn’t the only service with ...

Anthropic says Claude Code did get worse — but shoots down speculation it 'nerfed' the model

Anthropic said it found three issues with Claude Code after users complained the AI tool deteriorated.

With jaw-dropping $1 trillion valuation, Anthropic overtakes OpenAI in market cap race

Buyers scooping up coveted Anthropic shares have vaulted the AI giant’s valuation on some trading platforms to $1 trillion – ...

Amazon's Partnership With Anthropic Keeps Deepening. Is This the Catalyst Amazon Stock Needs?

The e-commerce and cloud computing giant's recent $25 billion investment plan in Anthropic captures its aggressive push into ...

Remodex Is the Best Codex Remote Client for iOS (Until OpenAI Releases an Official Codex Mobile App)

Various OpenAI employees and members of the Codex team have been hinting at a native Codex app for iOS lately. While I very ...

OpenAI brings GPT-5.5 to Codex for coding tasks

OpenAI is rolling out GPT-5.5 in Codex, with a 400K context window and higher coding benchmark scores than GPT-5.4.

OpenAI GPT-5.5 Codex deployment on NVIDIA systems reduces costs by 35x

Nvidia says OpenAI's Codex can deliver serious efficiency and cost savings, making enterprise AI viable at scale.

Nvidia deploys OpenAI's Codex across 10,000 employees as Jensen Huang hails 'age of AI,' Sam Altman says 'it was awesome'

Nvidia Corp. NVDA has rolled out OpenAI's Codex coding agent across its global workforce, with CEO Jensen Huang hailing the ...

Microsoft CEO Satya Nadella says GPT-5.5 improves Copilot across GitHub, M365, Studio and Foundry

Microsoft CEO Satya Nadella says GPT-5.5 is improving Copilot experiences across GitHub, Microsoft 365, Copilot Studio and ...

OpenAI releases GPT-5.5 with advanced math, coding capabilities

OpenAI says it has already put GPT-5.5’s coding skills to use internally. The LLM helped optimize the software that manages ...

OpenAI Just Dropped GPT-5.5 and Says the New Model Intuits What You Need Before You Ask

In a pre-release briefing, Open AI president and cofounder Greg Brockman said that GPT-5.5 is “way more intuitive to use,” ...