GPT-5

OpenAI releases GPT-5.5, outperforming GPT-5.4 for paid ChatGPT users

OpenAI has launched GPT-5.5, outperforming GPT-5.4 and available to paid ChatGPT users. The release locks in a 100% YES ...

New AGI benchmark reveals shocking gaps: Why leading AI models like GPT-4, Claude & Gemini struggled

Discover the latest breakthrough in Artificial General Intelligence testing as we explore a newly released AGI benchmark that ...

Show HN: Minimal Linux sandboxes to manage AI-Generated Code with ease

Minimal Linux sandboxes for running untrusted code. Built for AI agents, build systems, and any scenario where you need to execute code you didn't write.

Show HN: Need Human Lawyer – when AI for legal work isn't enough

This idea came from real life. I was doing fine being my own lawyer but I reached that point where I really needed to cc someone to make my point. There's also the case of when you just get too deep into legal issues and really do need a human with a law degree helping. The idea is anyone can send an email or cc an existing thread to:help@needhumanlawyer.comThis will open a request file and reply-all to everyone on the email. Not quite the same as cc perry.mason@famous-firm.com but still, i

Show HN: Lightport – open-source AI gateway

Hey HN!I am the founder of Glama.We are making Lightport open-source – it's the AI gateway that's been powering Glama.GitHub: https://github.com/glama-ai/lightportLive: https://glama.ai/ai/gatewayWhy?We're going all-in on the MCP ecosystem – it's what we're best at. Open-sourcing the gateway is both a thank-you to the community that helped us grow and a way to keep us focused.The short backstory:Lightport began as a fork of Portkey

Show HN: Modern alternative to Google Dictionary, AI-powered and context-aware

I kept losing my reading flow every time I hit an unfamiliar word. The usual fix: open a new tab, search, scroll past ads, come back. Costs about 30 seconds of focus each time. Multiply that by 10 lookups in one article and it adds up fast.Google Dictionary extension solved the tab-switching problem but never went further than static definitions. I wanted something smarter.So I built QuickDef, a Chrome extension that sends the surrounding sentence to GPT-4o-mini alongside the word, so the defini

Show HN: Codex context bloat? 87% avg reduction on SWE-bench Verified traces

If you had to build a context window manager in 24h, would you stick to the existing model or come up with something better?Here's what I did:1. Built a proxy that intercepts Codex's calls to OpenAI and rewrites them on the fly.2. Replayed 3,807 rounds of SWE-bench Verified traces through it: avg prompt 44k → 6k tokens (-87%).3. Posted it to HN to get the next reduction applied to my confidence interval — starting with the inevitable "How about accuracy?"npx -y pando-proxy ·

Show HN: A CLI to use any model in your coding agent

Hi everyone, I've been working on a CLI tool that can help to easily run any model in claude, Codex, Gemini, Pi, and OpenCode.It's also an API keys manager, supports multiple providers or OpenAI/Claude/Gemini accounts. You can add openrouter, poe, Vercel AI gateways etc.It has a built-in provider that is free to all, which is using Deepseek-V4, no login or API key required, add your own when you're ready.After installation you can try claude instantly (No config, no logi

Show HN: VT Code – Rust TUI coding agent with multi-provider support

Hi HN, I built VT Code, a semantic coding agent. Supports all SOTA and open sources model. Anthropic, OpenAI, Gemini, Codex. Agent Skills, Model Context Protocol and Agent Client Protocol (ACP) ready. All open source models are support. Local inference via LM Studio and Ollama (experiment). Semantic context understanding is supported by ast-grep for structured code search and ripgrep for powered grep.I built VT Code in Rust on Ratatui. Architecture and agent loop documented in the README and Dee

Ask HN: How are you evaluating AI apps and CLI?

I'm sure many of you work for companies where various AI tools are being made available and IT departments asking for feedback on those tools. The IT departments are allocating in some cases unlimited budget in the hopes that something comes out as a winner and sticks out eventually...For example the models from Anthropic, OpenAI, Google etc. can be accessed via: - IDE integration, e.g. VS Code, JetBrains etc. - Dedicated apps and CLIs, e.g. Codex, Claude, Copilot CLI etc.It's already

Ask HN: Is the ongoing AI research driving LLM models to be better?

I'm just a curious hobbyist that has ran LLM models locally and follow a lot of content about it. Hope we have a few AI researchers here on HN to clarify this.When using Opus or Codex vs. a chinese or Open source model, it feels like its reasoning capabilities are basically the same.The difference is typically in coding. It looks like OpenAI and Anthropic invest a lot in pre-training (paying Mercor and the like).Also a lot in creating synthetic data, I believe this has bigger AI research in

Ask HN: What does your agentic software dark factory look like?

In some of the comment threads around here a few of you shared interesting ideas and patterns, enough that I believe everyone interesting in harness engineering is working on some sort of software dark factory or another.We have OpenAI’s Symphony[1], StrongDM’s Factory[2], Yegge’s GasTown[3], and probably a few others I’ve missed.So I’m curious. What have you been working on? What have learned? What has worked and what has failed? And what do you think comes after?I’ll go first. The first thing

Show HN: Agent MCP Studio – build multi-agent MCP systems in a browser tab

I built a browser-only studio for designing and orchestrating MCP agent systems for development and experimental purposes. The whole stack — tool authoring, multi-agent orchestration, RAG, code execution — runs from a single static HTML file via WebAssembly. No backend.The bet: WASM is a hard sandbox for free. When you generate tools with an LLM (or write them by hand), the studio AST-validates the source, registers it lazily, and JIT-compiles into Pyodide on first call. SQL tools run in DuckDB-

Straight out of sci-fi: Anthropic makes Trump blink in AI showdown

This is especially so because this week in the Trump administration-Anthropic showdown — after Amodei met with White House ...

Google Plans to Invest Up to $40 Billion in Anthropic

Google will invest $10 billion in Anthropic PBC, with another $30 billion potentially to follow, strengthening the ...

Spotify isn’t the only service now integrated with Anthropic’s Claude

This week, Spotify released new AI features that work inside Anthropic’s Claude app. Spotify isn’t the only service with ...

Anthropic says Claude Code did get worse — but shoots down speculation it 'nerfed' the model

Anthropic said it found three issues with Claude Code after users complained the AI tool deteriorated.

Amazon's Partnership With Anthropic Keeps Deepening. Is This the Catalyst Amazon Stock Needs?

The e-commerce and cloud computing giant's recent $25 billion investment plan in Anthropic captures its aggressive push into ...

With jaw-dropping $1 trillion valuation, Anthropic overtakes OpenAI in market cap race

Buyers scooping up coveted Anthropic shares have vaulted the AI giant’s valuation on some trading platforms to $1 trillion – ...

Remodex Is the Best Codex Remote Client for iOS (Until OpenAI Releases an Official Codex Mobile App)

Various OpenAI employees and members of the Codex team have been hinting at a native Codex app for iOS lately. While I very ...