GPT-5

Claude AI integrates with major creative software via new connectors

Anthropic has launched nine new connectors allowing its Claude AI assistant to work directly inside popular creative tools like Photoshop, Blender, and Autodesk Fusion. The integrations, powered by ...

Claude AI Goes Down for Thousands of Users Tuesday, Downdetector Shows

Learn about the recent Claude AI outage affecting users. Read more on the status and issues reported with Claude services.

OpenAI’s GPT-5.3-Codex Wants to be More than a Coding Copilot

OpenAI is pitching GPT-5.3-Codex as a long-running “agent,” not just a code helper: The company says the model combines GPT-5.2-Codex coding strength with GPT-5.2 reasoning and professional knowledge, ...

A GPT-5.4 bug led to OpenAI banning goblins and raccoons

Someone found this in OpenAI Codex’s system prompt:"Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously relevant to the user’s query."Goblins, gremlins, trolls, ogres - the fantasy quartet, fine. I get it. Perhaps someone asked Codex to write something and got back a Runescape goblin lore deep-dive.But pigeons? Raccoons? These are deployment-environment animals.If not for raccoons, who’d do the ga

Help a fellow dev on AI-localization?

We built an AI-based localization pipeline for our software product (HR domain) and would love feedback/ suggestions from others working in production MT/localization, so that we can learn and improve.Current methodology:GPT-5-nano forward translation + back-translationtext-embedding-3-small cosine similarity on source vs. back-translated text.Threshold: ≥0.92 = auto-approvedOn a recent ~970-string Spanish localization run:~75% of strings passed automaticallyWe then had two human tran

Show HN: A new benchmark for testing LLMs for deterministic outputs

When building workflows that rely on LLMs, we commonly use structured output for programmatic use cases like converting an invoice into rows or meeting transcripts into tickets or even complex PDFs into database entries.The model may return the schema you want, but with hallucinated values like `invoice_date` being off by 2 months or the transcript array ordered wrongly. The JSON is valid, but the values are not.Structured output today is a big part of using LLMs, especially when building determ

Why Codex works better than Claude Code for my production monolith

Over the last year I mostly used Codex, but during the last month I tried Claude Code with Opus 4.6 and 4.7. These are my notes.This is not a benchmark. It is just my experience from daily use on one production codebase. For some medium-complexity tasks, I also ran both tools with the same prompts, but I did not try to make this a controlled evaluation.TL;DR: for my production Python monolith, I still prefer Codex.The codebase is a many-years-old Python backend. It has several architectural laye

OpenAI releases GPT-5.5, outperforming GPT-5.4 for paid ChatGPT users

OpenAI has launched GPT-5.5, outperforming GPT-5.4 and available to paid ChatGPT users. The release locks in a 100% YES ...

New AGI benchmark reveals shocking gaps: Why leading AI models like GPT-4, Claude & Gemini struggled

Discover the latest breakthrough in Artificial General Intelligence testing as we explore a newly released AGI benchmark that ...

Show HN: Minimal Linux sandboxes to manage AI-Generated Code with ease

Minimal Linux sandboxes for running untrusted code. Built for AI agents, build systems, and any scenario where you need to execute code you didn't write.

Show HN: Need Human Lawyer – when AI for legal work isn't enough

This idea came from real life. I was doing fine being my own lawyer but I reached that point where I really needed to cc someone to make my point. There's also the case of when you just get too deep into legal issues and really do need a human with a law degree helping. The idea is anyone can send an email or cc an existing thread to:help@needhumanlawyer.comThis will open a request file and reply-all to everyone on the email. Not quite the same as cc perry.mason@famous-firm.com but still, i

Show HN: Lightport – open-source AI gateway

Hey HN!I am the founder of Glama.We are making Lightport open-source – it's the AI gateway that's been powering Glama.GitHub: https://github.com/glama-ai/lightportLive: https://glama.ai/ai/gatewayWhy?We're going all-in on the MCP ecosystem – it's what we're best at. Open-sourcing the gateway is both a thank-you to the community that helped us grow and a way to keep us focused.The short backstory:Lightport began as a fork of Portkey

Show HN: Modern alternative to Google Dictionary, AI-powered and context-aware

I kept losing my reading flow every time I hit an unfamiliar word. The usual fix: open a new tab, search, scroll past ads, come back. Costs about 30 seconds of focus each time. Multiply that by 10 lookups in one article and it adds up fast.Google Dictionary extension solved the tab-switching problem but never went further than static definitions. I wanted something smarter.So I built QuickDef, a Chrome extension that sends the surrounding sentence to GPT-4o-mini alongside the word, so the defini

Show HN: Codex context bloat? 87% avg reduction on SWE-bench Verified traces

If you had to build a context window manager in 24h, would you stick to the existing model or come up with something better?Here's what I did:1. Built a proxy that intercepts Codex's calls to OpenAI and rewrites them on the fly.2. Replayed 3,807 rounds of SWE-bench Verified traces through it: avg prompt 44k → 6k tokens (-87%).3. Posted it to HN to get the next reduction applied to my confidence interval — starting with the inevitable "How about accuracy?"npx -y pando-proxy ·

Show HN: A CLI to use any model in your coding agent

Hi everyone, I've been working on a CLI tool that can help to easily run any model in claude, Codex, Gemini, Pi, and OpenCode.It's also an API keys manager, supports multiple providers or OpenAI/Claude/Gemini accounts. You can add openrouter, poe, Vercel AI gateways etc.It has a built-in provider that is free to all, which is using Deepseek-V4, no login or API key required, add your own when you're ready.After installation you can try claude instantly (No config, no logi

Show HN: VT Code – Rust TUI coding agent with multi-provider support

Hi HN, I built VT Code, a semantic coding agent. Supports all SOTA and open sources model. Anthropic, OpenAI, Gemini, Codex. Agent Skills, Model Context Protocol and Agent Client Protocol (ACP) ready. All open source models are support. Local inference via LM Studio and Ollama (experiment). Semantic context understanding is supported by ast-grep for structured code search and ripgrep for powered grep.I built VT Code in Rust on Ratatui. Architecture and agent loop documented in the README and Dee

Ask HN: How are you evaluating AI apps and CLI?

I'm sure many of you work for companies where various AI tools are being made available and IT departments asking for feedback on those tools. The IT departments are allocating in some cases unlimited budget in the hopes that something comes out as a winner and sticks out eventually...For example the models from Anthropic, OpenAI, Google etc. can be accessed via: - IDE integration, e.g. VS Code, JetBrains etc. - Dedicated apps and CLIs, e.g. Codex, Claude, Copilot CLI etc.It's already

Ask HN: Is the ongoing AI research driving LLM models to be better?

I'm just a curious hobbyist that has ran LLM models locally and follow a lot of content about it. Hope we have a few AI researchers here on HN to clarify this.When using Opus or Codex vs. a chinese or Open source model, it feels like its reasoning capabilities are basically the same.The difference is typically in coding. It looks like OpenAI and Anthropic invest a lot in pre-training (paying Mercor and the like).Also a lot in creating synthetic data, I believe this has bigger AI research in

Ask HN: What does your agentic software dark factory look like?

In some of the comment threads around here a few of you shared interesting ideas and patterns, enough that I believe everyone interesting in harness engineering is working on some sort of software dark factory or another.We have OpenAI’s Symphony[1], StrongDM’s Factory[2], Yegge’s GasTown[3], and probably a few others I’ve missed.So I’m curious. What have you been working on? What have learned? What has worked and what has failed? And what do you think comes after?I’ll go first. The first thing

Show HN: Agent MCP Studio – build multi-agent MCP systems in a browser tab

I built a browser-only studio for designing and orchestrating MCP agent systems for development and experimental purposes. The whole stack — tool authoring, multi-agent orchestration, RAG, code execution — runs from a single static HTML file via WebAssembly. No backend.The bet: WASM is a hard sandbox for free. When you generate tools with an LLM (or write them by hand), the studio AST-validates the source, registers it lazily, and JIT-compiles into Pyodide on first call. SQL tools run in DuckDB-