GPT-5

OpenAI announces GPT-5.3 Instant with fewer refusals, improved accuracy and smoother conversations

OpenAI has introduced GPT-5.3 Instant, an update to its most-used ChatGPT model that aims to deliver smoother conversations, fewer unnecessary refusals, improved factual accuracy and better responses when using web information.

OpenAI launches ChatGPT GPT 5.3 instant with fewer refusals and improved accuracy

OpenAI has launched GPT-5.3 Instant for ChatGPT, promising fewer unnecessary refusals, improved contextual understanding and reduced hallucinations, as the company faces scrutiny over its Pentagon deal while rival Anthropic remains locked in a dispute with the US defence department.

GPT-5.3 Instant: 5 new upgrades that are important for you

OpenAI dropped a significant update to ChatGPT amidst the backlash they have been receiving for the DoW Deal. GPT-5.3 Instant – the new version of its most widely used everyday model – went live on March 3, and it's one of the more user-focused releases the company has made in a while. Forget benchmark scores:

OpenAI launches GPT-5.3 Instant to improve ChatGPT’s most-used model

OpenAI’s GPT-5.3 Instant improves ChatGPT’s most-used model with fewer hallucinations, better web synthesis, and smoother conversations.

OpenAI Upgrades Prism With Codex CLI and GPT-5.3

OpenAI upgrades Prism with Codex CLI and GPT-5.3, letting researchers run code, compile LaTeX, and refine papers inside one ...

OpenAI releases GPT-5.3 Instant update to make ChatGPT less ‘cringe’

OpenAI has released an update to ChatGPT that it says should make its most commonly used model less “cringe” and more natural. Users should see fewer overly dramatic, jarring responses as a result.

With GPT-5.4, OpenAI promises fewer errors, preps for autonomous agents

This week's second new model from OpenAI is built for more complex tasks than GPT-5.3 Instant.

OpenAI’s new GPT-5.4 thinking lets you take the wheel mid-response

The post OpenAI’s New GPT-5.4 Thinking Lets You Take the Wheel Mid-Response appeared first on Android Headlines.

Validation pipeline that blocks AI-generated files with schema errors

Every time I used an LLM to generate structured knowledge files, the output would drift — wrong enum values, missing fields, dates in the wrong format, tags as strings instead of arrays. The files looked fine until something downstream broke: a Dataview query returning nothing, a CI check failing, a search index corrupting.The standard fix is post-hoc validation — check after writing, fix manually. That doesn't scale past a few dozen files.So I built a pipeline where the commit gate is the

Show HN: Meshcraft – Text-to-3D and image-to-3D with selectable AI engines

Hey HN, I built Meshcraft – a web-based tool that generates 3D models (GLB) from text prompts or images.What's new since the first Show HN (Feb): Back then it was a basic TripoSR wrapper. A commenter here (thanks vunderba) pointed me to Trellis 2, which was vastly better. Since then I've rebuilt the whole thing:- Two 3D engines: Standard (Trellis 2 via HuggingFace ZeroGPU) and Premium (Hunyuan v3.1 Pro via fal.ai). Standard is free, Premium costs 50 credits and produces ~1.4M face mode

Show HN: Claude-consensus – Multi-model code review plugin for Claude Code

It's a Claude Code plugin that runs multiple AI models (GPT, Gemini, Grok, Kimi, Qwen, etc.) in parallel for code review and planning, then converges them on consensus through structured rounds.Each model reviews independently with no visibility into what the others found. Then they synthesize, surface conflicts, and run convergence (approve / changes needed, max 2 rounds).Technically it's markdown command files orchestrating Claude Code's team system — no custom runtime, jus

Show HN: MultiPowerAI – Trust and accountability infrastructure for AI agents

Been shipping agent systems for a while and kept running into the same wall - once an agent's deployed, you're basically flying blind. No way to prove what it did, no automatic killswitch if it goes sideways, nothing.Built MultiPowerAI to fix that. The core stuff: cryptographic identity per agent, behavioral circuit breakers that auto-suspend if something looks off, human approval queues before high-stakes actions, and a full audit trail so every action is signed and timestamped.Also t

Ask HN: Any AI browswer that I can control by Claude Code?

The only missing part of using cluade code is control the broswer in case that need login,like linkedin, twitter etc, current solution using broswer might still be risky. is there any service that feels like a perplexity comet or gpt atlas broswer with a claude code control?

OpenAI_Developers_-_The_Codex_app_is_now_on_Windows._Get_the_full_Codex_app_experience_o..._EjqNGy.m

Codex App now on Windows

Show HN: GPT-5.4 is interesting for one boring reason: fewer retries

Most model posts focus on benchmarks, but the thing I care about is simpler: does it actually cut retries on real work. Curious whether others are seeing that with GPT-5.4, especially on coding and longer tool heavy tasks...

Show HN: LiberClaw, deploy AI agents that run 24/7 on their own VMs

LiberClaw is an open-source platform for deploying AI agents that run around the clock on dedicated virtual machines. You define what an agent does with markdown skills file, deploy it, and it keeps working whether you're at your desk or not. The agent system code is on GitHub: https://github.com/Libertai/liberclaw-agentThere are 61 agents running on the platform right now across 578 conversations, with 99.7% uptime. Each agent gets its own VM with its own filesystem, da

Show HN: DocMCP – Index any docs site locally, search it from Claude via MCP

Kept running into the same problem while coding with Claude: library docs are either outdated in its training data or I'm copy-pasting pages into the chat. Built DocMCP – an MCP server that crawls documentation sites, chunks and indexes them locally in SQLite, and exposes a search tool Claude can call directly. The interesting bit is the search: pure keyword search misses semantic queries ("how to make elements wrap" → flex-wrap), and pure vector search misses exact API names. So

Show HN: SafeAppeals – Cursor for Documents

SafeAppeals is an AI-powered document workspace designed for legal professionals, researchers, injured workers, and anyone handling document-heavy workflows. Built with Electron and TypeScript, it combines a powerful code editor interface with native support for DOCX, PDF, Excel, and Markdown files.Key features include: - Integrated AI agents powered by Claude, OpenAI, and Google APIs for document analysis and generation - Native document editing without conversion to plaintext - DocuSign integr

Show HN: Stoneforge – Open-source orchestration for parallel AI coding agents

I built this because I was running 3-5 Claude Code instances on the same repo and burning out from constantly context switching between terminal windows. Carefully making sure agents' work didn't overlap, preventing context windows from degrading, manually enforcing documentation/memory policies, and re-explaining decisions across sessions.Stoneforge is the coordination layer I wanted. A Director agent breaks goals into tasks. A dispatch daemon assigns them to workers when availab

Show HN: Evalcraft – cassette-based testing for AI agents (pytest, $0/run)

Testing AI agents is painful. Every test run calls the LLM API, costs real money, takes minutes, and gives different results each time. CI? Forget about it.Evalcraft fixes this with cassette-based capture and replay — think VCR for HTTP, but for LLM calls and tool use.How it works:1. Run your agent once with real API calls. Evalcraft records every LLM request, tool call, and response into a JSON cassette file.2. In tests, replay from the cassette. Zero API calls, zero cost, deterministic output.