GPT-5

OpenAI releases GPT-5.4 Thinking and Pro in ChatGPT: How is it different

OpenAI has rolled out GPT-5.4 (Thinking) and GPT-5.4 Pro across ChatGPT, its API and Codex, introducing improvements in ...

OpenAI releases GPT-5.3 Instant in ChatGPT: Here's how it's different

OpenAI says GPT-5.3 Instant improves conversational flow, trims excessive disclaimers and delivers more reliable answers ...

Show HN: Τ³-Bench is out – can agents handle complex docs and live calls?

τ-Bench is an open benchmark for evaluating AI agents on grounded, multi-turn customer service tasks with verifiable outcomes. It's been great to see the community adopt it since launch — this is now the third iteration. With τ³-Bench, we're extending it to two new settings: knowledge-intensive retrieval and full-duplex voice.τ-Knowledge: agents must navigate ~700 interconnected policy documents to complete multi-step tasks. Best frontier model (GPT-5.2, high reasoning) hits ~25%. The

Show HN: Robust LLM extractor for websites in TypeScript

We've been building data pipelines that scrape websites and extract structured data for a while now. If you've done this, you know the drill: you write CSS selectors, the site changes its layout, everything breaks at 2am, and you spend your morning rewriting parsers.LLMs seemed like the obvious fix — just throw the HTML at GPT and ask for JSON. Except in practice, it's more painful than that:- Raw HTML is full of nav bars, footers, and tracking junk that eats your token budget. A

Show HN: //Beforeyouship is a pre-build tool to estimate the LLM cost

For one of my projects, I needed to choose an LLM but got lost in numbers and tokenization. So I searched for a solution which could help me do the math, but only found tools that helped with cost management and optimization at the production stage. I did some research and found that this is an existing problem, especially if you are a vibe-coder or solo developer starting an AI-powered app from scratch.So I built an MVP to test with you guys — if any of you relate to the problem, please tell me

Show HN: Reading Tree, a weighted outline for articles instead of a summary

I built this for close reading, especially philosophy chapters, long essays, and dense nonfiction. AI summaries are useful in many cases, but sometimes the source is good enough that I want to read it properly, not just get the gist. Those are exactly the cases where a summary can leave out the parts I would care about most.Reading Tree keeps the original words in place. Every node links to the passage it covers, and every paragraph links back to the node that explains its role. Nodes are weight

Show HN: The Economics of Builder Saturation in Digital Markets

I formalised something most of us already feel but rarely say out loud: making things easier to build doesn't make things easier to succeed with.Personal version: I've vibe-coded maybe 15 projects since the beginning of this year. Two are still alive. At work, our teams built hundreds of custom GPTs and dashboards. Handful survived. The failure mode was never "couldn't build it" - it was "nobody had the bandwidth to care."So I wrote a paper about it. It combine

Show HN: Hollow – serverless web perception for AI agents

I wanted any AI Agent to access the web and perform tasks without costing me anything, which would normally require a browser running somewhere, but I didn’t want to run a headless browser or keep my laptop open. So I built an interface for agents that exists purely as a serverless function. The agent gets two primitives: ‘perceive’ and ‘act’.You POST a URL, get back a structured map, act on an element by ID. That's the whole interface. And it costs almost nothing, at approximately $0.00003

I tested GPT-5.4 Thinking, and it gave me great answers (until I dove deeper)

I wore the world's first HDR10 smart glasses TCL's new E Ink tablet beats the Remarkable and Kindle Anker's new charger is one of the most unique I've ever seen Best laptop cooling pads Best flip ...

OpenAI unveils GPT-5.4 mini and nano versions

These flavors of GPT-5.4 are designed to be fast and efficient for high-volume workloads, according to OpenAI, News.Az reports, citing foreign media. ...

Intercom's new post-trained Fin Apex 1.0 beats GPT-5.4 and Claude Sonnet 4.6 at customer service resolutions

Intercom plans to expand Fin beyond customer service into sales and marketing—positioning it as a direct competitor to ...

OpenAI introduces GPT‑5.4 mini and nano, faster and smarter small AI models

The GPT‑5.4 mini comes with improved performance across coding, reasoning, multimodal tasks, and tool use.

Show HN: Illustrative – AI pipeline that turns books into graphic novels

I like the idea of taking one thing and turning it into another—very much inspired by NotebookLM and wondered what it might take to generate full graphic novels, with consistent characters, narrative flow, story arc, etc.Developed a 7-pass scripting enrichment system (beat analysis, adaptation filtering, character deep dives) before generating any images.Dual backend: Google Gemini for scripting (2M context window) and either Gemini or OpenAI for image generation with 3-tier model fallback (comp

Show HN: Datetime-bench: which datetime formats LLMs get right (and wrong)

tl;dr* If you need an LLM to parse OR emit a timestamp, use: RFC 3339 ( e.g. 2024-03-26 10:30:00-05:00 ) * python date format also works well* Do NOT use unix epoch or javascript date formats.* Smaller models and non-reasoning models still make a LOT of mistakes in time parsing / formatting.---There are lots of temporal reasoning benchmarks (like TimeBench, TRAM, etc.) but they test whether models understand time concepts. Nothing on which datetime output format models get right most ofte

Show HN: Convene – Marketplace and management software for event organizers

We're software developers and farmers market/pop-up event organizers. In the past year we've run close to 200 events in NJ on Convene.markets - a booking engine, event management platform, and vendor marketplace for organizers to discover and connect with local makers, food purveyors, & farmers. We've spent the past few months opening up our platform for other organizers to now run their events and we'd love to connect with anyone running a farmers market, street fes

Anthropic's Record $1.5b Settlement Payment to Authors Nears Final Approval

The post Anthropic's Record $1.5b Settlement Payment to Authors Nears Final Approval appeared first on Android Headlines.

Feds say Hegseth tweet about Anthropic was not a final agency action

The federal government designated Anthropic a risk in February after the AI developer maintained its "Claude" model can't be ...

The Math Behind Anthropic’s Mad Revenue Growth

OpenAI and Anthropic’s remarkable revenue growth has invited scrutiny of how the AI startups are tallying the headline-making ...

‘Attempted corporate murder’: Judge calls on Anthropic and Department of War to explain dispute over supply chain risk

District Judge Rita Lin will issue a ruling on Anthropic’s legal challenge “in the next few days,” she said on Tuesday.

Judge says it looks like Pentagon was out to 'punish' Anthropic, not protect national security

Anthropic has sued to stop Defense Secretary Pete Hegseth from labeling the company a risk after it set limits on the use of ...