GPT-5

Show HN: SoMatic – Vision-based OS automation framework for AI agents

Hi HN, I'm Smyan and I enjoy building agents. Modern multimodal LLMs are great at vision and perception but are quite poor at localization. This naturally creates a massive problem when we try to take our RPA frameworks and give them to agents to perform computer use tasks.For browsers, we have been able to solve this by using the DOM tree to supply the LLM with structural hints and now more recently modern browser use frameworks use Set-Of-Marks prompting which take the structural informat

Show HN: Agent-estimate, how long a coding task takes, at agent speed

I have used Codex & Claude Code for coding for a while, but how long a coding task will actually take? When I ask Claude Code to estimate, the result is often from training data, which is based on human speed. That’s why I built this tool, to estimate effort in ai agent speed. I run it every morning before I dispatch coding tasks to my agents.What's in it: task sizing: auto-classifies XS to XL from the description, then runs PERT on that tier human-equivalent comparison: a per-task-type

Why does it look like LLMs consistently overestimate implementation time?

I have my suspicion: they estimate how long people would have taken to implement some feature, becasue they were trained on such data. I consistently see estimates of 2 week/3 weeks or 5 days, etc. But then implementation takes a day or 2 max using agents within Claude/GPT. Unless I am missing something? Anybody else notice this?

Show HN: OpenRig – a control plane for multi-agent coding topologies

Hi HN, I’m Mike, the founder of OpenRig.I built this because my Claude Code + Codex setup kept forming little "topologies" of long-lived agents that worked well together, but the terminal sprawl was intense. So I built a primitive the agents could intuitively reach for to save and recreate these setups on the fly. This then led to more agent-first primitives like coordination, declarative workflow patterns, workspaces, etc.Several months in and these "rigs" I manage with open

Tell HN: Google slightly changed its wordmark logo

Google is doing some A&#x2F;B testing where in their main site,[0] it will show a slightly modified version of the 2015 logo, as shown in the following Wikipedia article.[1]<p>You can try this for yourself by opening a private&#x2F;incognito window, then visiting the main Google search home page.[0] Close and reopen it again until the logo changes.<p>[0]: https:&#x2F;&#x2F;www.google.com&#x2F;<p>[1]: https:&#x2F;&#x2F;en.wikipedia.org&#x2F;wiki&#x2F;Google_logo

Show HN: CoreMem – Portable context for AI agents

CoreMem lets you build collections of context, called a mem, and share it with any AI agent via URL, a Chrome extension, MCP, Cursor&#x2F;VS Code plugins, a skill, and more. Instead of re-explaining your project or goal when you switch agents or start new sessions, CoreMem keeps your context centrally organized so that any AI tool can read it.This originally started as a CLI I built that kept pieces of context (Project A&#x2F;B&#x2F;C details, my writing style, preferred tech stacks, coding styl

Ask HN: Do people lie about why they hate AI writing on social media?

I think the following might be the case:* people do not distinguish between AI writing based on the human&#x27;s ideas vs AI writing based on the AI&#x27;s ideas* they do this intentionally as a way to denigrate all AI writing — even when the content is good&#x2F;interesting* and the reason they do all this is to delay their unemployment as a result of AISo in the end, they use social media as a way to make people think all AI writing is bad because this is how they are trying to delay their une

OpenAI built a plugin for their biggest competitor's coding tool Claude Code

OpenAI built a plugin for their biggest competitor's coding tool 👀 The Codex plugin drops OpenAI's agent directly inside Claude Code: Claude writes, Codex reviews, and an adversarial mode literally tries to break your logic. Two AI agents checking each other in real time. Setup is 4 commands 👇 #ClaudeCode #Codex #AICoding #VibeCoding #AIEngineering

Show HN: Free One-shot cloud agents with OpenCode and Daytona and Cloudflare

Hi HN! Outside of the hackernews bubble we often find engineers who are barely using AI (aka using microsoft copilot) and we needed an easy way to show the latest capabilities in a non confusing UI.So we dumbed down our product to a simple text box UI where you one-shot your feature and you get an email with a link to a PR in github. The backend is hosted in Cloudflare, spinning sandboxes in Daytona that run the Opencode harness.Feel free to give it a try or share it with people who are skeptica

Codex got better, codex might be built with Claude Opus

Very suspicious with openai codex getting better, I wonder if codex teams use Claude opus to build codex. Anyone engineer from openai who can confirm…

Anthropic Just Bought a Developer Tool Used by OpenAI, Google

Anthropic acquired SDK startup Stainless, signaling a deeper push into developer tooling as AI labs compete beyond model ...

Anthropic Buys Stainless To Cut Off OpenAI And Google SDK Access

Anthropic acquired Stainless, the SDK toolmaker behind OpenAI and Google, then shut the hosted products down for rivals. Inside the agentic AI infrastructure play.

Anthropic enhances Claude Managed Agents with two new privacy and security features

Anthropic is introducing two new features for Claude Managed Agents that give users more control over the security and ...

OpenAI Co-Founder Andrej Karpathy Joins Anthropic in AI Research Shakeup

Andrej Karpathy, OpenAI co-founder and former AI director with Tesla, recently joined Anthropic to build a team focused on ...

Anthropic still won’t hand over its ‘Mythos’ cyber model — even as OpenAI agrees to give the EU a locked-down version of GPT-5.5

An AI model built by Anthropic just did something no previous system had managed in UK government testing: it autonomously hacked its way through a simulated corporate network in 32 sequential steps, ...

OpenAI Cofounder Andrej Karpathy Joins Anthropic as Sam Altman’s Fortunes Turn

Anthropic was founded by OpenAI exiles. It’s adding one more. On Tuesday, former OpenAI co-founder Andrej Karpathy announced ...

Perplexity AI Blames Promo Code Abuse for Tighter Usage Restrictions

Perplexity AI users say advanced AI limits are ending faster after the company tightened restrictions linked to promotional subscription misuse.

Reconnecting. – – 5/5 why don't they fix codex

it&#x27;s been a month tagging sama openai tibo on X for this issue<p>and no one seem to reply<p>and eveyone is falttering codex, im sure im not the only one facing this<p>i switched to codex from claude since it was better consume less credit than claude and after a while im starting to see this<p>i sure i will switch to claude the min they release opus 4.8

Show HN: Forge – Guardrails take an 8B model from 53% to 99% on agentic tasks

Hi HN, I&#x27;m Antoine Zambelli, AI Director at Texas Instruments.I built Forge, an open-source reliability layer for self-hosted LLM tool-calling.What it does:- Adds domain-and-tool-agnostic guardrails (retry nudges, step enforcement, error recovery, VRAM-aware context management) to local models running on consumer hardware- Takes an 8B model from ~53% to ~99% on multi-step agentic workflows without changing the model - just the system around it- Ships with an eval harness and interactive das

Sieve – scans Cursor/Claude chat history for leaked API keys

Background: I was using Cursor to set up an OpenAI integration.The agent read my .env file, added the key to the config, and everything worked. What I didn&#x27;t think about: that key was now sitting in a plaintext SQLite database at ~&#x2F;Library&#x2F;ApplicationSupport&#x2F;Cursor&#x2F;User&#x2F;workspaceStorage&#x2F;..AI coding tools (Cursor, Claude Code, Copilot, Cline) routinely read .env files as part of normal operation. Every secret they touch gets embedded in their local transcript&#x