GPT-5.3 Instant Improves Query Intent Detection in Search
OpenAI's ChatGPT 5.3 Instant web search now avoids abrupt tone shifts; in a biking weather example it includes snowpack details, improving planning clarity.
OpenAI's ChatGPT 5.3 Instant web search now avoids abrupt tone shifts; in a biking weather example it includes snowpack details, improving planning clarity.
GPT-4o and other older models in ChatGPT, shifts focus to GPT-5.2, and launches GPT-5.3-Codex-Spark for real-time coding ...
This week's second new model from OpenAI is built for more complex tasks than GPT-5.3 Instant.
Regarding: https://arxiv.org/abs/2602.05192IntroductionThe First Proof paper (Abouzaid et al., 2026) aims to evaluate AI capabilities through a set of research-level mathematical problems. While the mathematical content of the questions is not in dispute, the experimental design suffers from significant methodological gaps that undermine the authors' primary conclusions. Specifically, the paper conflates binary outcomes with processual states, lacks independent verificat
I work in competitive intelligence. Needed to track competitor releases, publications, regulatory changes.Started with Make.com. Built ~15 scenarios: pull sources, filter, summarize with GPT, email results. It worked. Until it became my second job. Scenarios broke silently. Only I could fix them. Every new tracking need meant another afternoon building another fragile workflow.Then during a major industry event, all hands were on deck and the automations were sitting broken. Our CEO walked into
Long time lurker, many accounts, one at a time, no abuse. Hi. Yesterday's recount about layer duplication and adjustment for popular open weight models on huggingface, led to this submission. Since GPT ~3.5 it has been apparent that computers can simulate human, as far as a computer is concerned. The dead-internet theory actually originated circa 2012, but I've had difficulty finding verification, including searching the archive.org . All this turmoil makes offline on prem so important
I've been deep-diving into diffusion language models this week and I think this is the most underrated direction in AI right now.The core issue with autoregressive LLMs:Every major model today (GPT, Claude, Gemini) generates one token at a time, left to right. Each token depends on the previous one. This single architectural constraint has shaped the entire AI industry:- Models can't revise what they already wrote → we build chain-of-thought, reflection, and multi-pass reasoning to for
I built agent-triage - a CLI that automates diagnosing AI agent failures in production.I was spending way too much time staring at traces, logs and dashboards trying to figure out why my multi-agent setups kept failing.You just point it at your traces (LangSmith, Langfuse, OpenTelemetry, or a JSON file). It pulls the system prompts directly from the logs, extracts the behavioral rules, and uses an LLM-as-a-judge to replay each conversation step-by-step.It flags exactly which turn broke things, w
Hey HN,I run a recruiting AI startup, and the thing that keeps blowing my mind is how much money companies dump into sourcing tools, ATS platforms, employer branding — then turn around and publish a job description that reads like it was written by a committee in 2014.We kept seeing the same patterns. "Competitive salary" (translation: we don't want to tell you). "Fast-paced environment" repeated four times (translation: we're disorganized). Forty-seven bullet point
I've been building AI agent tooling and kept running into the same problem: agents browse the web, take actions, fill out forms, scrape data -- and there's zero proof of what actually happened. Screenshots can be faked. Logs can be edited. If something goes wrong, you're left pointing fingers at a black box.So I built Conduit. It's a headless browser (Playwright under the hood) that records every action into a SHA-256 hash chain and signs the result with Ed25519. Each action
Hi HN,I've been building Slate for the past few months and just open-sourced it. It's a native macOS app that puts AI chat and web browsing in the same window.The idea came from how I actually use AI day to day. I'd ask Claude or GPT something, get a bunch of links or recommendations, then cmd-tab to a browser, open tabs, lose context, and go back and forth. It felt broken. I wanted the browser inside the AI conversation, not the other way around. So Slate is an AI workspace first
I built this because Claude Code loads every rule and skill into context on every prompt. With 50+ rules and skills installed, you're burning tokens on Docker best practices while writing a commit message.ai-nexus runs a hook before Claude starts — it picks 2-3 relevant rules and skills via keyword matching (free) or GPT-4o-mini (~$0.50/mo), and physically hides the rest. Claude doesn't even know they exist.An ETH Zurich study (https://arxiv.org/pdf/2602.11988)
Hi, Ted here, creator of Mog.- Mog is a statically typed, compiled, embedded language (think statically typed Lua) designed to be written by LLMs -- the full spec fits in 3,200 tokens. - An AI agent writes a Mog program, compiles it, and dynamically loads it as a plugin, script, or hook. - The host controls exactly which functions a Mog program can call (capability-based permissions), so permissions propagate from agent to agent-written code. - Compiled to native code for low-latency plugin exec
Hey HN, We just open-sourced Styx — an AI gateway that sits between your app and AI providers (OpenAI, Anthropic, Google, Mistral). One endpoint, any model, self-hosted. What makes it different from LiteLLM or OpenRouter:styx:auto — send "model": "styx:auto" and the gateway picks the right model based on prompt complexity. Simple questions go to cheap models ($0.15/1M tokens), complex code goes to frontier models. 9-signal classifier, zero config. MCP-native — first gate
Hey Alessio, here. I built Polpo because AI agents are great at coding — and terrible at finishing real work on their own.The problem: you open Claude Code, give it a task, it does 80%. You fix the other 20%, open another chat for the next piece, copy context, retry when it drifts. Before you know it you're a full-time AI babysitter — 4 monitors, 12 terminals, zero confidence anything actually ships.Polpo fixes this. You build an AI company: hire agents, give them roles, skills, and credent
I've been experimenting with OpenClaw agents that call hardware tools.The initial goal was getting a local agent to solve a small maze using some benchtop hardware. The agent observes the maze through a webcam, decides its next move, and calls a hardware tool to move.When something goes wrong, it's hard to understand why. You usually end up staring at a huge JSON log of prompts, tool calls, and responses.So I started building a trace harness and an openclaw specific shim to capture str
Simple CLI tool, one Python file, no setup. Point it at a repo and it finds leaked API keys (OpenAI, Anthropic, AWS, GitHub, Stripe, etc.) and gives you the direct link to revoke each one.<p><pre><code> Built it because I kept generating code with AI assistants and worrying about keys ending up in the wrong place. Its offbrand TruffleHog.</code></pre>
LLM driven stacks by Anthropic and OpenAI are aiming for a monoploy on labor replacement by driving Claude Code, Codex development at rates never seen before, there would be likely a reordering of what SWEs do in the near future(1-3 years).What's your futuristic version of how this would turn out? Try justifying your answer e.g. by citing previous re-organization of labor during such upheavals, applying economic/market theory or precedent.My favorite one(right now) is: As traditional S
Hey HN,I built UnifyRoute because I kept running into the same problem: rate limits, quota exhaustion, and provider outages were breaking my LLM-powered apps at the worst times.UnifyRoute is a self-hosted gateway that sits in front of your LLM providers (OpenAI, Anthropic, etc.) and handles routing, failover, and quota management automatically — with a fully OpenAI-compatible API, so you don't change a single line of your existing code.What it does: - Drop-in OpenAI-compatible API (&#x
I built VectorLens because I was tired of "log file archaeology" every time my RAG pipeline hallucinated. Usually, when an LLM gives a wrong answer, you're stuck guessing which retrieved chunk misled it—or why the right chunk was ignored.Existing observability tools either require a cloud signup, an enterprise contract, or heavy manual instrumentation of your code. I wanted something that stayed local and just worked.The Solution: Three lines of codePython import vectorlens vector