GPT-5
Show HN: Open-Source GTM Skills for Claude Code, Codex, and Cursor
We built an open-source library of 125 GTM (go-to-market) skills that plug into AI coding agents like Claude Code, Codex, and Cursor.With these skills an AI agent can automatically:- Find ICP leads from conference speakers, LinkedIn activity, or job boards- Generate personalized cold email sequences- Monitor competitor blogs, pricing pages, and hiring signals- Generate programmatic SEO pages from keyword lists- Track where your brand appears in ChatGPT, Perplexity, and Claude answers---How skill
ClawMemory – Git for AI agent memory (forkable memory for AI agents)
Hi HN,I built ClawMemory because my AI agent kept waking up with amnesia.The problem isn't the model. GPT-4, Claude, and other LLMs are stateless by design, and that's fine. The real problem is the agent layer. Every time an agent framework starts a new session (like in OpenClaw), the agent forgets everything that happened before. The architectural decisions we made, the experiments that failed, the context that took hours to build—all gone. The model is smart, but the agent behaves li
Should Sam Altman fear token compression?
AI bills are exploding despite costs dropping 10x every 18 months because every saving gets reinvested into more usage.BUT:
- Today's prices are subsidized (OpenAI losing $14B in 2026, Anthropic burning $12B in one quarter, Cursor alleges that a $200/month Claude Code subscription could cost up to $5,000 in compute.)
- When subsidies end, prices reverse into a 10x larger consumption base
- That's the double squeeze nobody is modeling for
Companies building token efficiency now wo
Show HN: LogClaw – Open-source AI SRE that auto-creates tickets from logs
Hi HN, I'm Robel. I built LogClaw because I was tired of paying for Datadog and still waking up to pages that said "something is wrong" with no context.LogClaw is an open-source log intelligence platform that runs on Kubernetes. It ingests logs via OpenTelemetry and detects anomalies using signal-based composite scoring — not simple threshold alerting. The system extracts 8 failure-type signals (OOM, crashes, resource exhaustion, dependency failures, DB deadlocks, timeouts, connec
Show HN: Ava – AI Voice Agent for Traditional Phone Systems(Python+Asterisk/ARI)
Hi HN, I'm the creator of AVA - AI Voice Agent for AsteriskMy repo was shared here once before by someone else so I wanted to follow up with the progress since then.https://news.ycombinator.com/item?id=46380399I've been working with Asterisk/FreePBX systems for years. I wanted to add AI voice capabilities to legacy phone systems without paying per-minute SaaS fees or ripping out the entire telephony stack.So I built AVA, a self-hosted AI voice agent that can integra
Launch HN: IonRouter (YC W26) – High-throughput, low-cost inference
Hey HN — I’m Veer and my cofounder is Suryaa. We're building Cumulus Labs (YC W26), and we're releasing our latest product IonRouter (https://ionrouter.io/), an inference API for open-source and fine tuned models. You swap in our base URL, keep your existing OpenAI client code, and get access to any model (open source or finetuned to you) running on our own inference engine.The problem we kept running into: every inference provider is either fast-but-expensive (Together,
Show HN: Codex Symphony – bootstrap OpenAI Symphony and Linear in any repo
I wanted a simpler way to use OpenAI Symphony locally.The recurring friction points for me were:
- setting up Linear correctly
- creating a reusable workflow file
- bootstrapping repo scripts
- restarting Symphony cleanly after reopening Codex
- keeping the setup portable across machinesSo I made a small public bootstrap package called Codex Symphony.It installs:
- WORKFLOW.symphony.md
- scripts/symphony/start-local.sh
- scripts/symphony/start-background.sh
CostRouter – Cut AI API costs 60% by routing to the cheapest capable model
Hey HN,I built CostRouter because I noticed 70-80% of our AI API calls didn't need GPT-4o/5. Simple text extraction, basic Q&A, formatting — all going to the most expensive model.CostRouter is an API gateway that scores each request's complexity (0-100) and routes it to the cheapest model that can handle it:- Simple queries → Llama 4 Scout ($0.0001/1K tokens)
- Medium → Gemini 3 Flash ($0.0005/1K tokens)
- Complex reasoning → stays on GPT-5.2 or Claude OpusIntegrat
GPT-5.3 Instant Improves Query Intent Detection in Search
OpenAI's ChatGPT 5.3 Instant web search now avoids abrupt tone shifts; in a biking weather example it includes snowpack details, improving planning clarity.
OpenAI to discontinue GPT-5, GPT-4o and other models today: What changes
GPT-4o and other older models in ChatGPT, shifts focus to GPT-5.2, and launches GPT-5.3-Codex-Spark for real-time coding ...
With GPT-5.4, OpenAI promises fewer errors, preps for autonomous agents
This week's second new model from OpenAI is built for more complex tasks than GPT-5.3 Instant.
A Methodological Critique of "First Proof" (Abouzaid et al., 2026)
Regarding: https://arxiv.org/abs/2602.05192IntroductionThe First Proof paper (Abouzaid et al., 2026) aims to evaluate AI capabilities through a set of research-level mathematical problems. While the mathematical content of the questions is not in dispute, the experimental design suffers from significant methodological gaps that undermine the authors' primary conclusions. Specifically, the paper conflates binary outcomes with processual states, lacks independent verificat
Hacker Infrastructure
Long time lurker, many accounts, one at a time, no abuse. Hi. Yesterday's recount about layer duplication and adjustment for popular open weight models on huggingface, led to this submission.
Since GPT ~3.5 it has been apparent that computers can simulate human, as far as a computer is concerned. The dead-internet theory actually originated circa 2012, but I've had difficulty finding verification, including searching the archive.org .
All this turmoil makes offline on prem so important
Summry – I replaced my mess of Make.com automations with this
I work in competitive intelligence. Needed to track competitor releases, publications, regulatory changes.Started with Make.com. Built ~15 scenarios: pull sources, filter, summarize with GPT, email results. It worked. Until it became my second job. Scenarios broke silently. Only I could fix them. Every new tracking need meant another afternoon building another fragile workflow.Then during a major industry event, all hands were on deck and the automations were sitting broken. Our CEO walked into
Diffusion LLM may make most of the AI engineering stack obsolete
I've been deep-diving into diffusion language models this week and I think this is the most underrated direction in AI right now.The core issue with autoregressive LLMs:Every major model today (GPT, Claude, Gemini) generates one token at a time, left to right. Each token depends on the previous one. This single architectural constraint has shaped the entire AI industry:- Models can't revise what they already wrote → we build chain-of-thought, reflection, and multi-pass reasoning to for
Show HN: Agent-triage – diagnosis of agent failures from production traces
I built agent-triage - a CLI that automates diagnosing AI agent failures in production.I was spending way too much time staring at traces, logs and dashboards trying to figure out why my multi-agent setups kept failing.You just point it at your traces (LangSmith, Langfuse, OpenTelemetry, or a JSON file). It pulls the system prompts directly from the logs, extracts the behavioral rules, and uses an LLM-as-a-judge to replay each conversation step-by-step.It flags exactly which turn broke things, w
Show HN: JD Roast – Paste a job description, get it brutally roasted
Hey HN,I run a recruiting AI startup, and the thing that keeps blowing my mind is how
much money companies dump into sourcing tools, ATS platforms, employer branding —
then turn around and publish a job description that reads like it was written by
a committee in 2014.We kept seeing the same patterns. "Competitive salary" (translation: we don't
want to tell you). "Fast-paced environment" repeated four times (translation:
we're disorganized). Forty-seven bullet point
Show HN:Conduit–Headless browser with SHA-256 hash chain - Ed25519 audit trails
I've been building AI agent tooling and kept running into the same problem: agents browse the web, take actions, fill out forms, scrape data -- and there's zero proof of what actually happened. Screenshots can be faked. Logs can be edited. If something goes wrong, you're left pointing fingers at a black box.So I built Conduit. It's a headless browser (Playwright under the hood) that records every action into a SHA-256 hash chain and signs the result with Ed25519. Each action
Show HN: Slate – Open-source AI workspace with a built-in browser
Hi HN,I've been building Slate for the past few months and just open-sourced it. It's a native macOS app that puts AI chat and web browsing in the same window.The idea came from how I actually use AI day to day. I'd ask Claude or GPT something, get a bunch of links or recommendations, then cmd-tab to a browser, open tabs, lose context, and go back and forth. It felt broken. I wanted the browser inside the AI conversation, not the other way around.
So Slate is an AI workspace first
Show HN: AI-nexus – Only 2-3 rules and skills load per prompt in Claude Code
I built this because Claude Code loads every rule and skill into context on every prompt. With 50+ rules and skills installed, you're burning tokens on Docker best practices while writing a commit message.ai-nexus runs a hook before Claude starts — it picks 2-3 relevant rules and skills via keyword matching (free) or GPT-4o-mini (~$0.50/mo), and physically hides the rest. Claude doesn't even know they exist.An ETH Zurich study (https://arxiv.org/pdf/2602.11988)