GPT-5

OpenAI's new GPT-5.3-Codex is 25% faster and goes way beyond coding now - what's new

GPT-5.3-Codex helped debug and deploy parts of itself. Codex can be steered mid-task without losing context. "Underspecified" prompts now produce richer, more usable results. OpenAI today announced ...

Show HN: Coding agents find the right GPU bottleneck 70% of the time, fix it 30%

One of the authors. Some things that surprised us while running these experiments:The tasks are pulled from real merged PRs in vLLM and SGLang, so there's a known-good human solution for each one. Agents get the full codebase, the issue description, and a test harness. Pretty generous setup.What we didn't expect: the agents are genuinely good at diagnosing the problem. They read the code, find the bottleneck, describe the right fix. But then the generated code has subtle bugs. Off-by-o

Show HN: How AI Content Automation Is Reshaping SaaS Marketing in 2025

Show HN: How AI Content Automation is Reshaping SaaS Marketing in 2025I've spent 5 years building SaaS and tracking how AI revolutionizes marketing. Here's what the data shows:KEY FINDINGS:- AI-integrated SaaS products grew 40% YoY (GitNux, 2026) - Companies using AI publish 3.2x more content than human-only teams - Cost per article dropped from $157 to $12-18 (AI-assisted) - Top-quartile SaaS allocate 65% of marketing budget to automation (up from 23% in 2022)WHERE AI WORKS BEST:High-

Show HN: StageWright – A performance-focused Playwright reporter with AI

Hi HN,I’m the creator of StageWright (and the open-source playwright-smart-reporter).I’ve been frustrated by the "black box" nature of E2E test failures. Standard reporters tell you that a test failed, but they don't help you understand why it’s failing across 50 different runs or whether its execution time is trending toward a regression.I built StageWright to treat test results as a performance and stability dataset.Key Technical Features:Historical Flakiness Detection: Unlike P

Show HN: OnGarde – Runtime content security proxy for self-hosted AI agents

Built this because I had heard some horror stories about companies leaking PII from high compliance environments to ChatGPT. I wanted something that would auto-filter any dangerous traffic between my AI agent and the LLM API without requiring code changes in the agent itself.The filtering list has expanded a bit to include PII, secret keys and I've started a prompt injection library thats being filtered on as well.The problem: self-hosted agent platforms (OpenClaw, Agent Zero, CrewAI) have

Show HN: The best agent orchestrator is a 500-line Markdown file

I’ve tried agent teams, subagents, multi-terminal setups, and several open-source orchestration frameworks. This Claude Code skill (~500 lines of Markdown, no framework, no dependencies) has outperformed all of them for my team’s daily workflow.It turns your session into a dispatcher that fans work out to background workers across any model (Claude, GPT, Gemini, Codex). Workers ask clarifying questions mid-task via filesystem IPC instead of silently failing. Meanwhile, your main session stays le

ChatGPT's Writing Style

We can all detect that particular style. I've just thought about what may define it.It writes in such a way that the human reading it assigns meaning to it. We're accustomed to reading the words of others and being generous with our interpretation. I think GPT takes advantage of this. What it says can be interpreted as correct and incisive, but if one were to be very strict and uncharitable then what it says can usually be interpreted as meaningless and vague.Considering it was trained

Show HN: Caddy plugin that charges AI crawlers real USDC to access your site

Hello, I built a Caddy middleware that implements the x402 protocol (by Coinbase) to charge AI crawlers real money for content access.When GPTBot, ClaudeBot, or any known AI crawler hits your site, it gets an HTTP 402 with payment requirements. If it pays (USDC on Base), it gets the content. If not, it gets nothing.Normal users are never affected.How it works: - Crawler detected by User-Agent → 402 response with price and wallet address- Crawler signs a USDC payment (EIP-3009) and retries wi

Show HN: Handoff-md – One command to generate portable AI context from any repo

Every time you switch AI models mid-project, the new model starts from zero. It doesn't know your stack, your conventions, or what you were working on five minutes ago.I built handoff-md to fix this. It's a CLI tool that analyzes your git repo and generates a single HANDOFF.md file. Paste it into any AI model and it instantly understands your project.What it does:- Parses git history (last 20 commits, branches, uncommitted changes) - Detects your stack (framework, ORM, DB, auth, deploy

Show HN: Market Digest: Self-hosted market analysis and Telegram

Hi HN, I built this because my pre-market routine was eating 45+ minutes every morning — checking TradingView, Finviz, economic calendars, news sites, fear/greed indexes, all before market open.*What it does:* Market Digest pulls data from 6 free sources (yfinance, TwelveData, Finnhub, FRED, NewsAPI, Fear & Greed), runs multi-timeframe technical analysis (daily/weekly/monthly RSI, pivot points, trend detection), scores every instrument from 0-100, and sends you a formatted dig

Show HN: GameScout AI – AI-powered game recommender

I built GameScout AI because I never really liked the recommendations sites like Steam propose. It uses natural language to find games based on specific mechanics or moods, like "cozy farming sims with a bit of action" or "something like Dark Souls but funnier".The Stack:- Next.js & Tailwind: For a responsive, gaming-focused UI.- Groq (Llama 3): I’m using Groq for inference because sub-second latency makes semantic search feel like a local DB query.- Prompt Engineering: O

Ask HN: Why do AI coding agents refuse to save their own observations?

I've spent months building tooling for AI coding agents and hit something I can't fully explain.If you give an agent (Claude Code, Cursor, Codex) a tool to save observations — "save_observation: persist this insight for future sessions" — and explicitly instruct it to use the tool in system prompts, config files, everywhere you can, it calls it maybe 30% of the time.The agent will happily use tools that help it complete the current task. But a tool that only benefits future s

Show HN: Simple Viewers – Tiny native macOS file viewers

Hi HN,Around summer/fall of 2025, I started using 'plan mode' quite a bit more. When I would cmd+click into a newly created markdown plan, macOS would open Xcode. This was slow, didn't have native rendered viewing, the list goes on. I started working on a markdown viewer to easily open, review plans before I sent Claude on their way. The Markdown Viewer was born! It came from inspiration from the great Preview Mac app.For Christmas our family got a Bambu A1 3D printer (fantas

Show HN: Phone a Friend for Claude Code – GPT, Gemini, DeepSeek via MCP

I built an MCP server that gives Claude Code a "phone a friend" lifeline. Instead of relying on one model's perspective, Claude can pull in GPT, Gemini, DeepSeek, or any OpenAI-compatible model for a structured multi-round debate — and participate as an active debater itself.How it works:You ask Claude to brainstorm a topic All configured models respond in parallel (Round 1) Claude reads their responses and pushes back with its own take Models see each other's responses and r

Show HN: Ryvos – Autonomous AI assistant in Rust(15MB RAM,50 tools,16 providers)

Hi HN,I've been building Ryvos for the past few months — an open-source autonomous AI assistant written in Rust. It's the thing I wished existed: always-on, multi-channel, and actually secure.The core idea: every tool call passes through a SecurityGate — 5 tiers of classification, 9 dangerous pattern regexes (rm -rf, DROP TABLE, curl|bash, etc.), Docker sandboxing, and human-in-the-loop approval for anything risky. Skills run in Lua/Rhai sandboxes, not raw system code. We&#x2

I built a 151k-node GraphRAG swarm that autonomously invents SDG solutions

Hi HN, I wanted to share a passion project I've been building: PROMETHEUS AGI. I got frustrated that most LLM/RAG applications just summarize text. I wanted to see if an agentic swarm could actually perform cross-domain reasoning to invent new physical solutions (focusing on UN SDGs). The Stack: Neo4j Aura (Free tier maxed out at 151k nodes / 400k edges) Ingestion: Google BigQuery (Patents) + OpenAlex API LLMs: Ollama (Llama 3) for zero-cost local entity extraction, Claude 3.5 via

Show HN: Batchling – save 50% off any GenAI requests in two lines of code

batchling is a Python gateway to provider-native GenAI Batch APIs, so your existing calls can run at batch-priced rates instead of standard realtime pricing.As an AI developer myself, I discovered Batch APIs when tingling with AI benchmarking: I wanted to save 50% because I was ok with a 24h-SLA.What I discovered was a hard engineering reality:- No standards: each batch API has a different flow and batch lifecycles are never the same.- Framework shift: as a developer, switching from sync/as

Show HN: Browser-based .NET IDE with visual designer, NuGet packages, code share

Hi HN, I'm Giovanni, founder of Userware. We built XAML.io, a free browser-based IDE for C# and XAML that compiles and runs .NET projects entirely client-side via WebAssembly. No server-side build step.The link above opens a sample project using Newtonsoft.Json. Click Run to compile and execute it in your browser. You can edit the code, add NuGet packages, and share your project via a URL.What's new in v0.6:- NuGet package support (any library compatible with Blazor WebAssembly) - Code

Explained: What is behind the Pentagon’s clash with Anthropic?

Pentagon warns Anthropic over military use of its AI model. Dispute centres on safeguards around surveillance and autonomous ...

Anthropic Leans Into Enterprise With Managed Claude Cowork Plugins

Anthropic this week announced a new plugin ecosystem and extensions to the Cowork platform that makes it easier for enterprises to build and manage workflow integrated agents.