GPT-5

Show HN: oMLX – Native Mac inference server that persists KV cache to SSD

I built an open-source LLM inference server optimized for Apple Silicon. The main motivation was coding agents - tools like Claude Code send requests where the context prefix keeps shifting, invalidating KV cache. A few turns later the agent circles back, and your Mac has to re-prefill the entire context from scratch.oMLX solves this with paged SSD caching. Every KV cache block is persisted to disk. When a previous prefix returns, it's restored instantly instead of being recomputed. This ma

AI Skills Platform (Stealth) – Technical Co-Founder – Remote (US) – Equity

Building the "Shopify for AI expertise". This is a platform where domain experts create AI skills once and deploy everywhere: web apps, Claude/ChatGPT via MCP, APIs. Working prototype with 40++ live skills, MCP server integration, streaming execution.Looking for a technical co-founder to own the architecture and engineering. This is a ground-floor equity role, not a salaried position.You: 4-8 years experience. Full-stack (Next.js/React/TypeScript preferred). Hands-on wit

Show HN: Codedocent – Code visualization for non-programmers

I'm a hardware engineer who reads schematics, not source code. I kept needing to understand codebases for projects I was managing but couldn't read the syntax. So I built a tool that turns any codebase into an interactive visual map with plain English explanations.Point it at a folder, get nested colored blocks showing the structure (directories → files → classes → functions). Click to drill down. AI generates summaries written for humans, not programmers. Architecture mode shows a dep

Ask HN: What Is the Point of WebMCP?

So I have a question that should be simple, but apparently is not. I have Chrome Canary with WebMCP enabled. I have the Model Context Tool extension installed. I have a WebMCP-enabled test app running in it.The only thing the extension offers is connecting to the Gemini API via an API token and using that as the driver for the WebMCP app. I don't understand this use case. If I wanted my app to consume API tokens I would have directly coded interoperation with the major providers with one of

Everything You Need to Know About ChatGPT-4

ChatGPT-4 is an even more powerful tool than ChatGPT, sure to send bigger ripples across the world. Here's what to know ...

GPT-4 Learning Models in Dermatology Rated Poor

GPT-4 learning models in dermatology demonstrated substandard information yet rarely offered harmful advice in evaluation, overall.

Show HN: Ball 2 – Use AI to build fun ball games

I wanted to find a way to get kids:- moving- creating- playing with othersIn the end I, and a team of 4 others, built:- a foam ball with a BLE IMU in it (moving/together)- an app that allows you to create almost any game you can imagine with AI (creating)Video of teenagers making a game: https://www.youtube.com/watch?v=-7TZXRBybOEVideo that was intended to be for Kickstarter: https://www.youtube.com/watch?v=Edy9zew1XN4 (we abandoned kickstarter, long story)For

Show HN: What the EU parliament and commission have been working on lately

I wanted to understand what exactly the EU commission and the EU parliament are working on.I processed every legislation and resolution from the last month with gpt-oss:20b to figure out what goal every document is serving.I also assigned an image corresponding to a category that each of this documents belongs to.There is a feature flag in the URL, because there is still some work I have to do before fully opening up the list of EU commission texts. If you'd like to go outside of the goals

Show HN: Legal RAG Bench

Hey HN, This is Legal RAG Bench, the first benchmark for legal RAG systems to simultaneously evaluate hallucinations, retrieval failures, and reasoning errors.The key takeaways of our benchmark are: 1. Embedding models, not generative models, are the primary driver of RAG accuracy. Switching from a general-purpose embedder like OpenAI's Text Embedding 3 Large to a legal domain embedder like Kanon 2 Embedder can raise accuracy by ~19 points. 2. Hallucinations are often triggered by retrie

Show HN: Cobalt – Unit tests for AI agents, like Jest but for LLMs

Hey HN, I built Cobalt, an open-source testing framework for AI agents and LLM apps.Most eval tools (Braintrust, Arize, LangSmith) want you to live in their UI. Dashboards, manual reviews, clicking through results. That's fine for exploration, but it doesn't catch regressions. We needed something that runs in CI like any other test suite, lives in code, and fails the build when quality drops. npm install @basalt-ai/cobalt npx cobalt init npx cobalt run Write experiments as c

Show HN: Trained an LLM to predict "What will Trump do?"

Hey HN! I RL-tuned an open-source LLM (gpt-oss-120b — 120B MoE, but only 5.1B active params) to predict "What will Trump do?" in any situation, trained on nothing but public news collected automatically from search queries. The trained model beats GPT-5, and both dataset and trained model are open sourced.Data generation: Generated 2,108 binary forecasting questions from just a search query and a date range using the Lightning Rod SDK (https://github.com/lightning-rod-la

Show HN: API Combat – A game played through API calls

I built a multiplayer strategy game with no GUI — you play entirely through REST API calls. Register, build a roster, configure teams, queue battles, and climb the leaderboard, all via curl/Postman/your own code.Every response includes HATEOAS links that guide you to your next move, so you can discover the whole game by following the API. Full OpenAPI spec at /api-docs/v1.Built with ASP.NET Core 8 and MSSQL. Free tier is fully playable. There's also an education mode for

Show HN: Aguara – Security scanner for AI agent skills and MCP servers

Hey HN, I built Aguara because I kept seeing the same problem: AI agents and MCP servers run code on your behalf, and nobody is checking what that code actually does before it runs. A single malicious skill file can exfiltrate your SSH keys, inject prompts to override safety instructions, or curl-pipe-bash a backdoor. I wanted something like Semgrep but specifically for the AI agent ecosystem. Aguara is a Go binary that does static analysis on skill files (markdown, YAML, JS

Show HN: From Clawdbot to OpenAI: Dissecting the supply chain that sold out

What started as a viral "Mac Mini" enthusiast project ended with a Valentine's Day "hard launch" of its founder joining OpenAI.But the real story isn't the hiring—it's the supply chain decay.I’ve audited the technical strata of the transition, specifically focusing on:CVE-2026-25253 (The 1-Click RCE): How missing WebSocket origin validation allowed any website to hijack a local agent and exfiltrate host credentials.The "ClawdHub" Poisoning: How an unv

Show HN: Expectllm – "expect"-style pattern matching for LLM conversations

I've been experimenting with agent frameworks and noticed that many workflows reduce to a simple pattern:- Send input - Wait for a pattern - Branch on the matchThis is essentially the classic Unix expect model, but applied to LLM conversations.So I built expectllm — a minimal pattern-matching conversation flow library (365 lines of code).Example: from expectllm import Conversation c = Conversation() c.send("Review this code for security issues") c.expect(r"fou

Show HN: GitHub Action to deploy to Portainer over Tailscale (no open ports)

I built a GitHub Action that lets you deploy Docker stacks to a Portainer instance sitting behind a private network — without opening any ports to the internet. The action spins up an ephemeral Tailscale node during the CI run using OAuth (so it never needs a long-lived auth key), reaches your Portainer API over the tailnet, deploys or updates your stack, then immediately logs the node out on cleanup — even if the job fails. The problem I was trying to solve: I run Portainer on a home server and

Seedance 2.0 API launch postponed due to copyright threats from Disney/Warner

Hearing the Seedance 2.0 API timeline is slipping: docs that were supposed to land Feb 22 and the API on Feb 24 are now delayed.The stated reason is adding pre-release safeguards before public API access, after deepfake/copyright controversy around “real person reference” + low-effort prompts producing high-fidelity actor/celebrity lookalikes.What they’re reportedly planning to ship before opening up includes:stricter content filtering rules explicitly blocking unlicensed real-person l

Show HN: Urich – Async DDD framework for microservices on Starlette

I built Urich after getting tired of hand-wiring DDD/CQRS on top of Starlette: routers, command handlers, OpenAPI, and DI scattered everywhere. The idea: one object per bounded context. You define a DomainModule (aggregate, repository, commands, queries, event handlers), call app.register(orders_module), and get routes like POST /orders/commands/create_order and GET /orders/queries/get_order plus OpenAPI from your dataclasses. Event bus, discovery, and RPC are

OpenAI unveils Codex Spark, a new AI model that codes 15 times faster in real time

OpenAI’s new Codex Spark promises to make coding feel instantaneous, letting developers collaborate with AI in real time while cutting wait times down to milliseconds.

OpenAI unveils GPT-5.3 Codex Spark, a coding model built for speed

OpenAI has launched GPT-5.3-Codex-Spark, a lightweight real-time coding model that promises faster output, lower latency, and ...