GPT-5
Show HN: LiberClaw, deploy AI agents that run 24/7 on their own VMs
LiberClaw is an open-source platform for deploying AI agents that run around the clock on dedicated virtual machines. You define what an agent does with markdown skills file, deploy it, and it keeps working whether you're at your desk or not. The agent system code is on GitHub: https://github.com/Libertai/liberclaw-agentThere are 61 agents running on the platform right now across 578 conversations, with 99.7% uptime. Each agent gets its own VM with its own filesystem, da
Show HN: DocMCP – Index any docs site locally, search it from Claude via MCP
Kept running into the same problem while coding with Claude: library docs are either outdated in its training data or I'm copy-pasting pages into the chat.
Built DocMCP – an MCP server that crawls documentation sites, chunks and indexes them locally in SQLite, and exposes a search tool Claude can call directly.
The interesting bit is the search: pure keyword search misses semantic queries ("how to make elements wrap" → flex-wrap), and pure vector search misses exact API names. So
Show HN: SafeAppeals – Cursor for Documents
SafeAppeals is an AI-powered document workspace designed for legal professionals, researchers, injured workers, and anyone handling document-heavy workflows. Built with Electron and TypeScript, it combines a powerful code editor interface with native support for DOCX, PDF, Excel, and Markdown files.Key features include:
- Integrated AI agents powered by Claude, OpenAI, and Google APIs for document analysis and generation
- Native document editing without conversion to plaintext
- DocuSign integr
Show HN: Stoneforge – Open-source orchestration for parallel AI coding agents
I built this because I was running 3-5 Claude Code instances on the same repo and burning out from constantly context switching between terminal windows. Carefully making sure agents' work didn't overlap, preventing context windows from degrading, manually enforcing documentation/memory policies, and re-explaining decisions across sessions.Stoneforge is the coordination layer I wanted. A Director agent breaks goals into tasks. A dispatch daemon assigns them to workers when availab
Show HN: Evalcraft – cassette-based testing for AI agents (pytest, $0/run)
Testing AI agents is painful. Every test run calls the LLM API, costs real money, takes minutes, and gives different results each time. CI? Forget about it.Evalcraft fixes this with cassette-based capture and replay — think VCR for HTTP, but for LLM calls and tool use.How it works:1. Run your agent once with real API calls. Evalcraft records every LLM request, tool call, and response into a JSON cassette file.2. In tests, replay from the cassette. Zero API calls, zero cost, deterministic output.
Show HN: Triplecheck – Review your code free with local LLMs
Hey HN, I built triplecheck because I wanted deep AI code review without paying $24/mo per seat.The idea: instead of one LLM pass that drops comments (like CodeRabbit/Sourcery), triplecheck runs a full loop:1. Reviewer finds bugs → structured findings with file, line, severity
2. Coder writes actual patches (search/replace diffs, not suggestions)
3. Tests run automatically to catch regressions
4. Loop until no new findings or max rounds
5. Judge scores the final result 0–10The key
Show HN: Markdown-to-Book – Convert Markdown to KDP Ready PDFs and EPUBs
Author here. I'm a software engineer who started writing hard science fiction on the side. I built this tool because I wanted to write in plain Markdown and go straight to Amazon KDP without touching Word, InDesign, or Vellum.The workflow: I write stories in .md files, one heading per chapter, --- for scene breaks. When I'm ready to publish, I run one command and get a paperback PDF, hardcover PDF, and Kindle EPUB with correct margins, typography, and scene breaks. The tool wraps Pando
Show HN: Agent-vfs – Virtual filesystem for AI agent memory
I think filesystems are the right abstraction for agent memory. Not vector databases, not key-value stores, not custom memory APIs.Why? Because agents already know how to use files. Claude Code writes notes to ~/.claude/. Cursor stores context in project files. Every coding agent that works well has converged on the same pattern: just use files. The model doesn't need to learn a new API, and you don't need a retrieval pipeline.The problem is that real filesystems don't w
Ask HN: How AI teams source and license training data?
I'm deep in research on how AI teams actually source and license training data (text, audio, video, synthetic). Not the theory, but real, messy, day-to-day process.
I'm NOT pitching or selling anything. I'm having short 15-minute conversations with people who work on this daily, and the insights have been genuinely eye-opening. Happy to share what I'm learning in return.If you know someone who fits any of these, I'd massively appreciate an intro or a tag in the comments.
Show HN: RapidFire AI – parallel RAG experimentation with live run intervention
We built RapidFire AI because iterating on RAG pipelines is painfully sequential:
run a config, wait, inspect results, tweak one knob, repeat. When you have 15
things to tune (chunk size, retrieval k, reranker, prompt template, context
window strategy...) that cycle compounds fast.RapidFire uses shard-based interleaved scheduling to run many configurations
concurrently on a single machine — even a CPU-only box if you're using a
closed API like OpenAI. Instead of config A finishing befo
Pentagon dispute bolsters Anthropic reputation but raises questions about AI readiness in military
Anthropic’s moral stand on U.S. military use of artificial intelligence is reshaping the competition between leading AI companies but also exposing a growing awareness that maybe chatbots just aren’t capable enough for acts of war.
Anthropic’s Claude is suddenly the most popular iPhone app following Pentagon feud
The AI company Anthropic was likely not in the public lexicon just a month ago. But it is now after a whirlwind sequence of events in February thrust the company into the public eye more than ever.
After Banning Anthropic From Military Use, Pentagon Still Relying Heavily on It in Iran War
So much for banning it "effective immediately." The post After Banning Anthropic From Military Use, Pentagon Still Relying ...
Anthropic launches AI exposure index to assess which white-collar jobs face automation risk
Anthropic's new AI Exposure Index ranks computer programmers as the most vulnerable to LLM automation, with 75% of tasks automatable and early-career hiring slowing.
Anthropic Nears $20 Billion Revenue Run Rate Amid Pentagon Feud
Anthropic PBC is on track to generate annual revenue of almost $20 billion, a projection based on current performance, more ...
Anthropic CEO: We're trying to "deescalate" Pentagon AI standoff to reach agreement
Anthropic CEO Dario Amodei said his company and the Department of Defense "have much more in common than we have differences.
Pentagon notifies Anthropic it’s deemed firm a supply-chain risk
The Pentagon said it has formally notified Anthropic PBC that it’s determined the company and its products pose a risk to the U.S. supply chain, according to a senior defense official, escalating a dispute over artificial intelligence safeguards. “DOW officially informed Anthropic leadership the company and its products are deemed a supply chain risk, effective immediately,” the official told ...
Pentagon labels Anthropic supply-chain risk days after CEO lashed out at Trump, OpenAI
The Pentagon has reportedly told Anthropic it will be officially labeled a supply-chain risk – just days after the AI firm’s ...
OpenAI Releases Dedicated Codex Coding App For Microsoft Windows
OpenAI launches Codex for Windows, letting developers run multiple AI coding agents, automate testing tasks, and sync projects seamlessly across Mac and Windows.
OpenAI’s new flagship GPT model can control your PC
When paired with an AI agent system, GPT-5.4 can click a mouse, type keyboard commands, browse the web, and control computer apps.