AI nerves are fraying. Anthropic keeps doubling down
Just weeks after its AI tools shook software stocks, Anthropic is pushing even deeper into the workplace. The company is ...
Just weeks after its AI tools shook software stocks, Anthropic is pushing even deeper into the workplace. The company is ...
"Let's say you want to generate a job description. Tell the AI 'I want you to ask me questions, one at a time, until you've ...
Claude, Anthropic's AI chatbot, has sparked major selling of US tech stocks as its capabilities induce fear among investors ...
The FDA’s oversight was built for devices that rarely change. Clinical AI evolves over time, raising new questions about who ...
A worst-case AI scenario rattled markets, drawing sharp pushback from economists and business leaders.
This past weekend, I decided to test out a cli tool I've been building to help me do source code reviews _faster_.I figured the best environment for such a tool would be a Weekend CTF event. I like web challenges since you get a nice dump of source code, as well as a Dockerfile or docker compose setup for how to run everything locally. Usually, I can complete 2-3 Web challenges before I get stuck. To help get unstuck I found myself increasingly turning to LLMs as a pairing partner.I'm
We built AIR Blackbox — open-source compliance infrastructure for AI agents targeting the EU AI Act enforcement deadline on August 2, 2026. If you're deploying LLM-based agents (LangChain, CrewAI, AutoGen, OpenAI Agents SDK) into production, the EU AI Act requires tamper-evident audit trails, human oversight mechanisms, data governance controls, and injection defense — for any system classified as high-risk. Most teams we've talked to either don't know about the deadline or assume
I thought it would be interesting to look at the terms of service of the frontier labs and there was more deviation than I expected when it comes to the issue of building competing offerings. Note that I am not a lawyer and none of this is legal advice. You should refer to the specific versions of the agreements that apply to you and consult with a lawyer.It is very common for technology companies (particularly when providing data through an API,) to include a term that more-or-less says their c
I run multi-agent teams in high-consequence scenarios. Read: fuckups at 3 AM = I'm awake.I kept hitting the same issue. I couldn't get a rules-based system to enforce behavior and I had no real way to prove that agents really did what they said they did. I can log and monitor them - set up (a million) Slack alerts but none of these things are PROOF. Logs are mutable. And that matters more every day as agents get more powerful (take THAT, @meta)So I went down the rabbit hole.The obvious
I’ve been building database-backed tools for ~20 years (previously co-founded restdb.io). One recurring pattern: teams repeatedly rebuild the same admin dashboards and CRUD APIs.This is an open-source React admin template that runs on Codehooks and generates a full CRUD backend and UI from a schema instantly.The flow is:Prompt → Schema → API → Admin UIYou can generate a schema using the "Copy Prompt" button in the UI, or define collections manually in a visual editor. After deploy, you
OpenAI has spent the past year systematically reducing its dependence on Nvidia. The company signed a massive multi-year deal ...
The company says its latest model’s agentic skills also apply to a broader set of knowledge work such as presentations and spreadsheets. On Thursday, OpenAI released GPT-5.3-Codex, a new model that ...
OpenAI introduces Harness Engineering, an AI-driven methodology where Codex agents generate, test, and deploy a million-line production system. The platform integrates observability, architectural con
OpenAI announced yesterday Codex Desktop, a new native macOS app that treats AI coding agents like teammates you can direct, ...
Hey HN — I'm Parth, I built thisorthis.ai because I was tired of copy-pasting the same prompt across ChatGPT, Claude, and Gemini tabs to figure out which model actually gave the best answer.What it does: You type one prompt, pick 2–6 models (we support 47 text models and several image models across OpenAI, Anthropic, Google, xAI, Meta, Amazon, Mistral, Cohere, AI21), and see every response side-by-side. There's also a feature called SmartPick that uses an LLM evaluator to score each re
Hey HN,I built AgentBudget after an AI agent loop cost me $187 in 10 minutes — GPT-4o retrying a failed analysis over and over. Existing tools (LangSmith, Langfuse) track costs after execution but don't prevent overspend.AgentBudget is a Python SDK that gives each agent session a hard dollar budget with real-time enforcement. Integration is two lines: import agentbudget agentbudget.init("$5.00") It monkey-patches the OpenAI and Anthropic SDKs (same pattern as Sentry/D
I built a system that reads research papers across unrelated domains and tries to surface hypotheses that neither field would have generated on its own. The Discover page is where it publishes findings: https://aegismind.app/discoveries It's very early — only three discoveries so far — but the core idea is what I want feedback on. The problem it's trying to solve: Science is siloed. A breakthrough in mycology might have direct implications for network routing. A discover
Hey HN. I've spent 10 years doing IT work, mostly infrastructure and scripting, the stuff nobody writes blog posts about. No CS degree. This is my first startup and I have no idea if I'm doing it right. Here's the problem that bugged me: every AI agent setup I looked at just blasts everything through GPT-4 or whatever the biggest model is. That's insane for 80% of tasks. You don't need a $0.03/1k token model to parse a CSV. So I built two things. Rhelm is a web app
Hey HN,We built a different kind of AI benchmark for UI generation.Instead of static leaderboards or curated screenshots, you can watch multiple models generate the same design live, side-by-side, and decide which output is actually better.Under the hood, we call AI models from Anthropic (Opus), OpenAI (GPT), Google (Gemini), and Moonshot AI (Kimi).Each model generates a real, editable project using Tailwind CSS (not screenshots or canvas exports). You can export it for Next.js, Laravel (Blade),
I ported Karpathy's microgpt [1] to TypeScript. It implements a complete GPT-2 architecture (autograd, tokenizer, multi-head attention, RMSNorm, and Adam optimizer), all in ~500 lines with zero runtime dependencies. It runs natively in the browser, no Python runtime or backend needed.You can try it in the playground: https://microgpt-ts.vercel.app/playgroundThere are preset datasets (baby names, Pokemon, company names, movie titles, etc.) or you can paste your own text. The p