GPT-5

AI nerves are fraying. Anthropic keeps doubling down

Just weeks after its AI tools shook software stocks, Anthropic is pushing even deeper into the workplace. The company is ...

Say please? The best way to talk to an AI

"Let's say you want to generate a job description. Tell the AI 'I want you to ask me questions, one at a time, until you've ...

Claude the conqueror: The AI chatbot keeps wiping out billions from exposed tech stocks

Claude, Anthropic's AI chatbot, has sparked major selling of US tech stocks as its capabilities induce fear among investors ...

Medical AI Is Already In Hospitals. Who Is Watching Its Safety?

The FDA’s oversight was built for devices that rarely change. Clinical AI evolves over time, raising new questions about who ...

What smart people in economics and business are saying about a viral report warning of an AI-driven recession and stock crash

A worst-case AI scenario rattled markets, drawing sharp pushback from economists and business leaders.

I spent $100 benchmarking LLM providers on a weekend CTF

This past weekend, I decided to test out a cli tool I've been building to help me do source code reviews _faster_.I figured the best environment for such a tool would be a Weekend CTF event. I like web challenges since you get a nice dump of source code, as well as a Dockerfile or docker compose setup for how to run everything locally. Usually, I can complete 2-3 Web challenges before I get stuck. To help get unstuck I found myself increasingly turning to LLMs as a pairing partner.I'm

Show HN: Open-source EU AI Act compliance layer for AI agents (8/2026 deadline)

We built AIR Blackbox — open-source compliance infrastructure for AI agents targeting the EU AI Act enforcement deadline on August 2, 2026. If you're deploying LLM-based agents (LangChain, CrewAI, AutoGen, OpenAI Agents SDK) into production, the EU AI Act requires tamper-evident audit trails, human oversight mechanisms, data governance controls, and injection defense — for any system classified as high-risk. Most teams we've talked to either don't know about the deadline or assume

Terms of use: What types of competition do model providers ban?

I thought it would be interesting to look at the terms of service of the frontier labs and there was more deviation than I expected when it comes to the issue of building competing offerings. Note that I am not a lawyer and none of this is legal advice. You should refer to the specific versions of the agreements that apply to you and consult with a lawyer.It is very common for technology companies (particularly when providing data through an API,) to include a term that more-or-less says their c

Show HN: I applied Markowitz port. theory to agent teams / proved it in a zkVM

I run multi-agent teams in high-consequence scenarios. Read: fuckups at 3 AM = I'm awake.I kept hitting the same issue. I couldn't get a rules-based system to enforce behavior and I had no real way to prove that agents really did what they said they did. I can log and monitor them - set up (a million) Slack alerts but none of these things are PROOF. Logs are mutable. And that matters more every day as agents get more powerful (take THAT, @meta)So I went down the rabbit hole.The obvious

Show HN: Prompt → Schema → CRUD API and Admin UI (New Codehooks Template)

I’ve been building database-backed tools for ~20 years (previously co-founded restdb.io). One recurring pattern: teams repeatedly rebuild the same admin dashboards and CRUD APIs.This is an open-source React admin template that runs on Codehooks and generates a full CRUD backend and UI from a schema instantly.The flow is:Prompt → Schema → API → Admin UIYou can generate a schema using the "Copy Prompt" button in the UI, or define collections manually in a visual editor. After deploy, you

OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips

OpenAI has spent the past year systematically reducing its dependence on Nvidia. The company signed a massive multi-year deal ...

OpenAI’s GPT-5.3-Codex thinks deeper and wider about coding work

The company says its latest model’s agentic skills also apply to a broader set of knowledge work such as presentations and spreadsheets. On Thursday, OpenAI released GPT-5.3-Codex, a new model that ...

OpenAI Introduces Harness Engineering: Codex Agents Power Large‑Scale Software Development

OpenAI introduces Harness Engineering, an AI-driven methodology where Codex agents generate, test, and deploy a million-line production system. The platform integrates observability, architectural con

OpenAI Launches Codex Desktop App As Agentic AI Rivalry Intensifies

OpenAI announced yesterday Codex Desktop, a new native macOS app that treats AI coding agents like teammates you can direct, ...

Show HN: Thisorthis.ai – Compare responses from 50 AI models side-by-side

Hey HN — I'm Parth, I built thisorthis.ai because I was tired of copy-pasting the same prompt across ChatGPT, Claude, and Gemini tabs to figure out which model actually gave the best answer.What it does: You type one prompt, pick 2–6 models (we support 47 text models and several image models across OpenAI, Anthropic, Google, xAI, Meta, Amazon, Mistral, Cohere, AI21), and see every response side-by-side. There's also a feature called SmartPick that uses an LLM evaluator to score each re

Show HN: AgentBudget – Real-time dollar budgets for AI agents

Hey HN,I built AgentBudget after an AI agent loop cost me $187 in 10 minutes — GPT-4o retrying a failed analysis over and over. Existing tools (LangSmith, Langfuse) track costs after execution but don't prevent overspend.AgentBudget is a Python SDK that gives each agent session a hard dollar budget with real-time enforcement. Integration is two lines: import agentbudget agentbudget.init("$5.00") It monkey-patches the OpenAI and Anthropic SDKs (same pattern as Sentry/D

Show HN: AegisMind Discover – cross-domain hypothesis generation from papers

I built a system that reads research papers across unrelated domains and tries to surface hypotheses that neither field would have generated on its own. The Discover page is where it publishes findings: https://aegismind.app/discoveries It's very early — only three discoveries so far — but the core idea is what I want feedback on. The problem it's trying to solve: Science is siloed. A breakthrough in mycology might have direct implications for network routing. A discover

Show HN: Built an AI tool that routes tasks to agents, humans. Am I crazy?

Hey HN. I've spent 10 years doing IT work, mostly infrastructure and scripting, the stuff nobody writes blog posts about. No CS degree. This is my first startup and I have no idea if I'm doing it right. Here's the problem that bugged me: every AI agent setup I looked at just blasts everything through GPT-4 or whatever the biggest model is. That's insane for 80% of tasks. You don't need a $0.03/1k token model to parse a CSV. So I built two things. Rhelm is a web app

Show HN: Real-Time AI Design Benchmark

Hey HN,We built a different kind of AI benchmark for UI generation.Instead of static leaderboards or curated screenshots, you can watch multiple models generate the same design live, side-by-side, and decide which output is actually better.Under the hood, we call AI models from Anthropic (Opus), OpenAI (GPT), Google (Gemini), and Moonshot AI (Kimi).Each model generates a real, editable project using Tailwind CSS (not screenshots or canvas exports). You can export it for Next.js, Laravel (Blade),

Show HN: Microgpt-ts – Full GPT in 500 lines of TypeScript, zero dependencies

I ported Karpathy's microgpt [1] to TypeScript. It implements a complete GPT-2 architecture (autograd, tokenizer, multi-head attention, RMSNorm, and Adam optimizer), all in ~500 lines with zero runtime dependencies. It runs natively in the browser, no Python runtime or backend needed.You can try it in the playground: https://microgpt-ts.vercel.app/playgroundThere are preset datasets (baby names, Pokemon, company names, movie titles, etc.) or you can paste your own text. The p