OpenAI sidesteps Nvidia with unusually fast coding model on plate-sized chips
OpenAI has spent the past year systematically reducing its dependence on Nvidia. The company signed a massive multi-year deal ...
OpenAI has spent the past year systematically reducing its dependence on Nvidia. The company signed a massive multi-year deal ...
The company says its latest model’s agentic skills also apply to a broader set of knowledge work such as presentations and spreadsheets. On Thursday, OpenAI released GPT-5.3-Codex, a new model that ...
OpenAI introduces Harness Engineering, an AI-driven methodology where Codex agents generate, test, and deploy a million-line production system. The platform integrates observability, architectural con
OpenAI announced yesterday Codex Desktop, a new native macOS app that treats AI coding agents like teammates you can direct, ...
Hey HN — I'm Parth, I built thisorthis.ai because I was tired of copy-pasting the same prompt across ChatGPT, Claude, and Gemini tabs to figure out which model actually gave the best answer.What it does: You type one prompt, pick 2–6 models (we support 47 text models and several image models across OpenAI, Anthropic, Google, xAI, Meta, Amazon, Mistral, Cohere, AI21), and see every response side-by-side. There's also a feature called SmartPick that uses an LLM evaluator to score each re
I built a system that reads research papers across unrelated domains and tries to surface hypotheses that neither field would have generated on its own. The Discover page is where it publishes findings: https://aegismind.app/discoveries It's very early — only three discoveries so far — but the core idea is what I want feedback on. The problem it's trying to solve: Science is siloed. A breakthrough in mycology might have direct implications for network routing. A discover
Hey HN,I built AgentBudget after an AI agent loop cost me $187 in 10 minutes — GPT-4o retrying a failed analysis over and over. Existing tools (LangSmith, Langfuse) track costs after execution but don't prevent overspend.AgentBudget is a Python SDK that gives each agent session a hard dollar budget with real-time enforcement. Integration is two lines: import agentbudget agentbudget.init("$5.00") It monkey-patches the OpenAI and Anthropic SDKs (same pattern as Sentry/D
Hey HN. I've spent 10 years doing IT work, mostly infrastructure and scripting, the stuff nobody writes blog posts about. No CS degree. This is my first startup and I have no idea if I'm doing it right. Here's the problem that bugged me: every AI agent setup I looked at just blasts everything through GPT-4 or whatever the biggest model is. That's insane for 80% of tasks. You don't need a $0.03/1k token model to parse a CSV. So I built two things. Rhelm is a web app
Hey HN,We built a different kind of AI benchmark for UI generation.Instead of static leaderboards or curated screenshots, you can watch multiple models generate the same design live, side-by-side, and decide which output is actually better.Under the hood, we call AI models from Anthropic (Opus), OpenAI (GPT), Google (Gemini), and Moonshot AI (Kimi).Each model generates a real, editable project using Tailwind CSS (not screenshots or canvas exports). You can export it for Next.js, Laravel (Blade),
I ported Karpathy's microgpt [1] to TypeScript. It implements a complete GPT-2 architecture (autograd, tokenizer, multi-head attention, RMSNorm, and Adam optimizer), all in ~500 lines with zero runtime dependencies. It runs natively in the browser, no Python runtime or backend needed.You can try it in the playground: https://microgpt-ts.vercel.app/playgroundThere are preset datasets (baby names, Pokemon, company names, movie titles, etc.) or you can paste your own text. The p
MCP (Model Context Protocol) has 77k+ stars and is becoming the standard way AI agents connect to tools. We audited both official SDKs (TypeScript and Python) at the source code level and found three classes of boundary-crossing vulnerabilities.All three confirmed with live PoC exploits using the SDK's real auth components (BearerAuthBackend, RequireAuthMiddleware, TokenVerifier).Findings:1. Tool Capability Shadowing — tool names are flat strings with no namespace or origin tracking. If two
Hey HN, I fine-tuned a small open-source model on golf forecasting and it beats GPT-5 at predicting golf outcomes. The same approach can be used to build a specialized model in any domain, you just need to update a few search queries.We fine-tuned gpt-oss-120b with LoRA on 3,178 golf forecasting questions, using GRPO with Brier score as the reward.Our model outperformed GPT-5 on Brier Skill (17% vs 12.8%) and ECE (6% vs 10.6%) on 855 held-out questions.How to try it: the model and dataset are op
URL: https://github.com/Preet3627/Comet-AITEXT: Hey HN, I'm Preet, 16 years old, and I've been building Comet AI Browser for the past 2 months while preparing for JEE. I want to be upfront about what this is and what it isn't. What it is: A cross-platform AI browser (Windows/macOS/Linux/Android/iOS) with a security architecture I couldn't find anywhere else. Most AI browsers trust LLM guardrails to prevent prompt injection. Comet doesn&
Hi HN,I built QueryVeil because I was tired of two things: (1) uploading data to third-party tools, and (2) AI tools that just translate English to one SQL query and call it done.QueryVeil is an AI data analyst that actually investigates. When you ask "why did revenue drop last month?", it doesn't just run one query — it plans an approach, runs multiple queries, self-corrects when it hits errors, and builds a report with its findings. Like a junior analyst who happens to live in y
Hey HN,A few years ago the worst boss I ever had told me “I don’t encourage side projects.”So obviously I spent a few evenings recently to build Pythia (live beta at https://pythia-rating.com).It’s a single letter grade (AAA down to C) that combines five indices into one executive-friendly score: •Performance (40 %) – real CrUX field data + Lighthouse lab •Security (20 %) – modern HTTP security headers •Privacy & Tracking (20 %) – cookies, trackers, 3rd-party domains, consent sig
Hey HN!Over the past few months, I've been working on building Omni - a workplace search and chat platform that connects to apps like Google Drive/Gmail, Slack, Confluence, etc. Essentially an open-source alternative to Glean, fully self-hosted.I noticed that some orgs find Glean to be expensive and not very extensible. I wanted to build something that small to mid-size teams could run themselves, so I decided to build it all on Postgres (ParadeDB to be precise) and pgvector. No Elasti
I had a slightly absurd realization: despite all the UI polish on iOS, most weather apps are still just the same numbers in slightly different rectangles. So I built Brzzy. The idea was simple: what if checking the weather felt less like reading a spreadsheet and more like opening something fun? Brzzy turns forecasts into customizable GIFs you can tweak and share. Pick your vibe, set your tags (Star Wars, Harry Potter, whatever), and your forecast stops feeling alive!I’m currently offering free
Tested it out this [weekend](https://ctftime.org/team/425785), spent $100 across xai ($35), google($35), anthropic($25).- xai:grok-4-1-fast-reasoning solved 8- google:gemini-3-flash-preview solved 5 (building on xai)- anthropic:opus-4-5 couldn't solve any additional ones - also was a bit annoyed by the constant 429 ratelimiting, wished I had instead switched to openai, but I didn't want to keep spending money.Did pretty good for web and cryto, less so for pwnBuilt b
The core of every agent framework is the same ReAct loop. It's commodity code. What actually matters is everything around that loop — how you manage context windows, how you pipeline tool execution, how you handle durability and replay. These are hard problems with real design trade-offs, and yet every framework bundles them into one monolith where you buy all of it or none of it.neuron is the layer below frameworks. It defines trait boundaries — `Provider`, `Tool`, `ContextStrategy` — and
GPT-5.3 inside OpenAI’s public GitHub repository, suggesting that the general-purpose version of the model is already in development. The First sighting of GPT 5.3 has been spotted!