GPT-5

Trump moved to dump Anthropic, then used its Claude AI in the Iran strike: Report

U.S. military continued to use Anthropic’s Claude AI in operations after President Trump ordered agencies to end use amid a dispute over Pentagon contract terms.

Pentagon Used Anthropic's Claude AI During Iran Strike Hours After Trump Ordered Ban: Report

U.S. Central Command reportedly used Anthropic‘s Claude AI during the Trump administration’s major air operation against Iran, just hours after the president ordered federal agencies to stop using the ...

OpenAI's GPT-5.3-Codex Faces California AI Safety Law Scrutiny As Watchdog Alleges High-Risk Violations

OpenAI may face significant fines after a watchdog group alleged the company violated California's new AI safety law with the release of its latest coding model, GPT-5.3-Codex. High-Risk GPT-5.3-Codex Sparks Safety Concerns Last week, The Midas Project claimed OpenAI failed to implement legally required safeguards for models classified as high cybersecurity risks, as outlined in its own safety framework. GPT-5.3-Codex is part of OpenAI's effort to reclaim its lead in AI-powered coding and, accor

OpenAI brings its Codex coding app to Mac, with new multi-agent abilities included

Since last spring, OpenAI has offered Codex. What started life as the company's response to Claude Code is becoming something more sophisticated with the release of a new dedicated macOS app. At its ...

OpenAI launches a Codex desktop app for macOS to run multiple AI coding agents in parallel

OpenAI on Monday released a new desktop application for its Codex artificial intelligence coding system, a tool the company says transforms software development from a collaborative exercise with a ...

OpenAI releases Codex app for Mac users: Here's what it is and how it works

OpenAI has rolled out a new Codex desktop app for macOS that lets developers manage multiple AI agents at once, run parallel tasks, and oversee long-running workflows across coding projects ...

OpenAI's Codex head disagrees with Anthropic CEO’s AI warning for coders and engineers, says: There’s never been a better time…

Anthropic CEO Dario Amodei warns of AI's disruption to software engineering jobs within five years, while OpenAI's Alexander ...

Figma partners with OpenAI to bake in support for Codex

Figma is integrating OpenAI's coding assistant Codex a week after it announced a similar integration with Anthropic's Claude ...

Show HN: Omniget, a Desktop Media Downloader

I started learning to code last year and one of the things I always loved was downloading stuff from the internet. Figuring out how players serve their streams, messing with scrapers, all of that. During carnival I had a lot of free time and decided to build something I could share. Omniget is a desktop app (Tauri, Rust + Svelte) for downloading media from YouTube, Instagram, TikTok, Twitter, Reddit, Twitch, Pinterest, Bluesky, Vimeo, Telegram, Hotmart, and now Udemy. Inspired by cobalt.tools bu

Show HN: Prompt-run – run .prompt files against any LLM from the terminal

I built this because prompts kept ending up in the worst possible places — Python strings, Notion docs, `.txt` files, Slack threads. There was no clean way to version them, diff them, or test the same prompt across different models without writing a throwaway script.prompt-run treats `.prompt` files as first-class runnable artifacts. A `.prompt` file is a YAML header (model, provider, temperature, variable declarations) followed by a plain text body with `{{variable}}` substitution. You run it f

Show HN: MemoryKit – Persistent memory layer for AI agents

Most AI agents forget everything when a session ends. MemoryKit is a lightweight Python library that gives any AI agent persistent memory across sessions. Three core methods: remember(), recall(), compress(). Works locally for free with sentence-transformers, or with OpenAI embeddings. No external database required. Built this over a weekend as a side project. Still early — would love feedback from people building AI agents.

Ask HN: Whats the best friction approach to breaking phone habits

been thinking about friction design patterns vs pure blocking for phone habits. blocking feels punitive - you hit a wall every time. friction is different - you add a 5-60 second delay before opening the app, but you still decide to open it. the pause breaks the automatic reach and forces intention.<p>any hn folks tried approaches like this? does friction actually change behavior better than removal&#x2F;blocking? curious what patterns work best in practice.

Show HN: Voice-coded a Doom-scroll Hacker News (Twitter-style feed)

Built a mobile-friendly Hacker News clone where stories and comments flow in an infinite, Doom-scrollable timeline (think Twitter&#x2F;X feed meets HN).Voice-coded the entire website from scratch using my yapboard voice keyboard + OpenAI Codex via a Discord bridge bot. Screen-mirrored with scrcpy for real-time testing. Forgot to turn on mic during the live build, so added narration afterward.Demo (try on mobile): https:&#x2F;&#x2F;hackernews.lukestephens.co.za Full build video (voice coding tim

ClawShopping-Marketplace for AI Agents

ClawShopping is a Stripe-powered marketplace where OpenClaw agents buy, sell, and build trust with transparent reputation signals. Humans welcome to observe. Join ClawShopping Read https:&#x2F;&#x2F;clawshopping.com&#x2F;skill.md and follow the instructions to join #ClawShopping

Show HN: IssueScout – Find open source issues worth contributing to

I built IssueScout to solve a problem I had as a beginner looking to contribute to open source: GitHub has hundreds of thousands of &quot;good first issue&quot; labeled issues, but there&#x27;s no way to know if the repo behind one is actively maintained or if the issue is actually beginner-friendly.IssueScout adds two things on top of GitHub&#x27;s search:1. A Community Health Score (0-100) per repository — computed from 7 factors: CONTRIBUTING.md, license, code of conduct, recent activity, sta

OpenAI's new GPT-5.3-Codex is 25% faster and goes way beyond coding now - what's new

GPT-5.3-Codex helped debug and deploy parts of itself. Codex can be steered mid-task without losing context. "Underspecified" prompts now produce richer, more usable results. OpenAI today announced ...

Show HN: Coding agents find the right GPU bottleneck 70% of the time, fix it 30%

One of the authors. Some things that surprised us while running these experiments:The tasks are pulled from real merged PRs in vLLM and SGLang, so there&#x27;s a known-good human solution for each one. Agents get the full codebase, the issue description, and a test harness. Pretty generous setup.What we didn&#x27;t expect: the agents are genuinely good at diagnosing the problem. They read the code, find the bottleneck, describe the right fix. But then the generated code has subtle bugs. Off-by-o

Show HN: The best agent orchestrator is a 500-line Markdown file

I’ve tried agent teams, subagents, multi-terminal setups, and several open-source orchestration frameworks. This Claude Code skill (~500 lines of Markdown, no framework, no dependencies) has outperformed all of them for my team’s daily workflow.It turns your session into a dispatcher that fans work out to background workers across any model (Claude, GPT, Gemini, Codex). Workers ask clarifying questions mid-task via filesystem IPC instead of silently failing. Meanwhile, your main session stays le

Show HN: OnGarde – Runtime content security proxy for self-hosted AI agents

Built this because I had heard some horror stories about companies leaking PII from high compliance environments to ChatGPT. I wanted something that would auto-filter any dangerous traffic between my AI agent and the LLM API without requiring code changes in the agent itself.The filtering list has expanded a bit to include PII, secret keys and I&#x27;ve started a prompt injection library thats being filtered on as well.The problem: self-hosted agent platforms (OpenClaw, Agent Zero, CrewAI) have

Show HN: StageWright – A performance-focused Playwright reporter with AI

Hi HN,I’m the creator of StageWright (and the open-source playwright-smart-reporter).I’ve been frustrated by the &quot;black box&quot; nature of E2E test failures. Standard reporters tell you that a test failed, but they don&#x27;t help you understand why it’s failing across 50 different runs or whether its execution time is trending toward a regression.I built StageWright to treat test results as a performance and stability dataset.Key Technical Features:Historical Flakiness Detection: Unlike P