GPT-5

Who Owns Claude AI (And Is It Amazon?)

Amazon's huge investment in Anthropic sparked speculation over Claude AI's ownership. We break down the funding, partnerships ...

Show HN: I built a local fuzzing tool to red-team LLM agents (Python, SQLite)

I spent the last week building a local-first security tool because I was tired of paying $500/mo for enterprise SaaS just to test my AI agents for basic vulnerabilities.The tool is called Agent Exam Pro. It's a Python-based fuzzer that runs locally on your machine (no cloud data leaks).How it works:The Engine: Takes a base test case and runs it through 16 mutation strategies (Base64, Roleplay, Token Smuggling) to generate 1,000+ variations.The Payloads: I curated 280+ real-world exploi

Show HN: LLM-models – a CLI tool to list available LLM models across providers

I built a simple CLI tool to solve a problem I kept running into: which exact model names are actually available through OpenAI, Anthropic, Google, and xAI APIs at any given time?The APIs themselves provide this info, but I got tired of checking docs or writing one-off scripts. Now I can just run:$ llm-models -p Anthropicand get the current list with human-readable names.Installation: macOS: brew tap ljbuturovic/tap && brew install llm-models Linux: pipx install llm-models Wind

Show HN: Kodaii generated a 20K-line FastAPI back end from one prompt

We’ve been working on the Kodaii engine, aimed at generating complete backends that stay coherent across models, routes, workflows, and tests — not just isolated snippets.To get a sense of how well the engine handles a real project, we asked it to build a Calendly-style booking system from a single prompt. It ran the whole process — planning, code generation, tests, infra, and deployment — in about 8 hours.What it generated: - ~20K lines of Python (FastAPI, async)- Postgres schema (6 tables)- Se

Tell HN: OpenAI Security Incident with PII

Today I got the following email from OpenAI:Subject: Third-party security incidentFrom: OpenAI <noreply@email.openai.com>Transparency is important to us, so we want to inform you about a recent security incident at Mixpanel, a data analytics provider that OpenAI used for web analytics on the frontend interface for our API product (platform.openai.com). The incident occurred within Mixpanel’s systems and involved limited analytics data related to your API account.This was not a breach of Op

Show HN: Superglue – OSS integration tool that understands your legacy systems

If you've ever worked in a large company, you've probably encountered "shadow infrastructure": scripts nobody understands or custom connectors written once and never touched again. This glue layer isn't documented, isn't owned by anyone, and tends to break when systems are upgraded or someone leaves. It's also the part everybody dreads working on, because it's hard to understand, painful to work with, and full of unknown unknowns.We built superglue so that

OpenAI's new GPT‑5.1-Codex-Max — all about the agentic coding model that can work for long hours

Max, a new coding model designed for detailed and long-running software development tasks. Here is an overview of the model ...

What to be thankful for in AI in 2025

Liquid AI spent 2025 pushing its Liquid Foundation Models (LFM2) and LFM2-VL vision-language variants, designed from day one for low-latency, device-aware deployments — edge boxes, robots, and ...

OpenAI debuts GPT‑5.1-Codex-Max coding model and it already completed a 24-hour task internally

Max, a new frontier agentic coding model now available in its Codex developer environment. The release marks a significant step forward in AI-assisted software engineering, offering improved ...

How OpenAI Ships New Products With Lightning Speed

OpenAI has shipped new products at a relentless clip in the second half of 2025. Not only has the company released several ...

ChatGPT 5.1 Codex Max : AI Coder Handles Massive PRs, Reviews & Debugging at Scale

OpenAI’s GPT 5.1 Codex Max runs 24-hour workflows, handles multifile refactors, reaches 80% accuracy, and uses 30% fewer tokens to reduce costs

I built an open-weights memory system that reaches 80.1% on the LoCoMo benchmark

I’ve been experimenting with long-term memory architectures for agent systems and wanted to share some technical results that might be useful to others working on retrieval pipelines.Benchmark: LoCoMo (10 runs × 10 conversation sets) Average accuracy: 80.1% Setup: full isolation across all 10 conv groups (no cross-contamination, no shared memory between runs)Architecture (all open weights except answer generation)1. Dense retrievalBGE-large-en-v1.5 (1024d)FAISS IndexFlatIPStandard BGE instructio

Ask HN: Codex vs. 5.1 for pdf table-to-JSON extraction?

Has anyone successfully compared GPT-5.1 with Codex to extract clean JSON tables from image-based PDFs?

Show HN: Fixing LLM memory degradation in long coding sessions

Long-session LLM memory degradation (entropy) is the silent killer of complex coding projects. Models like Gemini, GPT-4, and Claude all suffer from it, leading to hallucinations and lost context.I've developed an open-source protocol that temporarily "fixes" this issue by structuring the dialogue. It's not the final architectural solution, but it’s a proven patch for developers working right now.Looking for feedback from the community on how we can solve this structurally. h

Show HN: Turn your site into a demo video with one URL

Tired of recording the same product demo 10 times and still not being happy with it?That’s basically why I built AutoAds (autoads.pro).The idea: you paste your site’s URL, and an AI pipeline does all the annoying stuff for you, fully automatic.What it does- You paste the URL of your site (SaaS, e-commerce, portfolio) - An AI pipeline visits the site and analyzes the content (sections, copy, visuals, CTAs, etc.) - It generates a screen-recording-style video of your site: - scrolling and navigat

Show HN: ZigFormer – An LLM implemented in pure Zig

Hi everyone,I've made an early version of ZigFormer, a small LLM implemented in Zig with no dependencies on external ML frameworks like PyTorch or JAX. ZigFormer is modelled after a textbook LLM (like GPT-2 from OpenAI) and can be used as a Zig library as well as a standalone application to train a model and chat with it.This was mainly an educational project. I'm sharing it here in case others find it interesting or useful.Link to the project: https://github.com/Cogitat

Show HN: Local-first RAG for PDF user manuals, datasheets

I work on embedded firmware for my day job, and I've found LLMs to be useful for answering questions about technical errata. But, they tend to be bad at answering highly specific questions without using some kind of search tool (if they decide to use one at all), and some user manuals are far too large to fit into a context window.I built askdocs-mcp as a way to give agents a more direct route to searching through a project's source-of-truth documents. My design constraints were that i

Tested OpenAI's prompt caching across models. Found undocumented behavior

Been building an AI agent from scratch to understand token economics. Spent a week on prompt caching. Found something interesting that isn't in OpenAI's docs. Setup: Network device monitoring chatbot, 10 tools, ~1,400 token prefix. Tested gpt-4o-mini, gpt-5-mini, gpt-5. Logged cached_tokens from every response.Finding 1: Caching works as documented Once prefix exceeds 1024 tokens, OpenAI caches it automatically. I saw 80-90% cache hit rates after the first call. Cost reduction of 47-49

My SaaS jumped from $6,523 to $12,648 monthly, here is how

This is my story of Postiz doubling its revenue in one month. Feel free to ask me anything.I saw an opening Postiz is traditionally an open-source social media scheduling tool. In August, I started seeing something interesting: new YouTube videos about how people are automating their social media with n8n.First, I thought it was cool, then I saw a pattern. Many people opened "Skool" groups on how to use n8n. Thousands of templates, but among them, how to create viral AI posts.I liked i

Show HN: Codex Swarm – Local ChatGPT swarm for coding with Git-tracked agents

I built a small framework that turns the OpenAI Codex plugin into a swarm of specialized “workers” that all live inside a single repo. Instead of just chatting with a model in your IDE, you get an orchestrator, planner, coder and reviewer that share a JSON task board, touch only the files in your project and are forced to finish every step with a clean git commit.The whole thing is defined in prompts and JSON (AGENTS.md + .AGENTS/*.json), with git and tasks.json as the shared memory layer.