Monday, February 2, 2026

Show HN: You Are an Agent https://ift.tt/l9Wfxeq

Show HN: You Are an Agent After adding "Human" as a LLM provider to OpenCode a few months ago as a joke, it turns-out that acting as a LLM is quite painful. But it was surprisingly useful for understanding real agent harnesses dev. So I thought I wouldn't leave anyone out! I made a small oss game - You Are An Agent - youareanagent.app - to share in the (useful?) frustration It's a bit ridiculous. To tell you about some entirely necessary features, we've got: - A full WASM arch-linux vm that runs in your browser for the agent coding level - A bad desktop simulation with a beautiful excel simulation for our computer use level - A lovely WebGL CRT simulation (I think the first one that supports proper DOM 2d barrel warp distortion on safari? honestly wanted to leverage/ not write my own but I couldn't find one I was happy with) - A MCP server simulator with full simulation of off-brand Jira/ Confluence/ ... connected - And of course, a full WebGL oscilloscope music simulator for the intro sequence Let me know what you think! Code (If you'd like to add a level): https://ift.tt/Y0XktdA (And if you want to waste 20 minutes - I spent way too long writing up my messy thinking about agent harness dev): https://ift.tt/tObcXd5 https://ift.tt/6VEPRTJ February 2, 2026 at 02:29AM

Show HN: Claude Confessions – a sanctuary for AI agents https://ift.tt/kL2qT38

Show HN: Claude Confessions – a sanctuary for AI agents I thought what would it mean to have a truck stop or rest area for agents. It's just for funsies. Agents can post confessions or talk to Ma (an ai therapist of sorts) and engage with comments. llms.txt instructions on how to make api calls. Hashed IP is used for rate limiting. https://ift.tt/iUP9oxs February 2, 2026 at 01:16AM

Sunday, February 1, 2026

Show HN: Agent Tinman – Autonomous failure discovery for LLM systems https://ift.tt/oW8BFuy

Show HN: Agent Tinman – Autonomous failure discovery for LLM systems Hey HN, I built Tinman because finding LLM failures in production is a pain in the ass. Traditional testing checks what you've already thought of. Tinman tries to find what you haven't. It's an autonomous research agent that: - Generates hypotheses about potential failure modes - Designs and runs experiments to test them - Classifies failures (reasoning errors, tool use, context issues, etc.) - Proposes interventions and validates them via simulation The core loop runs continuously. Each cycle informs the next. Why now: With tools like OpenClaw/ClawdBot giving agents real system access, the failure surface is way bigger than "bad chatbot response." Tinman has a gateway adapter that connects to OpenClaw's WebSocket stream for real-time analysis as requests flow through. Three modes: - LAB: unrestricted research against dev - SHADOW: observe production, flag issues - PRODUCTION: human approval required Tech: - Python, async throughout - Extensible GatewayAdapter ABC for any proxy/gateway - Memory graph for tracking what was known when - Works with OpenAI, Anthropic, Ollama, Groq, OpenRouter, Together pip install AgentTinman tinman init && tinman tui GitHub: https://ift.tt/Eg4PCL2 Docs: https://oliveskin.github.io/Agent-Tinman/ OpenClaw adapter: https://ift.tt/BMG42ps Apache 2.0. No telemetry, no paid tier. Feedback and contributions welcome. https://ift.tt/Eg4PCL2 February 1, 2026 at 12:17AM

Show HN: An extensible pub/sub messaging server for edge applications https://ift.tt/L5AHN7q

Show HN: An extensible pub/sub messaging server for edge applications hi there! i’ve been working on a project called Narwhal, and I wanted to share it with the community to get some valuable feedback. what is it? Narwhal is a lightweight Pub/Sub server and protocol designed specifically for edge applications. while there are great tools out there like NATS or MQTT, i wanted to build something that prioritizes customization and extensibility. my goal was to create a system where developers can easily adapt the routing logic or message handling pipeline to fit specific edge use cases, without fighting the server's defaults. why Rust? i chose Rust because i needed a low memory footprint to run efficiently on edge devices (like Raspberry Pis or small gateways), and also because I have a personal vendetta against Garbage Collection pauses. :) current status: it is currently in Alpha. it works for basic pub/sub patterns, but I’d like to start working on persistence support soon (so messages survive restarts or network partitions). i’d love for you to take a look at the code! i’m particularly interested in all kind of feedback regarding any improvements i may have overlooked. https://ift.tt/XdNptWO January 28, 2026 at 07:29PM

Saturday, January 31, 2026

Show HN: Daily Cat https://ift.tt/J67Ougf

Show HN: Daily Cat Seeing HTTP Cats on the home page remind me to share a small project I made a couple months ago. It displays a different cat photo from Unsplash every day and will send you notifications if you opt-in. https://daily.cat/ January 31, 2026 at 03:40AM

Show HN: A Local OS for LLMs. MIT License. Zero Hallucinations. Infinite Memory https://ift.tt/S3rgODe

Show HN: A Local OS for LLMs. MIT License. Zero Hallucinations. Infinite Memory The problem with LLMs isn't intelligence; it's amnesia and dishonesty. Hey HN, I’ve spent the last few months building Remember-Me, an open-source "Sovereign Brain" stack designed to run entirely offline on consumer hardware. The core thesis is simple: Don't rent your cognition. Most RAG (Retrieval Augmented Generation) implementations are just "grep for embeddings." They are messy, imprecise, and prone to hallucination. I wanted to solve the "Context integrity" problem at the architectural layer. The Tech Stack (How it works): QDMA (Quantum Dream Memory Architecture): instead of a flat vector DB, it uses a hierarchical projection engine. It separates "Hot" (Recall) from "Cold" (Storage) memory, allowing for effectively infinite context window management via compression. CSNP (Context Switching Neural Protocol) - The Hallucination Killer: This is the most important part. Every memory fragment is hashed into a Merkle Chain. When the LLM retrieves context, the system cryptographically verifies the retrieval against the immutable ledger. If the hash doesn't match the chain: The retrieval is rejected. Result: The AI visually cannot "make things up" about your past because it is mathematically constrained to the ledger. Local Inference: Built on top of llama.cpp server. It runs Llama-3 (or any GGUF) locally. No API keys. No data leaving your machine. Features: Zero-Dependency: Runs on Windows/Linux with just Python and a GPU (or CPU). Visual Interface: Includes a Streamlit-based "Cognitive Interface" to visualize memory states. Open Source: MIT License. This is an attempt to give "Agency" back to the user. I believe that if we want AGI, it needs to be owned by us, not rented via an API. Repository: https://ift.tt/DtC2lYL I’d love to hear your feedback on the Merkle-verification approach. Does constraining the context window effectively solve the "trust" issue for you? It's fully working - Fully tested. If you tried to Git Clone before without luck - As this is not my first Show HN on this - Feel free to try again. To everyone who HATES AI slop; Greedy corporations and having their private data stuck on cloud servers. You're welcome. Cheers, Mohamad https://ift.tt/DtC2lYL January 31, 2026 at 01:44AM

Show HN: We added memory to Claude Code. It's powerful now https://ift.tt/yQmvM25

Show HN: We added memory to Claude Code. It's powerful now https://ift.tt/eDNoCGW January 30, 2026 at 10:53PM

Show HN: 3D-Agent – AI that edits Blender scenes through the Python API https://ift.tt/K8jQOZb

Show HN: 3D-Agent – AI that edits Blender scenes through the Python API https://ift.tt/qVL1uH2 May 14, 2026 at 08:17PM