Wednesday, May 13, 2026

Show HN: Statewright – Visual state machines that make AI agents reliable https://ift.tt/dxfnmvp

Show HN: Statewright – Visual state machines that make AI agents reliable Agentic problem solving in its current state is very brittle. I fell in love with it, but it creates as many problems as it solves. I'm Ben Cochran, I spent 20+ years in the trenches with full-stack Engineering, DevOps, high performance computing & ML with stints at NVIDIA, AMD and various other organizations most recently as a Distinguished Engineer. For agents to work reliably you either need massive parameter counts or massive context windows to keep the solution spaces workable. Most people are brute forcing reliability with bigger models and longer prompts. What if I made the problem smaller instead of making the model bigger? I took a different approach by using smaller models: models in the 13-20B parameter range and set them to task solving real SWE-bench problems. I constrained the tool and solution spaces using formal state machines. Each state in the machine defines which tools the model can access, how many iterations it gets and what transitions are valid. A planning state gets read-only tools. An implementation state gets edit tools (scoped to prevent mega edits) and write friendly bash tools. The testing state gets bash but only for testing commands. The model cannot physically skip steps or use the wrong tool at the wrong time. It is enforced via protocol, not via prompts. The results were more promising than I would have expected. Across multiple model families irrespective of age (qwen-coder, gpt-oss, gemma4) and the improvements were consistent above the 13B parameter inflection point. Below that, models can navigate the state machine but can't retain enough context to produce accurate edits. More on the research bit: https://ift.tt/4kc1Y2H Surprisingly this yielded improvements in frontier models as well. Haiku and Sonnet start to punch above their weight and Opus solves more reliably with fewer tokens and death spirals. Fine tuning did not yield these kinds of functional improvements for me. The takeaway it seems is that context window utilization matters more than raw context size - a tightly scoped working context at each step outperforms a model given carte blanche over everything. Constraining LLMs which are non-idempotent by using deterministic code is a pattern that nobody is currently talking about. So, I built Statewright. Its core is a Rust engine that evaluates state machine definitions: states, transitions, guards and tool restrictions. Its orchestration doesn't use an LLM, just enforces the state machine. On top of that is a plugin layer that integrates with Claude Code (and soon Codex, Cursor and others) via MCP. When you activate a workflow, hooks enforce the guardrails per state automatically. The model sees 5 tools available instead of dozens, gets clear instructions for the current phase and transitions when conditions are met. Importantly it tells the model when it's attempting to do something that isn't in scope, incorrect or when it needs to try something else after getting stuck. You can use your agent via MCP to build a state machine for you to solve a problem in your current context. The visual editor at statewright.ai lets you tweak these workflows in a graph view... You can clearly see the failure paths, the retry loops and the approval gates. State machines aren't DAGs; they loop and retry, which is what agentic work actually needs. Statewright is currently live with a free tier, try it out in Claude Code by running the following: /plugin marketplace add statewright/statewright /plugin install statewright /reload-plugins Then "start the bugfix workflow" or /statewright start bugfix. You'll need to paste your API key when prompted. The latest versions of Claude may complain -- paste the API key again and say you really mean it, Claude is just being cautious here. Feedback is welcome on the workflow editor, the plugin experience, and tell me what workflows you'd want to build first. Agents are suggestions, states are laws. https://ift.tt/NZf7wQm May 12, 2026 at 07:54PM

Tuesday, May 12, 2026

New Parking Payment Options: More Flexibility and Helpful Reminders

New Parking Payment Options: More Flexibility and Helpful Reminders
By Pamela Johnson

Learn how our new parking payment options offer more flexibility so you can get where you need to go. Paying for metered parking in San Francisco has never been easier or more flexible thanks to two new mobile payment options we’re offering. With HotSpot and ParkMobile, you can pay for metered parking directly from your phone. By offering more choices, we are making it more convenient than ever to pay for parking. With both HotSpot and ParkMobile, you can: Pay for parking at any SFMTA parking meter using your phone Get reminders before your session expires, helping you avoid citations Extend...



Published 2026-05-11T00:00:00Z
https://ift.tt/qH9K0wc

Show HN: Mimik – open-source local-first alternative to Scribe and Tango https://ift.tt/WNTRrpS

Show HN: Mimik – open-source local-first alternative to Scribe and Tango https://ift.tt/Z7lqKiy May 11, 2026 at 11:18PM

Show HN: SyncBank – Self-hosted bank sync for EU banks https://ift.tt/QBrmnDl

Show HN: SyncBank – Self-hosted bank sync for EU banks https://syncbank.app/ May 11, 2026 at 11:32PM

Monday, May 11, 2026

Show HN: adamsreview – better multi-agent PR reviews for Claude Code https://ift.tt/0MTlWQu

Show HN: adamsreview – better multi-agent PR reviews for Claude Code I built adamsreview, a Claude Code plugin that runs deeper, multi-stage PR reviews using parallel sub-agents, validation passes, persistent JSON state, and optional ensemble review via Codex CLI and PR bot comments. On my own PRs, it has been catching dramatically more real bugs than Claude’s built-in /review, /ultrareview, CodeRabbit, Greptile, and Codex’s built-in review, while producing fewer false positives. adamsreview is six Claude Code slash commands packaged as a plugin: review, codex-review, add, promote, walkthrough, and fix. I modeled it after the built-in /review command and extended it meaningfully. You can clear context between review stages because state is stored in JSON artifacts on disk, with built-in scripts for keeping it updated. The walkthrough command uses Claude’s AskUserQuestion feature to walk you through uncertain findings or items needing human review one by one. Then, the fix command dispatches per-fix-group agents and re-reviews the work with Opus, reverting any regressions before committing survivors. It runs against your regular Claude Code subscription (Max plan recommended), unlike /ultrareview, which charges against your Extra Usage pool. I would love feedback from Claude Code users, pro devs, and anyone with strong opinions about AI code reviews. Repo: https://ift.tt/Up1k3NZ Install: /plugin marketplace add adamjgmiller/adamsreview, /plugin install adamsreview@adamsreview https://ift.tt/Up1k3NZ May 11, 2026 at 07:36AM

Show HN: I trained a chess engine to play like humans https://ift.tt/i5eF9rL

Show HN: I trained a chess engine to play like humans I built 1e4.ai - a chess web app where you play against neural networks trained to mimic human Lichess players at specific Elo ranges. There's a separate model for each 100-point rating bucket from ~800 to 2200+, and the bots not only choose human-like moves but also burn clock time, play worse under time pressure, and blunder in human-like ways. Live demo: https://1e4.ai Code: https://ift.tt/sKw15A6 A few things that might be interesting: - Trained on almost a full year of Lichess blitz games, around 1B total games - Architecture is an a small (~9MM parameters) transformer-based network that takes the board, recent move history, the player's rating, and remaining clock time as input. Three separate models per rating bucket: move, clock-usage, and win probability. The clock model is what makes the bots feel humanish under time pressure rather than instant. Because the move model takes the clock as one input parameter, it also learns to blunder under time pressure like a human might. - Because the network is so tiny, no GPU is needed for inference - it runs easily on a local CPU - Downside of the tiny network is that it's a bit weak as you turn up the rating past around 1700. It can spot short tactics but not long multi-move combinations. - Initial training on a rented 8xH100 cluster, then fine-tunes on my local GPU for different rating ranges - Inspired by Maia-2 and DeepMind's "Grandmaster-Level Chess Without Search". On a held-out Lichess blitz benchmark, the it beats Maia-2 blitz on top-1 move prediction (56.7% vs 52.7%) and pretty substantially on win-probability calibration (Brier 0.176 vs 0.272). Numbers and code in https://ift.tt/Xh3oAMp... - The data pipeline is C++ via nanobind, then training with Pytorch. Getting this right was actually the thing I spent the most time on. Pre-shuffling the dataset and then being able to read the shuffled dataset sequentially at training time kept the GPU utilization high. Without this it spent a huge percentage of time on I/O while the GPU sat idle. Happy to answer questions about the rating-conditioning, the clock model, or the data pipeline. May 11, 2026 at 04:01AM

Show HN: Hustler Bingo – a tiny bingo game about startup Twitter clichés https://ift.tt/u8YUyEn

Show HN: Hustler Bingo – a tiny bingo game about startup Twitter clichés I built this after my brother started complaining that I got too much into brainrot culture. It's just for fun nothing serious, but was able to test vercel, tanstack start and convex without high stakes. Have fun! This is the game where lower score is goood for your mental health https://ift.tt/YXIwy7J May 11, 2026 at 02:06AM

Show HN: Mosaic – arrange iOS icons by color using an evolutionary algorithm https://ift.tt/DbGE483

Show HN: Mosaic – arrange iOS icons by color using an evolutionary algorithm It started out as a way for me to freshen up my C++ skills during COVID. But life got in the way and it was put on ice. Luckily, coding LLMs came to the rescue and allowed me to bring it to a point where I feel comfortable sharing it. https://ift.tt/6hS7J83 May 10, 2026 at 11:59PM

Sunday, May 10, 2026

Show HN: Free OSS transcription app I made and found it's faster than wispr flow https://ift.tt/jXQh9Tk

Show HN: Free OSS transcription app I made and found it's faster than wispr flow title doesn't let nuance, ofc it's not the app that's faster but the way you can use it with Groq inference for example. https://mumbli.app/ May 10, 2026 at 03:07AM

Show HN: Create flashcards with Space CLI https://ift.tt/TYvVr93

Show HN: Create flashcards with Space CLI Hey, I created seven years ago a flashcard app with a main focus on UX. In the last months I added offline-first mode and a CLI that allows Claude Code or Codex to create high quality flashcards for you. I use that to learn about pharma rules, technology, dancing, taxes and smart home. Never really did marketing, this not my specialty. Would love to know what you think https://ift.tt/5MtZKkF May 9, 2026 at 08:08PM

Saturday, May 9, 2026

Show HN: The independent guide to agent orchestrators https://ift.tt/a6OnejT

Show HN: The independent guide to agent orchestrators Hey HN! I built AgentMGMT.dev today to keep track of all those agent orchestration tools that keep popping up. I've tried a few and landed on Superset, which I'm extremely happy (and productive!) with - but I think this category of tools will be extremely important and interesting in the next couple years, so it's worth keeping an eye on all available tools and how they evolve. I will keep the site up-to-date, please help me by submitting new tools that are not yet in the list, or add any details that might help folks who are out shopping for their first/next agent orchestrator! https://agentmgmt.dev/ May 9, 2026 at 02:47AM

Show HN: GETadb.com – every GET request creates a DB https://ift.tt/n95C4oG

Show HN: GETadb.com – every GET request creates a DB Hey HN! We made GETadb.com, so it's easier to get agents to build you full stack apps. You don't need to give them any credentials. Just by loading a GET request, they get access to a database, a sync engine, and abstractions for auth, presence, and streams. To see what the agent sees, you can load https://getadb.com/new There's two fun things about how it's implemented: 1. If you curl the home page, it the agent content rather than human content. We do this by detecting the 'Sec-Fetch-Mode' header. It's not perfect, but gets the job done for Claude Code et al. 2. For an agent to spin up an app, they make _two_ fethes. (1) getadb.com/guide tells them to generate a uuid, and fetch (2) getadb.com/provision/. We did this, because just about half of the popular web-based app builders cache URLs globally, even if you return no-store headers. To get around this we just instruct the agent to generate unique URLs You may wonder: Why GET requests, rather than POST requests? It's because then you can build in surprising places. For example, we get meta.ai to build an app inside the artifact preview: https://ift.tt/jYRHziO Under the hood, this is possible because the whole infra is mult-tenant from ground up. We already announced how that works on HN, but if you're curious here's the essay for it: https://ift.tt/Gx57wXI https://www.getadb.com/ May 8, 2026 at 09:47PM

Show HN: We built a tool that generates 3D objects with editable, separate parts https://ift.tt/WCeIl2y

Show HN: We built a tool that generates 3D objects with editable, separate parts https://nova3d.xyz/ May 8, 2026 at 10:41PM

Friday, May 8, 2026

Show HN: Kstack – Skill pack for monitoring/troubleshooting K8s in Claude Code https://ift.tt/6Kuziwp

Show HN: Kstack – Skill pack for monitoring/troubleshooting K8s in Claude Code Hi All, Recently I've been using Claude Code a lot for debugging cluster issues and I realized I was performing similar tasks repeatedly so I decided to package them up into skills so I could call them up more easily (e.g. `/investigate`, `/audit-security`, `/audit-outdated`). I'm calling the skill pack "kstack" and the goal is to be able to monitor and troubleshoot K8s from within Claude Code. If you have time, I'd love to get some feedback on the project! Andres Source: https://ift.tt/FYeacq5 Docs: https://kstack.sh/ https://ift.tt/FYeacq5 May 7, 2026 at 10:54AM

Show HN: Bilig – a headless spreadsheet engine for Node services and agents https://ift.tt/s2NlRrA

Show HN: Bilig – a headless spreadsheet engine for Node services and agents https://ift.tt/8G63mki May 7, 2026 at 11:46PM

A Community-Powered Success: Bayview Shuttle Extended Through 2027

A Community-Powered Success: Bayview Shuttle Extended Through 2027
By Javaun Garcia

Riders enjoy a community tour of San Francisco’s African American Arts & Cultural District. The California Air Resources Board (CARB) has extended the Bayview Shuttle’s grant. The service is fully funded by this grant, and the extension runs through November 2027. This allows us to continue connecting people in Bayview-Hunter's Point to Muni, BART and other important resources. This helps them get around more easily, and it strengthens the public transportation network in a community that was historically disconnected from the rest of the city. The extension is a direct result of strong...



Published 2026-05-07T00:00:00Z
https://ift.tt/D9Iz8Mx

Thursday, May 7, 2026

Show HN: PHP-fts – Full-text search engine in pure PHP, no extensions https://ift.tt/K2Zsiqw

Show HN: PHP-fts – Full-text search engine in pure PHP, no extensions https://ift.tt/g586Tik May 7, 2026 at 01:58AM

Show HN: Mac Juice Monitor – Bluetooth battery levels in the macOS menu bar https://ift.tt/nMCQWqz

Show HN: Mac Juice Monitor – Bluetooth battery levels in the macOS menu bar https://ift.tt/yC91MGY May 7, 2026 at 12:58AM

Show HN: DoodleMate: Animate Your Child's Hand Drawings Without Generative AI https://ift.tt/6TLIcdG

Show HN: DoodleMate: Animate Your Child's Hand Drawings Without Generative AI Hi HN! I made an app that takes a photo of a paper drawing and, in a handful of seconds, creates a fully rigged character that can be used in an animation or little story. It doesn’t use any image-to-video generative AI models. Instead, I built it using the years of insights I’ve picked up studying children’s drawings and character animation. Today we’re releasing a community beta. I respect this community and would value any feedback you offer. It’s easy to try- you don’t need to create an account to check it out. We’ve got several free stories to drop your character into, and a Mother’s Day eCard. I’m also working on a tool, DoodleMate Studio, to easily allow people to author their own stories instead of using premade templates. But what form that takes is going to be highly dependent on the type of feedback we get from the community with this beta. How this came to be: I’ve worked in this space for a while. Here’s an old HN post related to a popular tech demo I did ( https://ift.tt/uxzL76y ) and another one from when I open sourced the data and code ( https://ift.tt/y0L8Udz ). I also wrote a SIGGRAPH paper about the methodology ( https://ift.tt/Ihcj6nW ). I’d moved on to other things, but had always felt like there was such potential in this space. Last year I decided I was over big tech and, with a lot of encouragement from my family, finally decided to pursue this seriously. Since then, my wife and I have been building this together. We’re bootstrapping at the moment, trying to give ourselves time and space to make sure DoodleMate turns into something wonderful and wholesome. Thanks, Jesse https://doodlemate.com/ May 6, 2026 at 10:36PM

Wednesday, May 6, 2026

Show HN: AI-DLC-UML (AI-Driven Development Life Cycle with UML Modeling) https://ift.tt/RHOPGSd

Show HN: AI-DLC-UML (AI-Driven Development Life Cycle with UML Modeling) AI-DLC-UML modifies AI-DLC to enable AI agents to drive the software development workflow with UML modeling. It is intended for those who want to use UML modeling collaboratively in their design practices, even in AI-driven software development. https://ift.tt/oA5pfRV May 5, 2026 at 11:48PM

Show HN: Statewright – Visual state machines that make AI agents reliable https://ift.tt/dxfnmvp

Show HN: Statewright – Visual state machines that make AI agents reliable Agentic problem solving in its current state is very brittle. I f...