Saturday, November 2, 2024

Show HN: Autotab Instruct – Claude Computer Use with Guardrails for Reliability https://ift.tt/x18f9nr

Show HN: Autotab Instruct – Claude Computer Use with Guardrails for Reliability Hi HN, We’ve built a desktop app to create highly reliable AI agents that use a computer with mouse and keyboard. Until last week, we had tried many different approaches to open-ended agentic features but none of them had met our reliability bar. With Anthropic’s Computer Use this finally changed, and we just shipped a feature we’re calling Instruct. Instruct allows users to create agentic blocks as part of a larger Autotab skill that provides the structured logical flow to keep the automation on track. If you haven’t had a chance to try Computer Use yet, it is an impressive leap from the last generation of vision models (e.g. gpt4o struggles with relative positions, let alone coordinates). At the same time, it is still not good enough to be given a prompt and let loose. One of the big surprises to us early on was just how much intent specification is required for most real world workflows to run reliably. What looks at first like a simple form filling task usually turns out to have dozens of edge cases and super specific, hidden rules. Even human employees need to be shown how to perform these tasks, and then refined with question-asking + feedback over time. We wanted to build a tool for specifying intent, and iterating with the model to make it reliable enough for real work. - Automations run on top of an action scaffold, which works kind of like a very fuzzy programming language with strict types. This gives the model a high level plan that guides execution, and makes it easy to break out discrete steps to get the reliability you need. (Interestingly, this has also proven useful not just for the agent, but also for the human trying to create, verify and edit the automation.) - When the model is unsure it asks for clarification. For example, if you are in editing mode and the model thinks that an element looks meaningfully different than before, it will ask you to verify that it is the same element. - The agent has access to a memory system that lets it recall information from past runs as well as instructions and feedback from the user. Here's a short video of Autotab Instruct in action: https://ift.tt/hImULow?... . There are a few more demos at https://twitter.com/autotabai/status/1852393973165199425 a75f06f82cab521bc78672ed35d85e8a. We’d love to hear what you think! November 1, 2024 at 10:26PM

No comments:

Post a Comment

Show HN: Shadcn/UI theme editor – Design and share Shadcn themes https://ift.tt/q4YZ3uV

Show HN: Shadcn/UI theme editor – Design and share Shadcn themes Hey, I built https://ift.tt/yZxliP5 - a web app for creating and sharing th...