Wednesday, January 28, 2026

Show HN: Lightbox – Flight recorder for AI agents (record, replay, verify) https://ift.tt/IfJmLPB

Show HN: Lightbox – Flight recorder for AI agents (record, replay, verify) I built Lightbox because I kept running into the same problem: an agent would fail in production, and I had no way to know what actually happened. Logs were scattered, the LLM’s “I called the tool” wasn’t trustworthy, and re-running wasn’t deterministic. This week, tons of Clawdbot incidents have driven the point home. Agents with full system access can expose API keys and chat histories. Prompt injection is now a major security concern. When agents can touch your filesystem, execute code, and browse the web…you probably need a tamper-proof record of exactly what actions it took, especially when a malicious prompt or compromised webpage could hijack the agent mid-session. Lightbox is a small Python library that records every tool call an agent makes (inputs, outputs, timing) into an append-only log with cryptographic hashes. You can replay runs with mocked responses, diff executions across versions, and verify the integrity of logs after the fact. Think airplane black box, but for your hackbox. *What it does:* - Records tool calls locally (no cloud, your infra) - Tamper-evident logs (hash chain, verifiable) - Replay failures exactly with recorded responses - CLI to inspect, replay, diff, and verify sessions - Framework-agnostic (works with LangChain, Claude, OpenAI, etc.) *What it doesn’t do:* - Doesn’t replay the LLM itself (just tool calls) - Not a dashboard or analytics platform - Not trying to replace LangSmith/Langfuse (different problem) *Use cases I care about:* - Security forensics: agent behaved strangely, was it prompt injection? Check the trace. - Compliance: “prove what your agent did last Tuesday” - Debugging: reproduce a failure without re-running expensive API calls - Regression testing: diff tool call patterns across agent versions As agents get more capable and more autonomous (Clawdbot/Molt, Claude computer use, Manus, Devin), I think we’ll need black boxes the same way aviation does. This is my attempt at that primitive. It’s early (v0.1), intentionally minimal, MIT licensed. Site: < https://uselightbox.app > install: `pip install lightbox-rec` GitHub: < https://github.com/mainnebula/Lightbox-Project > Would love feedback, especially from anyone thinking about agent security or running autonomous agents in production. https://ift.tt/X8eAOgE January 27, 2026 at 10:53PM

Show HN: LemonSlice – Upgrade your voice agents to real-time video https://ift.tt/FVHekoZ

Show HN: LemonSlice – Upgrade your voice agents to real-time video Hey HN, we're the co-founders of LemonSlice ( https://lemonslice.com ). We train interactive avatar video models. Our API lets you upload a photo and immediately jump into a FaceTime-style call with that character. Here's a demo: https://ift.tt/IwUk6Qg Chatbots are everywhere. Voice AI has recently taken off. But we believe video avatars will be the most common form factor for conversational AI. Most people would rather watch something than read it. The problem is that generating video in real-time is hard, and overcoming the uncanny valley is even harder. We haven’t broken the uncanny valley yet. Nobody has. But we’re getting close and our photorealistic avatars are currently best-in-class (judge for yourself: https://ift.tt/DQKBEs4 ). Plus, we're the only avatar model that can do animals and heavily stylized cartoons. Try it: https://ift.tt/GaA8Cyw . Warning! Talking to this little guy may improve your mood. Today we're releasing our new model* - Lemon Slice 2, a 20B-parameter diffusion transformer that generates infinite-length video at 20fps on a single GPU - and opening up our API. How did we get a video diffusion model to run in real-time? There was no single trick, just a lot of them stacked together. The first big change was making our model causal. Standard video diffusion models are bidirectional (they look at frames both before and after the current one), which means you can't stream. From there it was about fitting everything on one GPU. We switched from full to sliding window attention, which killed our memory bottleneck. We distilled from 40 denoising steps down to just a few - quality degraded less than we feared, especially after using GAN-based distillation (though tuning that adversarial loss to avoid mode collapse was its own adventure). And the rest was inference work: modifying RoPE from complex to real (this one was cool!), precision tuning, fusing kernels, a special rolling KV cache, lots of other caching, and more. We kept shaving off milliseconds wherever we could and eventually got to real-time. We set up a guest playground for HN so you can create and talk to characters without logging in: https://ift.tt/KxWo8ZD . For those who want to build with our API (we have a new LiveKit integration that we’re pumped about!), grab a coupon code in the HN playground for your first Pro month free ($100 value). See the docs: https://ift.tt/z37P5uY . Pricing is usage-based at $0.12-0.20/min for video generation. Looking forward to your feedback! And we’d love to see any cool characters you make - please share their links in the comments *We did a Show HN last year for our V1 model: https://ift.tt/FTIxjWf . It was technically impressive but so bad compared to what we have today. January 27, 2026 at 11:25PM

Tuesday, January 27, 2026

Show HN: Ourguide – OS wide task guidance system that shows you where to click https://ift.tt/eyaIfx0

Show HN: Ourguide – OS wide task guidance system that shows you where to click Hey! I'm eshaan and I'm building Ourguide -an on-screen task guidance system that can show you where to click step-by-step when you need help. I started building this because whenever I didn’t know how to do something on my computer, I found myself constantly tabbing between chatbots and the app, pasting screenshots, and asking “what do I do next?” Ourguide solves this with two modes. In Guide mode, the app overlays your screen and highlights the specific element to click next, eliminating the need to leave your current window. There is also Ask mode, which is a vision-integrated chat that captures your screen context—which you can toggle on and off anytime -so you can ask, "How do I fix this error?" without having to explain what "this" is. It’s an Electron app that works OS-wide, is vision-based, and isn't restricted to the browser. Figuring out how to show the user where to click was the hardest part of the process. I originally trained a computer vision model with 2300 screenshots to identify and segment all UI elements on a screen and used a VLM to find the correct icon to highlight. While this worked extremely well—better than SOTA grounding models like UI Tars—the latency was just too high. I'll be making that CV+VLM pipeline OSS soon, but for now, I’ve resorted to a simpler implementation that achieves <1s latency. You may ask: if I can show you where to click, why can't I just click too? While trying to build computer-use agents during my job in Palo Alto, I hit the core limitation of today’s computer-use models where benchmarks hover in the mid-50% range (OSWorld). VLMs often know what to do but not what it looks like; without reliable visual grounding, agents misclick and stall. So, I built computer use—without the "use." It provides the visual grounding of an agent but keeps the human in the loop for the actual execution to prevent misclicks. I personally use it for the AWS Console's "treasure hunt" UI, like creating a public S3 bucket with specific CORS rules. It’s also been surprisingly helpful for non-technical tasks, like navigating obscure settings in Gradescope or Spotify. Ourguide really works for any task when you’re stuck or don't know what to do. You can download and test Ourguide here: https://ourguide.ai/downloads The project is still very early, and I’d love your feedback on where it fails, where you think it worked well, and which specific niches you think Ourguide would be most helpful for. https://ourguide.ai January 26, 2026 at 11:49PM

Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus https://ift.tt/XJAoaQz

Show HN: TetrisBench – Gemini Flash reaches 66% win rate on Tetris against Opus https://ift.tt/GfqWQwd January 27, 2026 at 12:12AM

Show HN: Postgres and ClickHouse as a unified data stack https://ift.tt/gRPHK5V

Show HN: Postgres and ClickHouse as a unified data stack Hello HN, this is Sai and Kaushik from ClickHouse. Today we are launching a Postgres managed service that is natively integrated with ClickHouse. It is built together with Ubicloud (YC W24). TL;DR: NVMe-backed Postgres + built-in CDC into ClickHouse + pg_clickhouse so you can keep your app Postgres-first while running analytics in ClickHouse. Try it (private preview): https://ift.tt/IWcagbC Blog w/ live demo: https://ift.tt/oziR7jV Problem Across many fast-growing companies using Postgres, performance and scalability commonly emerge as challenges as they grow. This is for both transactional and analytical workloads. On the OLTP side, common issues include slower ingestion (especially updates, upserts), slower vacuums, long-running transactions incurring WAL spikes, among others. In most cases, these problems stem from limited disk IOPS and suboptimal disk latency. Without the need to provision or cap IOPS, Postgres could do far more than it does today. On the analytics side, many limitations stem from the fact that Postgres was designed primarily for OLTP and lacks several features that analytical databases have developed over time, for example vectorized execution, support for a wide variety of ingest formats, etc. We’re increasingly seeing a common pattern where many companies like GitLab, Ramp, Cloudflare etc. complement Postgres with ClickHouse to offload analytics. This architecture enables teams to adopt two purpose-built open-source databases. That said, if you’re running a Postgres based application, adopting ClickHouse isn’t straightforward. You typically end up building a CDC pipeline, handling backfills, and dealing with schema changes and updating your application code to be aware of a second database for analytics. Solution On the OLTP side, we believe that NVMe-based Postgres is the right fit and can drastically improve performance. NVMe storage is physically colocated with compute, enabling significantly lower disk latency and higher IOPS than network-attached storage, which requires a network round trip for disk access. This benefits disk-throttled workloads and can significantly (up to 10x) speed up operations incl. updates, upserts, vacuums, checkpointing, etc. We are working on a detailed blog examining how WAL fsyncs, buffer reads, and checkpoints dominate on slow I/O and are significantly reduced on NVMe. Stay tuned! On the OLAP side, the Postgres service includes native CDC to ClickHouse and unified query capabilities through pg_clickhouse. Today, CDC is powered by ClickPipes/PeerDB under the hood, which is based on logical replication. We are working to make this faster and easier by supporting logical replication v2 for streaming in-progress transactions, a new logical decoding plugin to address existing limitations of logical replication, working toward sub-second replication, and more. Every Postgres comes packaged with the pg_clickhouse extension, which reduces the effort required to add ClickHouse-powered analytics to a Postgres application. It allows you to query ClickHouse directly from Postgres, enabling Postgres for both transactions and analytics. pg_clickhouse supports comprehensive query pushdown for analytics, and we plan to continuously expand this further ( https://ift.tt/9M4jfdE ). Vision To sum it up - Our vision is to provide a unified data stack that combines Postgres for transactions with ClickHouse for analytics, giving you best-in-class performance and scalability on an open-source foundation. Get Started We are actively working with users to onboard them to the Postgres service. Since this is a private preview, it is currently free of cost.If you’re interested, please sign up here. https://ift.tt/IWcagbC We’d love to hear your feedback on our thesis and anything else that comes to mind, it would be super helpful to us as we build this out! January 22, 2026 at 11:51PM

Monday, January 26, 2026

Show HN: I Created a Tool to Convert YouTube Videos into 2000 Word SEO Blog https://ift.tt/gTUvI4E

Show HN: I Created a Tool to Convert YouTube Videos into 2000 Word SEO Blog https://landkit.pro/youtube-to-blog January 25, 2026 at 11:16PM

Show HN: CertRadar – Find every certificate ever issued for your domain https://ift.tt/7Oz5jvw

Show HN: CertRadar – Find every certificate ever issued for your domain https://certradar.net/ January 25, 2026 at 11:21PM

Show HN: The independent guide to agent orchestrators https://ift.tt/a6OnejT

Show HN: The independent guide to agent orchestrators Hey HN! I built AgentMGMT.dev today to keep track of all those agent orchestration too...