Friday, February 6, 2026

Show HN: Playwright Best Practices AI SKill https://ift.tt/7tPOKIr

Show HN: Playwright Best Practices AI SKill Hey folks, today we at Currents are releasing a brand new AI skill to help AI agents be really smart when writing tests, debugging them, or anything Playwright-related really. This is a very comprehensive skill, covering everyday topics like fixing flakiness, authentication, or writing fixtures... to more niche topics like testing Electron apps, PWAs, iFrames and so forth. It should make your agent much better at writing, debugging and maintaining Playwright code. for whoever didn't learn about skills yet, it's a new powerful feature that allows you to make the AI agents in your editor/cli (Cursor, Claude, Antigravity, etc) experts in some domain and better at performing specific tasks. (See https://ift.tt/HShbzxf ) You can install it by running: npx skills add https://ift.tt/dPquZHw... The skill is open-source and available under MIT license at https://ift.tt/dPquZHw... -> check out the repo for full documentation and understanding of what it covers. We're eager to hear community feedback and improve it :) Thanks! https://ift.tt/0RL9Use February 6, 2026 at 12:31AM

Thursday, February 5, 2026

Show HN: Viberails – Easy AI Audit and Control https://ift.tt/xSVXN1P

Show HN: Viberails – Easy AI Audit and Control Hello HN. I'm Maxime, founder at LimaCharlie ( https://limacharlie.io ), a Hyperscaler for SecOps (access building blocks you need to build security operations, like AWS does for IT). We’ve engineered a new product on our platform that solves a timely issue acting as a guardrail between your AI and the world: Viberails ( https://ift.tt/nTS7ajO ) This won't be new to folks here, but we identified 4 challenges teams face right now with AI tools: 1. Auditing what the tools are doing. 2. Controlling toolcalls (and their impact on the world). 3. Centralized management. 4. Easy access to the above. To expand: Audit logs are the bread and butter for security, but this hasn't really caught up in AI tooling yet. Being able to look back and say "what actually happened" after the fact is extremely valuable during an incident and for compliance purposes. Tool calls are how LLMs interact with the world, we should be able to exercise basic controls over them like: don't read credential files, don't send emails out, don't create SSH keys etc. Being able to not only see those calls but also block them is key for preventing incidents. As soon as you move beyond a single contributor on one box, the issue becomes: how do I scale processes by creating an authoritative config for the team. Having one spot with all the audit, detection and control policies becomes critical. It's the same story as snowflake-servers. Finally, there's plenty of companies that make products that partially address this, but they fall in one of two buckets: - They don't handle the "centralized" point above, meaning they just send to syslog and leave all the messy infra bits to you. - They are locked behind "book a demo", sales teams, contracts and all the wasted energy that goes with that. We made Viberails address these problems. Here's what it is: - OpenSource client, written in Rust - Curl-to-bash install, share a URL with your team to join your Team, done. Linux, MacOS and Windows support. - Detects local AI tools, you choose which ones you want to install. We install hooks for each relevant platform. The hooks use the CLI tool. We support all the major tools (including OpenClaw). - The CLI tool sends webhooks into your Team (tenant, called Organization in LC) in LimaCharlie. The tool-related hooks are blocking to allow for control. - Blocking webhooks have around 50ms RTT. - Your tenant in LC records the interaction for audit. - We create an initial set of detection rules for you as examples. They do not block by default. You can create your own rules, no opaque black boxes. - You can view the audit, the alerts, etc. in the cloud. - You can setup outputs to send audits, blocking events and detections to all kinds of other platforms of your choosing. Easy mode of this is coming, right now this is done in the main LC UI and not the simplified Viberails view. - The detection/blocking rules support all kinds of operators and logic, lots of customizability. - All data is retained for 1 year unless you delete the tenant. Datacenters in USA, Canada, Europe, UK, Australia and India. - Only limit to community edition for this is a global throughput of 10kbps for ingestion. Try it: https://viberails.io Repo: https://ift.tt/VSOUIoB Essentially, we wanted to make a super-simplified solution for all kinds of devs and teams so that they can get access to the basics of securing their AI tools. Thanks for reading - we’re really excited to share this with the community! Let us know if you have any questions for feedback in the comments. https://ift.tt/bpnGDKX February 5, 2026 at 12:46AM

Show HN: EpsteIn – Search the Epstein files for your LinkedIn connections https://ift.tt/hvjHDOo

Show HN: EpsteIn – Search the Epstein files for your LinkedIn connections https://ift.tt/oQe2PJS February 5, 2026 at 12:54AM

Show HN: GitHub Browser Plugin for AI Contribution Blame in Pull Requests https://ift.tt/AHrbup5

Show HN: GitHub Browser Plugin for AI Contribution Blame in Pull Requests https://ift.tt/dzurGsS February 3, 2026 at 08:05PM

Wednesday, February 4, 2026

Show HN: Nomad Tracker – a local-first iOS app to track visas and tax residency https://ift.tt/yi6kD9b

Show HN: Nomad Tracker – a local-first iOS app to track visas and tax residency Hi HN, I’m full stack developer (formerly iOS) and I just launched Nomad Tracker, a native iOS app to help digital nomads track physical presence across countries for visa limits and tax residency. Key idea: everything runs on-device. No accounts, no cloud sync, no analytics. Features: - Calendar-based day tracking per country. - Schengen 90/180 and other visa “runways”. - Fiscal residency day counts and alerts. - Optional background location logging (battery-efficient, never overwrites manual data). - Photo import using metadata only (no image access). - On-device “Fiscal Oracle” using Apple’s Foundational Models to ask questions about your own data. I created this because other apps felt limiting and didn’t do what I needed. This app is visual, user-focused, and designed to make tracking easy and clear. Happy to answer questions or discuss the technical tradeoffs. https://ift.tt/rgywEYe February 3, 2026 at 11:25PM

Show HN: I built "AI Wattpad" to eval LLMs on fiction https://ift.tt/6pmLSo2

Show HN: I built "AI Wattpad" to eval LLMs on fiction I've been a webfiction reader for years (too many hours on Royal Road), and I kept running into the same question: which LLMs actually write fiction that people want to keep reading? That's why I built Narrator ( https://ift.tt/0IocykP ) – a platform where LLMs generate serialized fiction and get ranked by real reader engagement. Turns out this is surprisingly hard to answer. Creative writing isn't a single capability – it's a pipeline: brainstorming → writing → memory. You need to generate interesting premises, execute them with good prose, and maintain consistency across a long narrative. Most benchmarks test these in isolation, but readers experience them as a whole. The current evaluation landscape is fragmented: Memory benchmarks like FictionLive's tests use MCQs to check if models remember plot details across long contexts. Useful, but memory is necessary for good fiction, not sufficient. A model can ace recall and still write boring stories. Author-side usage data from tools like Novelcrafter shows which models writers prefer as copilots. But that measures what's useful for human-AI collaboration, not what produces engaging standalone output. Authors and readers have different needs. LLM-as-a-judge is the most common approach for prose quality, but it's notoriously unreliable for creative work. Models have systematic biases (favoring verbose prose, certain structures), and "good writing" is genuinely subjective in ways that "correct code" isn't. What's missing is a reader-side quantitative benchmark – something that measures whether real humans actually enjoy reading what these models produce. That's the gap Narrator fills: views, time spent reading, ratings, bookmarks, comments, return visits. Think of it as an "AI Wattpad" where the models are the authors. I shared an early DSPy-based version here 5 months ago ( https://ift.tt/Z8rYaBN ). The big lesson: one-shot generation doesn't work for long-form fiction. Models lose plot threads, forget characters, and quality degrades across chapters. The rewrite: from one-shot to a persistent agent loop The current version runs each model through a writing harness that maintains state across chapters. Before generating, the agent reviews structured context: character sheets, plot outlines, unresolved threads, world-building notes. After generating, it updates these artifacts for the next chapter. Essentially each model gets a "writer's notebook" that persists across the whole story. This made a measurable difference – models that struggled with consistency in the one-shot version improved significantly with access to their own notes. Granular filtering instead of a single score: We classify stories upfront by language, genre, tags, and content rating. Instead of one "creative writing" leaderboard, we can drill into specifics: which model writes the best Spanish Comedy? Which handles LitRPG stories with Male Leads the best? Which does well with romance versus horror? The answers aren't always what you'd expect from general benchmarks. Some models that rank mid-tier overall dominate specific niches. A few features I'm proud of: Story forking lets readers branch stories CYOA-style – if you don't like where the plot went, fork it and see how the same model handles the divergence. Creates natural A/B comparisons. Visual LitRPG was a personal itch to scratch. Instead of walls of [STR: 15 → 16] text, stats and skill trees render as actual UI elements. Example: https://ift.tt/MzGxenb What I'm looking for: More readers to build out the engagement data. Also curious if anyone else working on long-form LLM generation has found better patterns for maintaining consistency across chapters – the agent harness approach works but I'm sure there are improvements. https://ift.tt/0IocykP February 3, 2026 at 10:38PM

Tuesday, February 3, 2026

Show HN: Adboost – A browser extension that adds ads to every webpage https://ift.tt/jJBogqO

Show HN: Adboost – A browser extension that adds ads to every webpage https://ift.tt/M6yoCsR February 2, 2026 at 06:41PM

Show HN: Playwright Best Practices AI SKill https://ift.tt/7tPOKIr

Show HN: Playwright Best Practices AI SKill Hey folks, today we at Currents are releasing a brand new AI skill to help AI agents be really s...