Thursday, April 24, 2025

Show HN: Document agent example that can parse and chat over unstructured data https://ift.tt/ld40hJZ

Show HN: Document agent example that can parse and chat over unstructured data Hi all, Dapr maintainer here. We've added a new example that shows how you can build a conversational agent that can upload, parse and understand complex documents, while retaining long-term memory. The example also shows how the agent can upload the file to multiple storage providers. Would be great to get your feedback. https://ift.tt/VS1detq April 24, 2025 at 12:16AM

Improving the Green Trips You Take on Muni: 29 Sunset Riders Seeing a Smoother Ride

Improving the Green Trips You Take on Muni: 29 Sunset Riders Seeing a Smoother Ride
By Brian Haagsman

New stops on the 29 Sunset now have enough space for its many student riders to get on and off comfortably. During San Francisco Climate Week – and every day – riding Muni is one of the most sustainable ways to get around. In San Francisco, transportation remains the leading source of greenhouse gas emissions: 44% of city emissions. Public transit only accounts for 2% of city emissions (see SFMTA Climate Roadmap). That’s despite carrying half a million riders every day on Muni - and more on BART, ferries and other bus services in the city. We’re proud to play a significant role reducing...



Published April 23, 2025 at 05:30AM
https://ift.tt/Q6JzNbs

Show HN: Body Controlled 3D Dino Game https://ift.tt/zWeaf1X

Show HN: Body Controlled 3D Dino Game Hey HN, I am Niko. I've built this 3D Dino Game In browser using tech like three.js and MoveNet (tensorflow). Basically, it's a normal 3D dinosaur game with a twist: you need to actually perform actions irl to avoid obstacles. Duck to crouch, jump to jump, raise left hand - go left, raise right hand - go right. Game is using your phone/laptop camera to track your body movements and perform in-game actions. PS. Game is 100% client side and I don't record/track/use/save any of your data Hope you find it worth playing. (better play on PC) It's a 100% FREE browser game with no login! Please feel welcome to DM feedback or reply or anything! https://ift.tt/0hRqSv6 April 23, 2025 at 02:58PM

Wednesday, April 23, 2025

Show HN: LMM for LLMs – A mental model for building LLM apps https://ift.tt/wdLCV5A

Show HN: LMM for LLMs – A mental model for building LLM apps I've been building agentic apps for some large Fortune 500 companies (T-Mobile, Twilio, etc.) and developed a mental model that serves as a practical guide in building agentic apps: separate the high-level agent specific logic from low-level platform capabilities. I call it the L-MM: the Logical Mental Model for LLM applications. This mental model has not only been tremendously helpful in building agents but also helping customers think about the development process - so when I am done with a consulting engagement they can move faster across the stack and enable engineers and platform teams to work concurrently without interference, boosting productivity. So what is the high-level logic vs. the low-level platform work? High-Level Logic (Agent & Task Specific) Tools and Environment - These are specific integrations and capabilities that allow agents to interact with external systems or APIs to perform real-world tasks. Examples include: Booking a table via OpenTable API Scheduling calendar events via Google Calendar or Microsoft Outlook Retrieving and updating data from CRM platforms like Salesforce Utilizing payment gateways to complete transactions Role and Instructions - Clearly defining an agent's persona, responsibilities, and explicit instructions is essential for predictable and coherent behavior. This includes: The "personality" of the agent (e.g., professional assistant) Explicit boundaries around task completion ("done criteria") Behavioral guidelines for handling unexpected inputs or situations Low-Level Logic (Common Platform Capabilities) Routing - Efficiently coordinating tasks between multiple specialized agents, ensuring seamless hand-offs and effective delegation: Implementing intelligent load balancing and dynamic agent selection based on task context Supporting retries, failover strategies, and fallback mechanisms Guardrails - Centralized mechanisms to safeguard interactions and ensure reliability and safety: Filtering or moderating sensitive or harmful content Real-time compliance checks for industry-specific regulations (e.g., GDPR, HIPAA) Threshold-based alerts and automated corrective actions to prevent misuse Access to LLMs - Providing robust and centralized access to multiple LLMs ensures high availability and scalability: Implementing smart retry logic with exponential backoff Centralized rate limiting and quota management to optimize usage Handling diverse LLM backends transparently (OpenAI, Cohere, local open-source models, etc.) Observability - Comprehensive visibility into system performance and interactions using industry-standard practices: W3C Trace Context compatible distributed tracing for clear visibility across requests Detailed logging and metrics collection (latency, throughput, error rates, token usage) Easy integration with popular observability platforms like Grafana, Prometheus, Datadog, and OpenTelemetry Why This Matters By adopting this structured mental model, teams can achieve clear separation of concerns, improving collaboration, reducing complexity, and accelerating the development of scalable, reliable, and safe agentic applications. I'm actively working on addressing challenges in this domain. If you're navigating similar problems or have insights to share, let's discuss further - i'll leave some links about the stack too if folks want it. High-level framework - https://ift.tt/09XvwDL Low-level infrastructure - https://ift.tt/Cp0hcGI April 23, 2025 at 01:02AM

Show HN:[Opensource] AIgr.id–Polycentric Infrastructure for Open and Plural AI https://ift.tt/TacRNsb

Show HN:[Opensource] AIgr.id–Polycentric Infrastructure for Open and Plural AI Hey HN! I'm Kanishka Nithin, founder of AIGr.id ( https://www.aigr.id ). We’re building AIGr.id — a polycentric network of independent, modular AI that can coordinate, exchange data, and compose into higher-level intelligence — all within a decentralized and plural ecosystem. Rings collective intelligence? In simpler terms: We’re trying to make it possible for people to produce, remix, operate, distribute and consume AI systems the way we use the internet— openly, collaboratively, and without needing to centralize everything into one mega-model owned by one mega-entity. Just like internet of intelligence. Today’s AI landscape is: Centralized, resource-heavy systems demand vast funding, compute, and talent—excluding much of the world. Controlled by a few powerful actors prioritizing profit over public good. Participation is limited, deepening inequality in AI benefits. Fragmented and siloed, with no open protocols for AI coordination We believe it's time to reimagine AI as collective intelligence, as shared commons — poly-centric, collaborative, composable, inclusive, and guided by values beyond profit. What’s different about our approach is that we’re not trying to build “the one true model” — we’re trying to make it easier for people to build, remix, run, and govern their own AI systems, together. We want a world where AGI doesn’t have to be monolithic — where different models, agents, and collectives can evolve side by side, coordinate, and even argue if they need to. Plural, by design. At the core of AIGr.id is OpenOS.AI, a distributed AI operating system. It is a full stack AIOS that spans everything from low-level compute orchestration to higher-level cognition, coordination, governance and economic policy. Think of it as a programmable substrate for building and running decentralized AI systems — across any infrastructure, in any topology. Developers can use shared protocols, primitives, and templates to compose AI systems — models, agents, cognitive workflows — and plug them into running grids. These grids can be public, private, federated, or even permissionless. Each grid can maintain its own sovereignty (values, rules, trust mechanisms), but still remain interoperable with others. It's designed for a world where we expect many intelligences to coexist, rather than one model to rule them all. We’re in beta and will be kicking off more extensive scale testing during our upcoming testnet phase. If this scratches an itch for you, or just want to jam on open systems — we’d love your feedback. If you're interested in joining the testnet, you can join our discord @ https://ift.tt/AN62LoF — we’d be excited to have you involved early. Docs, GitHub, and the paper are all linked at https://www.aigr.id Curious what you think — critiques, weird use cases, edge cases, counterpoints — all welcome. Our own background is what pushed us into this problem. Before this, we were a 4-person crew running one of the largest real-time AI inference workloads in India. We were doing around 500K inferences/sec across 80–90 models simultaneously, supporting 35+ public-sector use cases — mostly video analytics. We were operating across federated and private infrastructure in real time, processing millions of frames per second. We didn’t rely on cloud providers or commercial frameworks. Our market was distorted by deprioritized infrastructure investment and choosing to grow within our earnings means the only way to survive was by being ruthlessly efficient: creating frameworks that automated end to end production, operation, distribution and maintenance life cycle of AI -- everything at scale reliably without or with minimal human intervention — so four of us could actually live our lives, too. So in a way, AIGr.id was born out of necessity. It's the system we wish we had — one that treats intelligence as something modular, networked, composable, orchestratable, shareable, and governable – in a collective way. https://www.aigr.id April 22, 2025 at 11:13PM

Show HN: I open-sourced my AI toy company that runs on ESP32 and OpenAI realtime https://ift.tt/HxXter1

Show HN: I open-sourced my AI toy company that runs on ESP32 and OpenAI realtime Hi HN! Last year the project I launched here got a lot of good feedback on creating speech to speech AI on the ESP32. Recently I revamped the whole stack, iterated on that feedback and made our project fully open-source—all of the client, hardware, firmware code. This Github repo turns an ESP32-S3 into a realtime AI speech companion using the OpenAI Realtime API, Arduino WebSockets, Deno Edge Functions, and a full-stack web interface. You can talk to your own custom AI character, and it responds instantly. I couldn't find a resource that helped set up a reliable, secure websocket (WSS) AI speech to speech service. While there are several useful Text-To-Speech (TTS) and Speech-To-Text (STT) repos out there, I believe none gets Speech-To-Speech right. OpenAI launched an embedded-repo late last year which sets up WebRTC with ESP-IDF. However, it's not beginner friendly and doesn't have a server side component for business logic. This repo is an attempt at solving the above pains and creating a great speech to speech experience on Arduino with Secure Websockets using Edge Servers (with Deno/Supabase Edge Functions) for fast global connectivity and low latency. https://ift.tt/PV6Gkid April 22, 2025 at 07:40PM

Tuesday, April 22, 2025

Show HN: Anti-Cluely – Detect virtual devices and cheating tools on exam systems https://ift.tt/onuTQWR

Show HN: Anti-Cluely – Detect virtual devices and cheating tools on exam systems Anti-Cluely is a lightweight tool designed to detect common...