Wednesday, November 20, 2024

Show HN: archgw: open-source, intelligent proxy for AI agents, built on Envoy https://ift.tt/U0sTV67

Show HN: archgw: open-source, intelligent proxy for AI agents, built on Envoy Hi HN! This is Adil, Salman, Co and Shuguang and we're excited to introduce archgw [1], an open source intelligent proxy for agents built on Envoy [2]. Arch moves the critical but crufty work around safety, observability, and routing of prompts outside business logic. Arch is a uniquely intelligent infrastructure primitive, engineered with purpose-built fast LLMs [3] for tasks like intent detection over multi-turn, parameter identification and extraction, triggering single/multiple function calls, and offers convenience features to auto dispatch LLM calls for summarization based on data from your APIs via system prompts configured in archgw. Today, the approach to build a smart production-ready agent is weaving together a large set of mono-functional opinionated libraries, adding extra layers like LLM-based preprocessing to determine things like relevance and safety of the user's prompt (e.g. applying governance and guardrails). Once past that stage, developers must extract relevant information from the user prompt to determine intent, extract parameters as necessary, package relevant tools calls to an LLM to trigger a backend API to execute particular domain-specific task. etc. After all that is done then only are developers ready to trigger an LLM call for summarization and must manage upstream error handling and retry logic themselves. Not to mention, if they want to experiment with multiple LLMs or move between LLM versions, they have to write crufty undifferentiated code. This entire experience is slow, error prone, cumbersome, and not specifically unique. Prior to building archgw, the team spent time building Envoy [2] at Lyft, API Gateway at AWS, specialized search and intent models at Microsoft Research and worked on safety at Meta. archgw was born out of the belief that several rules based mono-functional tools should be converged into a multi-functional infrastructure primitive designed for prompts and agents. We built archgw on the highly popular, battle-tested open source proxy Envoy and re-imagined it for prompts and agents. For this we had to build blazing fast LLMs [3] that can handle crufty, ahead-in-the-request-path type of work in handling and processing prompts that are sent to an agent, so that developers can focus on what matters most: building fast personalized agents without the unnecessary prompt engineering and systems integration work needed to get there. Here are some additional details about the open source project. arghw is written in rust, and the request path has three main parts: * Listener subsystem which handles downstream (ingress) and upstream (egress) request processing. * Prompt handler subsystem. This is where archgw makes decisions on the safety of the incoming request via its prompt_guard primitive and identifies where to forward the conversation to via its prompt_target primitive. * Model serving subsystem is the interface that hosts all the lightweight LLMs engineered in archgw and offers a framework for things like hallucination detection of our these models We loved building this open source project, and our belief is that this infra primitive would help developers build faster, safer and more personalized agents without all the manual prompt engineering and systems integration work needed to get there. We hope to invite other developers to use and improve Arch. Please give it a shot and leave feedback here, or at our discord channel [4] Also here is a quick demo of the project in action [5]. You can check out our public docs here at [6]. Our models are also available here [7]. [1] https://ift.tt/JqozSv9 [2] https://ift.tt/Wo0VJPR [3] https://ift.tt/4TInQR8... [4] https://ift.tt/SZFC9Xx... [5] https://www.youtube.com/watch?v=I4Lbhr-NNXk [6] https://ift.tt/DjRJ1T2 [7] https://ift.tt/yKTrX8V https://ift.tt/JqozSv9 November 20, 2024 at 12:56AM

Tuesday, November 19, 2024

Show HN: Venmo Unofficial API https://ift.tt/vzVNaqc

Show HN: Venmo Unofficial API https://ift.tt/bwDZlFJ November 19, 2024 at 04:34AM

Valencia Side-Running Bikeway Goes to SFMTA Board for Final Vote

Valencia Side-Running Bikeway Goes to SFMTA Board for Final Vote
By

Coming up with the final design required in-depth community outreach and conversations, like this feedback exercise conducted at a Valencia Street open house. On Nov. 19, we will present a final design for a side-running Valencia Street bikeway to the SFMTA Board of Directors. This follows ten months of sustained community outreach and design work. If approved, construction will begin in January 2025. This project aims to create a safer and more enjoyable experience for everyone traveling along the vibrant Valencia Street corridor between 15th and 23rd streets. You can share feedback either...



Published November 18, 2024 at 05:30AM
https://ift.tt/jW3itbm

Show HN: CSV Table – Proper GUI for View and Edit CSV, JSON https://ift.tt/U2csIXR

Show HN: CSV Table – Proper GUI for View and Edit CSV, JSON https://csvtable.com November 18, 2024 at 10:04PM

Show HN: FastGraphRAG – Better RAG using good old PageRank https://ift.tt/0MTG5Uk

Show HN: FastGraphRAG – Better RAG using good old PageRank Hey there HN! We’re Antonio, Luca, and Yuhang, and we’re excited to introduce Fast GraphRAG, an open-source RAG approach that leverages knowledge graphs and the 25 years old PageRank for better information retrieval and reasoning. Building a good RAG pipeline these days takes a lot of manual optimizations. Most engineers intuitively start from naive RAG: throw everything in a vector database and hope that semantic search is powerful enough. This can work for use cases where accuracy isn’t too important and hallucinations are tolerable, but it doesn’t work for more difficult queries that involve multi-hop reasoning or more advanced domain understanding. Also, it’s impossible to debug it. To address these limitations, many engineers find themselves adding extra layers like agent-based preprocessing, custom embeddings, reranking mechanisms, and hybrid search strategies. Much like the early days of machine learning when we manually crafted feature vectors to squeeze out marginal gains, building an effective RAG system often becomes an exercise in crafting engineering “hacks.” Earlier this year, Microsoft seeded the idea of using Knowledge Graphs for RAG and published GraphRAG - i.e. RAG with Knowledge Graphs. We believe that there is an incredible potential in this idea, but existing implementations are naive in the way they create and explore the graph. That’s why we developed Fast GraphRAG with a new algorithmic approach using good old PageRank. There are two main challenges when building a reliable RAG system: (1) Data Noise: Real-world data is often messy. Customer support tickets, chat logs, and other conversational data can include a lot of irrelevant information. If you push noisy data into a vector database, you’re likely to get noisy results. (2) Domain Specialization: For complex use cases, a RAG system must understand the domain-specific context. This requires creating representations that capture not just the words but the deeper relationships and structures within the data. Our solution builds on these insights by incorporating knowledge graphs into the RAG pipeline. Knowledge graphs store entities and their relationships, and can help structure data in a way that enables more accurate and context-aware information retrieval. 12 years ago Google announced the knowledge graph we all know about [1]. It was a pioneering move. Now we have LLMs, meaning that people can finally do RAG on their own data with tools that can be as powerful as Google’s original idea. Before we built this, Antonio was at Amazon, while Luca and Yuhang were finishing their PhDs at Oxford. We had been thinking about this problem for years and we always loved the parallel between pagerank and the human memory [2]. We believe that searching for memories is incredibly similar to searching the web. Here’s how it works: - Entity and Relationship Extraction: Fast GraphRAG uses LLMs to extract entities and their relationships from your data and stores them in a graph format [3]. - Query Processing: When you make a query, Fast GraphRAG starts by finding the most relevant entities using vector search, then runs a personalized PageRank algorithm to determine the most important “memories” or pieces of information related to the query [4]. - Incremental Updates: Unlike other graph-based RAG systems, Fast GraphRAG natively supports incremental data insertions. This means you can continuously add new data without reprocessing the entire graph. - Faster: These design choices make our algorithm faster and more affordable to run than other graph-based RAG systems because we eliminate the need for communities and clustering. Suppose you’re analyzing a book and want to focus on character interactions, locations, and significant events: from fast_graphrag import GraphRAG DOMAIN = "Analyze this story and identify the characters. Focus on how they interact with each other, the locations they explore, and their relationships." EXAMPLE_QUERIES = [ "What is the significance of Christmas Eve in A Christmas Carol?", "How does the setting of Victorian London contribute to the story's themes?", "Describe the chain of events that leads to Scrooge's transformation.", "How does Dickens use the different spirits (Past, Present, and Future) to guide Scrooge?", "Why does Dickens choose to divide the story into \"staves\" rather than chapters?" ] ENTITY_TYPES = ["Character", "Animal", "Place", "Object", "Activity", "Event"] grag = GraphRAG( working_dir="./book_example", domain=DOMAIN, example_queries="\n".join(EXAMPLE_QUERIES), entity_types=ENTITY_TYPES ) with open("./book.txt") as f: grag.insert(f.read()) print(grag.query("Who is Scrooge?").response) This code creates a domain-specific knowledge graph based on your data, example queries, and specified entity types. Then you can query it in plain English while it automatically handles all the data fetching, entity extractions, co-reference resolutions, memory elections, etc. When you add new data, locking and checkpointing is handled for you as well. This is the kind of infrastructure that GenAI apps need to handle large-scale real-world data. Our goal is to give you this infrastructure so that you can focus on what’s important: building great apps for your users without having to care about manually engineering a retrieval pipeline. In the managed service, we also have a suite of UI tools for you to explore and debug your knowledge graph. We have a free hosted solution with up to 100 monthly requests. When you’re ready to grow, we have paid plans that scale with you. And of course you can self host our open-source engine. Give us a spin today at https://circlemind.co and see our code at https://ift.tt/ProqKgb We’d love feedback :) [1] https://ift.tt/kWBLy4l... [2] Griffiths, T. L., Steyvers, M., & Firl, A. (2007). Google and the Mind: Predicting Fluency with PageRank. Psychological Science, 18(12), 1069–1076. https://ift.tt/WmbulxZ [3] Similarly to Microsoft’s GraphRAG: https://ift.tt/1FWg2dU [4] Similarly to OSU’s HippoRAG: https://ift.tt/LJEpDtT https://ift.tt/PJWEjY3 https://ift.tt/ProqKgb November 18, 2024 at 11:13PM

Monday, November 18, 2024

Show HN: Store and render ASCII diagrams in Obsidian https://ift.tt/Wkm0Bn8

Show HN: Store and render ASCII diagrams in Obsidian Obsidian plug-in that allows you to create and store ASCII diagrams in your notes. It can be used to visualise diagrams, flowcharts, complex tables, Gantt charts and more in technical documentation, that will be rendered as a nice SVG graphics. https://ift.tt/Sn1rJ09 November 12, 2024 at 07:33AM

Show HN: I made Picle (a.k.a. Catchphrase x Wordle x AI) https://ift.tt/g6rkwXu

Show HN: I made Picle (a.k.a. Catchphrase x Wordle x AI) Love to hear what you think! Thank you! https://picle.fi/ November 17, 2024 at 08:38PM

Show HN: Happy Coder – End-to-End Encrypted Mobile Client for Claude Code https://ift.tt/vt1BkI0

Show HN: Happy Coder – End-to-End Encrypted Mobile Client for Claude Code Hey all! Few weeks ago we realized AI models are now so good you d...