Thursday, August 24, 2023

Show HN: Dataherald AI – Natural Language to SQL Engine https://ift.tt/GF2zWJy

Show HN: Dataherald AI – Natural Language to SQL Engine Hi HN community. We are excited to open source Dataherald’s natural-language-to-SQL engine today ( https://ift.tt/iv1Ghet ). This engine allows you to set up an API from your structured database that can answer questions in plain English. GPT-4 class LLMs have gotten remarkably good at writing SQL. However, out-of-the-box LLMs and existing frameworks would not work with our own structured data at a necessary quality level. For example, given the question “what was the average rent in Los Angeles in May 2023?” a reasonable human would either assume the question is about Los Angeles, CA or would confirm the state with the question asker in a follow up. However, an LLM translates this to: select price from rent_prices where city=”Los Angeles” AND month=”05” AND year=”2023” This pulls data for Los Angeles, CA and Los Angeles, TX without getting columns to differentiate between the two. You can read more about the challenges of enterprise-level text-to-SQL in this blog post I wrote on the topic: https://ift.tt/0KOQyzd... Dataherald comes with “batteries-included.” It has best-in-class implementations of core components, including, but not limited to: a state of the art NL-to-SQL agent, an LLM-based SQL-accuracy evaluator. The architecture is modular, allowing these components to be easily replaced. It’s easy to set up and use with major data warehouses. There is a “Context Store” where information (NL2SQL examples, schemas and table descriptions) is used for the LLM prompts to make the engine get better with usage. And we even made it fast! This version allows you to easily connect to PG, Databricks, BigQuery or Snowflake and set up an API for semantic interactions with your structured data. You can then add business and data context that are used for few-shot prompting by the engine. The NL-to-SQL agent in this open source release was developed by our own Mohammadreza Pourreza, whose DIN-SQL algorithm is currently top of the Spider ( https://ift.tt/KOI4kvm ) and Bird ( https://ift.tt/KRl3osc ) NL 2 SQL benchmarks. This agent has outperformed the Langchain SQLAgent anywhere from 12%-250%.5x (depending on the provided context) in our own internal benchmarking while being only ~15s slower on average. Needless to say, this is an early release and the codebase is under swift development. We would love for you to try it out and give us your feedback! And if you are interested in contributing, we’d love to hear from you! https://ift.tt/iv1Ghet August 24, 2023 at 12:08AM

No comments:

Post a Comment

Show HN: The Σ-Manifold Manifesto https://ift.tt/YIBzd2E

Show HN: The Σ-Manifold Manifesto This project explores the connection between *the linear structure of text* and its *emotional-aesthetic i...