Growing India News, world news, nation news, our news, people's news, grow news, entertainment, fashion, movies, tech, automobile and many more..
Wednesday, March 20, 2024
Show HN: Cloud-native Stack for Ollama - Build locally and push to deploy https://ift.tt/cXvJZRB
Show HN: Cloud-native Stack for Ollama - Build locally and push to deploy https://ift.tt/Vg7mtSW March 19, 2024 at 11:36PM
Show HN: Real-time voice chat with AI, no transcription https://ift.tt/8YmDait
Show HN: Real-time voice chat with AI, no transcription Hi HN -- voice chat with AI is very popular these days, especially with YC startups ( https://twitter.com/k7agar/status/1769078697661804795 ). The current approaches all do a cascaded approach, with audio -> transcription -> language model -> text synthesis. This approach is easy to get started with, but requires lots of complexity and has a few glaring limitations. Most notably, transcription is slow, is lossy and any error propagates to the rest of the system, cannot capture emotional affect, is often not robust to code-switching/accents, and more. Instead, what if we fed audio directly to the LLM - LLM's are really smart, can they figure it out? This approach is faster (we skip transcription decoding) and less lossy/more robust because the big language model should be smarter than a smaller transcription decoder. I've trained a model in just that fashion. For more architectural information and some training details, see this first post: https://tincans.ai/slm . For details about this model and some ideas for how to prompt it, see this post: https://tincans.ai/slm3 . We trained this on a very limited budget but the model is able to do some things that even GPT-4, Gemini, and Claude cannot, eg reasoning about long-context audio directly, without transcription. We also believe that this is the first model in the world to conduct adversarial attacks and apply preference modeling in the speech domain. The demo is unoptimized (unquantized bf16 weights, default Huggingface inference, serverless speed bumps) but achieves 120ms time to first token with audio. You can basically think of it as Mistral 7B, so it'll be very fast and can also run basically anywhere. I am especially optimistic about embedded usage -- not needing the transcription step means that the resulting model is smaller and cheaper to use on the edge. Would love to hear your thoughts and how you would use it! Weights are Apache-2 and on Hugging Face: https://ift.tt/Q4fCmPG... https://ift.tt/J0KwjMa March 20, 2024 at 12:37AM
Show HN: Krata Maps – open-source Figma like GeoJSON Editor https://ift.tt/7w1qdJi
Show HN: Krata Maps – open-source Figma like GeoJSON Editor https://krata.app March 19, 2024 at 11:58PM
Tuesday, March 19, 2024
Show HN: Arthas.ai – An open-source alternative to character.ai https://ift.tt/vBCo4Vl
Show HN: Arthas.ai – An open-source alternative to character.ai https://ift.tt/xSh6nOa March 19, 2024 at 07:29AM
Show HN: Extend Zigbee sensor range with LoRaWAN https://ift.tt/Ebfr0uH
Show HN: Extend Zigbee sensor range with LoRaWAN https://ift.tt/Wr0VJjZ March 18, 2024 at 02:36PM
Show HN: Pipedream now has 2000+ API integrations https://ift.tt/J6e8Wkf
Show HN: Pipedream now has 2000+ API integrations https://ift.tt/X3YMd01 March 18, 2024 at 11:42PM
Monday, March 18, 2024
Show HN: Native implementation of with checkboxes https://ift.tt/e7PfQJ6
Show HN: Native implementation of
Subscribe to:
Posts (Atom)
Show HN: Do You Know RGB? https://ift.tt/t8kUpbO
Show HN: Do You Know RGB? https://ift.tt/OWhvmMT June 24, 2025 at 01:49PM
-
Show HN: An AI logo generator that can also generate SVG logos Hey everyone, I've spent the past 2 weeks building an AI logo generator, ...
-
Show HN: Snap Scope – Visualize Lens Focal Length Distribution from EXIF Data https://ift.tt/yrqHZtDShow HN: Snap Scope – Visualize Lens Focal Length Distribution from EXIF Data Hey HN, I built this tool because I wanted to understand which...
-
Show HN: Federated IndieAuth Server implemented as a notebook https://ift.tt/32IC633 April 27, 2021 at 04:37PM