Wednesday, August 20, 2025

Show HN: Lemonade: Run LLMs Locally with GPU and NPU Acceleration https://ift.tt/KHtz9q4

Show HN: Lemonade: Run LLMs Locally with GPU and NPU Acceleration Lemonade is an open-source SDK and local LLM server focused on making it easy to run and experiment with large language models (LLMs) on your own PC, with special acceleration paths for NPUs (Ryzen™ AI) and GPUs (Strix Halo and Radeon™). Why? There are three qualities needed in a local LLM serving stack, and none of the market leaders (Ollama, LM Studio, or using llama.cpp by itself) deliver all three: 1. Use the best backend for the user’s hardware, even if it means integrating multiple inference engines (llama.cpp, ONNXRuntime, etc.) or custom builds (e.g., llama.cpp with ROCm betas). 2. Zero friction for both users and developers from onboarding to apps integration to high performance. 3. Commitment to open source principles and collaborating in the community. Lemonade Overview: Simple LLM serving: Lemonade is a drop-in local server that presents an OpenAI-compatible API, so any app or tool that talks to OpenAI’s endpoints will “just work” with Lemonade’s local models. Performance focus: Powered by llama.cpp (Vulkan and ROCm for GPUs) and ONNXRuntime (Ryzen AI for NPUs and iGPUs), Lemonade squeezes the best out of your PC, no extra code or hacks needed. Cross-platform: One-click installer for Windows (with GUI), pip/source install for Linux. Bring your own models: Supports GGUFs and ONNX. Use Gemma, Llama, Qwen, Phi and others out-of-the-box. Easily manage, pull, and swap models. Complete SDK: Python API for LLM generation, and CLI for benchmarking/testing. Open source: Apache 2.0 (core server and SDK), no feature gating, no enterprise “gotchas.” All server/API logic and performance code is fully open; some software the NPU depends on is proprietary, but we strive for as much openness as possible (see our GitHub for details). Active collabs with GGML, Hugging Face, and ROCm/TheRock. Get started: Windows? Download the latest GUI installer from https://ift.tt/XJmyiLf Linux? Install with pip or from source ( https://ift.tt/XJmyiLf ) Docs: https://ift.tt/hB6ZKyQ Discord for banter/support/feedback: https://ift.tt/zw4ZcMh How do you use it? Click on lemonade-server from the start menu Open http://localhost:8000 in your browser for a web ui with chat, settings, and model management. Point any OpenAI-compatible app (chatbots, coding assistants, GUIs, etc.) at http://localhost:8000/api/v1 Use the CLI to run/load/manage models, monitor usage, and tweak settings such as temperature, top-p and top-k. Integrate via the Python API for direct access in your own apps or research. Who is it for? Developers: Integrate LLMs into your apps with standardized APIs and zero device-specific code, using popular tools and frameworks. LLM Enthusiasts, plug-and-play with: Morphik AI (contextual RAG/PDF Q&A) Open WebUI (modern local chat interfaces) Continue.dev (VS Code AI coding copilot) …and many more integrations in progress! Privacy-focused users: No cloud calls, run everything locally, including advanced multi-modal models if your hardware supports it. Why does this matter? Every month, new on-device models (e.g., Qwen3 MOEs and Gemma 3) are getting closer to the capabilities of cloud LLMs. We predict a lot of LLM use will move local for cost reasons alone. Keeping your data and AI workflows on your own hardware is finally practical, fast, and private, no vendor lock-in, no ongoing API fees, and no sending your sensitive info to remote servers. Lemonade lowers friction for running these next-gen models, whether you want to experiment, build, or deploy at the edge. Would love your feedback! Are you running LLMs on AMD hardware? What’s missing, what’s broken, what would you like to see next? Any pain points from Ollama, LM Studio, or others you wish we solved? Share your stories, questions, or rant at us. Links: Download & Docs: https://ift.tt/XJmyiLf GitHub: https://ift.tt/cG6Pfuk Discord: https://ift.tt/zw4ZcMh Thanks HN! https://ift.tt/cG6Pfuk August 20, 2025 at 01:05AM

No comments:

Post a Comment

Show HN: Lemonade: Run LLMs Locally with GPU and NPU Acceleration https://ift.tt/KHtz9q4

Show HN: Lemonade: Run LLMs Locally with GPU and NPU Acceleration Lemonade is an open-source SDK and local LLM server focused on making it e...