Tuesday, January 7, 2025

Show HN: A 100-Line LLM Framework https://ift.tt/KtGhcDv

Show HN: A 100-Line LLM Framework I've seen a lot of comments about how complex frameworks like LangChain can be. Over the holidays, I wanted to see how minimal an LLM framework could get if we stripped away everything non-essential. The result is an LLM framework in just 100 lines of code. These 100 lines capture what I see as the core abstraction of most LLM frameworks: a nested directed graph that breaks down tasks into multiple LLM steps, with branching and recursion to enable agent-like decision-making. From there, you can layer on more advanced features like agents, RAG, task decomposition, and more. I’ve intentionally avoided bundling vendor-specific wrappers (e.g., for OpenAI) into the framework. That kind of lock-in can be brittle and is easy to recreate on the fly—just feed the vendor’s API docs into your favorite LLM to generate a new wrapper. With miniLLMFlow, you only get the fundamentals. It also works nicely with coding assistants like ChatGPT, Claude, and Cursor.ai. Because the code is so minimal, you can quickly share the entire "source code and documentation with an AI assistant, and it can help you build new workflows on the spot. I’m adding more examples (including multi-agent setups) and would love feedback! If there's a feature or use case you’d like to see, please let me know. GitHub: https://ift.tt/KcO1ZwE https://ift.tt/KcO1ZwE January 6, 2025 at 09:20PM

Taken with Transportation Podcast: Thank You, Jeff Tumlin

Taken with Transportation Podcast: Thank You, Jeff Tumlin
By

Jeff Tumlin has left the SFMTA after five years with our agency. His last day was Dec. 31, 2024. It’s January 2025, and we have said goodbye to Director of Transportation Jeff Tumlin. Director Tumlin announced in mid-December that he would not renew his contract and instead would step down from his position at the end of the year. “Thank You, Jeff Tumlin” is the latest episode of our Taken with Transportation podcast. In it, we talk with our former director about his time at the agency. Reflecting on the last five years “I started this job on Dec. 15, 2019. Three months later, we were in...



Published January 06, 2025 at 05:30AM
https://ift.tt/ml9XKbV

Show HN: Skeet – A local-friendly command-line copilot that works with any LLM https://ift.tt/UbAdDzq

Show HN: Skeet – A local-friendly command-line copilot that works with any LLM I've been using GitHub Copilot CLI, and while it's great, I found myself wanting something that could work with any LLM (including running local models through Ollama), so I built Skeet. The key features that make it different: - Works with any LLM provider through LiteLLM (OpenAI, Anthropic, local models, etc.) - Automatically retries and adapts commands when they fail - Can generate and execute Python scripts with dependencies (powered by uv) without virtual environment hassles You can try simple tasks like: ``` skeet show me system information skeet what is using port 8000 skeet --python "what's the current time on the ISS?" ``` Demo: https://ift.tt/KdkIzNj Code: https://ift.tt/QdkX7wr I built it for myself, and I've been really happy with the results. It's interesting to see how different models fare against one another with everyday tasks. If running a local model, I've had decent luck with ollama_chat/phi3:medium but I'm curious to know what others use. Cheers! https://ift.tt/QdkX7wr January 6, 2025 at 10:53PM

Monday, January 6, 2025

Show HN: Discuo – Anonymous discussions with infinite branching and 24h lifespan https://ift.tt/9bYrnTt

Show HN: Discuo – Anonymous discussions with infinite branching and 24h lifespan I built Discuo, a unique discussion platform that combines: - Infinite thread branching: conversations evolve naturally in multiple directions - 24h post lifespan: all content auto-deletes after 24 hours - No account needed: just start posting or commenting instantly - Complete anonymity: no tracking, no personal data collection - Minimalist design: distraction-free, focused on pure discussion Originally created for developers to share progress and discuss code, it evolved into a platform covering various topics while maintaining its minimalist essence. https://discuo.com January 1, 2025 at 10:23PM

Show HN: Does your food have gluten? https://ift.tt/NPyVxgo

Show HN: Does your food have gluten? Hey folks! About a couple of months or so ago, I finally figured out I’m gluten intolerant after months of chasing random symptoms and getting nowhere. After a wild goose chase (started this via Djokovic's Serve To Win book) finally found out I was highly gluten sensitive/intolerant. I had to rethink everything I ate. Grocery shopping turned into ingredient detective work, and eating out became a gamble. I quickly realized I needed something to make this easier and built GlutenAI. It’s a super simple tool to check if something’s gluten-free. Type in a food or product or even a common recipe name, and it’ll let you know if you’re good to go or should steer clear. Would love to get y'all's feedback on this and let me know what else you would like to see here : https://ift.tt/xAoLa1X https://ift.tt/xAoLa1X January 6, 2025 at 12:58AM

Sunday, January 5, 2025

Show HN: Lightweight Llama3 Inference Engine – CUDA C https://ift.tt/46AO3Mg

Show HN: Lightweight Llama3 Inference Engine – CUDA C Hey, recently I took inspiration from llama.cpp, ollama, and many other similar tools that enable inference of LLMs locally, and I just finished building a Llama inference engine for the 8B model in CUDA C. I recently wanted to explore my newly founded interest in CUDA programming and my passion for machine learning. This project only makes use of the native CUDA runtime api and cuda_fp16. The inference takes place in fp16, so it requires around 17-18GB of VRAM (~16GB for model params and some more for intermediary caches). It doesn’t use cuBLAS or any similar libraries since I wanted to be exposed to the least amount of abstraction. Hence, it isn’t as optimized as a cuBLAS implementation or other inference engines like the ones that inspired the project. ## *A brief overview of the implementation* I used CUDA C. It reads a .safetensor file of the model that you can pull from HuggingFace. The actual kernels are fairly straightforward for normalizations, skip connections, RoPE, and activation functions (SiLU). For GEMM, I got as far as implementing tiled matrix multiplication with vectorized retrieval for each thread. The GEMM kernel is also written in such a way that the second matrix is not required to be pre-transposed while still achieving coalesced memory access to HBM. Feel free to have a look at the project repo and try it out if you’re interested. If you like what you see, feel free to star the repo too! I highly appreciate any feedback, good or constructive. https://ift.tt/iTkjXMQ January 5, 2025 at 03:37AM

Show HN: Signify – FOSS tool to generate Email signatures (HTML and PNG) https://ift.tt/w0vzgOx

Show HN: Signify – FOSS tool to generate Email signatures (HTML and PNG) Signify is a free and open-source tool inspired by eSigna (esigna.vercel.app). It enables you to create professional email signatures with ease. Written with Svelte & Kit. https://ift.tt/15jFTx2 January 5, 2025 at 01:54AM

Show HN: IssuePay – Get paid for open-source contributions https://ift.tt/ujCNZEA

Show HN: IssuePay – Get paid for open-source contributions Hi HN! I’m Mario, and I’m about to launch IssuePay. Problem: Open-source contribu...