Growing India News, world news, nation news, our news, people's news, grow news, entertainment, fashion, movies, tech, automobile and many more..
Sunday, January 5, 2025
Show HN: Lightweight Llama3 Inference Engine – CUDA C https://ift.tt/46AO3Mg
Show HN: Lightweight Llama3 Inference Engine – CUDA C Hey, recently I took inspiration from llama.cpp, ollama, and many other similar tools that enable inference of LLMs locally, and I just finished building a Llama inference engine for the 8B model in CUDA C. I recently wanted to explore my newly founded interest in CUDA programming and my passion for machine learning. This project only makes use of the native CUDA runtime api and cuda_fp16. The inference takes place in fp16, so it requires around 17-18GB of VRAM (~16GB for model params and some more for intermediary caches). It doesn’t use cuBLAS or any similar libraries since I wanted to be exposed to the least amount of abstraction. Hence, it isn’t as optimized as a cuBLAS implementation or other inference engines like the ones that inspired the project. ## *A brief overview of the implementation* I used CUDA C. It reads a .safetensor file of the model that you can pull from HuggingFace. The actual kernels are fairly straightforward for normalizations, skip connections, RoPE, and activation functions (SiLU). For GEMM, I got as far as implementing tiled matrix multiplication with vectorized retrieval for each thread. The GEMM kernel is also written in such a way that the second matrix is not required to be pre-transposed while still achieving coalesced memory access to HBM. Feel free to have a look at the project repo and try it out if you’re interested. If you like what you see, feel free to star the repo too! I highly appreciate any feedback, good or constructive. https://ift.tt/iTkjXMQ January 5, 2025 at 03:37AM
Subscribe to:
Post Comments (Atom)
Show HN: Wsgrok – one of many ngrok alternatives https://ift.tt/6uGQzrl
Show HN: Wsgrok – one of many ngrok alternatives I built it for myself because ngrok didn't let me add one more domain unless I paid $12...
- 
Show HN: An AI logo generator that can also generate SVG logos Hey everyone, I've spent the past 2 weeks building an AI logo generator, ...
- 
Breaking #FoxNews Alert : Number of dead rises after devastating tornadoes, Kentucky governor announces — R Karthickeyan (@RKarthickeyan1)...
- 
Show HN: Snap Scope – Visualize Lens Focal Length Distribution from EXIF Data https://ift.tt/yrqHZtDShow HN: Snap Scope – Visualize Lens Focal Length Distribution from EXIF Data Hey HN, I built this tool because I wanted to understand which...
 
No comments:
Post a Comment