abhisheknair10 / llama3.cuView external linksLinks
Lightweight Llama 3 8B Inference Engine in CUDA C
☆53Mar 21, 2025Updated 10 months ago
Alternatives and similar repositories for llama3.cu
Users that are interested in llama3.cu are comparing it to the libraries listed below
Sorting:
- SwiftLet is a lightweight Python framework for running open-source Large Language Models (LLMs) locally using safetensors☆28Aug 6, 2025Updated 6 months ago
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Sep 14, 2025Updated 5 months ago
- ☆12May 30, 2025Updated 8 months ago
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- ☆15Feb 1, 2025Updated last year
- Inference Llama 2 with a model compiled to native code by TorchInductor☆14Feb 8, 2024Updated 2 years ago
- Deploying full-stack on-prem deep research agent that can be run entirely on a local machine for $0!☆30Nov 8, 2025Updated 3 months ago
- The BAZAAR challenges LLMs to navigate the double-auction marketplace, where buyers and sellers must make strategic decisions with incomp…☆35Jul 30, 2025Updated 6 months ago
- LibreTranslate C++ bindings☆18Aug 27, 2021Updated 4 years ago
- Offline-first, desktop AI assistant tailored for educators, enabling them to generate questions directly from source materials.☆23Aug 2, 2025Updated 6 months ago
- ☆63Jul 10, 2025Updated 7 months ago
- A Python-based chat application utilizing a Local LLM to generate complex thought chains for various use cases such as product developmen…☆20Sep 20, 2024Updated last year
- ☆19Oct 2, 2023Updated 2 years ago
- Deploy Apollo HF space locally☆40Dec 16, 2024Updated last year
- ☆23Dec 9, 2025Updated 2 months ago
- A simple speech-to-text and text-to-speech AI chatbot that can be run fully offline.☆45Jan 28, 2024Updated 2 years ago
- Service for testing out the new Qwen2.5 omni model☆63Apr 30, 2025Updated 9 months ago
- Light WebUI for lm.rs☆24Oct 14, 2024Updated last year
- ☆30Aug 27, 2024Updated last year
- A hackable, simple, and reseach-friendly GRPO Training Framework with high speed weight synchronization in a multinode environment.☆36Aug 27, 2025Updated 5 months ago
- A miniaturized version of the Kimi-K2 model optimized for deployment on single H100 GPUs.☆36Jul 16, 2025Updated 7 months ago
- A simple slimmed down mono slam implementation☆31Jul 7, 2025Updated 7 months ago
- ☆49Sep 8, 2025Updated 5 months ago
- Demo and other details can be found here☆34Mar 10, 2025Updated 11 months ago
- Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.☆36Jul 2, 2025Updated 7 months ago
- ☆33Aug 29, 2022Updated 3 years ago
- Spotlight-like client for Ollama on Windows.☆28May 18, 2024Updated last year
- Ambrogio is a dev agent who tackles tech debt. Starting with automatic unit tests and docstring.☆14Mar 30, 2025Updated 10 months ago
- Like system requirements lab but for LLMs☆31Jun 10, 2023Updated 2 years ago
- Financial Analysis and Algorithmic Trading Strategies in Python☆11Feb 16, 2023Updated 3 years ago
- 🚀 FlexLLama - Lightweight self-hosted tool for running multiple llama.cpp server instances with OpenAI v1 API compatibility and multi-GP…☆50Nov 26, 2025Updated 2 months ago
- ☆34Aug 28, 2024Updated last year
- Sculpt: Structuring unstructured data with LLMs☆38Sep 22, 2025Updated 4 months ago
- V.I.S.O.R., my in-development AI-powered voice assistant with integrated memory!☆36Nov 20, 2025Updated 2 months ago
- ☆91Dec 9, 2025Updated 2 months ago
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆150Jan 7, 2026Updated last month
- Experience the power of AI with this free AI voice generator demo. Utilizing Deepgram and Groq, we transform text into voice seamlessly. …☆37Jun 12, 2024Updated last year
- Pinecone Explorer for MacOS☆47Feb 6, 2026Updated last week
- HeadlessPivot☆29Jan 29, 2026Updated 2 weeks ago