okuvshynov / llama_duo
asynchronous/distributed speculative evaluation for llama3
☆39Updated 8 months ago
Alternatives and similar repositories for llama_duo:
Users that are interested in llama_duo are comparing it to the libraries listed below
- Inference of Mamba models in pure C☆187Updated last year
- A minimalistic C++ Jinja templating engine for LLM chat templates☆132Updated last week
- The Finite Field Assembly Programming Language☆36Updated 2 weeks ago
- Python bindings for ggml☆140Updated 7 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- Experiments with BitNet inference on CPU☆53Updated last year
- LLM training in simple, raw C/CUDA☆92Updated 11 months ago
- Lightweight Llama 3 8B Inference Engine in CUDA C☆47Updated last month
- High-Performance SGEMM on CUDA devices☆90Updated 3 months ago
- iterate quickly with llama.cpp hot reloading. use the llama.cpp bindings with bun.sh☆48Updated last year
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆179Updated last year
- General purpose GPU compute framework built on Vulkan to support 1000s of cross vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). …☆46Updated 2 months ago
- extensible collectives library in triton☆85Updated 3 weeks ago
- TORCH_LOGS parser for PT2☆37Updated this week
- Explore training for quantized models☆17Updated 3 months ago
- Thin wrapper around GGML to make life easier☆24Updated this week
- minimal C implementation of speculative decoding based on llama2.c☆22Updated 9 months ago
- First token cutoff sampling inference example☆30Updated last year
- GGUF parser in Python☆26Updated 8 months ago
- Train your own small bitnet model☆67Updated 6 months ago
- Tenstorrent's MLIR Based Compiler. We aim to enable developers to run AI on all configurations of Tenstorrent hardware, through an open-s…☆43Updated this week
- Port of Suno AI's Bark in C/C++ for fast inference☆52Updated last year
- ☆13Updated 10 months ago
- Because it's there.☆16Updated 7 months ago
- moondream in zig.☆63Updated 2 weeks ago
- Profile your CoreML models directly from Python 🐍☆27Updated 6 months ago
- ☆68Updated last month
- Perplexity GPU Kernels☆251Updated this week
- Learning about CUDA by writing PTX code.☆128Updated last year
- ☆46Updated 9 months ago