LaurentMazare / mamba.rs
☆122Updated 4 months ago
Related projects: ⓘ
- ☆134Updated 7 months ago
- Inference Llama 2 in one file of zero-dependency, zero-unsafe Rust☆36Updated last year
- Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.☆229Updated 3 weeks ago
- High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datas…☆59Updated last month
- Low rank adaptation (LoRA) for Candle.☆124Updated 3 weeks ago
- 1.58 Bit LLM on Apple Silicon using MLX☆97Updated 4 months ago
- Tutorial for Porting PyTorch Transformer Models to Candle (Rust)☆235Updated last month
- Rust client for the huggingface hub aiming for minimal subset of features over `huggingface-hub` python package☆138Updated 3 weeks ago
- ☆57Updated last year
- A high-performance constrained decoding engine based on context free grammar in Rust☆30Updated this week
- Unofficial Rust bindings to Apple's mlx framework☆52Updated last week
- implement llava using candle☆12Updated 3 months ago
- LLM Orchestrator built in Rust☆261Updated 6 months ago
- Tensor library with autograd using only Rust's standard library☆61Updated 2 months ago
- ☆75Updated 3 weeks ago
- Inference Llama 2 in one file of pure Rust 🦀☆227Updated last year
- ☆22Updated 2 months ago
- ☆25Updated 8 months ago
- ☆152Updated last year
- Inference of Mamba models in pure C☆176Updated 6 months ago
- Tensors and dynamic Neural Networks in Mojo☆125Updated last month
- Cookbook to build Rust Candle models☆71Updated 9 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆217Updated 2 months ago
- LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!☆100Updated last year
- Fast parallel LLM inference for MLX☆118Updated 2 months ago
- High-level, optionally asynchronous Rust bindings to llama.cpp☆161Updated 3 months ago
- ☆109Updated last month
- A collection of optimisers for use with candle☆29Updated last month
- ☆23Updated last year
- Comprehensive analysis of difference in performance of QLora, Lora, and Full Finetunes.☆81Updated last year