mlc-ai / packageLinks
☆14Updated last month
Alternatives and similar repositories for package
Users that are interested in package are comparing it to the libraries listed below
Sorting:
- AMD related optimizations for transformer models☆97Updated 3 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆93Updated this week
- RWKV models and examples powered by candle.☆24Updated 3 weeks ago
- Fused Qwen3 MoE layer for faster training, compatible with Transformers, LoRA, bnb 4-bit quant, Unsloth. Also possible to train LoRA over…☆231Updated this week
- CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…☆225Updated 3 weeks ago
- Implementing the BitNet model in Rust☆44Updated last year
- Implementation of the RWKV language model in pure WebGPU/Rust.☆338Updated 3 weeks ago
- xllamacpp - a Python wrapper of llama.cpp☆73Updated this week
- llama.cpp to PyTorch Converter☆37Updated last year
- ☆19Updated last month
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆74Updated last year
- Inference of Mamba and Mamba2 models in pure C☆196Updated 2 weeks ago
- RWKV-7: Surpassing GPT☆104Updated last year
- Gpu benchmark☆74Updated last year
- LLM inference in C/C++☆104Updated last week
- AirLLM 70B inference with single 4GB GPU☆17Updated 7 months ago
- Hackable and optimized Transformers building blocks, supporting a composable construction.☆34Updated 3 weeks ago
- ☆41Updated 10 months ago
- ☆120Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on disk☆238Updated this week
- ☆172Updated last week
- Bamboo-7B Large Language Model☆93Updated last year
- A streamlined, user-friendly JSON streaming preprocessor, crafted in Python.☆115Updated last year
- Simple high-throughput inference library☆155Updated 8 months ago
- wasm bindings for huggingface tokenizers library☆34Updated 3 years ago
- Estimating hardware and cloud costs of LLMs and transformer projects☆20Updated 3 weeks ago
- Samples of good AI generated CUDA kernels☆99Updated 8 months ago
- ☆30Updated 8 months ago
- JAX bindings for the flash-attention3 kernels☆20Updated last month
- QuIP quantization☆61Updated last year