Infini-AI-Lab / UMbreLLa
LLM Inference on consumer devices
☆81Updated this week
Alternatives and similar repositories for UMbreLLa:
Users that are interested in UMbreLLa are comparing it to the libraries listed below
- ☆100Updated last month
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆190Updated 6 months ago
- KV cache compression for high-throughput LLM inference☆109Updated this week
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆180Updated 2 months ago
- RWKV-7: Surpassing GPT☆73Updated 2 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆113Updated last month
- An easy-to-understand framework for LLM samplers that rewind and revise generated tokens☆120Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on disk☆66Updated this week
- Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget☆140Updated 10 months ago
- Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models☆212Updated 9 months ago
- ☆122Updated 5 months ago
- ☆79Updated 2 months ago
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆221Updated last week
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆80Updated last week
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆241Updated 3 months ago
- ☆192Updated last week
- Testing LLM reasoning abilities with family relationship quizzes.☆57Updated this week
- ☆48Updated 2 months ago
- 1.58-bit LLaMa model☆80Updated 9 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 3 months ago
- Easily view and modify JSON datasets for large language models☆69Updated 3 months ago
- PyTorch implementation of models from the Zamba2 series.☆173Updated last week
- ☆110Updated 4 months ago
- ☆192Updated last month
- Inference of Mamba models in pure C☆183Updated 11 months ago
- A pipeline parallel training script for LLMs.☆121Updated this week
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆205Updated last month
- Fast parallel LLM inference for MLX☆153Updated 6 months ago
- [ICLR2025] MagicPIG: LSH Sampling for Efficient LLM Generation☆181Updated last month