yandex-research / swarm
Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"
☆137Updated last year
Alternatives and similar repositories for swarm:
Users that are interested in swarm are comparing it to the libraries listed below
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆116Updated 2 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 4 months ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆266Updated last year
- ☆42Updated last year
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆35Updated this week
- ☆92Updated 2 years ago
- ☆192Updated 2 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆215Updated 3 weeks ago
- ☆50Updated 3 months ago
- ☆100Updated 5 months ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆361Updated 11 months ago
- DeMo: Decoupled Momentum Optimization☆181Updated 2 months ago
- ☆26Updated last year
- ☆44Updated 3 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models"☆59Updated 4 months ago
- Experiments on speculative sampling with Llama models☆124Updated last year
- Collection of kernels written in Triton language☆105Updated this week
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆81Updated last year
- Token Omission Via Attention☆123Updated 4 months ago
- Explorations into some recent techniques surrounding speculative decoding☆240Updated 2 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆196Updated 2 months ago
- ☆49Updated 11 months ago
- Code repository for the c-BTM paper☆105Updated last year
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆229Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- ☆181Updated this week
- RWKV-7: Surpassing GPT☆79Updated 3 months ago
- PB-LLM: Partially Binarized Large Language Models☆151Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆221Updated this week