PrimeIntellect-ai / OpenDilocoLinks
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
☆523Updated 6 months ago
Alternatives and similar repositories for OpenDiloco
Users that are interested in OpenDiloco are comparing it to the libraries listed below
Sorting:
- prime is a framework for efficient, globally distributed training of AI models over the internet.☆786Updated 2 months ago
- Distributed Training Over-The-Internet☆951Updated 2 months ago
- Decentralized RL Training at Scale☆403Updated this week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆343Updated 7 months ago
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆648Updated 3 months ago
- ☆216Updated 6 months ago
- DFloat11: Lossless LLM Compression for Efficient GPU Inference☆504Updated this week
- Efficient LLM Inference over Long Sequences☆387Updated last month
- Official implementation of Half-Quadratic Quantization (HQQ)☆856Updated this week
- ☆556Updated 11 months ago
- scalable and robust tree-based speculative decoding algorithm☆354Updated 6 months ago
- [ICML 2024] CLLMs: Consistency Large Language Models☆397Updated 8 months ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆277Updated last year
- Advanced Quantization Algorithm for LLMs and VLMs, with support for CPU, Intel GPU, CUDA and HPU. Seamlessly integrated with Torchao, Tra…☆574Updated this week
- LLM KV cache compression made easy☆570Updated this week
- Beyond Language Models: Byte Models are Digital World Simulators☆326Updated last year
- Reference implementation of Megalodon 7B model☆524Updated 2 months ago
- ☆549Updated 9 months ago
- Muon is Scalable for LLM Training☆1,258Updated last week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆377Updated this week
- OLMoE: Open Mixture-of-Experts Language Models☆830Updated 4 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated 9 months ago
- Pretraining and inference code for a large-scale depth-recurrent language model☆810Updated 3 weeks ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆324Updated 3 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆512Updated this week
- GRadient-INformed MoE☆264Updated 10 months ago
- ☆864Updated last year
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…☆1,086Updated this week
- noise_step: Training in 1.58b With No Gradient Memory☆220Updated 7 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆198Updated last year