NousResearch / DisTrO
Distributed Training Over-The-Internet
☆893Updated 4 months ago
Alternatives and similar repositories for DisTrO:
Users that are interested in DisTrO are comparing it to the libraries listed below
- OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training☆478Updated 2 months ago
- prime is a framework for efficient, globally distributed training of AI models over the internet.☆689Updated this week
- procedural reasoning datasets☆541Updated this week
- noise_step: Training in 1.58b With No Gradient Memory☆217Updated 3 months ago
- DeMo: Decoupled Momentum Optimization☆185Updated 4 months ago
- GRadient-INformed MoE☆261Updated 6 months ago
- Code to train and evaluate Neural Attention Memory Models to obtain universally-applicable memory systems for transformers.☆301Updated 5 months ago
- Minimalistic 4D-parallelism distributed training framework for education purpose☆970Updated 3 weeks ago
- A comprehensive repository of reasoning tasks for LLMs (and beyond)☆426Updated 6 months ago
- Muon optimizer: +>30% sample efficiency with <3% wallclock overhead☆539Updated last week
- Pretraining code for a large-scale depth-recurrent language model☆709Updated 3 weeks ago
- Official implementation of Half-Quadratic Quantization (HQQ)☆774Updated this week
- VPTQ, A Flexible and Extreme low-bit quantization algorithm☆622Updated this week
- ☆846Updated 6 months ago
- [ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling☆857Updated last month
- A complete end-to-end pipeline for LLM interpretability with sparse autoencoders (SAEs) using Llama 3.2, written in pure PyTorch and full…☆605Updated last week
- ☆205Updated 2 months ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆1,162Updated this week
- ☆704Updated 2 weeks ago
- A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!☆1,020Updated 2 months ago
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…☆311Updated 3 months ago
- Recipes to scale inference-time compute of open models☆1,048Updated last month
- Aidan Bench attempts to measure <big_model_smell> in LLMs.☆288Updated 3 weeks ago
- Code for BLT research paper☆1,439Updated this week
- Simple Python library/structure to ablate features in LLMs which are supported by TransformerLens☆441Updated 9 months ago
- smol models are fun too☆91Updated 4 months ago
- A library for making RepE control vectors☆562Updated 2 months ago
- PyTorch implementation of models from the Zamba2 series.☆178Updated 2 months ago
- Official PyTorch implementation for "Large Language Diffusion Models"☆1,350Updated 3 weeks ago
- System 2 Reasoning Link Collection☆818Updated 2 weeks ago