facebookresearch / moodistLinks
moodist
☆22Updated last month
Alternatives and similar repositories for moodist
Users that are interested in moodist are comparing it to the libraries listed below
Sorting:
- ☆56Updated last year
 - The simplest, fastest repository for training/finetuning medium-sized GPTs.☆170Updated 4 months ago
 - Can Language Models Solve Olympiad Programming?☆119Updated 9 months ago
 - ☆33Updated 9 months ago
 - A MAD laboratory to improve AI architecture designs 🧪☆132Updated 10 months ago
 - ☆142Updated last month
 - Repository for the paper Stream of Search: Learning to Search in Language☆151Updated 9 months ago
 - Universal Neurons in GPT2 Language Models☆30Updated last year
 - ☆197Updated 2 months ago
 - Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆84Updated last year
 - EvaByte: Efficient Byte-level Language Models at Scale☆110Updated 6 months ago
 - Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆193Updated last year
 - ☆114Updated 2 weeks ago
 - Official Repo for InSTA: Towards Internet-Scale Training For Agents☆56Updated 3 months ago
 - Understand and test language model architectures on synthetic tasks.☆234Updated last month
 - 📄Small Batch Size Training for Language Models☆63Updated 3 weeks ago
 - ☆23Updated 9 months ago
 - Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Updated last year
 - Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆29Updated last month
 - RLP: Reinforcement as a Pretraining Objective☆195Updated 3 weeks ago
 - Fluid Language Model Benchmarking☆19Updated last month
 - AlgoTune is a NeurIPS 2025 benchmark made up of 154 math, physics, and computer science problems. The goal is write code that solves each…☆66Updated last week
 - ☆86Updated last year
 - A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆270Updated this week
 - Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.☆48Updated 9 months ago
 - ☆53Updated last year
 - $100K or 100 Days: Trade-offs when Pre-Training with Academic Resources☆147Updated last month
 - ☆27Updated last month
 - Open source replication of Anthropic's Crosscoders for Model Diffing☆59Updated last year
 - Applying SAEs for fine-grained control☆24Updated 10 months ago