facebookresearch / moodistLinks
moodist
☆23Updated last week
Alternatives and similar repositories for moodist
Users that are interested in moodist are comparing it to the libraries listed below
Sorting:
- ☆56Updated last year
- Official Repo for InSTA: Towards Internet-Scale Training For Agents☆55Updated 5 months ago
- ☆33Updated 11 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆111Updated 7 months ago
- ☆201Updated 3 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆86Updated last year
- ☆144Updated 3 months ago
- Can Language Models Solve Olympiad Programming?☆123Updated 10 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆151Updated 10 months ago
- [ICLR 2025] "Training LMs on Synthetic Edit Sequences Improves Code Synthesis" (Piterbarg, Pinto, Fergus)☆19Updated 10 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆135Updated 11 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆179Updated 5 months ago
- Official implementation of Regularized Policy Gradient (RPG) (https://arxiv.org/abs/2505.17508)☆54Updated last month
- ☆53Updated last year
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆196Updated last year
- ☆74Updated last month
- ☆121Updated last month
- ☆89Updated last year
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆132Updated last year
- ☆109Updated last year
- ☆77Updated 2 months ago
- Memory Mosaics are networks of associative memories working in concert to achieve a prediction task.☆55Updated 10 months ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆67Updated 7 months ago
- Sparse Autoencoder Training Library☆55Updated 7 months ago
- Fluid Language Model Benchmarking☆22Updated 2 months ago
- some common Huggingface transformers in maximal update parametrization (µP)☆87Updated 3 years ago
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆30Updated 2 months ago
- ☆27Updated 2 months ago
- Replicating O1 inference-time scaling laws☆90Updated last year
- ☆31Updated 8 months ago