stanford-cs336 / assignment5-alignmentLinks
☆33Updated 2 weeks ago
Alternatives and similar repositories for assignment5-alignment
Users that are interested in assignment5-alignment are comparing it to the libraries listed below
Sorting:
- Physics of Language Models, Part 4☆204Updated last week
- Replicating O1 inference-time scaling laws☆89Updated 8 months ago
- ☆67Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆86Updated last month
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆82Updated 3 weeks ago
- A brief and partial summary of RLHF algorithms.☆131Updated 5 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆78Updated last year
- Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.☆128Updated 11 months ago
- ☆187Updated 3 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆81Updated 9 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆109Updated 6 months ago
- ☆51Updated 4 months ago
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆52Updated 8 months ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆59Updated 3 months ago
- Evaluation of LLMs on latest math competitions☆155Updated 2 weeks ago
- ☆61Updated last week
- An extension of the nanoGPT repository for training small MOE models.☆164Updated 4 months ago
- [ICML 2025] Predictive Data Selection: The Data That Predicts Is the Data That Teaches☆53Updated 5 months ago
- ☆81Updated last week
- [ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples☆103Updated last week
- Repo for "Z1: Efficient Test-time Scaling with Code"☆63Updated 3 months ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆172Updated 2 months ago
- The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …☆61Updated 9 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆140Updated 11 months ago
- [NeurIPS'24] Official code for *🎯DART-Math: Difficulty-Aware Rejection Tuning for Mathematical Problem-Solving*☆111Updated 7 months ago
- ☆101Updated 10 months ago
- ☆38Updated 4 months ago
- minimal GRPO implementation from scratch☆94Updated 4 months ago
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆161Updated 2 weeks ago
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆219Updated last month