stanford-cs336 / assignment5-alignmentLinks
☆28Updated last month
Alternatives and similar repositories for assignment5-alignment
Users that are interested in assignment5-alignment are comparing it to the libraries listed below
Sorting:
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆128Updated 7 months ago
- An extension of the nanoGPT repository for training small MOE models.☆160Updated 4 months ago
- Open source interpretability artefacts for R1.☆154Updated 2 months ago
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).☆259Updated 3 weeks ago
- ☆96Updated 9 months ago
- Evaluation of LLMs on latest math competitions☆140Updated 2 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆190Updated last year
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆177Updated 3 weeks ago
- A scalable asynchronous reinforcement learning implementation with in-flight weight updates.☆129Updated this week
- A brief and partial summary of RLHF algorithms.☆131Updated 4 months ago
- ☆41Updated 2 months ago
- ☆66Updated last year
- Simple and efficient pytorch-native transformer training and inference (batched)☆77Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆78Updated last month
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆147Updated 2 weeks ago
- 📖 This is a repository for organizing papers, codes, and other resources related to Latent Reasoning.☆86Updated this week
- Replicating O1 inference-time scaling laws☆89Updated 7 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆150Updated 3 months ago
- [COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models☆114Updated 2 months ago
- RL Scaling and Test-Time Scaling (ICML'25)☆108Updated 5 months ago
- ☆134Updated 3 months ago
- Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache☆112Updated this week
- PyTorch building blocks for the OLMo ecosystem☆258Updated last week
- official repository for “Reinforcement Learning for Reasoning in Large Language Models with One Training Example”☆323Updated this week
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆216Updated 3 weeks ago
- ☆198Updated 5 months ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆108Updated 2 months ago
- ☆181Updated 2 months ago
- Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"☆318Updated 7 months ago
- A MAD laboratory to improve AI architecture designs 🧪☆123Updated 7 months ago