neubig / minllama-assignmentLinks
☆100Updated last year
Alternatives and similar repositories for minllama-assignment
Users that are interested in minllama-assignment are comparing it to the libraries listed below
Sorting:
- An assignment for building an NLP system from scratch.☆27Updated last year
- Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/☆71Updated 10 months ago
- ☆190Updated 2 years ago
- Notes and commented code for RLHF (PPO)☆124Updated last year
- ☆412Updated last year
- minimal GRPO implementation from scratch☆102Updated 10 months ago
- Notes on Direct Preference Optimization☆24Updated last year
- Direct Preference Optimization from scratch in PyTorch☆126Updated 10 months ago
- Distributed training (multi-node) of a Transformer model☆93Updated last year
- ☆82Updated last year
- ☆168Updated 3 months ago
- NeurIPS 2024 tutorial on LLM Inference☆47Updated last year
- An extension of the nanoGPT repository for training small MOE models.☆233Updated 10 months ago
- A brief and partial summary of RLHF algorithms.☆144Updated 11 months ago
- ☆104Updated 6 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆114Updated this week
- ☆140Updated last year
- Minimalist BERT implementation assignment for CS11-711☆83Updated 3 years ago
- ☆85Updated 2 years ago
- Project 2 (Building Large Language Models) for Stanford CS324: Understanding and Developing Large Language Models (Winter 2022)☆105Updated 2 years ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆68Updated 9 months ago
- Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts.☆141Updated last year
- Course Materials for Interpretability of Large Language Models (0368.4264) at Tel Aviv University☆297Updated 3 weeks ago
- Website☆57Updated 3 years ago
- Prune transformer layers☆74Updated last year
- Organize the Web: Constructing Domains Enhances Pre-Training Data Curation☆76Updated 9 months ago
- The official evaluation suite and dynamic data release for MixEval.☆255Updated last year
- NeurIPS Large Language Model Efficiency Challenge: 1 LLM + 1GPU + 1Day☆260Updated 2 years ago
- ☆112Updated 7 months ago
- ☆160Updated last year