ash-neupane / multi-token-predLinks
Train toy models using multi-token prediction objective
☆13Updated last year
Alternatives and similar repositories for multi-token-pred
Users that are interested in multi-token-pred are comparing it to the libraries listed below
Sorting:
- ☆72Updated 6 months ago
- official implementation of ICLR'2025 paper: Rethinking Bradley-Terry Models in Preference-based Reward Modeling: Foundations, Theory, and…☆70Updated 9 months ago
- A Sober Look at Language Model Reasoning☆92Updated last month
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆151Updated 6 months ago
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆57Updated 11 months ago
- ☆34Updated 7 months ago
- ☆78Updated last year
- Reinforcing General Reasoning without Verifiers☆93Updated 6 months ago
- ☆17Updated 5 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆124Updated 9 months ago
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆81Updated 11 months ago
- ☆33Updated last month
- Reinforced Multi-LLM Agents training☆65Updated 7 months ago
- ☆49Updated 9 months ago
- [ICLR 2025] MiniPLM: Knowledge Distillation for Pre-Training Language Models☆71Updated last year
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆114Updated 2 months ago
- ☆53Updated 11 months ago
- ☆107Updated last year
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆50Updated last year
- ☆34Updated last year
- [NAACL 25 main] Awesome LLM Causal Reasoning is a collection of LLM-based casual reasoning works, including papers, codes and datasets.☆111Updated 3 months ago
- exploring whether LLMs perform case-based or rule-based reasoning☆30Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆151Updated 10 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆70Updated 10 months ago
- ☆50Updated 11 months ago
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization☆51Updated 5 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆168Updated 9 months ago
- ☆51Updated 5 months ago
- Official implementation of Bootstrapping Language Models via DPO Implicit Rewards☆46Updated 8 months ago
- PhyX: Does Your Model Have the "Wits" for Physical Reasoning?☆49Updated 2 weeks ago