Joshua-Ren / Learning_dynamics_LLM
☆106Updated this week
Alternatives and similar repositories for Learning_dynamics_LLM
Users that are interested in Learning_dynamics_LLM are comparing it to the libraries listed below
Sorting:
- ☆59Updated last month
- ☆43Updated last month
- ☆165Updated last month
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)☆72Updated 6 months ago
- ☆97Updated 2 months ago
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆45Updated last week
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆67Updated 3 months ago
- [AAAI 2025 oral] Evaluating Mathematical Reasoning Beyond Accuracy☆61Updated 4 months ago
- [ICLR 2025 Workshop] "Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models"☆18Updated 2 weeks ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆38Updated 10 months ago
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆119Updated last month
- AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)☆214Updated 3 weeks ago
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆29Updated 3 months ago
- A brief and partial summary of RLHF algorithms.☆128Updated 2 months ago
- ☆168Updated last month
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆149Updated 2 months ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆79Updated this week
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆76Updated 8 months ago
- ☆45Updated 6 months ago
- Code for "Reasoning to Learn from Latent Thoughts"☆94Updated last month
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆52Updated last month
- Curation of resources for LLM mathematical reasoning, most of which are screened by @tongyx361 to ensure high quality and accompanied wit…☆125Updated 10 months ago
- AdaRFT: Efficient Reinforcement Finetuning via Adaptive Curriculum Learning☆31Updated last month
- A comprehensive collection of process reward models.☆76Updated last week
- Model merging is a highly efficient approach for long-to-short reasoning.☆46Updated last month
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 5 months ago
- A Survey on the Honesty of Large Language Models☆57Updated 5 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆120Updated 8 months ago
- Repo of paper "Free Process Rewards without Process Labels"☆147Updated 2 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆138Updated 3 months ago