MingLiiii / Layer_Gradient
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective
☆63Updated last month
Alternatives and similar repositories for Layer_Gradient:
Users that are interested in Layer_Gradient are comparing it to the libraries listed below
- [NeurIPS 2024] The official implementation of paper: Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs.☆112Updated 3 weeks ago
- Co-LLM: Learning to Decode Collaboratively with Multiple Language Models☆111Updated 11 months ago
- ☆91Updated last month
- [ICLR 2025] SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction☆66Updated 3 weeks ago
- Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models☆79Updated last week
- [COLING'25] Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?☆72Updated 2 months ago
- ☆49Updated last month
- Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling☆99Updated 2 months ago
- Code for Paper: Teaching Language Models to Critique via Reinforcement Learning☆88Updated last month
- Repo of paper "Free Process Rewards without Process Labels"☆140Updated last month
- ☆85Updated 3 weeks ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆83Updated 2 weeks ago
- Code for "Reasoning to Learn from Latent Thoughts"☆85Updated 2 weeks ago
- In-Context Sharpness as Alerts: An Inner Representation Perspective for Hallucination Mitigation (ICML 2024)☆57Updated last year
- Official repository for paper: O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning☆66Updated last month
- Large Language Models Can Self-Improve in Long-context Reasoning☆67Updated 4 months ago
- Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering☆57Updated 4 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆107Updated last year
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆89Updated last month
- The official implementation of "Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks"☆53Updated 11 months ago
- The official implementation of Self-Exploring Language Models (SELM)☆63Updated 10 months ago
- A curated list of awesome LLM Inference-Time Self-Improvement (ITSI, pronounced "itsy") papers from our recent survey: A Survey on Large …☆75Updated 3 months ago
- ☆105Updated 2 months ago
- Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning [ICLR 2025]☆43Updated 2 months ago
- This the implementation of LeCo☆32Updated 2 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆175Updated 3 weeks ago
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆40Updated 4 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆108Updated this week
- ☆59Updated 7 months ago
- [𝐄𝐌𝐍𝐋𝐏 𝐅𝐢𝐧𝐝𝐢𝐧𝐠𝐬 𝟐𝟎𝟐𝟒 & 𝐀𝐂𝐋 𝟐𝟎𝟐𝟒 𝐍𝐋𝐑𝐒𝐄 𝐎𝐫𝐚𝐥] 𝘌𝘯𝘩𝘢𝘯𝘤𝘪𝘯𝘨 𝘔𝘢𝘵𝘩𝘦𝘮𝘢𝘵𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯…☆49Updated 11 months ago