F2-Song / Weak-to-Strong-DecodingLinks
The official implementation of "Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding"
☆23Updated 7 months ago
Alternatives and similar repositories for Weak-to-Strong-Decoding
Users that are interested in Weak-to-Strong-Decoding are comparing it to the libraries listed below
Sorting:
- [COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆19Updated 10 months ago
- Reagent: Exploring Reasoning Reward Model for Agents☆31Updated this week
- Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge☆104Updated last week
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆53Updated last year
- Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models☆41Updated last year
- Code for "Language Models Can Learn from Verbal Feedback Without Scalar Rewards"☆57Updated last month
- Code for Heima☆59Updated 9 months ago
- [EMNLP 2025] Distill Visual Chart Reasoning Ability from LLMs to MLLMs☆59Updated 5 months ago
- [NeurIPS 2025] Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"☆30Updated 3 months ago
- The official implementation of Preference Data Reward-Augmentation.☆18Updated 9 months ago
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆132Updated 9 months ago
- JudgeLRM: Large Reasoning Models as a Judge☆40Updated last week
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆24Updated last year
- ☆47Updated 4 months ago
- A Recipe for Building LLM Reasoners to Solve Complex Instructions☆29Updated 4 months ago
- [NeurIPS 2025@FoRLM] R1-Compress: Long Chain-of-Thought Compression via Chunk Compression and Search☆17Updated 2 weeks ago
- This repo contains code for the paper "Both Text and Images Leaked! A Systematic Analysis of Data Contamination in Multimodal LLM"☆17Updated 3 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Updated last year
- Improving Your Model Ranking on Chatbot Arena by Vote Rigging (ICML 2025)☆26Updated 11 months ago
- The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆15Updated 5 months ago
- Code repository for the paper "The Inherent Limits of Pretrained LLMs: The Unexpected Convergence of Instruction Tuning and In-Context Le…☆13Updated last year
- [ACL 2025 Findings] Implicit Reasoning in Transformers is Reasoning through Shortcuts☆17Updated 10 months ago
- ☆21Updated 9 months ago
- ☆72Updated 7 months ago
- The official implementation of Cross-Task Experience Sharing (COPS)☆29Updated last year
- Emergent Hierarchical Reasoning in LLMs/VLMs through Reinforcement Learning☆60Updated 3 months ago
- ☆39Updated last month
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆63Updated last year
- ☆16Updated last year
- FastCuRL: Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient LLM Reasoning☆56Updated 3 months ago