aeroplanepaper / GRPO-LEAD
☆11Updated this week
Alternatives and similar repositories for GRPO-LEAD:
Users that are interested in GRPO-LEAD are comparing it to the libraries listed below
- Codes for Merging Large Language Models☆29Updated 8 months ago
- SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Model https://arxiv.org/pdf/2411.02433☆25Updated 5 months ago
- ☆20Updated 2 months ago
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆41Updated 2 weeks ago
- Mosaic IT: Enhancing Instruction Tuning with Data Mosaics☆17Updated 2 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆20Updated 2 months ago
- ☆16Updated 9 months ago
- ☆15Updated 2 weeks ago
- ☆22Updated 10 months ago
- [NeurIPS 2024] Official code of $\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$☆43Updated 6 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆53Updated last week
- The code and data for the paper JiuZhang3.0☆43Updated 11 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆51Updated 2 years ago
- ☆35Updated last year
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 6 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆58Updated 4 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆37Updated last year
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆37Updated 6 months ago
- Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.☆21Updated 2 months ago
- [ACL 2023] Code for paper “Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation”(https://arxiv.org/abs/2305.…☆38Updated last year
- ☆14Updated last year
- ☆43Updated 3 weeks ago
- Codebase for Instruction Following without Instruction Tuning☆34Updated 7 months ago
- An Easy-to-use Hallucination Detection Framework for LLMs.☆58Updated last year
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆67Updated 2 months ago
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)☆34Updated last year
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆58Updated last month
- The repository for our paper: Neighboring Perturbations of Knowledge Editing on Large Language Models☆16Updated last year
- Mixture of Attention Heads☆44Updated 2 years ago
- This is an official implementation of the Reward rAnked Fine-Tuning Algorithm (RAFT), also known as iterative best-of-n fine-tuning or re…☆29Updated 7 months ago