dhcode-cpp / grpo-lossLinks
☆27Updated 3 months ago
Alternatives and similar repositories for grpo-loss
Users that are interested in grpo-loss are comparing it to the libraries listed below
Sorting:
- Official completion of “Training on the Benchmark Is Not All You Need”.☆34Updated 5 months ago
- ☆53Updated last week
- code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》☆32Updated last year
- ☆46Updated this week
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆57Updated last year
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆131Updated 2 months ago
- ☆38Updated 2 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆50Updated 3 weeks ago
- A simple implementation of ReasonGenRM.☆13Updated 2 months ago
- Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.☆35Updated 3 weeks ago
- ☆48Updated last year
- code for Scaling Laws of RoPE-based Extrapolation☆73Updated last year
- [ACL 2025] An official pytorch implement of the paper: Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement☆30Updated last month
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆75Updated 3 weeks ago
- ☆36Updated 9 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆61Updated 6 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32Updated last year
- A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward model…☆47Updated 2 weeks ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆78Updated last year
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆81Updated last year
- [ACL-25] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆63Updated 8 months ago
- Feeling confused about super alignment? Here is a reading list☆42Updated last year
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆64Updated 4 months ago
- Research without Re-search: Maximal Update Parametrization Yields Accurate Loss Prediction across Scales☆32Updated last year
- A Bilingual Role Evaluation Benchmark for Large Language Models☆41Updated last year
- ☆42Updated 2 months ago
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment☆75Updated last year
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆50Updated 6 months ago
- Pretrain、decay、SFT a CodeLLM from scratch 🧙♂️☆36Updated last year
- ☆44Updated 4 months ago