dhcode-cpp / grpo-loss
☆22Updated last month
Alternatives and similar repositories for grpo-loss:
Users that are interested in grpo-loss are comparing it to the libraries listed below
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆47Updated 3 months ago
- Official completion of “Training on the Benchmark Is Not All You Need”.☆31Updated 3 months ago
- ☆36Updated 7 months ago
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆54Updated 2 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆102Updated last year
- ☆46Updated 10 months ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆58Updated 4 months ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated last year
- ☆32Updated last week
- Feeling confused about super alignment? Here is a reading list☆42Updated last year
- ☆107Updated 3 weeks ago
- code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》☆31Updated last year
- ☆38Updated last month
- Automatic prompt optimization framework for multi-step agent tasks.☆29Updated 5 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆72Updated last month
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆39Updated 8 months ago
- ☆98Updated 6 months ago
- a-m-team's exploration in large language modeling☆49Updated 3 weeks ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆112Updated 2 weeks ago
- 1.4B sLLM for Chinese and English - HammerLLM🔨☆44Updated last year
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆26Updated 4 months ago
- ☆81Updated last year
- Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment☆75Updated 10 months ago
- Copy the MLP of llama3 8 times as 8 experts , created a router with random initialization,add load balancing loss to construct an 8x8b Mo…☆26Updated 9 months ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32Updated 10 months ago
- ☆104Updated last year
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆61Updated 5 months ago
- [COLING 2025] ToolEyes: Fine-Grained Evaluation for Tool Learning Capabilities of Large Language Models in Real-world Scenarios☆65Updated 4 months ago
- The code and data for the paper JiuZhang3.0☆43Updated 10 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆76Updated last year