dhcode-cpp / grpo-loss
☆24Updated 2 months ago
Alternatives and similar repositories for grpo-loss
Users that are interested in grpo-loss are comparing it to the libraries listed below
Sorting:
- ☆18Updated last week
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆102Updated last year
- Official completion of “Training on the Benchmark Is Not All You Need”.☆31Updated 4 months ago
- ☆36Updated last month
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆47Updated 4 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆46Updated 5 months ago
- A simple implementation of ReasonGenRM.☆12Updated 3 weeks ago
- [NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models☆58Updated 5 months ago
- ☆63Updated last week
- [ICLR 2025] LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization☆36Updated 2 months ago
- Official codebase for "GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning".☆72Updated 3 weeks ago
- SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning. COLM 2024 Accepted Paper☆32Updated 11 months ago
- Feeling confused about super alignment? Here is a reading list☆42Updated last year
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆39Updated 9 months ago
- This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…☆56Updated last year
- The official repository of the Omni-MATH benchmark.☆82Updated 4 months ago
- [ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models☆77Updated last year
- [NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs☆39Updated 5 months ago
- The code for creating the iGSM datasets in papers "Physics of Language Models Part 2.1, Grade-School Math and the Hidden Reasoning Proces…☆47Updated 4 months ago
- We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLMs.☆62Updated 6 months ago
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆122Updated last month
- PreAct: Prediction Enhances Agent's Planning Ability (Coling2025)☆26Updated 5 months ago
- ☆42Updated 2 months ago
- Reformatted Alignment☆114Updated 7 months ago
- The code and data for the paper JiuZhang3.0☆44Updated 11 months ago
- Open-Pandora: On-the-fly Control Video Generation☆34Updated 5 months ago
- ☆50Updated 3 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆42Updated 10 months ago
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆32Updated 9 months ago
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆84Updated 7 months ago