fzp0424 / MT-R1-ZeroLinks
[EMNLP'25] Code for paper "MT-R1-Zero: Advancing LLM-based Machine Translation via R1-Zero-like Reinforcement Learning"
☆60Updated 6 months ago
Alternatives and similar repositories for MT-R1-Zero
Users that are interested in MT-R1-Zero are comparing it to the libraries listed below
Sorting:
- [ACL 2025] We introduce ScaleQuest, a scalable, novel and cost-effective data synthesis method to unleash the reasoning capability of LLM…☆68Updated last year
- [ACL'24] Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning☆180Updated 4 months ago
- Scaling Preference Data Curation via Human-AI Synergy☆122Updated 4 months ago
- SimpleDeepSearcher: Deep Information Seeking via Web-Powered Reasoning Trajectory Synthesis☆109Updated 5 months ago
- ☆169Updated 6 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆136Updated 6 months ago
- ☆49Updated last year
- The demo, code and data of FollowRAG☆75Updated 4 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆51Updated last year
- Adapt an LLM model to a Mixture-of-Experts model using Parameter Efficient finetuning (LoRA), injecting the LoRAs in the FFN.☆63Updated last week
- Fantastic Data Engineering for Large Language Models☆91Updated 10 months ago
- ☆118Updated last year
- ☆39Updated 3 months ago
- ☆157Updated 3 weeks ago
- Extrapolating RLVR to General Domains without Verifiers☆176Updated 2 months ago
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compression☆115Updated 6 months ago
- [ACL 2024] The official codebase for the paper "Self-Distillation Bridges Distribution Gap in Language Model Fine-tuning".☆135Updated last year
- Test-time preferenece optimization (ICML 2025).☆168Updated 5 months ago
- ☆59Updated last year
- The official repository of the Omni-MATH benchmark.☆88Updated 10 months ago
- Pre-trained, Scalable, High-performance Reward Models via Policy Discriminative Learning.☆158Updated last month
- [SIGIR'24] The official implementation code of MOELoRA.☆184Updated last year
- Official completion of “Training on the Benchmark Is Not All You Need”.☆37Updated 10 months ago
- ☆50Updated 3 months ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆220Updated 3 months ago
- Reformatted Alignment☆112Updated last year
- The implementation of paper "LLM Critics Help Catch Bugs in Mathematics: Towards a Better Mathematical Verifier with Natural Language Fee…☆37Updated last year
- ☆96Updated 2 years ago
- [ICLR 2025] 🧬 RegMix: Data Mixture as Regression for Language Model Pre-training (Spotlight)☆174Updated 8 months ago
- ☆84Updated last year