Qihoo360 / Light-IFLinks
☆39Updated 2 months ago
Alternatives and similar repositories for Light-IF
Users that are interested in Light-IF are comparing it to the libraries listed below
Sorting:
- a-m-team's exploration in large language modeling☆192Updated 5 months ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆153Updated 11 months ago
- 在verl上做reward的定制开发☆128Updated 6 months ago
- Train your grpo with zero dataset and low resources, 8bit/4bit/lora/qlora supported, multi-gpu supported ...☆79Updated 6 months ago
- ☆54Updated last year
- ☆47Updated 9 months ago
- ☆382Updated last month
- ☆146Updated last year
- Scaling Preference Data Curation via Human-AI Synergy☆128Updated 4 months ago
- The related works and background techniques about Openai o1☆221Updated 10 months ago
- Fantastic Data Engineering for Large Language Models☆92Updated 10 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆265Updated 9 months ago
- ☆162Updated 10 months ago
- Llama-3-SynE: A Significantly Enhanced Version of Llama-3 with Advanced Scientific Reasoning and Chinese Language Capabilities | 继续预训练提升 …☆34Updated 5 months ago
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆404Updated last week
- CLongEval: A Chinese Benchmark for Evaluating Long-Context Large Language Models☆45Updated last year
- A One-Stop Reward Model Platform☆90Updated this week
- ☆115Updated last year
- ☆309Updated 5 months ago
- 🔧Tool-Star: Empowering LLM-brained Multi-Tool Reasoner via Reinforcement Learning☆289Updated 3 weeks ago
- ☆76Updated last week
- RLHF experiments on a single A100 40G GPU. Support PPO, GRPO, REINFORCE, RAFT, RLOO, ReMax, DeepSeek R1-Zero reproducing.☆74Updated 9 months ago
- 怎么训练一个LLM分词器☆154Updated 2 years ago
- ☆142Updated 3 weeks ago
- Clustering and Ranking: Diversity-preserved Instruction Selection through Expert-aligned Quality Estimation☆90Updated last year
- ☆119Updated last year
- Finetuning LLaMA with RLHF (Reinforcement Learning with Human Feedback) based on DeepSpeed Chat☆115Updated 2 years ago
- InsTag: A Tool for Data Analysis in LLM Supervised Fine-tuning☆282Updated 2 years ago
- CFBench: A Comprehensive Constraints-Following Benchmark for LLMs☆44Updated last year
- MiroRL is an MCP-first reinforcement learning framework for deep research agent.☆172Updated 2 months ago