thelongestusernameofall / 360-LLaMA-FactoryLinks
adds Sequence Parallelism into LLaMA-Factory
☆9Updated 5 months ago
Alternatives and similar repositories for 360-LLaMA-Factory
Users that are interested in 360-LLaMA-Factory are comparing it to the libraries listed below
Sorting:
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆368Updated 4 months ago
- ☆210Updated last week
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆264Updated 8 months ago
- xVerify: Efficient Answer Verifier for Reasoning Model Evaluations☆106Updated last month
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆141Updated 5 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆123Updated this week
- A series of technical report on Slow Thinking with LLM☆685Updated this week
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆370Updated 9 months ago
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆221Updated this week
- The related works and background techniques about Openai o1☆221Updated 5 months ago
- ☆141Updated last year
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning☆531Updated last week
- ☆540Updated 5 months ago
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆127Updated 2 months ago
- verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in…☆270Updated this week
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆220Updated last year
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆215Updated 3 weeks ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆212Updated this week
- ☆210Updated 2 weeks ago
- ☆202Updated 3 months ago
- ☆108Updated 6 months ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆54Updated 2 months ago
- Real-time updated, fine-grained reading list on LLM-synthetic-data.🔥☆259Updated 4 months ago
- ☆330Updated 4 months ago
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.☆203Updated 3 months ago
- a survey of long-context LLMs from four perspectives, architecture, infrastructure, training, and evaluation☆50Updated 2 months ago
- A jounery to real multimodel R1 ! We are doing on large-scale experiment☆306Updated 3 weeks ago
- ☆151Updated last month
- minimal-cost for training 0.5B R1-Zero☆734Updated 3 weeks ago
- ☆63Updated 6 months ago