thelongestusernameofall / 360-LLaMA-Factory
adds Sequence Parallelism into LLaMA-Factory
☆9Updated 3 months ago
Alternatives and similar repositories for 360-LLaMA-Factory:
Users that are interested in 360-LLaMA-Factory are comparing it to the libraries listed below
- ☆405Updated this week
- ☆518Updated 3 months ago
- [NAACL'24] Self-data filtering of LLM instruction-tuning data using a novel perplexity-based difficulty score, without using any other mo…☆361Updated 7 months ago
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"☆364Updated 3 months ago
- Paper list for Efficient Reasoning.☆403Updated this week
- [ACL2024] T-Eval: Evaluating Tool Utilization Capability of Large Language Models Step by Step☆267Updated last year
- ☆326Updated 2 months ago
- ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)☆619Updated 3 months ago
- A series of technical report on Slow Thinking with LLM☆651Updated 2 weeks ago
- Real-time updated, fine-grained reading list on LLM-synthetic-data.🔥☆253Updated 3 months ago
- R1-searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning☆477Updated last week
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆190Updated last week
- The related works and background techniques about Openai o1☆221Updated 3 months ago
- Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models☆256Updated 7 months ago
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆240Updated 5 months ago
- minimal-cost for training 0.5B R1-Zero☆706Updated this week
- Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]☆552Updated 4 months ago
- 欢迎来到 "LLM-travel" 仓库!探索大语言模型(LLM)的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。☆316Updated 9 months ago
- [ACL2024 Findings] Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models☆346Updated last year
- ☆273Updated 9 months ago
- ☆673Updated last week
- Latest Advances on Long Chain-of-Thought Reasoning☆241Updated 2 weeks ago
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆133Updated 4 months ago
- Large Language Models(LLMs) of Code☆17Updated 2 years ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆107Updated this week
- Awesome RL-based LLM Reasoning☆450Updated 2 weeks ago
- ☆179Updated 2 weeks ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. 🧮✨☆205Updated last year
- AN O1 REPLICATION FOR CODING☆334Updated 4 months ago
- Awesome RL Reasoning Recipes ("Triple R")☆470Updated this week