Julia-LiuJ / NLFT
The official implementation of Natural Language Fine-Tuning
☆48Updated 4 months ago
Alternatives and similar repositories for NLFT
Users that are interested in NLFT are comparing it to the libraries listed below
Sorting:
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"☆113Updated last week
- llm & rl☆115Updated this week
- Latest Advances on Long Chain-of-Thought Reasoning☆289Updated last month
- Generative AI Act II: Test Time Scaling Drives Cognition Engineering☆168Updated 3 weeks ago
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It contains…☆208Updated 2 weeks ago
- Using LLM to evaluate MMLU dataset.☆29Updated last year
- 通义千问的DPO训练☆47Updated 7 months ago
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆76Updated last month
- OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning☆135Updated 4 months ago
- 🔥 How to efficiently and effectively compress the CoTs or directly generate concise CoTs during inference while maintaining the reasonin…☆41Updated 2 weeks ago
- 这是一个高效,快捷的arXiv论文爬虫,它可以将指定时间范围,指定主题,包含指定关键词的论文信息爬取到本地,并且将其中的标题和摘要翻译成中文。☆100Updated 8 months ago
- Paper list for Efficient Reasoning.☆425Updated this week
- ☆30Updated 9 months ago
- ☆95Updated 3 weeks ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆67Updated 3 months ago
- A research repo for experiments about Reinforcement Finetuning☆46Updated last month
- ☆80Updated 3 weeks ago
- ☆173Updated last month
- ☆153Updated last month
- Implementation for the research paper "Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision".☆52Updated 5 months ago
- A comprehensive collection of process reward models.☆76Updated last week
- This is the reading list for the survey "A Survey on the Optimization of LLM-based Agents ". We will keep adding papers and improving the…☆93Updated 3 weeks ago
- 😎 A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond☆208Updated this week
- 在没有sudo权限的情况下,在linux上使用clash☆101Updated 6 months ago
- Chain of Thoughts (CoT) is so hot! so long! We need short reasoning process!☆52Updated last month
- ☆377Updated 3 months ago
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment☆331Updated last year
- Reproduction of the complete process of DeepSeek-R1 on small-scale models, including Pre-training, SFT, and RL.☆25Updated 2 months ago
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆55Updated 8 months ago
- 对llama3进行全参微调、lora微调以及qlora微调。☆195Updated 7 months ago