lsdefine / lsrlLinks
Low ReSource Reinforcement Learning with CPU Offloading Training Support
โ81Updated last month
Alternatives and similar repositories for lsrl
Users that are interested in lsrl are comparing it to the libraries listed below
Sorting:
- A series of technical report on Slow Thinking with LLMโ759Updated 5 months ago
- A simple toolkit for benchmarking LLMs on mathematical reasoning tasks. ๐งฎโจโ273Updated last year
- โ48Updated 11 months ago
- Awesome-Long2short-on-LRMs is a collection of state-of-the-art, novel, exciting long2short methods on large reasoning models. It containsโฆโ258Updated 5 months ago
- [TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Modelsโ731Updated 3 months ago
- [ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuningโ512Updated last year
- ๐ A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, Agent, and Beyondโ342Updated 2 weeks ago
- โ57Updated 8 months ago
- a-m-team's exploration in large language modelingโ195Updated 8 months ago
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMsโ200Updated 2 months ago
- LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignmentโ395Updated last year
- The related works and background techniques about Openai o1โ221Updated last year
- A version of verl to support diverse tool useโ860Updated last month
- Official Repository of "Learning to Reason under Off-Policy Guidance"โ406Updated 4 months ago
- llm & rlโ271Updated 3 months ago
- โ333Updated 8 months ago
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learningโ260Updated 8 months ago
- โ305Updated 7 months ago
- โ185Updated 2 weeks ago
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.โ274Updated last week
- A visuailzation tool to make deep understaning and easier debugging for RLHF training.โ283Updated 11 months ago
- Official code for the paper, "Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning"โ153Updated 3 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.โ414Updated 6 months ago
- โ554Updated last year
- This is the repository of DEER, a Dynamic Early Exit in Reasoning method for Large Reasoning Language Models.โ180Updated 7 months ago
- Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"โ390Updated last year
- A live reading list for LLM data synthesis (Updated to July, 2025).โ449Updated 5 months ago
- โ214Updated 11 months ago
- โ328Updated 8 months ago
- Deepseek R1 zero tiny version own reproduce on two A100s.โ83Updated last year