liangyuwang / Tiny-DeepSpeedLinks
Tiny-DeepSpeed, a minimalistic re-implementation of the DeepSpeed library
☆14Updated 3 weeks ago
Alternatives and similar repositories for Tiny-DeepSpeed
Users that are interested in Tiny-DeepSpeed are comparing it to the libraries listed below
Sorting:
- Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (…☆140Updated this week
- slime is a LLM post-training framework aiming for RL Scaling.☆596Updated this week
- ☆64Updated 7 months ago
- PoC for "SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning" [arXiv '25]☆41Updated this week
- [ICLR 2025] PEARL: Parallel Speculative Decoding with Adaptive Draft Length☆93Updated 3 months ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"☆296Updated last week
- ☆110Updated last month
- ☆194Updated 3 months ago
- Super-Efficient RLHF Training of LLMs with Parameter Reallocation☆306Updated 2 months ago
- ☆205Updated 4 months ago
- DeepSeek Native Sparse Attention pytorch implementation☆73Updated 4 months ago
- Reproducing R1 for Code with Reliable Rewards☆237Updated 2 months ago
- ☆241Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆228Updated 2 months ago
- A flexible and efficient training framework for large-scale alignment tasks☆388Updated last week
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)☆285Updated 2 months ago
- A Comprehensive Survey on Long Context Language Modeling☆163Updated last week
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…☆74Updated 3 weeks ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆187Updated 3 months ago
- Official Repository of "Learning to Reason under Off-Policy Guidance"☆251Updated last month
- Efficient Mixture of Experts for LLM Paper List☆82Updated 7 months ago
- ☆140Updated 2 weeks ago
- [ACL 2025 main] FR-Spec: Frequency-Ranked Speculative Sampling☆36Updated this week
- This is a repo for showcasing using MCTS with LLMs to solve gsm8k problems☆85Updated 3 months ago
- TokenSkip: Controllable Chain-of-Thought Compression in LLMs☆166Updated 2 weeks ago
- qwen-nsa☆68Updated 3 months ago
- CoT-Valve: Length-Compressible Chain-of-Thought Tuning☆77Updated 5 months ago
- ☆65Updated 3 months ago
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs☆179Updated 3 weeks ago
- Research Code for preprint "Optimizing Test-Time Compute via Meta Reinforcement Finetuning".☆99Updated this week