NVIDIA-NeMo / RLLinks
Scalable toolkit for efficient model reinforcement
β438Updated this week
Alternatives and similar repositories for RL
Users that are interested in RL are comparing it to the libraries listed below
Sorting:
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β253Updated this week
- Scalable toolkit for efficient model alignmentβ814Updated 3 weeks ago
- SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learningβ410Updated last week
- β193Updated 4 months ago
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Frameworkβ353Updated last month
- LLM KV cache compression made easyβ508Updated this week
- ByteCheckpoint: An Unified Checkpointing Library for LFMsβ219Updated 2 months ago
- Ring attention implementation with flash attentionβ782Updated last week
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ465Updated 4 months ago
- slime is a LLM post-training framework aiming at scaling RL.β328Updated this week
- πΎ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.β379Updated last week
- Super-Efficient RLHF Training of LLMs with Parameter Reallocationβ303Updated last month
- Efficient LLM Inference over Long Sequencesβ377Updated 2 weeks ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ209Updated this week
- A project to improve skills of large language modelsβ423Updated this week
- Explorations into some recent techniques surrounding speculative decodingβ268Updated 5 months ago
- π₯ A minimal training framework for scaling FLA modelsβ170Updated last week
- β471Updated last week
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Trainingβ210Updated 10 months ago
- Async pipelined version of Verlβ100Updated 2 months ago
- Efficient triton implementation of Native Sparse Attention.β167Updated 3 weeks ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ295Updated 6 months ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inferenceβ519Updated 3 weeks ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsβ415Updated 2 weeks ago
- Triton-based implementation of Sparse Mixture of Experts.β217Updated 6 months ago
- Megatron's multi-modal data loaderβ213Updated last week
- Spec-Bench: A Comprehensive Benchmark and Unified Evaluation Platform for Speculative Decoding (ACL 2024 Findings)β277Updated 2 months ago
- Zero Bubble Pipeline Parallelismβ398Updated last month
- A framework to study AI models in Reasoning, Alignment, and use of Memory (RAM).β255Updated this week
- Large Context Attentionβ716Updated 4 months ago