NVIDIA-NeMo / RLLinks
Scalable toolkit for efficient model reinforcement
β478Updated this week
Alternatives and similar repositories for RL
Users that are interested in RL are comparing it to the libraries listed below
Sorting:
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β255Updated this week
- Scalable toolkit for efficient model alignmentβ820Updated this week
- SkyRL: A Modular Full-stack RL Library for LLMsβ574Updated this week
- Super-Efficient RLHF Training of LLMs with Parameter Reallocationβ305Updated 2 months ago
- A project to improve skills of large language modelsβ456Updated this week
- LLM KV cache compression made easyβ535Updated this week
- ByteCheckpoint: An Unified Checkpointing Library for LFMsβ224Updated this week
- Efficient LLM Inference over Long Sequencesβ382Updated 2 weeks ago
- Ring attention implementation with flash attentionβ800Updated last week
- πΎ OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.β397Updated this week
- β198Updated 4 months ago
- slime is a LLM post-training framework aiming for RL Scaling.β553Updated this week
- Explorations into some recent techniques surrounding speculative decodingβ272Updated 6 months ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ473Updated 5 months ago
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Frameworkβ370Updated this week
- π₯ A minimal training framework for scaling FLA modelsβ186Updated last month
- Triton-based implementation of Sparse Mixture of Experts.β224Updated 7 months ago
- π³ Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"β720Updated 3 months ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Trainingβ212Updated 10 months ago
- Megatron's multi-modal data loaderβ217Updated this week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β341Updated 7 months ago
- β485Updated this week
- PyTorch building blocks for the OLMo ecosystemβ258Updated this week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsβ468Updated this week
- Large Context Attentionβ718Updated 5 months ago
- Parallel Scaling Law for Language Model β Beyond Parameter and Inference Time Scalingβ410Updated last month
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ526Updated last month
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β205Updated this week
- Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMsβ178Updated 3 weeks ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β314Updated 2 months ago