qhliu26 / Dive-into-Big-Model-TrainingLinks
π Dive into Big Model Training
β114Updated 2 years ago
Alternatives and similar repositories for Dive-into-Big-Model-Training
Users that are interested in Dive-into-Big-Model-Training are comparing it to the libraries listed below
Sorting:
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Trainingβ214Updated last year
- A Python library transfers PyTorch tensors between CPU and NVMeβ120Updated 9 months ago
- β110Updated last year
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.β64Updated this week
- Bridge Megatron-Core to Hugging Face/Reinforcement Learningβ97Updated this week
- β121Updated last year
- REST: Retrieval-Based Speculative Decoding, NAACL 2024β207Updated 8 months ago
- Odysseus: Playground of LLM Sequence Parallelismβ76Updated last year
- A minimal implementation of vllm.β51Updated last year
- Triton-based implementation of Sparse Mixture of Experts.β233Updated this week
- Implementation for FP8/INT8 Rollout for RL training without performence drop.β155Updated this week
- PyTorch bindings for CUTLASS grouped GEMM.β137Updated last month
- β85Updated 3 years ago
- β149Updated 2 years ago
- β154Updated 2 years ago
- Triton implementation of FlashAttention2 that adds Custom Masks.β133Updated last year
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"β120Updated last year
- A simple calculation for LLM MFU.β44Updated 5 months ago
- Zero Bubble Pipeline Parallelismβ421Updated 3 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β208Updated last week
- A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, β¦β120Updated last year
- A collection of memory efficient attention operators implemented in the Triton language.β278Updated last year
- β123Updated 3 months ago
- Latency and Memory Analysis of Transformer Models for Training and Inferenceβ449Updated 4 months ago
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".β90Updated 2 years ago
- PyTorch bindings for CUTLASS grouped GEMM.β110Updated 3 months ago
- A lightweight reinforcement learning framework that integrates seamlessly into your codebase, empowering developers to focus on algorithmβ¦β61Updated this week
- Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**β201Updated 6 months ago
- [NeurIPS'23] Speculative Decoding with Big Little Decoderβ94Updated last year
- Explorations into some recent techniques surrounding speculative decodingβ283Updated 8 months ago