qhliu26 / Dive-into-Big-Model-TrainingLinks
📑 Dive into Big Model Training
☆114Updated 2 years ago
Alternatives and similar repositories for Dive-into-Big-Model-Training
Users that are interested in Dive-into-Big-Model-Training are comparing it to the libraries listed below
Sorting:
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆213Updated 11 months ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆118Updated 8 months ago
- ☆108Updated 11 months ago
- ☆85Updated 3 years ago
- ☆120Updated last year
- Triton-based implementation of Sparse Mixture of Experts.☆230Updated 8 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆75Updated last year
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆119Updated last year
- ☆147Updated 2 years ago
- Best practices for testing advanced Mixtral, DeepSeek, and Qwen series MoE models using Megatron Core MoE.☆45Updated last week
- ☆154Updated 2 years ago
- Latency and Memory Analysis of Transformer Models for Training and Inference☆443Updated 3 months ago
- REST: Retrieval-Based Speculative Decoding, NAACL 2024☆205Updated 8 months ago
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆109Updated 4 months ago
- A collection of memory efficient attention operators implemented in the Triton language.☆275Updated last year
- Zero Bubble Pipeline Parallelism☆415Updated 3 months ago
- A minimal implementation of vllm.☆51Updated last year
- PyTorch bindings for CUTLASS grouped GEMM.☆107Updated 2 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆134Updated 3 weeks ago
- Bridge Megatron-Core to Hugging Face/Reinforcement Learning☆74Updated this week
- [KDD'22] Learned Token Pruning for Transformers☆98Updated 2 years ago
- Scalable PaLM implementation of PyTorch☆190Updated 2 years ago
- Triton implementation of FlashAttention2 that adds Custom Masks.☆130Updated 11 months ago
- Explorations into some recent techniques surrounding speculative decoding☆278Updated 7 months ago
- A simple calculation for LLM MFU.☆42Updated 5 months ago
- A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …☆117Updated last year
- The official implementation of the EMNLP 2023 paper LLM-FP4☆212Updated last year
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆207Updated this week
- PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".☆90Updated 2 years ago
- ☆123Updated 2 months ago