qhliu26 / Dive-into-Big-Model-TrainingLinks

📑 Dive into Big Model Training

☆114

Alternatives and similar repositories for Dive-into-Big-Model-Training

Users that are interested in Dive-into-Big-Model-Training are comparing it to the libraries listed below

Sorting:

RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆213Updated 11 months ago
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆118Updated 8 months ago
stanford-futuredata / stk
☆108Updated 11 months ago
thu-pacman / FasterMoE
☆85Updated 3 years ago
anyscale / llm-continuous-batching-benchmarks
☆120Updated last year
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆230Updated 8 months ago
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆75Updated last year
teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
☆119Updated last year
epfml / dynamic-sparse-flash-attention
☆147Updated 2 years ago
yanring / Megatron-MoE-ModelZoo
Best practices for testing advanced Mixtral, DeepSeek, and Qwen series MoE models using Megatron Core MoE.
☆45Updated last week
haochengxi / Train_Transformers_with_INT4
☆154Updated 2 years ago
cli99 / llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
☆443Updated 3 months ago
FasterDecoding / REST
REST: Retrieval-Based Speculative Decoding, NAACL 2024
☆205Updated 8 months ago
thunlp / Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆109Updated 4 months ago
FlagOpen / FlagAttention
A collection of memory efficient attention operators implemented in the Triton language.
☆275Updated last year
sail-sg / zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
☆415Updated 3 months ago
MDK8888 / vllmini
A minimal implementation of vllm.
☆51Updated last year
tgale96 / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆107Updated 2 months ago
fanshiqing / grouped_gemm
PyTorch bindings for CUTLASS grouped GEMM.
☆134Updated 3 weeks ago
ISEEKYAN / mbridge
Bridge Megatron-Core to Hugging Face/Reinforcement Learning
☆74Updated this week
kssteven418 / LTP
[KDD'22] Learned Token Pruning for Transformers
☆98Updated 2 years ago
hpcaitech / PaLM-colossalai
Scalable PaLM implementation of PyTorch
☆190Updated 2 years ago
alexzhang13 / flashattention2-custom-mask
Triton implementation of FlashAttention2 that adds Custom Masks.
☆130Updated 11 months ago
lucidrains / speculative-decoding
Explorations into some recent techniques surrounding speculative decoding
☆278Updated 7 months ago
CalvinXKY / mfu_calculation
A simple calculation for LLM MFU.
☆42Updated 5 months ago
Hsword / Hetu
A high-performance distributed deep learning system targeting large-scale and automated distributed training. If you have any interests, …
☆117Updated last year
nbasyl / LLM-FP4
The official implementation of the EMNLP 2023 paper LLM-FP4
☆212Updated last year
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆207Updated this week
zhengzangw / Sequence-Scheduling
PyTorch implementation of paper "Response Length Perception and Sequence Scheduling: An LLM-Empowered LLM Inference Pipeline".
☆90Updated 2 years ago
Dao-AILab / grouped-latent-attention
☆123Updated 2 months ago