character-ai / pipelining-sftLinks

Simple and efficient DeepSeek V3 SFT using pipeline parallel and expert parallel, with both FP8 and BF16 trainings

☆101

Alternatives and similar repositories for pipelining-sft

Users that are interested in pipelining-sft are comparing it to the libraries listed below

Sorting:

HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆218Updated this week
snowflakedb / ArcticTraining
ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)
☆257Updated this week
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
tilde-research / MoMoE-impl
Memory optimized Mixture of Experts
☆69Updated 4 months ago
ServiceNow / Fast-LLM
Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research
☆265Updated this week
NVIDIA-NeMo / Automodel
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
☆194Updated this week
ServiceNow / PipelineRL
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
☆322Updated this week
facebookresearch / fastgen
Simple high-throughput inference library
☆150Updated 6 months ago
radixark / miles
☆344Updated this week
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆304Updated last month
SeunghyunSEO / optimized_hf_llama_class_for_training
☆48Updated last year
wdlctc / mini-s
☆53Updated last year
collinear-ai / spider
Streamline on-policy/off-policy distillation workflows in a few lines of code
☆67Updated this week
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated last year
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆46Updated last year
huggingface / kernels
Load compute kernels from the Hub
☆348Updated this week
google-deepmind / asyncdiloco
☆47Updated last year
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆100Updated 6 months ago
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆127Updated 2 years ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆107Updated 9 months ago
kevinwu23 / StanfordFineTuneBench
☆31Updated last year
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 10 months ago
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆320Updated last month
huggingface / kernel-builder
👷 Build compute kernels
☆192Updated this week
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆186Updated 10 months ago
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆111Updated 7 months ago
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆84Updated 2 weeks ago
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆225Updated last week
withmartian / routerbench
The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System
☆151Updated last year
haizelabs / j1-micro
j1-micro (1.7B) & j1-nano (600M) are absurdly tiny but mighty reward models.
☆99Updated 4 months ago