abacusai / gh200-llmLinks

Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning

☆50

Alternatives and similar repositories for gh200-llm

Users that are interested in gh200-llm are comparing it to the libraries listed below

Sorting:

Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 10 months ago
HazyResearch / train-tk
train with kittens!
☆62Updated 11 months ago
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year
yandex-research / swarm
Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"
☆144Updated last year
FasterDecoding / BitDelta
☆201Updated 10 months ago
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 11 months ago
cloneofsimo / min-fsdp
☆91Updated last year
joey00072 / ohara
Collection of autoregressive model implementation
☆86Updated 5 months ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆247Updated 8 months ago
srush / triton-autodiff
Experiment of using Tangent to autodiff triton
☆80Updated last year
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆128Updated 11 months ago
minyoungg / LTE
☆69Updated last year
open-lm-engine / lm-engine
LM engine is a library for pretraining/finetuning LLMs
☆69Updated last week
AnswerDotAI / cold-compress
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…
☆147Updated last year
PrimeIntellect-ai / pccl
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
☆126Updated last month
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆164Updated 3 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆201Updated last year
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆100Updated 3 months ago
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆126Updated 2 years ago
bloc97 / DeMo
DeMo: Decoupled Momentum Optimization
☆192Updated 10 months ago
google-deepmind / asyncdiloco
☆46Updated last year
huggingface / kernel-builder
👷 Build compute kernels
☆155Updated this week
mayank31398 / ladder-residual-inference
☆14Updated 2 months ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆97Updated 10 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆172Updated 8 months ago
kernelmachine / cbtm
Code repository for the c-BTM paper
☆107Updated 2 years ago
wdlctc / mini-s
☆52Updated 11 months ago
kyleliang919 / Super_Muon
☆64Updated 6 months ago
xrsrke / pipegoose
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
☆87Updated last year