changjonathanc / flex-nano-vllmLinks

FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.

☆303

Alternatives and similar repositories for flex-nano-vllm

Users that are interested in flex-nano-vllm are comparing it to the libraries listed below

Sorting:

huggingface / picotron_tutorial
☆225Updated last month
wolfecameron / nanoMoE
An extension of the nanoGPT repository for training small MOE models.
☆212Updated 8 months ago
huggingface / kernels
Load compute kernels from the Hub
☆327Updated last week
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆196Updated 5 months ago
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆300Updated 3 weeks ago
gpu-mode / profiling-cuda-in-torch
☆178Updated last year
thinking-machines-lab / batch_invariant_ops
☆907Updated 2 weeks ago
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆272Updated 2 weeks ago
mingyin0312 / RLFromScratch
☆451Updated 2 months ago
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆356Updated 11 months ago
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆346Updated 6 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated last year
meta-pytorch / torchft
Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)
☆452Updated last week
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆390Updated 4 months ago
allenai / OLMo-core
PyTorch building blocks for the OLMo ecosystem
☆319Updated this week
LambdaLabsML / distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
☆536Updated 3 weeks ago
microsoft / dion
Dion optimizer algorithm
☆384Updated this week
meta-pytorch / torchforge
PyTorch-native post-training at scale
☆532Updated this week
Quentin-Anthony / nanoMPI
Simple MPI implementation for prototyping or learning
☆288Updated 3 months ago
tokenbender / avataRL
rl from zero pretrain, can it be done? yes.
☆280Updated last month
huggingface / kernel-builder
👷 Build compute kernels
☆178Updated this week
PrimeIntellect-ai / prime-rl
Async RL Training at Scale
☆770Updated this week
brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆248Updated 2 months ago
McGill-NLP / nano-aha-moment
Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"
☆559Updated last month
NVIDIA-NeMo / Automodel
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
☆179Updated this week
tilde-research / MoMoE-impl
Memory optimized Mixture of Experts
☆69Updated 3 months ago
IST-DASLab / llmq
Quantized LLM training in pure CUDA/C++.
☆216Updated this week
apple / ml-cross-entropy
☆547Updated last month
ServiceNow / PipelineRL
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
☆301Updated this week
shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆305Updated last month