facebookresearch / fastgenLinks

Simple high-throughput inference library

☆150

Alternatives and similar repositories for fastgen

Users that are interested in fastgen are comparing it to the libraries listed below

Sorting:

BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆100Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated last year
tiiuae / onebitllms
Lightweight toolkit package to train and fine-tune 1.58bit Language models
☆100Updated 6 months ago
UmerHA / triton_util
Make triton easier
☆49Updated last year
character-ai / pipelining-sft
Simple and efficient DeepSeek V3 SFT using pipeline parallel and expert parallel, with both FP8 and BF16 trainings
☆98Updated 4 months ago
ScalingIntelligence / good-kernels
Samples of good AI generated CUDA kernels
☆92Updated 6 months ago
kroggen / mamba.c
Inference of Mamba models in pure C
☆194Updated last year
HazyResearch / cartridges
Storing long contexts in tiny caches with self-study
☆218Updated last month
leloykun / modded-nanogpt
NanoGPT (124M) quality in 2.67B tokens
☆28Updated 2 months ago
JoeLi12345 / nGPT
an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)
☆108Updated 9 months ago
euclaise / supertrainer2000
☆50Updated last year
Michaelvll / llm-ie-benchmarks
A collection of reproducible inference engine benchmarks
☆38Updated 7 months ago
s-smits / grpo-optuna
Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna
☆59Updated last month
kyleliang919 / Super_Muon
☆66Updated 8 months ago
joey00072 / ohara
Collection of autoregressive model implementation
☆85Updated 7 months ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 10 months ago
VatsaDev / NanoPoor
NanoGPT-speedrunning for the poor T4 enjoyers
☆73Updated 7 months ago
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆127Updated 2 years ago
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆123Updated 10 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated last month
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆45Updated this week
apple / ml-recurrent-drafter
☆219Updated 10 months ago
lianakoleva / no-libtorch-compile
☆21Updated 9 months ago
huggingface / kernel-builder
👷 Build compute kernels
☆190Updated this week
PrimeIntellect-ai / pccl
PCCL (Prime Collective Communications Library) implements fault tolerant collective communications over IP
☆140Updated 2 months ago
google-deepmind / asyncdiloco
☆47Updated last year
Zyphra / Zyda_processing
☆39Updated last year
BlinkDL / fast.c
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆73Updated 10 months ago
IST-DASLab / gptq-gguf-toolkit
Efficient non-uniform quantization with GPTQ for GGUF
☆53Updated 2 months ago