lm-sys / lm-sys.github.ioLinks

The source of LMSYS website and blogs

☆77

Alternatives and similar repositories for lm-sys.github.io

Users that are interested in lm-sys.github.io are comparing it to the libraries listed below

Sorting:

fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆89Updated last week
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆141Updated last year
FasterDecoding / REST
REST: Retrieval-Based Speculative Decoding, NAACL 2024
☆215Updated 4 months ago
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆263Updated this week
hao-ai-lab / Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models
☆410Updated last year
tilde-research / MoMoE-impl
Memory optimized Mixture of Experts
☆73Updated 6 months ago
anyscale / llm-continuous-batching-benchmarks
☆125Updated last year
RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆222Updated last year
project-etalon / etalon
LLM Serving Performance Evaluation Harness
☆83Updated 11 months ago
jaymody / speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆99Updated 2 years ago
Infini-AI-Lab / MagicPIG
[ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation
☆248Updated last year
ByteDance-Seed / ByteCheckpoint
ByteCheckpoint: An Unified Checkpointing Library for LFMs
☆268Updated last month
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆263Updated 4 months ago
IsaacRe / vllm-kvcompress
KV cache compression for high-throughput LLM inference
☆151Updated last year
thunlp / Ouroboros
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)
☆113Updated 10 months ago
SqueezeAILab / KVQuant
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
☆404Updated last year
cli99 / llm-analysis
Latency and Memory Analysis of Transformer Models for Training and Inference
☆478Updated 9 months ago
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆131Updated 4 months ago
lm-sys / llm-decontaminator
Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"
☆316Updated 2 years ago
apple / ml-recurrent-drafter
☆219Updated last year
NVIDIA-NeMo / Automodel
Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support
☆266Updated this week
deepseek-ai / LPLB
An early research stage expert-parallel load balancer for MoE models based on linear programming.
☆495Updated 2 months ago
radixark / miles
Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.
☆830Updated this week
InternLM / turbomind
☆96Updated 10 months ago
imagination-research / sot
[ICLR 2024] Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation
☆184Updated last year
RLsys-Foundation / TritonForge
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation…
☆116Updated 2 months ago
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆127Updated 2 years ago
Infini-AI-Lab / Sequoia
scalable and robust tree-based speculative decoding algorithm
☆366Updated last year
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆394Updated 7 months ago
hao-ai-lab / Dynasor
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
☆218Updated 8 months ago