linkedin / fmchiselLinks
fmchisel: Efficient Compression and Training Algorithms for Foundation Models
☆76Updated last month
Alternatives and similar repositories for fmchisel
Users that are interested in fmchisel are comparing it to the libraries listed below
Sorting:
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆345Updated this week
- A minimal implementation of vllm.☆62Updated last year
- Cataloging released Triton kernels.☆277Updated 3 months ago
- An early research stage expert-parallel load balancer for MoE models based on linear programming.☆456Updated 3 weeks ago
- LLM Serving Performance Evaluation Harness☆82Updated 9 months ago
- Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…☆236Updated 2 weeks ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆244Updated 7 months ago
- KV cache compression for high-throughput LLM inference☆146Updated 10 months ago
- Efficient LLM Inference over Long Sequences☆393Updated 5 months ago
- torchcomms: a modern PyTorch communications API☆302Updated this week
- Perplexity GPU Kernels☆536Updated last month
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆133Updated last year
- ☆564Updated this week
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆246Updated last year
- JAX backend for SGL☆191Updated this week
- ☆44Updated 8 months ago
- [ICLR2025 Spotlight] MagicPIG: LSH Sampling for Efficient LLM Generation☆243Updated 11 months ago
- LLM KV cache compression made easy☆709Updated this week
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆149Updated this week
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆200Updated last year
- Fast low-bit matmul kernels in Triton☆407Updated 3 weeks ago
- Autonomous GPU Kernel Generation via Deep Agents☆179Updated this week
- ☆48Updated last year
- ByteCheckpoint: An Unified Checkpointing Library for LFMs☆256Updated this week
- a minimal cache manager for PagedAttention, on top of llama3.☆127Updated last year
- Applied AI experiments and examples for PyTorch☆309Updated 3 months ago
- ring-attention experiments☆160Updated last year
- A low-latency & high-throughput serving engine for LLMs☆454Updated last month
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆216Updated 2 weeks ago
- ☆79Updated last month