Transformer related optimization, including BERT, GPT
☆6,428Mar 27, 2024Updated 2 years ago
Alternatives and similar repositories for FasterTransformer
Users that are interested in FasterTransformer are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizat…☆13,941Updated this week
- Fast and memory-efficient exact attention☆24,221Updated this week
- Ongoing research training transformer models at scale☆16,838Updated this week
- A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…☆3,408Updated this week
- Development repository for the Triton language and compiler☆19,525Updated this week
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆9,939Updated this week
- LightSeq: A High Performance Library for Sequence Processing and Generation☆3,300May 16, 2023Updated 3 years ago
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆42,586Updated this week
- FlashInfer: Kernel Library for LLM Serving☆5,867Updated this week
- NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source compone…☆13,102Updated this week
- Hackable and optimized Transformers building blocks, supporting a composable construction.☆10,509Jun 18, 2026Updated last week
- The Triton Inference Server provides an optimized cloud and edge inferencing solution.☆10,776Updated this week
- Accessible large language models via k-bit quantization for PyTorch.☆8,286Updated this week
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆4,141Updated this week
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Large Language Model Text Generation Inference☆10,862Mar 21, 2026Updated 3 months ago
- [MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration☆3,578Jul 17, 2025Updated 11 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆2,106Jun 30, 2025Updated 11 months ago
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,912Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆83,677Updated this week
- a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.☆1,547Jul 18, 2025Updated 11 months ago
- SGLang is a high-performance serving framework for large language models and multimodal models.☆29,694Updated this week
- AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (N…☆4,721Apr 9, 2026Updated 2 months ago
- Open Machine Learning Compiler Framework☆13,492Updated this week
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- PyTorch extensions for high performance and large scale training.