facebookresearch / bitsandbytes
Library for 8-bit optimizers and quantization routines.
☆714Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for bitsandbytes
- Fast Block Sparse Matrices for Pytorch☆545Updated 3 years ago
- Prune a model while finetuning or training.☆394Updated 2 years ago
- Flexible components pairing 🤗 Transformers with Pytorch Lightning☆611Updated last year
- Slicing a PyTorch Tensor Into Parallel Shards☆296Updated 3 years ago
- Code for the ALiBi method for transformer language models (ICLR 2022)☆507Updated last year
- Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch☆179Updated last year
- Implementation of a Transformer, but completely in Triton☆249Updated 2 years ago
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…☆1,535Updated 9 months ago
- Recipes are a standard, well supported set of blueprints for machine learning engineers to rapidly train models using the latest research…☆294Updated this week
- A library to inspect and extract intermediate layers of PyTorch models.☆470Updated 2 years ago
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆1,011Updated 7 months ago
- FastFormers - highly efficient transformer models for NLU☆701Updated 10 months ago
- A GPU performance profiling tool for PyTorch models☆495Updated 3 years ago
- Accelerate PyTorch models with ONNX Runtime☆356Updated 2 months ago
- Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"☆360Updated last year
- DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight …☆234Updated last year
- An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/p…☆433Updated 2 years ago
- Profiling and inspecting memory in pytorch☆1,020Updated 3 months ago
- Helps you write algorithms in PyTorch that adapt to the available (CUDA) memory☆428Updated 2 months ago
- Implementation of https://arxiv.org/abs/1904.00962☆369Updated 3 years ago
- Long Range Arena for Benchmarking Efficient Transformers☆729Updated 11 months ago
- Cockpit: A Practical Debugging Tool for Training Deep Neural Networks☆473Updated 2 years ago
- Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀☆1,659Updated 3 weeks ago
- Named tensors with first-class dimensions for PyTorch☆322Updated last year
- [Prototype] Tools for the concurrent manipulation of variably sized Tensors.☆253Updated 2 years ago
- Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint☆361Updated 7 months ago
- Parallelformers: An Efficient Model Parallelization Toolkit for Deployment☆778Updated last year
- Understanding the Difficulty of Training Transformers☆328Updated 2 years ago
- ☆365Updated last year
- Tutel MoE: An Optimized Mixture-of-Experts Implementation☆735Updated this week