zipnn / zipnnLinks
A Lossless Compression Library for AI pipelines
☆282Updated 3 months ago
Alternatives and similar repositories for zipnn
Users that are interested in zipnn are comparing it to the libraries listed below
Sorting:
- ☆258Updated this week
- Google TPU optimizations for transformers models☆120Updated 9 months ago
- Simple high-throughput inference library☆147Updated 5 months ago
- Inference server benchmarking tool☆118Updated 3 weeks ago
- ☆107Updated last month
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆147Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆266Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆202Updated last year
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated last year
- Efficient non-uniform quantization with GPTQ for GGUF☆52Updated last month
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆420Updated last week
- 👷 Build compute kernels☆163Updated this week
- Scalable and Performant Data Loading☆311Updated this week
- ☆15Updated last month
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆283Updated this week
- An implementation of PSGD Kron second-order optimizer for PyTorch☆96Updated 3 months ago
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆60Updated this week
- Module, Model, and Tensor Serialization/Deserialization☆270Updated 2 months ago
- A collection of all available inference solutions for the LLMs☆91Updated 7 months ago
- ☆24Updated this week
- Datamodels for hugging face tokenizers☆85Updated 3 weeks ago
- ☆443Updated last month
- ScalarLM - a unified training and inference stack☆87Updated 3 weeks ago
- Benchmark suite for LLMs from Fireworks.ai☆82Updated last week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated 10 months ago
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆59Updated 11 months ago
- Manage ML configuration with pydantic☆16Updated 5 months ago
- Load compute kernels from the Hub☆304Updated last week
- Storing long contexts in tiny caches with self-study☆201Updated last week
- ☆136Updated 2 months ago