A Lossless Compression Library for AI pipelines
☆315Apr 11, 2026Updated last month
Alternatives and similar repositories for zipnn
Users that are interested in zipnn are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Gateway for connecting application services in different domains, networks, and cloud infrastructures☆23Feb 1, 2026Updated 3 months ago
- Top papers related to LLM-based agent evaluation☆91Oct 21, 2025Updated 7 months ago
- Official PyTorch Implementation for the "Recovering the Pre-Fine-Tuning Weights of Generative Models" paper (ICML 2024).☆86Apr 15, 2025Updated last year
- Zebin Ren and Animesh Trivedi. 2023. Performance Characterization of Modern Storage Stacks: POSIX I/O, libaio, SPDK, and io_uring. In Pro…☆13Mar 30, 2023Updated 3 years ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆96Sep 4, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training☆40May 4, 2026Updated 3 weeks ago
- This repository contains code for the MicroAdam paper.☆21Dec 14, 2024Updated last year
- Achieve state of the art inference performance with modern accelerators on Kubernetes☆3,241Updated this week
- An implementation of the Llama architecture, to instruct and delight☆21May 31, 2025Updated 11 months ago
- ☆16May 14, 2025Updated last year
- some mixture of experts architecture implementations☆27Mar 22, 2024Updated 2 years ago
- Official PyTorch implementation for ״ lassification-Regression for Chart Comprehension״☆26Feb 5, 2025Updated last year
- ☆50Jan 18, 2024Updated 2 years ago
- ☆53Oct 29, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Quality Controlled Paraphrase Generation (ACL 2022)☆71Sep 17, 2025Updated 8 months ago
- Estimate resources needed to train LLMs☆14Feb 10, 2026Updated 3 months ago
- Yet another frontend for LLM, written using .NET and WinUI 3☆11Sep 14, 2025Updated 8 months ago
- ☆63May 16, 2025Updated last year
- FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Data on GPUs☆14Sep 26, 2023Updated 2 years ago
- A tiny FP8 multiplication unit written in Verilog. TinyTapeout 2 submission.☆14Nov 23, 2022Updated 3 years ago
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆340Jul 2, 2024Updated last year
- ☆47Feb 26, 2026Updated 2 months ago
- The driver for LMCache core to run in vLLM☆66Feb 4, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆22Apr 17, 2025Updated last year
- ☆18Sep 5, 2024Updated last year
- ☆15Oct 17, 2023Updated 2 years ago
- Universal Neurons in GPT2 Language Models☆30May 28, 2024Updated last year
- KV cache compression for high-throughput LLM inference☆157Feb 5, 2025Updated last year
- Generate interleaved text and image content in a structured format you can directly pass to downstream APIs.☆29Oct 18, 2024Updated last year
- Codebase for ICML'24 paper: Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs☆27Jun 25, 2024Updated last year
- llm-d Router: The intelligent entry point for inference requests☆200Updated this week
- A sleek, customizable interface for managing LLMs with responsive design and easy agent personalization.☆17Aug 30, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆541Feb 10, 2025Updated last year
- Retrieval with Learned Similarities (http://arxiv.org/abs/2407.15462, WWW'25 Oral)☆51Apr 23, 2025Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆148Dec 4, 2024Updated last year
- EvaByte: Efficient Byte-level Language Models at Scale☆118Apr 22, 2025Updated last year
- Models as a Service☆75Oct 21, 2025Updated 7 months ago
- MLIR tools and dialect for GraphBLAS☆18Mar 30, 2022Updated 4 years ago
- Spiker is a Python-based framework for designing and generating efficient FPGA hardware accelerators for spiking neural networks, coverin…☆48Feb 6, 2025Updated last year