☆21Apr 27, 2026Updated 3 weeks ago
Alternatives and similar repositories for aquakv
Users that are interested in aquakv are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- a libp2p-backed daemon wrapping the functionalities of go-libp2p for use in other languages☆11Feb 9, 2025Updated last year
- [ICML2025] KVTuner: Sensitivity-Aware Layer-wise Mixed Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference☆28Jan 27, 2026Updated 3 months ago
- Codebase for the Progressive Mixed-Precision Decoding paper.☆19Jul 15, 2025Updated 10 months ago
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆32Jun 5, 2025Updated 11 months ago
- [NeurIPS'2024] Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps☆102Jul 4, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code repo for "CritiPrefill: A Segment-wise Criticality-based Approach for Prefilling Acceleration in LLMs".☆17Sep 15, 2024Updated last year
- ☆21Oct 2, 2024Updated last year
- Official PyTorch implementation of CD-MOE☆12Mar 18, 2026Updated 2 months ago
- A rule-based tunnel for Android.☆52Updated this week
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆82Dec 18, 2025Updated 5 months ago
- Code for data-aware compression of DeepSeek models☆73Dec 11, 2025Updated 5 months ago
- Modded vLLM to run pipeline parallelism over public networks☆40May 20, 2025Updated last year
- ☆68Nov 4, 2024Updated last year
- PyTorch implementation of RWKV blocks☆32Jul 22, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Official Pytorch implementation of "Neural Optimal Transport with General Cost Functionals" (ICLR 2024)☆24Aug 29, 2024Updated last year
- Receiver operating characteristic curve (ROC) computation code in C++☆11Jul 17, 2017Updated 8 years ago
- [ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…☆92Apr 8, 2025Updated last year
- Source code of paper ''KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing''☆31Oct 24, 2024Updated last year
- ☆15Sep 15, 2022Updated 3 years ago
- [ACM MM2025]: MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization☆44Aug 13, 2025Updated 9 months ago
- uct tree search + supervised lerning for atari games☆12Feb 14, 2017Updated 9 years ago
- An implementation of the Sequence to Sequence model using the Lasagne library (WIP)☆12Aug 11, 2016Updated 9 years ago
- [COLM 2025] Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"☆72Jul 8, 2025Updated 10 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [not maintained anymore] [for study purpose] A simple PyTorch implementation for "Global Vectors for Word Representation".☆17Nov 7, 2019Updated 6 years ago
- Memory-efficient transformer. Work in progress.☆19Sep 17, 2022Updated 3 years ago
- (CVPR 2025) Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis☆205Jul 13, 2025Updated 10 months ago
- Highly concurrent and fast content processing for Mighty Inference Server☆10Feb 6, 2023Updated 3 years ago
- New structural distributional shifts for evaluating graph models☆16Oct 25, 2023Updated 2 years ago
- Blog post: how to do deterministic policy gradient with gumbel softmax and why you should do it.☆12Jun 20, 2017Updated 8 years ago
- A Redis-compatible in-memory database server written in Rust with MLua-based Lua 5.1 scripting☆18Nov 28, 2025Updated 5 months ago
- Interpolate between embedding points with llm☆38Jul 17, 2024Updated last year
- ☆34May 14, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Pytorch distributed backend extension with compression support☆17Mar 24, 2025Updated last year
- Voice to vector [Russian]☆15Feb 5, 2017Updated 9 years ago
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- Demo of fine-tuning QA models for answering FAQ of cloud providers documentation☆11Mar 7, 2023Updated 3 years ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆96Sep 4, 2024Updated last year
- ☆14Dec 28, 2021Updated 4 years ago
- A C++ implementation of Network Simplex Algorithm☆11Nov 12, 2018Updated 7 years ago