bytedance / ABQ-LLM
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
☆218Updated 5 months ago
Alternatives and similar repositories for ABQ-LLM:
Users that are interested in ABQ-LLM are comparing it to the libraries listed below
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆83Updated 4 months ago
- Unified KV Cache Compression Methods for Auto-Regressive Models☆956Updated 2 months ago
- [ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2☆181Updated last week
- Support mixed-precsion inference with vllm☆80Updated 2 months ago
- ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆153Updated 4 months ago
- [COLM 2024] TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding☆242Updated 6 months ago
- Mixed precision inference by Tensorrt-LLM☆79Updated 5 months ago
- A highly optimized LLM inference acceleration engine for Llama and its variants.☆881Updated 2 weeks ago
- ☆107Updated 4 years ago
- [ICLR 2025] BitStack: Any-Size Compression of Large Language Models in Variable Memory Environments☆36Updated last month
- SQuant [ICLR22]☆131Updated 2 years ago
- adds Sequence Parallelism into LLaMA-Factory☆432Updated this week
- The framework to prune LLMs to any size and any config.☆89Updated last year
- A toolkit enhances PyTorch with specialized functions for low-bit quantized neural networks.☆197Updated 9 months ago
- APOLLO: SGD-like Memory, AdamW-level Performance☆194Updated 2 weeks ago
- Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+…☆123Updated this week
- TVM Documentation in Chinese Simplified / TVM 中文文档☆817Updated last week
- JittorGeometric is a Jittor-based graph machine learning library.☆152Updated this week
- QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.☆108Updated 2 weeks ago
- Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming mo…☆158Updated this week
- Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantization☆112Updated this week
- ✨ A synthetic dataset generation framework that produces diverse coding questions and verifiable solutions - all in one framwork☆170Updated this week
- [NeurIPS2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging☆133Updated last week
- An easy-to-use package for implementing SmoothQuant for LLMs☆95Updated 10 months ago
- Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models☆170Updated 4 months ago
- 使用deepspeed从头开始训练一个LLM,经过pretrain和sft阶段,验证llm学习知识、理解语言、回答问题的能力☆147Updated 8 months ago
- ☆125Updated 3 weeks ago
- Official Implementation of "Accel-GNN: High-Performance GPU Accelerator Design for Graph Neural Networks"☆49Updated last week
- ☆38Updated last week
- DeepRetrieval - Hacking 🔥Real Search Engines and Text/Data Retrievers with LLM + RL☆196Updated this week