megvii-research / IntLLaMA
IntLLaMA: A fast and light quantization solution for LLaMA
☆18Updated last year
Alternatives and similar repositories for IntLLaMA:
Users that are interested in IntLLaMA are comparing it to the libraries listed below
- An object detection codebase based on MegEngine.☆28Updated 2 years ago
- PyTorch code for Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers☆37Updated 5 months ago
- ☆30Updated 8 months ago
- An auxiliary project analysis of the characteristics of KV in DiT Attention.☆25Updated 2 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆24Updated 11 months ago
- ☆68Updated this week
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆35Updated 11 months ago
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆22Updated 8 months ago
- torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.☆21Updated 10 months ago
- Triton implement of bi-directional (non-causal) linear attention☆41Updated 2 weeks ago
- Patch convolution to avoid large GPU memory usage of Conv2D☆84Updated 3 weeks ago
- BESA is a differentiable weight pruning technique for large language models.☆14Updated 11 months ago
- ☆20Updated 2 years ago
- ☆11Updated last year
- Quantized Attention on GPU☆34Updated 2 months ago
- Benchmark tests supporting the TiledCUDA library.☆15Updated 3 months ago
- TVMScript kernel for deformable attention☆24Updated 3 years ago
- GPTQ inference TVM kernel☆38Updated 9 months ago
- [NeurIPS 2024] Search for Efficient LLMs☆12Updated last month
- Benchmarking Attention Mechanism in Vision Transformers.☆17Updated 2 years ago
- Page for the CVPR 2023 Tutorial - Efficient Neural Networks: From Algorithm Design to Practical Mobile Deployments☆12Updated last year
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆19Updated 3 weeks ago
- [CVPR-2023] Towards Any Structural Pruning☆16Updated last year
- The official repo of continuous speculative decoding☆24Updated 3 months ago
- [TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"☆30Updated 6 months ago
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning☆19Updated 2 years ago
- Odysseus: Playground of LLM Sequence Parallelism☆64Updated 8 months ago