[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
☆30Dec 6, 2023Updated 2 years ago
Alternatives and similar repositories for ShiftAddViT
Users that are interested in ShiftAddViT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICML 2022] ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks☆15May 18, 2022Updated 4 years ago
- An open-sourced PyTorch library for developing energy efficient multiplication-less models and applications.☆14Feb 3, 2025Updated last year
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning☆20Jul 7, 2022Updated 3 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated 2 years ago
- AFPQ code implementation☆23Nov 6, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [NeurIPS 2020] ShiftAddNet: A Hardware-Inspired Deep Network☆74Nov 16, 2020Updated 5 years ago
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆67Mar 25, 2025Updated last year
- ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization☆114Oct 15, 2024Updated last year
- [TMLR] Official PyTorch implementation of paper "Quantization Variation: A New Perspective on Training Transformers with Low-Bit Precisio…☆49Sep 27, 2024Updated last year
- ☆16Dec 9, 2023Updated 2 years ago
- BGEMM-CUDA is a CUDA-based low-bit GEMM kernel library for efficient neural network inference. It implements optimized binary and ternary…☆20Aug 30, 2024Updated last year
- [NeurIPS2024] "Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design", Ruisi Cai, Yeonju Ro, Geon-Woo …☆16Dec 16, 2024Updated last year
- The code of SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models☆23Mar 25, 2026Updated 3 months ago
- [ICML 2026]A framework to compare low-bit integer and float-point formats☆79May 6, 2026Updated last month
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆12Jun 4, 2024Updated 2 years ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- The official code for [ECCV2020] "HALO: Hardware-aware Learning to Optimize"☆10Mar 22, 2023Updated 3 years ago
- The official implementation of the DAC 2024 paper GQA-LUT☆23Dec 20, 2024Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆56Feb 28, 2023Updated 3 years ago
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.☆139May 16, 2024Updated 2 years ago
- BinaryViT: Pushing Binary Vision Transformers Towards Convolutional Models☆39Feb 4, 2024Updated 2 years ago
- ☆29May 24, 2024Updated 2 years ago
- First Latency-Aware Competitive LLM Agent Benchmark☆29Jun 3, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆25Dec 1, 2016Updated 9 years ago
- Model Quantization Benchmark☆19Apr 17, 2026Updated 2 months ago
- ☆18Sep 25, 2025Updated 9 months ago
- The official implementation of the ICML 2023 paper OFQ-ViT☆39Oct 3, 2023Updated 2 years ago
- Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer☆115Jun 26, 2023Updated 3 years ago
- [ICASSP'20] DNN-Chip Predictor: An Analytical Performance Predictor for DNN Accelerators with Various Dataflows and Hardware Architecture…☆25Oct 1, 2022Updated 3 years ago
- [HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Design☆132Jun 27, 2023Updated 3 years ago
- [CVPR 2025 Highlight] FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation☆29Jun 16, 2025Updated last year
- Code for ViTAS_Vision Transformer Architecture Search☆50Jul 22, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [CVPR 2023] PD-Quant: Post-Training Quantization Based on Prediction Difference Metric☆61Mar 23, 2023Updated 3 years ago
- The official implementation of the EMNLP 2023 paper LLM-FP4☆225Dec 15, 2023Updated 2 years ago
- ☆37Oct 10, 2024Updated last year
- [ICML 2021] "Double-Win Quant: Aggressively Winning Robustness of Quantized DeepNeural Networks via Random Precision Training and Inferen…☆16Feb 13, 2022Updated 4 years ago
- ☆16Nov 14, 2022Updated 3 years ago
- Training Quantized Neural Networks with a Full-precision Auxiliary Module☆13Jun 19, 2020Updated 6 years ago
- ☆10Apr 24, 2024Updated 2 years ago