SensiMix: Sensitivity-Aware 8-bit Index & 1-bit Value Mixed Precision Quantization for BERT Compression (PLOS One)
☆34Aug 22, 2025Updated 9 months ago
Alternatives and similar repositories for SensiMix
Users that are interested in SensiMix are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆33Dec 9, 2022Updated 3 years ago
- PET: Parameter-efficient Knowledge Distillation on Transformer (PLOS One)☆15Aug 22, 2025Updated 9 months ago
- Accurate Node Feature Estimation with Structured Variational Graph Autoencoder (KDD 2022)☆18Apr 6, 2023Updated 3 years ago
- ☆14Oct 6, 2023Updated 2 years ago
- An implementation on "Curved-Voxel Clustering for Accurate Segmentation of 3D LiDAR Point Clouds with Real-Time Performance" from IROS 20…☆242Jan 25, 2022Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Pytorch implementation of our paper accepted by ECCV 2022-- Fine-grained Data Distribution Alignment for Post-Training Quantization☆16Sep 13, 2022Updated 3 years ago
- ☆45Jun 25, 2025Updated 11 months ago
- Code for paper: Unraveling the Shift of Visual Information Flow in MLLMs: From Phased Interaction to Efficient Inference☆14Jun 7, 2025Updated last year
- Structured Pruning Adapters in PyTorch☆19Aug 30, 2023Updated 2 years ago
- Implementation of several news recommendation methods in Pytorch.☆34Jul 1, 2021Updated 4 years ago
- A PyTorch implemenation of real XNOR-popcount (1-bit op) GEMM Linear PyTorch extension support both CPU and CUDA☆25Jun 6, 2023Updated 3 years ago
- 추천시스템 논문을 읽고 구현한 Code가 저장된 Repository☆66Feb 26, 2023Updated 3 years ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆45Apr 18, 2025Updated last year
- ☆42Oct 31, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [ICCV 2025] SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs☆85Jan 17, 2026Updated 4 months ago
- [ICML 2024] Official Implementation of SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks☆41Feb 4, 2025Updated last year
- ☆60Nov 15, 2024Updated last year
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆63Oct 9, 2025Updated 8 months ago
- PyTorch Transformer from scratch: BERT, ELECTRA pretraining, and Ko→En translation.☆54Jan 12, 2026Updated 4 months ago
- [CVPR 2025] DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models☆79Apr 16, 2026Updated last month
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization☆175Nov 26, 2025Updated 6 months ago
- Vision Language Models are Biased☆113Jan 26, 2026Updated 4 months ago
- [ICLR 2025] The official pytorch implement of "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Cont…☆72Sep 18, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Prune transformer layers☆74May 30, 2024Updated 2 years ago
- O'Reilly <TinyML: 텐서플로우 라이트 Tensorflow Lite> 소스코드 저장소☆64Dec 15, 2020Updated 5 years ago
- [ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆89May 20, 2025Updated last year
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆89Oct 26, 2025Updated 7 months ago
- In progress.☆68Mar 26, 2024Updated 2 years ago
- [EMNLP 2025 main 🔥] Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"☆120Oct 12, 2025Updated 7 months ago
- Matryoshka Multimodal Models☆123Jan 22, 2025Updated last year
- ☆106Jun 10, 2025Updated 11 months ago
- Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]☆90Sep 13, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official code for paper: [CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster.☆114Jun 29, 2025Updated 11 months ago
- [ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”☆127Jan 14, 2025Updated last year
- [NeurIPS'25 Oral] Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)☆218Feb 11, 2026Updated 3 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆123Mar 6, 2024Updated 2 years ago
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆147Mar 6, 2025Updated last year
- Official GPU implementation of the paper "PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance"☆133Nov 19, 2024Updated last year
- Using Teacher Assistants to Improve Knowledge Distillation: https://arxiv.org/pdf/1902.03393.pdf☆264Oct 3, 2019Updated 6 years ago