☆21Feb 5, 2024Updated 2 years ago
Alternatives and similar repositories for SqueezeLLM-gradients
Users that are interested in SqueezeLLM-gradients are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs☆19Jun 3, 2025Updated 9 months ago
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆715Aug 13, 2024Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated 2 years ago
- [IJCAI 2023] CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization.☆10Nov 3, 2023Updated 2 years ago
- [ACL 2024 Findings] Light-PEFT: Lightening Parameter-Efficient Fine-Tuning via Early Pruning☆13Sep 2, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [ACL 2025] Official code for ''Learning to Reason from Feedback at Test-Time''.☆13May 16, 2025Updated 10 months ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆416Aug 13, 2024Updated last year
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆23Mar 15, 2024Updated 2 years ago
- [ACL 2025] Squeezed Attention: Accelerating Long Prompt LLM Inference☆58Nov 20, 2024Updated last year
- Pytorch implementation of our paper accepted by ICML 2023 -- "Bi-directional Masks for Efficient N:M Sparse Training"☆13Jun 7, 2023Updated 2 years ago
- HFODetector is Python package that that is capable of detecting HFOs with STE / MNI / Hilbert detector. Detection speed is increased by u…☆12Feb 16, 2025Updated last year
- ☆11May 24, 2024Updated last year
- Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"☆11Mar 31, 2024Updated last year
- PyTorch Quantization Framework For OCP MX Datatypes.☆16May 30, 2025Updated 9 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- The official implementation for the paper 'mmSampler: Efficient Frame Sampler for Multimodal Video Retrieval'.☆11Aug 23, 2022Updated 3 years ago
- [NeurIPS 2025] Multipole Attention for Efficient Long Context Reasoning☆22Dec 5, 2025Updated 3 months ago
- ☆81Jul 21, 2022Updated 3 years ago
- [ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆30Updated this week
- Open Source Projects from Pallas Lab☆21Oct 10, 2021Updated 4 years ago
- Artifact evaluation for HPCA'24 paper Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accele…☆11Mar 3, 2024Updated 2 years ago
- PyTorch implementation of Language model compression with weighted low-rank factorization☆13Jun 28, 2023Updated 2 years ago
- To mitigate position bias in LLMs, especially in long-context scenarios, we scale only one dimension of LLMs, reducing position bias and …☆11Jun 18, 2024Updated last year
- ☆19Jan 3, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Official repository for FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models☆35Sep 19, 2025Updated 6 months ago
- OpenAI GPT model to build your personal assistant in IoT devices. Just like Alexa, Google Assistant, Siri, etc. but with your own skills,…☆12Aug 7, 2023Updated 2 years ago
- ☆11Nov 14, 2023Updated 2 years ago
- O'Reilly Course, In-Memory Computing Essentials☆10Oct 16, 2020Updated 5 years ago
- ☆11Apr 5, 2023Updated 2 years ago
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆45Apr 18, 2025Updated 11 months ago
- ☆43Nov 1, 2022Updated 3 years ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆396Feb 24, 2024Updated 2 years ago
- ☆20Nov 26, 2025Updated 4 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆19Feb 4, 2025Updated last year
- Code repository for ICLR 2025 paper "LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid"☆26Mar 2, 2025Updated last year
- [ACL 2025] RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis☆24Aug 8, 2025Updated 7 months ago
- This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs☆44Aug 14, 2024Updated last year
- ☆10Nov 16, 2024Updated last year
- Sirius, an efficient correction mechanism, which significantly boosts Contextual Sparsity models on reasoning tasks while maintaining its…☆21Sep 10, 2024Updated last year
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆41Aug 4, 2023Updated 2 years ago