Repository for CPU Kernel Generation for LLM Inference
☆28Jul 13, 2023Updated 2 years ago
Alternatives and similar repositories for QIGen
Users that are interested in QIGen are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Prototype routines for GPU quantization written using PyTorch.☆21Updated this week
- Reorder-based post-training quantization for large language model☆199May 17, 2023Updated 2 years ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆145Sep 20, 2024Updated last year
- ☆14Jun 22, 2025Updated 9 months ago
- Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM☆14Dec 27, 2023Updated 2 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- An attempt to merge ESBN with Transformers, to endow Transformers with the ability to emergently bind symbols☆16Aug 3, 2021Updated 4 years ago
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models☆55Aug 9, 2024Updated last year
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆715Aug 13, 2024Updated last year
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆185Apr 16, 2024Updated last year
- Apply Iprompt on GLM with innovative new methods. Currently support Chinese QA, English QA and Chinese poem generation.☆20Jun 16, 2022Updated 3 years ago
- ☆21Feb 11, 2022Updated 4 years ago
- This is an implementation of the audio source separation model as well as the evaluation metrics proposed in the paper "Weakly Informed A…☆11Nov 26, 2019Updated 6 years ago
- ☆162Sep 15, 2023Updated 2 years ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆397Feb 24, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Awesome Quantization Paper lists with Codes☆10Feb 24, 2021Updated 5 years ago
- Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"☆324Mar 4, 2025Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆43Jan 15, 2024Updated 2 years ago
- [ICML 2024] BiLLM: Pushing the Limit of Post-Training Quantization for LLMs☆229Jan 11, 2025Updated last year
- python package of rocm-smi-lib☆24Dec 15, 2025Updated 3 months ago
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆81Aug 30, 2023Updated 2 years ago
- ☆553Feb 8, 2026Updated 2 months ago
- A framework for few-shot evaluation of autoregressive language models.☆13Feb 14, 2024Updated 2 years ago
- A Python script to convert vobsub subtitles into srt format using tesseract for ocr☆10Sep 28, 2014Updated 11 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Implementations of Deep RL Algorithms in OpenAI Gym Environments☆15Dec 11, 2020Updated 5 years ago
- Hub for Open Source AGiXT Extensions, Chains, Prompts, and Agents.☆17Sep 27, 2023Updated 2 years ago
- The ROCdebug-agent is a library that can be loaded by ROCm Platform Runtime to provide some debugging functionality.☆32Updated this week
- Neural Network Quantization With Fractional Bit-widths☆11Feb 19, 2021Updated 5 years ago
- A simple and effective LLM pruning approach.☆860Aug 9, 2024Updated last year
- oneAPI - Data Parallel C++ course for students☆44Nov 4, 2024Updated last year
- ☆13Feb 18, 2024Updated 2 years ago
- This repository contains integer operators on GPUs for PyTorch.☆236Sep 29, 2023Updated 2 years ago
- JAX implementation of GPTQ quantization algorithm☆10Jul 19, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆893Nov 26, 2025Updated 4 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆351Aug 30, 2024Updated last year
- The official code for the "System Combination via Quality Estimation for Grammatical Error Correction" paper, published in EMNLP 2023.☆16Jan 24, 2026Updated 2 months ago
- Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".☆130Jul 11, 2023Updated 2 years ago
- ☆19Aug 10, 2024Updated last year
- [ICML 2024] Official code release accompanying the paper "diff History for Neural Language Agents" (Piterbarg, Pinto, Fergus)☆20Aug 20, 2024Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆280Nov 3, 2023Updated 2 years ago