IntLLaMA: A fast and light quantization solution for LLaMA
☆18Jul 21, 2023Updated 2 years ago
Alternatives and similar repositories for IntLLaMA
Users that are interested in IntLLaMA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- An external memory allocator example for PyTorch.☆16Aug 10, 2025Updated 7 months ago
- A model compression and acceleration toolbox based on pytorch.☆331Jan 12, 2024Updated 2 years ago
- GPTQ inference TVM kernel☆40Apr 25, 2024Updated last year
- Express DLA implementation for FPGA, revised based on NVDLA.☆11Oct 17, 2019Updated 6 years ago
- ☕️ A vscode extension for netron, support *.pdmodel, *.nb, *.onnx, *.pb, *.h5, *.tflite, *.pth, *.pt, *.mnn, *.param, etc.☆14Jun 4, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- ☆11Apr 5, 2021Updated 4 years ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- Open-sourced dataset of CoNR☆13Apr 18, 2023Updated 2 years ago
- PyTorch implementation of SSQL (Accepted to ECCV2022 oral presentation)☆73Mar 15, 2023Updated 3 years ago
- ☆65Apr 26, 2025Updated 11 months ago
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 7 months ago
- Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops☆30Mar 16, 2024Updated 2 years ago
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- A toolkit for developers to simplify the transformation of nn.Module instances. It's now corresponding to Pytorch.fx.☆13Apr 7, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Standalone Flash Attention v2 kernel without libtorch dependency☆112Sep 10, 2024Updated last year
- ☆13Mar 18, 2026Updated last week
- Reorder-based post-training quantization for large language model☆199May 17, 2023Updated 2 years ago
- An experimental project for paddle python IR.☆15Dec 4, 2023Updated 2 years ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆95Feb 20, 2026Updated last month
- CUDA 12.2 HMM demos☆20Jul 26, 2024Updated last year
- DeepSeek-V3.2-Exp DSA Warmup Lightning Indexer training operator based on tilelang☆44Nov 19, 2025Updated 4 months ago
- High Performance Int8 GEMM Kernels for SM80 and later GPUs.☆19Mar 11, 2025Updated last year
- [CVPR-2023] Towards Any Structural Pruning☆17Apr 27, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Tutorials of Extending and importing TVM with CMAKE Include dependency.☆15Oct 11, 2024Updated last year
- ☆11Jan 10, 2025Updated last year
- Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"☆323Mar 4, 2025Updated last year
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- [Neurips 2022] “ Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropogation”, Ziyu Jiang*, Xuxi Chen*, Xueqin Huan…☆19Mar 14, 2023Updated 3 years ago
- Model Quantization Benchmark☆18Sep 30, 2025Updated 5 months ago
- ChineseOcr Lite Mnn,超轻量级中文OCR PC Demo,使用MNN推理☆27Mar 26, 2021Updated 4 years ago
- ☆17Jan 1, 2024Updated 2 years ago
- Domain-Specific Architecture Generator 2☆22Oct 2, 2022Updated 3 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- ☆23Oct 7, 2021Updated 4 years ago
- This repository contains source codes for SoftCTC. Original paper can be found here: https://arxiv.org/abs/2212.02135☆19Mar 7, 2023Updated 3 years ago
- Official MegEngine Implementation of Real-Time Intermediate Flow Estimation for Video Frame Interpolation☆29Jul 14, 2022Updated 3 years ago
- ☆11Dec 26, 2025Updated 2 months ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆20Feb 23, 2024Updated 2 years ago
- 🌱 梦想家(DreamerGPT):中文大语言模型指令精调☆51Jun 17, 2023Updated 2 years ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆128Jul 13, 2024Updated last year