TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preprocessing to align with popular quantization alogirthms such as AWQ and GPTQ, and combine them with new FP8 quantization.
☆16Jul 5, 2024Updated last year
Alternatives and similar repositories for TLLM_QMM
Users that are interested in TLLM_QMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Awesome Quantization Paper lists with Codes☆10Feb 24, 2021Updated 5 years ago
- ☆26Feb 17, 2025Updated last year
- Möbius Transformation for Fast Inner Product Search on Graph☆23Jun 3, 2021Updated 5 years ago
- ☆13Nov 28, 2014Updated 11 years ago
- ☆27Jan 8, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"☆13Mar 11, 2026Updated 3 months ago
- PyTorch implementation of "Deep Transferring Quantization" (ECCV2020)☆18Jun 22, 2022Updated 3 years ago
- Low Precision Arithmetic Simulation in PyTorch - extension for posit and beyond☆16Dec 9, 2025Updated 6 months ago
- 一个非常高效的字符串匹配工具,支持正向/反向最大匹配分词和多模式字符串精确匹配☆16Jul 29, 2023Updated 2 years ago
- 16 bit serial multiplier in SystemVerilog☆13Oct 13, 2018Updated 7 years ago
- ☆12Mar 13, 2023Updated 3 years ago
- Wallace and Dadda tree multiplier generator in vhdl and verilog☆14Mar 14, 2026Updated 3 months ago
- AFP is a hardware-friendly quantization framework for DNNs, which is contributed by Fangxin Liu and Wenbo Zhao.☆13Nov 8, 2021Updated 4 years ago
- 放一些论文,简历之类的latex模板☆12Apr 17, 2022Updated 4 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, all…☆35Aug 28, 2023Updated 2 years ago
- Quantize pytorch model, support post-training quantization and quantization aware training methods☆15Jun 15, 2023Updated 2 years ago
- Adaptive floating-point based numerical format for resilient deep learning☆14Apr 11, 2022Updated 4 years ago
- ☆13Oct 5, 2020Updated 5 years ago
- implement bert in pure c++☆37Apr 29, 2020Updated 6 years ago
- PyTorch to CoreML: Writing custom layers with Metal shaders - torch.nn.functional.grid_sample operation☆16Jun 4, 2024Updated 2 years ago
- ☆10Jun 6, 2023Updated 3 years ago
- High performance NCCL plugin for Bagua.☆15Sep 15, 2021Updated 4 years ago
- Pytorch implementation of RAPQ, IJCAI 2022☆23Jul 19, 2023Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- RocksDB made replicated using Robust Distributed System Nucleus (rDSN) (Delta Learning)☆16Sep 15, 2015Updated 10 years ago
- A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library☆13Nov 24, 2023Updated 2 years ago
- A c++ hash map/table which utilizes simd (specifically Intel x86 SSE/AVX)☆12Apr 30, 2019Updated 7 years ago
- Models and training scripts for "LSTMs for Keyword Spotting with ReRAM-based Compute-In-Memory Architectures" (ISCAS 2021).☆17Mar 25, 2021Updated 5 years ago
- the completion of CNNs by myself☆14Oct 8, 2015Updated 10 years ago
- A C++-based RPC framework☆12Oct 28, 2021Updated 4 years ago
- **curve_fit_utils** is a Python module containing useful tools for curve fitting☆18Dec 23, 2017Updated 8 years ago
- Converting Chinese sentences into pinyin sequences, implemented in C++, very fast and easy to deploy.☆21Jan 5, 2026Updated 5 months ago
- PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021)☆31May 13, 2021Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- OneFlow Serving☆20Apr 10, 2025Updated last year
- This project is for dealing with dynamic multiobjective optimization problems using a Multiobjective Evolutionary Algorithm.☆21May 6, 2018Updated 8 years ago
- Foolbox implementation for NeurIPS 2021 Paper: "Fast Minimum-norm Adversarial Attacks through Adaptive Norm Constraints".☆25Mar 16, 2022Updated 4 years ago
- ☆22May 24, 2026Updated 3 weeks ago
- ☆17Feb 24, 2026Updated 3 months ago
- SRAM Design using OpenSource Applications☆25Jul 16, 2021Updated 4 years ago
- transformer tokenizers (e.g. BERT tokenizer) in C++ (WIP)☆18Apr 7, 2022Updated 4 years ago