TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preprocessing to align with popular quantization alogirthms such as AWQ and GPTQ, and combine them with new FP8 quantization.
☆16Jul 5, 2024Updated 2 years ago
Alternatives and similar repositories for TLLM_QMM
Users that are interested in TLLM_QMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆28Jul 1, 2025Updated last year
- 中文语料:大量人工标注样本,非常有价值 !!!☆11Aug 15, 2019Updated 6 years ago
- Awesome Quantization Paper lists with Codes☆10Feb 24, 2021Updated 5 years ago
- ☆26Feb 17, 2025Updated last year
- ☆27Jan 8, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- An operator for managing Alluxio system on Kubernetes cluster☆13Jan 9, 2024Updated 2 years ago
- Deep Variational Information Bottleneck (DVIB) in PyTorch.☆10Apr 25, 2020Updated 6 years ago
- PyTorch implementation of "Deep Transferring Quantization" (ECCV2020)☆18Jun 22, 2022Updated 4 years ago
- TensorRT-in-Action 是一个 GitHub 代码库,提供了使用 TensorRT 的代码示例,并有对应 Jupyter Notebook。☆15Jun 1, 2023Updated 3 years ago
- Low Precision Arithmetic Simulation in PyTorch - extension for posit and beyond☆16Dec 9, 2025Updated 6 months ago
- Let's rob the fat guys, and publish everything into feeds.☆11Jun 1, 2015Updated 11 years ago
- Fast metrics compatible with Prometheus, StatsD, and M3.☆14Sep 7, 2023Updated 2 years ago
- ☆12Mar 13, 2023Updated 3 years ago
- Wallace and Dadda tree multiplier generator in vhdl and verilog☆14Mar 14, 2026Updated 3 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- AFP is a hardware-friendly quantization framework for DNNs, which is contributed by Fangxin Liu and Wenbo Zhao.☆13Nov 8, 2021Updated 4 years ago
- Google Docs–style collaboration via the use of operational transforms☆22Mar 28, 2015Updated 11 years ago
- Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, all…☆35Aug 28, 2023Updated 2 years ago
- Adaptive floating-point based numerical format for resilient deep learning☆14Apr 11, 2022Updated 4 years ago
- ☆13Oct 5, 2020Updated 5 years ago
- implement bert in pure c++☆37Apr 29, 2020Updated 6 years ago
- Generate a Verilog Source file and testbench file for a given Moore FSM☆17Nov 18, 2012Updated 13 years ago
- The project of iOS and Android clients of SDUFE.☆12Dec 28, 2015Updated 10 years ago
- Routing framework based on Promise using CoffeeScript☆13Jun 28, 2015Updated 11 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆11Jun 6, 2023Updated 3 years ago
- High performance NCCL plugin for Bagua.☆15Sep 15, 2021Updated 4 years ago
- Pytorch implementation of RAPQ, IJCAI 2022☆23Jul 19, 2023Updated 2 years ago
- A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library☆13Nov 24, 2023Updated 2 years ago
- A c++ hash map/table which utilizes simd (specifically Intel x86 SSE/AVX)☆12Apr 30, 2019Updated 7 years ago
- A highly optimized LLM inference acceleration engine for Llama and its variants.☆905Mar 18, 2026Updated 3 months ago
- Models and training scripts for "LSTMs for Keyword Spotting with ReRAM-based Compute-In-Memory Architectures" (ISCAS 2021).☆17Mar 25, 2021Updated 5 years ago
- A C++-based RPC framework☆12Oct 28, 2021Updated 4 years ago
- **curve_fit_utils** is a Python module containing useful tools for curve fitting☆18Dec 23, 2017Updated 8 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Image to text using attention☆18Aug 30, 2017Updated 8 years ago
- Converting Chinese sentences into pinyin sequences, implemented in C++, very fast and easy to deploy.☆23Jan 5, 2026Updated 6 months ago
- ROS TensorRT Inference Nodes for DIGITS on the Jetson☆15Apr 6, 2019Updated 7 years ago
- PyTorch implementation of "MLP-Mixer: An all-MLP Architecture for Vision" Tolstikhin et al. (2021)☆31May 13, 2021Updated 5 years ago
- ☆22Jul 11, 2023Updated 2 years ago
- OneFlow Serving☆20Apr 10, 2025Updated last year
- unofficial memobird node sdk☆13May 27, 2017Updated 9 years ago