TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preprocessing to align with popular quantization alogirthms such as AWQ and GPTQ, and combine them with new FP8 quantization.
☆16Jul 5, 2024Updated last year
Alternatives and similar repositories for TLLM_QMM
Users that are interested in TLLM_QMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆11Jun 4, 2024Updated last year
- Awesome Quantization Paper lists with Codes☆10Feb 24, 2021Updated 5 years ago
- ☆26Feb 17, 2025Updated last year
- Möbius Transformation for Fast Inner Product Search on Graph☆22Jun 3, 2021Updated 4 years ago
- Bagua tutorials.☆13Sep 4, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆27Jan 8, 2024Updated 2 years ago
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"☆13Mar 11, 2026Updated last month
- Deep Variational Information Bottleneck (DVIB) in PyTorch.☆10Apr 25, 2020Updated 5 years ago
- ☆10Mar 6, 2016Updated 10 years ago
- Convert the PyTorch MaskRCNN model using the coremltool☆10Feb 8, 2025Updated last year
- PyTorch implementation of "Deep Transferring Quantization" (ECCV2020)☆18Jun 22, 2022Updated 3 years ago
- TensorRT-in-Action 是一个 GitHub 代码库,提供了使用 TensorRT 的代码示例,并有对应 Jupyter Notebook。☆15Jun 1, 2023Updated 2 years ago
- Let's rob the fat guys, and publish everything into feeds.☆11Jun 1, 2015Updated 10 years ago
- Fast metrics compatible with Prometheus, StatsD, and M3.☆14Sep 7, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- 16 bit serial multiplier in SystemVerilog☆13Oct 13, 2018Updated 7 years ago
- ☆14Mar 21, 2020Updated 6 years ago
- AFP is a hardware-friendly quantization framework for DNNs, which is contributed by Fangxin Liu and Wenbo Zhao.☆13Nov 8, 2021Updated 4 years ago
- Google Docs–style collaboration via the use of operational transforms☆22Mar 28, 2015Updated 11 years ago
- 放一些论文,简历之类的latex模板☆12Apr 17, 2022Updated 3 years ago
- Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, all…☆35Aug 28, 2023Updated 2 years ago
- Quantize pytorch model, support post-training quantization and quantization aware training methods☆14Jun 15, 2023Updated 2 years ago
- Adaptive floating-point based numerical format for resilient deep learning☆14Apr 11, 2022Updated 4 years ago
- implement bert in pure c++☆37Apr 29, 2020Updated 5 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- ECE 5745 Tutorial 8: SRAM Generators☆14Mar 5, 2022Updated 4 years ago
- PyTorch to CoreML: Writing custom layers with Metal shaders - torch.nn.functional.grid_sample operation☆16Jun 4, 2024Updated last year
- Generate a Verilog Source file and testbench file for a given Moore FSM☆17Nov 18, 2012Updated 13 years ago
- C library for handling proxy autoconfiguration (PAC) files.☆18Mar 22, 2018Updated 8 years ago
- The project of iOS and Android clients of SDUFE.☆13Dec 28, 2015Updated 10 years ago
- ☆10Jun 6, 2023Updated 2 years ago
- Pytorch implementation of RAPQ, IJCAI 2022☆23Jul 19, 2023Updated 2 years ago
- A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library☆13Nov 24, 2023Updated 2 years ago
- A c++ hash map/table which utilizes simd (specifically Intel x86 SSE/AVX)☆12Apr 30, 2019Updated 6 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- A highly optimized LLM inference acceleration engine for Llama and its variants.☆905Mar 18, 2026Updated 3 weeks ago
- Models and training scripts for "LSTMs for Keyword Spotting with ReRAM-based Compute-In-Memory Architectures" (ISCAS 2021).☆17Mar 25, 2021Updated 5 years ago
- ☆13Jan 31, 2016Updated 10 years ago
- A C++-based RPC framework☆12Oct 28, 2021Updated 4 years ago
- **curve_fit_utils** is a Python module containing useful tools for curve fitting☆18Dec 23, 2017Updated 8 years ago
- Converting Chinese sentences into pinyin sequences, implemented in C++, very fast and easy to deploy.☆21Jan 5, 2026Updated 3 months ago
- ROS TensorRT Inference Nodes for DIGITS on the Jetson☆15Apr 6, 2019Updated 7 years ago