TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preprocessing to align with popular quantization alogirthms such as AWQ and GPTQ, and combine them with new FP8 quantization.
☆16Jul 5, 2024Updated last year
Alternatives and similar repositories for TLLM_QMM
Users that are interested in TLLM_QMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Jun 4, 2024Updated last year
- 中文语料:大量人工标注样本,非常有价值 !!!☆11Aug 15, 2019Updated 6 years ago
- Awesome Quantization Paper lists with Codes☆10Feb 24, 2021Updated 5 years ago
- ☆26Feb 17, 2025Updated last year
- Bagua tutorials.☆13Sep 4, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆11Mar 18, 2019Updated 7 years ago
- ☆13Nov 28, 2014Updated 11 years ago
- ☆27Jan 8, 2024Updated 2 years ago
- ☆10Jun 28, 2019Updated 6 years ago
- An operator for managing Alluxio system on Kubernetes cluster☆13Jan 9, 2024Updated 2 years ago
- Deep Variational Information Bottleneck (DVIB) in PyTorch.☆10Apr 25, 2020Updated 5 years ago
- Convert the PyTorch MaskRCNN model using the coremltool☆10Feb 8, 2025Updated last year
- ☆24Nov 17, 2021Updated 4 years ago
- PyTorch implementation of "Deep Transferring Quantization" (ECCV2020)☆18Jun 22, 2022Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- TensorRT-in-Action 是一个 GitHub 代码库,提供了使用 TensorRT 的代码示例,并有对应 Jupyter Notebook。☆15Jun 1, 2023Updated 2 years ago
- Low Precision Arithmetic Simulation in PyTorch - extension for posit and beyond☆16Dec 9, 2025Updated 3 months ago
- 一个非常高效的字符串匹配工具,支持正向/反向最大匹配分词和多模式字符串精确匹配☆16Jul 29, 2023Updated 2 years ago
- support for embedding graphviz graphs inside markdown documents☆27Jan 16, 2010Updated 16 years ago
- 16 bit serial multiplier in SystemVerilog☆13Oct 13, 2018Updated 7 years ago
- ☆12Mar 13, 2023Updated 3 years ago
- AFP is a hardware-friendly quantization framework for DNNs, which is contributed by Fangxin Liu and Wenbo Zhao.☆13Nov 8, 2021Updated 4 years ago
- 这是一款刷单平台的后台管理,主要针对商家,员工,订单,任务等进行一系列的管理☆11May 8, 2019Updated 6 years ago
- ECE 5745 Tutorial 8: SRAM Generators☆14Mar 5, 2022Updated 4 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Quantize pytorch model, support post-training quantization and quantization aware training methods☆14Jun 15, 2023Updated 2 years ago
- Adaptive floating-point based numerical format for resilient deep learning☆14Apr 11, 2022Updated 3 years ago
- ☆13Oct 5, 2020Updated 5 years ago
- implement bert in pure c++☆37Apr 29, 2020Updated 5 years ago
- PyTorch to CoreML: Writing custom layers with Metal shaders - torch.nn.functional.grid_sample operation☆16Jun 4, 2024Updated last year
- Generate a Verilog Source file and testbench file for a given Moore FSM☆17Nov 18, 2012Updated 13 years ago
- A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library☆13Nov 24, 2023Updated 2 years ago
- A c++ hash map/table which utilizes simd (specifically Intel x86 SSE/AVX)☆11Apr 30, 2019Updated 6 years ago
- Models and training scripts for "LSTMs for Keyword Spotting with ReRAM-based Compute-In-Memory Architectures" (ISCAS 2021).☆17Mar 25, 2021Updated 5 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- the completion of CNNs by myself☆14Oct 8, 2015Updated 10 years ago
- **curve_fit_utils** is a Python module containing useful tools for curve fitting☆18Dec 23, 2017Updated 8 years ago
- Image to text using attention☆18Aug 30, 2017Updated 8 years ago
- Converting Chinese sentences into pinyin sequences, implemented in C++, very fast and easy to deploy.☆20Jan 5, 2026Updated 2 months ago
- ROS TensorRT Inference Nodes for DIGITS on the Jetson☆15Apr 6, 2019Updated 6 years ago
- ☆22Jul 11, 2023Updated 2 years ago
- 2020语言与智能技术竞赛:关系抽取任务(https://aistudio.baidu.com/aistudio/competition/detail/31?lang=zh_CN)☆24May 19, 2020Updated 5 years ago