TLLM_QMM strips the implementation of quantized kernels of Nvidia's TensorRT-LLM, removing NVInfer dependency and exposes ease of use Pytorch module. We modified the dequantation and weight preprocessing to align with popular quantization alogirthms such as AWQ and GPTQ, and combine them with new FP8 quantization.
☆16Jul 5, 2024Updated last year
Alternatives and similar repositories for TLLM_QMM
Users that are interested in TLLM_QMM are comparing it to the libraries listed below
Sorting:
- ☆26Feb 17, 2025Updated last year
- Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, all…☆35Aug 28, 2023Updated 2 years ago
- 本实例采用sys_id + organization_id来标识一个租户,重写了mybatis-plus中的tenant_id的租户类型☆13Mar 3, 2020Updated 6 years ago
- 中文语料:大量人工标注样本,非常有价值 !!!☆11Aug 15, 2019Updated 6 years ago
- implement bert in pure c++☆37Apr 29, 2020Updated 5 years ago
- SmartBuf is a cross-language serialization and deserialization framework, and it has high performance and compression ratio like …☆11Dec 5, 2023Updated 2 years ago
- Kibana Plugin to Associate custom CSS to Dashboards☆11May 11, 2021Updated 4 years ago
- ☆12Mar 13, 2023Updated 2 years ago
- Bagua tutorials.☆13Sep 4, 2022Updated 3 years ago
- A c++ hash map/table which utilizes simd (specifically Intel x86 SSE/AVX)☆11Apr 30, 2019Updated 6 years ago
- The library to generate sample JSON data from JSON schema☆11Mar 7, 2021Updated 4 years ago
- ☆10Jun 6, 2023Updated 2 years ago
- A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library☆13Nov 24, 2023Updated 2 years ago
- Voyager is a C++ non-blocking network library which can run on Linux, Mac OS X, FreeBSD, etc.☆12Sep 8, 2022Updated 3 years ago
- Awesome Quantization Paper lists with Codes☆10Feb 24, 2021Updated 5 years ago
- 16 bit serial multiplier in SystemVerilog☆13Oct 13, 2018Updated 7 years ago
- PyTorch code for our paper "Progressive Binarization with Semi-Structured Pruning for LLMs"☆13Sep 28, 2025Updated 5 months ago
- POJO可视化编辑方案,包含POJO到JsonSchema的解析器.Automatically generate JsonSchema according to the class information of the POJO☆13Jun 17, 2022Updated 3 years ago
- The extension project of MyBatis-Plus(MP) CRUD.☆11Apr 2, 2024Updated last year
- ☆13Oct 5, 2020Updated 5 years ago
- OpenAI compatible API for open source LLMs☆16Oct 30, 2023Updated 2 years ago
- The first open source triton inference engine for Stable Diffusion, specifically for sdxl☆12Nov 27, 2023Updated 2 years ago
- Convert the PyTorch MaskRCNN model using the coremltool☆10Feb 8, 2025Updated last year
- special node-red node used in Distrbuted Node-RED☆16May 14, 2019Updated 6 years ago
- CentOS docker images, build weekly with latest security updates☆11Updated this week
- Wrap RocksDB inside a server talks like the REDIS.☆31Feb 20, 2014Updated 12 years ago
- No pain HTML parsing library.☆12Apr 2, 2018Updated 7 years ago
- A C++-based RPC framework☆12Oct 28, 2021Updated 4 years ago
- Deep Variational Information Bottleneck (DVIB) in PyTorch.☆10Apr 25, 2020Updated 5 years ago
- Low Precision Arithmetic Simulation in PyTorch - extension for posit and beyond☆16Dec 9, 2025Updated 2 months ago
- Java SPI framework☆12Nov 26, 2017Updated 8 years ago
- FMO (Friendli Model Optimizer)☆13Jan 8, 2025Updated last year
- Elasticsearch "ignore tf-idf" plugin☆13Jan 1, 2019Updated 7 years ago
- FlexFlow Serve: Low-Latency, High-Performance LLM Serving☆74Sep 15, 2025Updated 5 months ago
- ELK Timelion's data source for Grafana☆15Dec 7, 2022Updated 3 years ago
- This repository contains some sentiment analysis models and sequence tagging models, including BiLSTM, TextCNN, BERT for both tasks. All …☆13Feb 1, 2023Updated 3 years ago
- Agent and Subagent example in OpenCode and ClaudeCode☆35Sep 11, 2025Updated 5 months ago
- sh - Super fast Alfred 3+ workflow to search through Chrome history 🕵️♀️☆13Nov 28, 2020Updated 5 years ago
- ☆13Nov 28, 2014Updated 11 years ago