High Performance Int8 GEMM Kernels for SM80 and later GPUs.
☆20Mar 11, 2025Updated 11 months ago
Alternatives and similar repositories for gemm-int8
Users that are interested in gemm-int8 are comparing it to the libraries listed below
Sorting:
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- PyTorch Quantization Framework For OCP MX Datatypes.☆16May 30, 2025Updated 9 months ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Jul 21, 2023Updated 2 years ago
- Faster Pytorch bitsandbytes 4bit fp4 nn.Linear ops☆30Mar 16, 2024Updated last year
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆35Sep 15, 2023Updated 2 years ago
- ☆31Jun 15, 2022Updated 3 years ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆113Dec 2, 2025Updated 3 months ago
- [CVPR 2025 Highlight] FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation☆26Jun 16, 2025Updated 8 months ago
- 2024维护(复刻)版本的yolov5+deepsort目标检测和追踪,能显示目标类别,能训练自己数据集.包含了一部分测试视频供常识,提供了txt和json两种格式的识别输出方式.可用于识别项目,路面识别,智能交通,毕设等各种.☆10Feb 28, 2024Updated 2 years ago
- Decentralized, transparent, verifiable and anonymous voting app☆12Feb 20, 2023Updated 3 years ago
- Deploy Yolo series algorithms on Hisilicon platform hi3516, including yolov3, yolov5, yolox, etc☆11Mar 25, 2022Updated 3 years ago
- RISC-V emulator in Zig☆15Nov 4, 2023Updated 2 years ago
- [Qt5开发及实例(第3版)][陆文周][程序源代码]☆10May 23, 2018Updated 7 years ago
- An object detection model for NMNIST larger video frame☆12Feb 24, 2022Updated 4 years ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆117Oct 15, 2025Updated 4 months ago
- Express DLA implementation for FPGA, revised based on NVDLA.☆11Oct 17, 2019Updated 6 years ago
- YOLOv12 TensorRT 端到端模型加速推理和INT8量化实现☆13Mar 5, 2025Updated 11 months ago
- Command-Line Argument Parser for C++20☆23Jan 1, 2026Updated 2 months ago
- ☆10Sep 7, 2022Updated 3 years ago
- ☆12Mar 24, 2021Updated 4 years ago
- Pedestrian detection python tool for Detectron framework☆11Nov 12, 2019Updated 6 years ago
- YOLOv1 implementation using PyTorch☆11Jan 18, 2023Updated 3 years ago
- Camouflage YOLO - (CAMOLO) trains adversarial patches to confuse the YOLO family of object detectors.☆12Oct 20, 2022Updated 3 years ago
- A tracking scheme developed by integrating six tracking methods, DeepSORT StrongSORT OSNet HybridSORT, OCSORT, and ByteTrack, using yolov…☆12Feb 22, 2024Updated 2 years ago
- A low-resource native app for sharing space with co-workers and friends.☆15Feb 20, 2025Updated last year
- Work in progress rust bindings to ggml☆12May 1, 2023Updated 2 years ago
- Lightweight behavior tree implementation in Rust☆11Jan 4, 2026Updated last month
- Official implementation of "Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent".☆21May 23, 2025Updated 9 months ago
- Tensor library for machine learning, inspired by ggml☆10Mar 17, 2024Updated last year
- ☆11Apr 3, 2023Updated 2 years ago
- EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆27Jul 30, 2025Updated 7 months ago
- The official code for [ECCV2020] "HALO: Hardware-aware Learning to Optimize"☆10Mar 22, 2023Updated 2 years ago
- A FantasyConsole compiled as WebAssembly and written in Zig☆14Feb 13, 2023Updated 3 years ago
- Training Quantized Neural Networks with a Full-precision Auxiliary Module☆13Jun 19, 2020Updated 5 years ago
- 北航校园网网关自动登录☆10Nov 8, 2021Updated 4 years ago
- ☆12Aug 18, 2023Updated 2 years ago
- YoloV6 for a bare Raspberry Pi using ncnn.☆11Jun 12, 2024Updated last year
- Codes for our paper "Exploring Bit-Slice Sparsity in Deep Neural Networks for Efficient ReRAM-Based Deployment" [NeurIPS'19 EMC2 workshop]…☆10Oct 12, 2020Updated 5 years ago
- YOLOv5s inference In C# and Training In Python☆10May 30, 2022Updated 3 years ago