NART = NART is not A RunTime, a deep learning inference framework.
☆37Mar 2, 2023Updated 2 years ago
Alternatives and similar repositories for NART
Users that are interested in NART are comparing it to the libraries listed below
Sorting:
- Offline Quantization Tools for Deploy.☆142Dec 28, 2023Updated 2 years ago
- ☆11Jan 10, 2025Updated last year
- Built upon Megatron-Deepspeed and HuggingFace Trainer, EasyLLM has reorganized the code logic with a focus on usability. While enhancing …☆49Sep 18, 2024Updated last year
- Summary of system papers/frameworks/codes/tools on training or serving large model☆57Dec 17, 2023Updated 2 years ago
- ☆18Nov 29, 2023Updated 2 years ago
- ☆21Feb 11, 2022Updated 4 years ago
- Model Quantization Benchmark☆858Apr 20, 2025Updated 10 months ago
- A toolkit for developers to simplify the transformation of nn.Module instances. It's now corresponding to Pytorch.fx.☆13Apr 7, 2023Updated 2 years ago
- An Tensorflow.keras implementation of Same, Same But Different - Recovering Neural Network Quantization Error Through Weight Factorizatio…☆10Dec 18, 2019Updated 6 years ago
- A tool for model sparse based on torch.fx☆13Jun 3, 2024Updated last year
- ☆10Aug 4, 2020Updated 5 years ago
- Model Quantization Benchmark☆18Sep 30, 2025Updated 4 months ago
- ☆13Jun 16, 2024Updated last year
- ☆38Oct 12, 2024Updated last year
- ONNX Command-Line Toolbox☆35Oct 11, 2024Updated last year
- A Triton JIT runtime and ffi provider in C++☆31Updated this week
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- ☆37Jun 1, 2022Updated 3 years ago
- source code of the paper: Robust Quantization: One Model to Rule Them All☆41Mar 24, 2023Updated 2 years ago
- The docs repository of Pulsar2 which is AXera's SoC 2rd AI toolchain. Such as AX650A, AX650N☆17Feb 12, 2026Updated 2 weeks ago
- This project is the official implementation of our accepted IEEE TPAMI paper Diverse Sample Generation: Pushing the Limit of Data-free Qu…☆15Feb 26, 2023Updated 3 years ago
- AAAI2023 Efficient and Accurate Models towards Practical Deep Learning Baseline☆13Nov 29, 2022Updated 3 years ago
- [TCAD 2021] Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA☆17Jul 7, 2022Updated 3 years ago
- ☆37Aug 5, 2022Updated 3 years ago
- United Perception☆436Dec 5, 2022Updated 3 years ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Jul 21, 2023Updated 2 years ago
- PyTorch implementation of Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation☆23Feb 17, 2020Updated 6 years ago
- Inference of quantization aware trained networks using TensorRT☆83Jan 27, 2023Updated 3 years ago
- ☆23Jan 3, 2024Updated 2 years ago
- A framework to compare low-bit integer and float-point formats☆66Feb 6, 2026Updated 3 weeks ago
- [ICML 2025] This is the official PyTorch implementation of "OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniv…☆27Jun 16, 2025Updated 8 months ago
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆50Oct 21, 2023Updated 2 years ago
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆178Feb 19, 2026Updated last week
- ☆25Sep 19, 2025Updated 5 months ago
- A primitive library for neural network☆1,366Nov 24, 2024Updated last year
- Improving Post Training Neural Quantization: Layer-wise Calibration and Integer Programming☆98Jun 10, 2021Updated 4 years ago
- A model compression and acceleration toolbox based on pytorch.☆333Jan 12, 2024Updated 2 years ago
- ☆60Nov 21, 2024Updated last year
- CVFusion is an open-source deep learning compiler to fuse the OpenCV operators.☆33Aug 31, 2022Updated 3 years ago