tpoisonooo / llama.onnxView external linksLinks
LLaMa/RWKV onnx models, quantization and testcase
☆366Jul 6, 2023Updated 2 years ago
Alternatives and similar repositories for llama.onnx
Users that are interested in llama.onnx are comparing it to the libraries listed below
Sorting:
- export llama to onnx☆136Dec 28, 2024Updated last year
- ☆125Dec 15, 2023Updated 2 years ago
- llm-export can export llm model to onnx.☆344Oct 24, 2025Updated 3 months ago
- 4 bits quantization of LLaMA using GPTQ☆3,073Jul 13, 2024Updated last year
- A converter and basic tester for rwkv onnx☆43Jan 29, 2024Updated 2 years ago
- A tool for parsing, editing, optimizing, and profiling ONNX models.☆480Feb 10, 2026Updated last week
- a lightweight LLM model inference framework☆747Apr 7, 2024Updated last year
- A model compression and acceleration toolbox based on pytorch.☆333Jan 12, 2024Updated 2 years ago
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆192Aug 17, 2023Updated 2 years ago
- ONNX Optimizer☆797Feb 4, 2026Updated last week
- MegEngine到其他框架的转换器☆69Apr 27, 2023Updated 2 years ago
- row-major matmul optimization☆701Aug 20, 2025Updated 5 months ago
- A primitive library for neural network☆1,368Nov 24, 2024Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆17Jun 3, 2024Updated last year
- linux bsp app & sample for axpi (ax620a)☆36Jun 21, 2023Updated 2 years ago
- MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器☆487Oct 23, 2024Updated last year
- ☆17Jan 1, 2024Updated 2 years ago
- Transformer related optimization, including BERT, GPT☆6,392Mar 27, 2024Updated last year
- Reorder-based post-training quantization for large language model☆198May 17, 2023Updated 2 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆1,609Jul 12, 2024Updated last year
- PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.☆1,781Mar 28, 2024Updated last year
- Universal cross-platform tokenizers binding to HF and sentencepiece☆452Jan 23, 2026Updated 3 weeks ago
- Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".☆2,256Mar 27, 2024Updated last year
- ONNX Command-Line Toolbox☆35Oct 11, 2024Updated last year
- ☆28Jun 30, 2025Updated 7 months ago
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆477Mar 15, 2024Updated last year
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆184Apr 2, 2025Updated 10 months ago
- simplify >2GB large onnx model☆71Nov 30, 2024Updated last year
- Large Language Model Onnx Inference Framework☆35Nov 25, 2025Updated 2 months ago
- ppl.cv is a high-performance image processing library of openPPL supporting various platforms.☆516Oct 30, 2024Updated last year
- An easy way to run, test, benchmark and tune OpenCL kernel files☆24Aug 25, 2023Updated 2 years ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Jul 21, 2023Updated 2 years ago
- ☆141Apr 23, 2024Updated last year
- A Toolkit to Help Optimize Large Onnx Model☆164Oct 26, 2025Updated 3 months ago
- Common utilities for ONNX converters☆294Dec 16, 2025Updated 2 months ago
- ☆23Dec 8, 2022Updated 3 years ago
- Model Quantization Benchmark☆858Apr 20, 2025Updated 9 months ago
- Simplify your onnx model☆4,294Jan 29, 2026Updated 2 weeks ago
- Another ChatGLM2 implementation for GPTQ quantization☆54Oct 15, 2023Updated 2 years ago