simplify >2GB large onnx model
☆71Nov 30, 2024Updated last year
Alternatives and similar repositories for onnxsim_large_model
Users that are interested in onnxsim_large_model are comparing it to the libraries listed below
Sorting:
- export llama to onnx☆135Dec 28, 2024Updated last year
- run ChatGLM2-6B in BM1684X☆48Mar 1, 2024Updated 2 years ago
- How to export Hugging Face's 🤗 NLP Transformers models to ONNX and use the exported model with the appropriate Transformers pipeline.☆25Apr 19, 2022Updated 3 years ago
- RWKV6 in native pytorch and triton:)☆11Aug 4, 2024Updated last year
- ☆44Jul 5, 2024Updated last year
- Sophgo AI chips driver and runtime library.☆23Mar 11, 2026Updated last week
- Inference deployment of the llama3☆10Apr 21, 2024Updated last year
- LLaMa/RWKV onnx models, quantization and testcase☆366Jul 6, 2023Updated 2 years ago
- A gesture recognition module trained from scratch using Pytorch, deployed with ncnn and TensorRT.☆13May 1, 2022Updated 3 years ago
- A Toolkit to Help Optimize Large Onnx Model☆165Oct 26, 2025Updated 4 months ago
- A whisper repo for TPU☆11Jun 4, 2024Updated last year
- A simple cycle-accurate DaDianNao simulator☆13Mar 27, 2019Updated 6 years ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆45Jun 11, 2025Updated 9 months ago
- An EXPERIMENTAL implementation of Stable Diffusion in .NET, ported from Python libraries by Huggingface☆15Oct 30, 2023Updated 2 years ago
- Natural language control for Python CLI tools using locally-trained SLMs (CPU inference)☆30Feb 21, 2026Updated last month
- ☆16Nov 14, 2023Updated 2 years ago
- 适用于sophon bm1684x,基于 Langchain 与 ChatGLM 等语言模型的本地知识库问答☆14Jun 5, 2024Updated last year
- ASIC simulation of Multi-ported Memory Module. And it can offer SRAM-based dual-port basic building block to support multiple read/write …☆23May 30, 2016Updated 9 years ago
- ☆13Nov 25, 2022Updated 3 years ago
- Detection and Tracking ROS node based on CenterPoint and Kalman Filter☆23Feb 24, 2024Updated 2 years ago
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆72Sep 8, 2024Updated last year
- Low-Rank Llama Custom Training☆23Mar 27, 2024Updated last year
- This is a simple C# demo for stable-diffusion.cpp with safe code only.☆16Mar 25, 2024Updated last year
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.☆186Mar 4, 2026Updated 2 weeks ago
- llm-export can export llm model to onnx.☆344Oct 24, 2025Updated 4 months ago
- Stable Diffusion model v1.5 for TorchSharp☆19Aug 6, 2024Updated last year
- Official Repo For AAAI 2026 Accepted Paper "Rethinking the Spatio-Temporal Alignment of End-to-End 3D Perception"☆29Jan 13, 2026Updated 2 months ago
- llm deploy project based onnx.☆49Oct 9, 2024Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆34Aug 14, 2024Updated last year
- ☆31Jul 2, 2025Updated 8 months ago
- ☆37Feb 10, 2026Updated last month
- [NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer☆30Dec 6, 2023Updated 2 years ago
- minimal C implementation of speculative decoding based on llama2.c☆28Jul 15, 2024Updated last year
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- ☆124Dec 15, 2023Updated 2 years ago
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Jun 5, 2024Updated last year
- Use Yolov7 in Java for object detection and pose estimation, YOLOV7&JAVA&目标检测&姿态识别☆20Apr 25, 2023Updated 2 years ago
- AI toolbox and pretrain models.☆43Feb 7, 2024Updated 2 years ago
- Simple Synthetic Head Generator☆20Dec 2, 2024Updated last year