luchangli03/onnxsim_large_model

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/luchangli03/onnxsim_large_model)

luchangli03 / onnxsim_large_model

simplify >2GB large onnx model

☆72

Alternatives and similar repositories for onnxsim_large_model

Users that are interested in onnxsim_large_model are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

luchangli03 / export_llama_to_onnx
View on GitHub
export llama to onnx
☆138Dec 28, 2024Updated last year
sophgo / ChatGLM2-TPU
View on GitHub
run ChatGLM2-6B in BM1684X
☆49Mar 1, 2024Updated 2 years ago
chainyo / transformers-pipeline-onnx
View on GitHub
How to export Hugging Face's 🤗 NLP Transformers models to ONNX and use the exported model with the appropriate Transformers pipeline.
☆25Apr 19, 2022Updated 4 years ago
zhiyuan1i / TorchRWKV
View on GitHub
RWKV6 in native pytorch and triton:)
☆11Aug 4, 2024Updated last year
tpoisonooo / llama.onnx
View on GitHub
LLaMa/RWKV onnx models, quantization and testcase
☆368Jul 6, 2023Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
sophgo / sophon-pipeline
View on GitHub
☆44Jul 5, 2024Updated 2 years ago
ZhangGe6 / GestureDet
View on GitHub
A gesture recognition module trained from scratch using Pytorch, deployed with ncnn and TensorRT.
☆14May 1, 2022Updated 4 years ago
xlite-dev / qwen-image-fast
View on GitHub
⚡️Qwen-Image 4.8x🎉 speedup with Hybrid Acceleration for low VRAM GPUs
☆17Oct 24, 2025Updated 8 months ago
reger-men / HPL_GPU
View on GitHub
High-Performance Linpack Benchmark adopted version for GPU backend
☆12Sep 12, 2022Updated 3 years ago
htshinichi / onnx-yolov3
View on GitHub
use yolov3 onnx model to implement object detection
☆11Apr 25, 2019Updated 7 years ago
IronySuzumiya / NiuDianNao
View on GitHub
A simple cycle-accurate DaDianNao simulator
☆13Mar 27, 2019Updated 7 years ago
Bruce-Lee-LY / decoding_attention
View on GitHub
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
☆47Jun 11, 2025Updated last year
EdVince / model_zoo
View on GitHub
Recording models
☆12Sep 19, 2023Updated 2 years ago
JKay0327 / whisper-TPU_py
View on GitHub
A whisper repo for TPU
☆11Jun 4, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
gianni-rg / SharpDiffusion
View on GitHub
An EXPERIMENTAL implementation of Stable Diffusion in .NET, ported from Python libraries by Huggingface
☆15Oct 30, 2023Updated 2 years ago
CalvinXKY / BasicCUDA
View on GitHub
A tutorial for CUDA&PyTorch
☆475Mar 23, 2026Updated 3 months ago
VITA-Group / READ-ME
View on GitHub
[NeurIPS2024] "Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design", Ruisi Cai, Yeonju Ro, Geon-Woo …
☆16Dec 16, 2024Updated last year
wangyifan2018 / ChatDoc-TPU
View on GitHub
适用于sophon bm1684x，基于 Langchain 与 ChatGLM 等语言模型的本地知识库问答
☆13Jun 5, 2024Updated 2 years ago
VITA-Group / R-Sparse
View on GitHub
[ICLR'25] R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
☆21Apr 28, 2025Updated last year
shouxieai / bevfusion_02hero
View on GitHub
☆17Nov 14, 2023Updated 2 years ago
yuunnn-w / RWKV_QQBot_BackEnd
View on GitHub
A Python QQ robot backend based on the Shamrock framework, which is used to connect large language models RWKV to QQ.一个基于Shamrock框架的Pytho…
☆23Mar 20, 2024Updated 2 years ago
wangzhaode / llm-export
View on GitHub
llm-export can export llm model to onnx.
☆353May 8, 2026Updated 2 months ago
HarryWu99 / funny_cute
View on GitHub
Some funny cute/cuteDSL code snippets
☆33Mar 2, 2026Updated 4 months ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
wejoncy / QLLM
View on GitHub
A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.
☆190Mar 23, 2026Updated 3 months ago
IntptrMax / StableDiffusionGGMLSharp
View on GitHub
This is a simple C# demo for stable-diffusion.cpp with safe code only.
☆16Mar 25, 2024Updated 2 years ago
JYS997760473 / CenterPoint-ROS-Detection-and-Tracking
View on GitHub
Detection and Tracking ROS node based on CenterPoint and Kalman Filter
☆24Feb 24, 2024Updated 2 years ago
Bruce-Lee-LY / cuda_hgemv
View on GitHub
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
☆75Sep 8, 2024Updated last year
ant-research / M2-Miner
View on GitHub
[ICLR 2026] M2-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining
☆55Apr 22, 2026Updated 2 months ago
syifan / hip_programming_examples
View on GitHub
☆15Jul 15, 2023Updated 3 years ago
JJXiangJiaoJun / cutlass_gemv
View on GitHub
GEMV implementation with CUTLASS
☆21Aug 21, 2025Updated 11 months ago
williamlzw / StableDiffusionTorchSharp
View on GitHub
Stable Diffusion model v1.5 for TorchSharp
☆18Aug 6, 2024Updated last year
wangzhaode / onnx-llm
View on GitHub
llm deploy project based onnx.
☆49Oct 9, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
daquexian / faster-rwkv
View on GitHub
☆126Dec 15, 2023Updated 2 years ago
yjybuaa / RGBDAerialTracking
View on GitHub
☆10May 23, 2023Updated 3 years ago
hikvision-research / Unified-Normalization
View on GitHub
# Unified Normalization (ACM MM'22) By Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, and Shiliang P…
☆34Mar 16, 2023Updated 3 years ago
GATECH-EIC / ShiftAddViT
View on GitHub
[NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer
☆30Dec 6, 2023Updated 2 years ago
pp1230 / LLMGPUMemEstimator
View on GitHub
The GPU RAM Estimator provides a simple tool for estimating GPU memory usage during training and inference.
☆35Apr 9, 2024Updated 2 years ago
aws-samples / fine-tune-qwen2-vl-with-llama-factory
View on GitHub
☆35Jul 2, 2025Updated last year
Qualcomm-AI-research / FP8-quantization
View on GitHub
☆172Mar 9, 2023Updated 3 years ago