owenliang / mnist-onnx-runtime
MoE model with onnx runtime
☆37Updated 11 months ago
Alternatives and similar repositories for mnist-onnx-runtime:
Users that are interested in mnist-onnx-runtime are comparing it to the libraries listed below
- LLM Tokenizer with BPE algorithm☆31Updated 11 months ago
- run ChatGLM2-6B in BM1684X☆49Updated last year
- 大模型部署实战:TensorRT-LLM, Triton Inference Server, vLLM☆26Updated last year
- Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function ind…☆90Updated last year
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆46Updated last year
- vLLM Documentation in Chinese Simplified / vLLM 中文文档☆61Updated this week
- ☆38Updated last month
- LLM 推理服务性能测试☆39Updated last year
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆159Updated last year
- A simple deep learning framework inspired by Dezero and PyTorch☆29Updated 2 months ago
- ☆40Updated 8 months ago
- ☆90Updated last year
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆42Updated last year
- ☆104Updated last year
- ☆41Updated 5 months ago
- 《自然语言处理:大模型理论与实践》配套数据和代码☆61Updated 4 months ago
- Triton Documentation in Chinese Simplified / Triton 中文文档☆66Updated last week
- 帮助新手快速入门、快速使用、习 惯 OpenMMLab 开源库官方文档且能够自主上手实验,自由选择阅读更深层的知识。☆62Updated 2 years ago
- qwen models finetuning☆97Updated last month
- 一些大语言模型和多模态模型的应用,主要包括Rag,小模型,Agent,跨模态搜索,OCR等等☆164Updated 5 months ago
- qwen2 and llama3 cpp implementation☆44Updated 10 months ago
- ☆120Updated last year
- ☆68Updated 6 months ago
- unify-easy-llm(ULM)旨在打造一个简易的一键式大模型训练工具,支持Nvidia GPU、Ascend NPU等不同硬件以及常用的大模型。☆55Updated 9 months ago
- 一个基于多模态大模型的图表解析器☆29Updated 3 weeks ago
- DeepSpeed Tutorial☆95Updated 8 months ago
- 基于昇腾310芯片的大语言模型部署☆17Updated 10 months ago
- 多模态 MM +Chat 合集☆255Updated 2 months ago
- run chatglm3-6b in BM1684X☆38Updated last year
- Inference code for LLaMA models☆120Updated last year