AIFlowPlayer / LMDeploy-JetsonLinks

Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function independently without continuous internet access.

☆98

Alternatives and similar repositories for LMDeploy-Jetson

Users that are interested in LMDeploy-Jetson are comparing it to the libraries listed below

Sorting:

yinghuo302 / ascend-llm
基于昇腾310芯片的大语言模型部署
☆20Updated last year
Tlntin / qwen-ascend-llm
☆47Updated 8 months ago
TRT2022 / trtllm-llama
☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
☆49Updated last year
wangzhaode / llm-export
llm-export can export llm model to onnx.
☆299Updated 5 months ago
sesmfs / onnx_quant_tool
An onnx-based quantitation tool.
☆71Updated last year
ICT-ANS / StarLight
☆61Updated last year
DataXujing / TensorRT-LLM-ChatGLM3
大模型部署实战：TensorRT-LLM, Triton Inference Server, vLLM
☆26Updated last year
AXERA-TECH / OWLVIT-ONNX-AX650-CPP
☆22Updated last year
AI-Study-Han / Zero-Qwen-VL
训练一个对中文支持更好的LLaVA模型，并开源训练代码和数据。
☆64Updated 10 months ago
thb1314 / mmyolo_tensorrt
☆142Updated last year
shouxieai / tensorRT_quantization
该代码与B站上的视频 https://www.bilibili.com/video/BV18L41197Uz/?spm_id_from=333.788&vd_source=eefa4b6e337f16d87d87c2c357db8ca7 相关联。
☆69Updated last year
ModelTC / llmc
[EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…
☆510Updated last week
AIFlowPlayer / InternDog
基于InternLM2大模型的离线具身智能导盲犬
☆101Updated last year
D-Robotics-AI-Lab / DOSOD
A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space
☆87Updated 6 months ago
FeiGeChuanShu / trt2023
NVIDIA TensorRT Hackathon 2023复赛选题：通义千问Qwen-7B用TensorRT-LLM模型搭建及优化
☆42Updated last year
RethinkFun / trian_ppo
☆86Updated 9 months ago
lucasjinreal / Namo-R1
A CPU Realtime VLM in 500M. Surpassed Moondream2 and SmolVLM. Training from scratch with ease.
☆220Updated 2 months ago
sophgo / LLM-TPU
Run generative AI models in sophgo BM1684X/BM1688
☆224Updated last week
TencentARC / mllm-npu
mllm-npu: training multimodal large language models on Ascend NPUs
☆90Updated 10 months ago
sophgo / ChatGLM2-TPU
run ChatGLM2-6B in BM1684X
☆49Updated last year
torchpipe / torchpipe
Serving Inside Pytorch
☆163Updated last week
luchangli03 / onnxsim_large_model
simplify >2GB large onnx model
☆59Updated 7 months ago
harleyszhang / lite_llama
A light llama-like llm inference framework based on the triton kernel.
☆134Updated this week
modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆259Updated last month
DeepLink-org / dlinfer
☆53Updated 2 weeks ago
inisis / OnnxLLM
Large Language Model Onnx Inference Framework
☆36Updated 6 months ago
hopef / llama3_chat
Llama3 Streaming Chat Sample
☆22Updated last year
bug-developer021 / YOLOV5_optimization_on_triton
Compare multiple optimization methods on triton to imporve model service performance
☆52Updated last year
wangzhaode / onnx-llm
llm deploy project based onnx.
☆42Updated 9 months ago
Phoenix8215 / learn-ONNX-from-scratch
一大波学习onnx的案例
☆19Updated 9 months ago