一个轻量化的大模型推理框架
☆21May 26, 2025Updated 9 months ago
Alternatives and similar repositories for lite_lang
Users that are interested in lite_lang are comparing it to the libraries listed below
Sorting:
- Inference deployment of the llama3☆11Apr 21, 2024Updated last year
- Flash Attention in ~100 lines of CUDA (forward pass only)☆11Jun 10, 2024Updated last year
- paper-read-notes☆13Sep 26, 2024Updated last year
- 搜藏的希望的代码片段☆13Jun 6, 2023Updated 2 years ago
- 使用mnn-llm对GOT-OCR2.0进行推理☆14Oct 2, 2024Updated last year
- Inference Llama 2 in one file of pure Cuda☆17Aug 20, 2023Updated 2 years ago
- HunyuanDiT with TensorRT and libtorch☆18May 22, 2024Updated last year
- 高性能 高精度 大陆车牌、港澳车牌、台湾车牌 韩国 车牌(South Korea LPR)识别 代码开源(ncnn移植)☆41Nov 5, 2025Updated 3 months ago
- RISCV C and Triton AI-Benchmark☆23Jan 28, 2026Updated last month
- ☆20Dec 29, 2023Updated 2 years ago
- Awesome code, projects, books, etc. related to CUDA☆31Feb 3, 2026Updated 3 weeks ago
- ☆26Nov 21, 2024Updated last year
- A one-page-only CGraph-API-liked DAG project.☆26Feb 11, 2025Updated last year
- Llama3 Streaming Chat Sample☆22Apr 24, 2024Updated last year
- Optimize softmax in triton in many cases☆23Sep 6, 2024Updated last year
- ☆26Aug 15, 2023Updated 2 years ago
- ☆30Nov 16, 2024Updated last year
- A Minimalistic Auto-Diff Optimization Framework for Teaching and Understanding Pytorch☆26Updated this week
- ☆25Apr 16, 2022Updated 3 years ago
- Flash Attention in raw Cuda C beating PyTorch☆37May 14, 2024Updated last year
- yolov7-pose end2end TRT实现☆27Sep 8, 2022Updated 3 years ago
- c++实现的clip推理,模型有一点点改动,但是不大,改动和导出模型的代码可以在readme里找到,模型文件都在Releases里,包括AX650的模型。新增支持ChineseCLIP☆31Jun 19, 2025Updated 8 months ago
- "FastSAM_Awsome_Openvino" 项目展示了如何通过 OpenVINO 框架高效部署 FastSAM 模型,实现了令人瞩目的实例分割功能。该项目提供了 C++ 版本和 Python 版本两种实现,为开发者提供了在不同语言环境下使用 FastSAM 模型的选…☆36Dec 13, 2023Updated 2 years ago
- Port of Funasr's Paraformer model in C/C++☆39Jun 19, 2024Updated last year
- A light llama-like llm inference framework based on the triton kernel.☆172Jan 5, 2026Updated last month
- 无人驾驶汽车ROS学习路径。ROS learning for autonomous driving.☆38Jul 18, 2021Updated 4 years ago
- 使用ONNXRuntime部署一种用于边缘检测的轻量级密集卷积神经网络LDC,包含C++和Python两个版本的程序☆11Apr 24, 2023Updated 2 years ago
- A Multi-graph Multi-head Adaptive Temporal Graph Convolutional Network☆11May 21, 2023Updated 2 years ago
- Foundation Model for Probabilistic Electricity Price Forecasting☆18Sep 29, 2025Updated 5 months ago
- A learning project for getting newcomers started with a WASM JIT compiler☆14Updated this week
- ☆11Sep 2, 2024Updated last year
- Large Language Model Onnx Inference Framework☆35Nov 25, 2025Updated 3 months ago
- A tool convert TensorRT engine/plan to a fake onnx☆41Nov 22, 2022Updated 3 years ago
- nerf☆41Aug 1, 2022Updated 3 years ago
- Inference Llama 2 in C++☆43Apr 29, 2024Updated last year
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆58Aug 12, 2024Updated last year
- 跟着Tensorrt_pro学习各种知识☆40Nov 25, 2022Updated 3 years ago
- 使用TensorRT加速YOLOv8-Seg,完整的后端框架,包括Http服务器,Mysql数据库,ffmpeg视频推流等。☆88Oct 9, 2023Updated 2 years ago
- ☆11Mar 13, 2024Updated last year