一个轻量化的大模型推理框架
☆23May 26, 2025Updated last year
Alternatives and similar repositories for lite_lang
Users that are interested in lite_lang are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A light llama-like llm inference framework based on the triton kernel.☆187Jan 5, 2026Updated 5 months ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆12Jun 10, 2024Updated 2 years ago
- paper-read-notes☆13Sep 26, 2024Updated last year
- learn TensorRT from scratch🥰☆18Sep 29, 2024Updated last year
- 搜藏的希望的代码片段☆13Jun 6, 2023Updated 3 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- HunyuanDiT with TensorRT and libtorch☆18May 22, 2024Updated 2 years ago
- 使用mnn-llm对GOT-OCR2.0进行推理☆14Oct 2, 2024Updated last year
- RISCV C and Triton AI-Benchmark☆25Jan 28, 2026Updated 4 months ago
- 高性能 高精度 大陆车牌、港澳车牌、台湾车牌 韩国车牌(South Korea LPR)识别 代码开源(ncnn移植)☆45Nov 5, 2025Updated 7 months ago
- ☆14Mar 8, 2025Updated last year
- segmentation algorithm yolact use tensorrt deploy☆14May 7, 2022Updated 4 years ago
- Triton for OpenCL backend, and use mlir-translate to get source OpenCL code☆27Aug 27, 2025Updated 9 months ago
- Implementation of a histogram equalization program using CUDA. Histogram equalization is a technique for adjusting image intensities to e…☆13Jan 3, 2021Updated 5 years ago
- Flash Attention in raw Cuda C beating PyTorch☆39May 14, 2024Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Inference Llama 2 in one file of pure Cuda☆17Aug 20, 2023Updated 2 years ago
- Nuclei AI Library Optimized For RISC-V Vector☆15Oct 15, 2025Updated 7 months ago
- "Hardware, Software, and Compilers! Oh My!" tutorial files☆16Jan 25, 2020Updated 6 years ago
- Optimize softmax in triton in many cases☆24Sep 6, 2024Updated last year
- Improve the performance of atoi()☆13Jan 23, 2016Updated 10 years ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆122Mar 13, 2024Updated 2 years ago
- Awesome code, projects, books, etc. related to CUDA☆36Jun 2, 2026Updated last week
- A one-page-only CGraph-API-liked DAG project.☆27Feb 11, 2025Updated last year
- ☆20Dec 29, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Port of Funasr's Paraformer model in C/C++☆43Jun 19, 2024Updated last year
- A Minimalistic Auto-Diff Optimization Framework for Teaching and Understanding Pytorch☆27Mar 12, 2026Updated 2 months ago
- 给llvm17.0.6添加一个新后端Cpu0☆12Apr 22, 2024Updated 2 years ago
- Sifive All Aboard 系列文章翻译☆11Nov 26, 2021Updated 4 years ago
- Graph model execution API for Candle☆18Jul 27, 2025Updated 10 months ago
- ☆26Aug 15, 2023Updated 2 years ago
- A complete (FP optional), portable implementation of stdio including printf, scanf, etc. No malloc() or static buffers.☆20Apr 16, 2025Updated last year
- A Rust-based, SenseVoiceSmall☆33Apr 27, 2026Updated last month
- Penn CIS 5650 (GPU Programming and Architecture) Final Project☆44Dec 11, 2023Updated 2 years ago
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- Llama3 Streaming Chat Sample☆22Apr 24, 2024Updated 2 years ago
- Running LLaMA 3 with Rust.☆10May 21, 2024Updated 2 years ago
- ☆15Mar 30, 2024Updated 2 years ago
- A Rust crate offering similar functionality to the Python transformers package using Candle.☆15Nov 19, 2024Updated last year
- c++实现的clip推理,模型有一点点改动,但是不大,改动和导出模型的代码可以在readme里找到,模型文件都在Releases里,包括AX650的模型。新增支持ChineseCLIP☆31Jun 19, 2025Updated 11 months ago
- ☆17May 28, 2024Updated 2 years ago
- This project is intended to build and deploy an SNPE model on Qualcomm Devices, which are having unsupported layers which are not part of…☆10Oct 4, 2021Updated 4 years ago