一个轻量化的大模型推理框架
☆21May 26, 2025Updated 9 months ago
Alternatives and similar repositories for lite_lang
Users that are interested in lite_lang are comparing it to the libraries listed below
Sorting:
- Inference deployment of the llama3☆10Apr 21, 2024Updated last year
- A light llama-like llm inference framework based on the triton kernel.☆174Jan 5, 2026Updated 2 months ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆10Jun 10, 2024Updated last year
- paper-read-notes☆13Sep 26, 2024Updated last year
- Extensions for the TG geometry library☆12Dec 3, 2024Updated last year
- learn TensorRT from scratch🥰☆17Sep 29, 2024Updated last year
- 搜藏的希望的代码片段☆12Jun 6, 2023Updated 2 years ago
- HunyuanDiT with TensorRT and libtorch☆17May 22, 2024Updated last year
- 使用mnn-llm对GOT-OCR2.0进行推理☆13Oct 2, 2024Updated last year
- RISCV C and Triton AI-Benchmark☆22Jan 28, 2026Updated last month
- 高性能 高精度 大陆车牌、港澳车牌、台湾车牌 韩国车牌(South Korea LPR)识别 代码开源(ncnn移植)☆40Nov 5, 2025Updated 4 months ago
- segmentation algorithm yolact use tensorrt deploy☆13May 7, 2022Updated 3 years ago
- Triton for OpenCL backend, and use mlir-translate to get source OpenCL code☆25Aug 27, 2025Updated 6 months ago
- Implementation of a histogram equalization program using CUDA. Histogram equalization is a technique for adjusting image intensities to e…☆12Jan 3, 2021Updated 5 years ago
- Flash Attention in raw Cuda C beating PyTorch☆38May 14, 2024Updated last year
- Inference Llama 2 in one file of pure Cuda☆16Aug 20, 2023Updated 2 years ago
- "Hardware, Software, and Compilers! Oh My!" tutorial files☆16Jan 25, 2020Updated 6 years ago
- Optimize softmax in triton in many cases☆23Sep 6, 2024Updated last year
- Improve the performance of atoi()☆13Jan 23, 2016Updated 10 years ago
- ☆20Oct 5, 2025Updated 5 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆118Mar 13, 2024Updated 2 years ago
- A learning project for getting newcomers started with a WASM JIT compiler☆14Feb 28, 2026Updated 3 weeks ago
- Port of Funasr's Paraformer model in C/C++☆39Jun 19, 2024Updated last year
- Awesome code, projects, books, etc. related to CUDA☆31Feb 3, 2026Updated last month
- A one-page-only CGraph-API-liked DAG project.☆25Feb 11, 2025Updated last year
- ☆20Dec 29, 2023Updated 2 years ago
- A Minimalistic Auto-Diff Optimization Framework for Teaching and Understanding Pytorch☆26Mar 12, 2026Updated last week
- 给llvm17.0.6添加一个新后端Cpu0☆12Apr 22, 2024Updated last year
- Graph model execution API for Candle☆17Jul 27, 2025Updated 7 months ago
- GitHub for AI4PD 2023 Workshop in Chile☆12Oct 12, 2023Updated 2 years ago
- ☆26Aug 15, 2023Updated 2 years ago
- A demo code for implementation of differentiable thermodynamic modeling in JAX.☆10Sep 18, 2021Updated 4 years ago
- make a LLVM Toy RISC-V backend step by step☆12Feb 28, 2024Updated 2 years ago
- Use time-splits for Materials Project entries for generative modeling benchmarking.☆12Mar 12, 2026Updated last week
- A complete (FP optional), portable implementation of stdio including printf, scanf, etc. No malloc() or static buffers.☆18Apr 16, 2025Updated 11 months ago
- An interface between the Materials Project software suite and the Schrodinger Python API, designed to allow for high-throughput execution…☆13Apr 8, 2024Updated last year
- An unofficial jax/haiku implementation of Crystal Graph Convolutional Neural Networks (CGCNN)☆10Dec 17, 2022Updated 3 years ago
- A Rust-based, SenseVoiceSmall☆27Mar 9, 2026Updated last week
- Penn CIS 5650 (GPU Programming and Architecture) Final Project☆43Dec 11, 2023Updated 2 years ago