一个轻量化的大模型推理框架
☆22May 26, 2025Updated 11 months ago
Alternatives and similar repositories for lite_lang
Users that are interested in lite_lang are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A light llama-like llm inference framework based on the triton kernel.☆180Jan 5, 2026Updated 3 months ago
- Flash Attention in ~100 lines of CUDA (forward pass only)☆12Jun 10, 2024Updated last year
- paper-read-notes☆13Sep 26, 2024Updated last year
- learn TensorRT from scratch🥰☆18Sep 29, 2024Updated last year
- 搜藏的希望的代码片段☆13Jun 6, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- HunyuanDiT with TensorRT and libtorch☆18May 22, 2024Updated last year
- 使用mnn-llm对GOT-OCR2.0进行推理☆14Oct 2, 2024Updated last year
- RISCV C and Triton AI-Benchmark☆24Jan 28, 2026Updated 3 months ago
- 高性能 高精度 大陆车牌、港澳车牌、台湾车牌 韩国车牌(South Korea LPR)识别 代码开源(ncnn移植)☆43Nov 5, 2025Updated 5 months ago
- ☆14Mar 8, 2025Updated last year
- segmentation algorithm yolact use tensorrt deploy☆14May 7, 2022Updated 3 years ago
- Triton for OpenCL backend, and use mlir-translate to get source OpenCL code☆27Aug 27, 2025Updated 8 months ago
- Implementation of a histogram equalization program using CUDA. Histogram equalization is a technique for adjusting image intensities to e…☆13Jan 3, 2021Updated 5 years ago
- Flash Attention in raw Cuda C beating PyTorch☆38May 14, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Inference Llama 2 in one file of pure Cuda☆17Aug 20, 2023Updated 2 years ago
- Nuclei AI Library Optimized For RISC-V Vector☆15Oct 15, 2025Updated 6 months ago
- Optimize softmax in triton in many cases☆24Sep 6, 2024Updated last year
- Improve the performance of atoi()☆13Jan 23, 2016Updated 10 years ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Mar 13, 2024Updated 2 years ago
- ☆20Oct 5, 2025Updated 6 months ago
- A learning project for getting newcomers started with a WASM JIT compiler☆14Feb 28, 2026Updated 2 months ago
- Awesome code, projects, books, etc. related to CUDA☆35Mar 30, 2026Updated last month
- ☆20Dec 29, 2023Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Port of Funasr's Paraformer model in C/C++☆43Jun 19, 2024Updated last year
- A Minimalistic Auto-Diff Optimization Framework for Teaching and Understanding Pytorch☆27Mar 12, 2026Updated last month
- Sifive All Aboard 系列文章翻译☆11Nov 26, 2021Updated 4 years ago
- Graph model execution API for Candle☆17Jul 27, 2025Updated 9 months ago
- ☆26Aug 15, 2023Updated 2 years ago
- A demo code for implementation of differentiable thermodynamic modeling in JAX.☆10Sep 18, 2021Updated 4 years ago
- make a LLVM Toy RISC-V backend step by step☆12Feb 28, 2024Updated 2 years ago
- An unofficial jax/haiku implementation of Crystal Graph Convolutional Neural Networks (CGCNN)☆10Dec 17, 2022Updated 3 years ago
- An interface between the Materials Project software suite and the Schrodinger Python API, designed to allow for high-throughput execution…☆13Apr 8, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A Rust-based, SenseVoiceSmall☆32Apr 13, 2026Updated 2 weeks ago
- Penn CIS 5650 (GPU Programming and Architecture) Final Project☆44Dec 11, 2023Updated 2 years ago
- Llama3 Streaming Chat Sample☆22Apr 24, 2024Updated 2 years ago
- Running LLaMA 3 with Rust.☆10May 21, 2024Updated last year
- Spring 2024 - Data Science and Machine Learning in Chemical Engineering☆12Feb 14, 2024Updated 2 years ago
- ☆15Mar 30, 2024Updated 2 years ago
- c++实现的clip推理,模型有一点点改动,但是 不大,改动和导出模型的代码可以在readme里找到,模型文件都在Releases里,包括AX650的模型。新增支持ChineseCLIP☆31Jun 19, 2025Updated 10 months ago