sunkx109 / llamaView external linksLinks
Inference code for LLaMA models
☆128Aug 13, 2023Updated 2 years ago
Alternatives and similar repositories for llama
Users that are interested in llama are comparing it to the libraries listed below
Sorting:
- llama 2 Inference☆43Nov 4, 2023Updated 2 years ago
- A docker image for One Student One Chip's debug exam☆10Sep 22, 2023Updated 2 years ago
- Inference deployment of the llama3☆11Apr 21, 2024Updated last year
- simplest online-softmax notebook for explain Flash Attention☆15Jan 27, 2026Updated 2 weeks ago
- Quantize yolov5 using pytorch_quantization.🚀🚀🚀☆14Oct 24, 2023Updated 2 years ago
- Tutorials for writing high-performance GPU operators in AI frameworks.☆136Aug 12, 2023Updated 2 years ago
- 🐱 ncnn int8 模型量化评估☆14Oct 10, 2022Updated 3 years ago
- ☆119Apr 2, 2025Updated 10 months ago
- ☆117Jan 10, 2026Updated last month
- 李建忠《设计模式》笔记总结☆15Jan 9, 2020Updated 6 years ago
- Separate from hardware and used to learn some NCCL mechanisms☆25Apr 19, 2024Updated last year
- A fork of Xiangshan for AI☆36Feb 6, 2026Updated last week
- ☆79May 16, 2023Updated 2 years ago
- Linux on RISC-V on FPGA (LOROF): RV64GC Sv39 Quad-Core Superscalar Out-of-Order Virtual Memory CPU☆15Updated this week
- This repository contains all (Python 3) code and libraries required for the 2022-2023 Notre Dame Rocketry Team (NDRT) Apogee Control Syst…☆10Apr 30, 2023Updated 2 years ago
- ☆141Apr 23, 2024Updated last year
- hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.☆49Jun 15, 2023Updated 2 years ago
- Build mini linux for your own RISC-V emulator!☆24Sep 11, 2024Updated last year
- An easy-to-use package for implementing SmoothQuant for LLMs☆110Apr 7, 2025Updated 10 months ago
- 模型压缩的小白入门教程☆22Jul 7, 2024Updated last year
- A parametric RTL code generator of an efficient integer MxM Systolic Array implementation for Xilinx FPGAs.☆31Aug 28, 2025Updated 5 months ago
- 大模型部署实战:TensorRT-LLM, Triton Inference Server, vLLM☆27Feb 26, 2024Updated last year
- Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM☆75Aug 12, 2025Updated 6 months ago
- 使用单个24G显卡,从0开始训练LLM☆56Jul 9, 2025Updated 7 months ago
- some hpc project for learning☆26Aug 28, 2024Updated last year
- how to learn PyTorch and OneFlow☆482Mar 22, 2024Updated last year
- ☆27May 27, 2024Updated last year
- CUDA 6大并行计算模式 代码与笔记☆61Jul 30, 2020Updated 5 years ago
- ☆12Aug 12, 2022Updated 3 years ago
- ☆1,047Mar 13, 2024Updated last year
- Library for modelling performance costs of different Neural Network workloads on NPU devices☆34Updated this week
- Cycle-accurate C++ & SystemC simulator for the RISC-V GPGPU Ventus☆31Dec 24, 2025Updated last month
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆78Aug 12, 2024Updated last year
- 🤖FFPA: Extend FlashAttention-2 with Split-D, ~O(1) SRAM complexity for large headdim, 1.8x~3x↑🎉 vs SDPA EA.☆250Feb 5, 2026Updated last week
- code reading for tvm☆76Jan 20, 2022Updated 4 years ago
- An exact algorithm for the maximum clique problem (MCP) which improves over state-of-the-art approaches in some cases by orders of magnit…☆14Nov 15, 2025Updated 2 months ago
- ☆175May 7, 2025Updated 9 months ago
- A light llama-like llm inference framework based on the triton kernel.☆171Jan 5, 2026Updated last month
- QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.☆154Aug 21, 2025Updated 5 months ago