ironartisan / awesome-compression1Links

模型压缩的小白入门教程

☆22

Alternatives and similar repositories for awesome-compression1

Users that are interested in awesome-compression1 are comparing it to the libraries listed below

Sorting:

TRT2022 / trtllm-llama
☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
☆50Updated last year
FreedomIntelligence / FastLLM
Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
☆40Updated last year
SmartFlowAI / LLM101n-CN
LLM101n: Let's build a Storyteller 中文版
☆132Updated 11 months ago
ArtificialZeng / llama3_explained
the newest version of llama3，source code explained line by line using Chinese
☆22Updated last year
Rayrtfr / FasterTransformer
Transformer related optimization, including BERT, GPT
☆17Updated 2 years ago
SmartFlowAI / Hand-on-RAG
顾名思义：手搓的RAG
☆125Updated last year
Oldpan / DeployIsAllYouNeed
☆121Updated 2 years ago
chenweiphd / DeepSeek-MoE-ResourceMap
☆135Updated 5 months ago
GuoYiFantastic / IMelodist
Music large model based on InternLM2-chat.
☆22Updated 7 months ago
DataXujing / TensorRT-LLM-ChatGLM3
大模型部署实战：TensorRT-LLM, Triton Inference Server, vLLM
☆26Updated last year
TRT2022 / ControlNet_TensorRT
天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛初赛第三名方案
☆49Updated last year
KMnO4-zx / tiny-llm
☆23Updated last month
ArtificialZeng / baichuan-speedup
纯c++的全平台llm加速库，支持python调用，支持baichuan, glm, llama, moss基座，手机端流畅运行chatglm-6B级模型单卡可达10000+token / s，
☆45Updated last year
zms1999 / SmartMoE
A MoE impl for PyTorch, [ATC'23] SmartMoE
☆66Updated 2 years ago
morsoli / llmbenchmark
大模型API性能指标比较 - 深入分析TTFT、TPS等关键指标
☆18Updated 10 months ago
SkyworkAI / Skywork-MoE
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models
☆136Updated last year
chaoswork / llm_illustrated
看图学大模型
☆316Updated last year
yanqiangmiffy / tree2retriever
Recursive Abstractive Processing for Tree-Organized Retrieval
☆10Updated last year
openmlsys / openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
☆130Updated last year
sophgo / ChatGLM2-TPU
run ChatGLM2-6B in BM1684X
☆49Updated last year
zai-org / GLM-Edge
GLM Series Edge Models
☆146Updated last month
ArtificialZeng / transformers-Explained
官方transformers源码解析。AI大模型时代，pytorch、transformer是新操作系统，其他都是运行在其上面的软件。
☆17Updated last year
RapidAI / Open-Llama
The complete training code of the open-source high-performance Llama model, including the full process from pre-training to RLHF.
☆69Updated 2 years ago
shreyansh26 / FlashAttention-PyTorch
Implementation of FlashAttention in PyTorch
☆159Updated 6 months ago
dhcode-cpp / easy-dualpipe
Pipeline-Parallel Lecture: Simplest Dualpipe Implementation.
☆25Updated last month
K024 / chatglm-q
Another ChatGLM2 implementation for GPTQ quantization
☆54Updated last year
zzlgreat / smart_agent
☆105Updated last year
dhcode-cpp / online-softmax
simplest online-softmax notebook for explain Flash Attention
☆13Updated 7 months ago
jiahe7ay / infini-mini-transformer
This is a personal reimplementation of Google's Infini-transformer, utilizing a small 2b model. The project includes both model and train…
☆58Updated last year
liguodongiot / unify-easy-llm
unify-easy-llm（ULM）旨在打造一个简易的一键式大模型训练工具，支持Nvidia GPU、Ascend NPU等不同硬件以及常用的大模型。
☆56Updated last year