keith2018 / TinyGPT
Tiny C++11 GPT-2 inference implementation from scratch
☆57Updated 3 months ago
Alternatives and similar repositories for TinyGPT:
Users that are interested in TinyGPT are comparing it to the libraries listed below
- ☆124Updated last year
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆46Updated last year
- 使用 CUDA C++ 实现的 llama 模型推理框架☆48Updated 4 months ago
- ☆32Updated 8 months ago
- Transformer related optimization, including BERT, GPT☆17Updated last year
- qwen2 and llama3 cpp implementation☆43Updated 9 months ago
- Triton Documentation in Chinese Simplified / Triton 中文文档☆62Updated 2 months ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆108Updated 6 months ago
- ☆30Updated 6 months ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆35Updated 3 weeks ago
- Efficient inference of large language models.☆146Updated 3 months ago
- A practical way of learning Swizzle☆16Updated last month
- ☆75Updated last week
- 分层解耦的深度学习推理引擎☆72Updated last month
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆60Updated this week
- ☆23Updated last month
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆35Updated last month
- GPTQ inference TVM kernel☆38Updated 11 months ago
- 📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.☆157Updated last week
- 用C++实现一个简单的Transformer模型。 Attention Is All You Need。☆47Updated 4 years ago
- ☆19Updated 4 years ago
- Tutorials for writing high-performance GPU operators in AI frameworks.☆130Updated last year
- ☆78Updated last year
- ☆52Updated this week
- llm deploy project based onnx.☆35Updated 5 months ago
- A tiny deep learning training framework implemented from scratch in C++ that follows PyTorch's API.☆47Updated last week
- ☆10Updated 3 weeks ago
- 将MNN拆解的简易前向推理框架(for study!)☆22Updated 4 years ago
- ☆61Updated 4 months ago
- GPT2 implementation in C++ using Ort☆26Updated 4 years ago