bigeagle / picoGPT
☆37Updated last year
Alternatives and similar repositories for picoGPT:
Users that are interested in picoGPT are comparing it to the libraries listed below
- Efficient inference of large language models.☆144Updated this week
- This is a demo how to write a high performance convolution run on apple silicon☆52Updated 2 years ago
- my dotfiles..☆59Updated this week
- ☆11Updated 3 years ago
- Programming exercises for kids (no prior programming experience required)☆14Updated 4 months ago
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 3 years ago
- ONNX Command-Line Toolbox☆35Updated last month
- Summary of system papers/frameworks/codes/tools on training or serving large model☆56Updated 11 months ago
- A library for syntactically rewriting Python programs, pronounced (sinner).☆70Updated 2 years ago
- Code release for book "Efficient Training in PyTorch"☆24Updated last month
- Make triton easier☆41Updated 5 months ago
- GPTQ inference TVM kernel☆36Updated 7 months ago
- PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.☆10Updated 2 years ago
- TVMScript kernel for deformable attention☆24Updated 2 years ago
- ☆35Updated this week
- Standalone Flash Attention v2 kernel without libtorch dependency☆98Updated 2 months ago
- ☆18Updated last month
- An easy way to run, test, benchmark and tune OpenCL kernel files☆23Updated last year
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.☆157Updated last week
- What are learned in tiktoken?☆67Updated 6 months ago
- Implement Flash Attention using Cute.☆39Updated this week
- ☆123Updated 11 months ago
- FP64 equivalent GEMM via Int8 Tensor Cores using the Ozaki scheme☆46Updated 2 months ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆49Updated 4 months ago
- cpp syntactic sugar☆8Updated 2 months ago
- ☆36Updated 2 weeks ago
- My tests and experiments with some popular dl frameworks.☆11Updated last month
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆90Updated 4 months ago