bigeagle / picoGPT
☆37Updated last year
Alternatives and similar repositories for picoGPT:
Users that are interested in picoGPT are comparing it to the libraries listed below
- Efficient inference of large language models.☆144Updated 2 months ago
- Programming exercises for kids (no prior programming experience required)☆14Updated 7 months ago
- This is a demo how to write a high performance convolution run on apple silicon☆52Updated 3 years ago
- A library for syntactically rewriting Python programs, pronounced (sinner).☆70Updated 2 years ago
- GPTQ inference TVM kernel☆38Updated 9 months ago
- Inference framework for MoE layers based on TensorRT with Python binding☆41Updated 3 years ago
- my dotfiles..☆61Updated 2 months ago
- PSTensor provides a way to hack the memory management of tensors in TensorFlow and PyTorch by defining your own C++ Tensor Class.☆10Updated 3 years ago
- ☆22Updated 5 years ago
- ☆11Updated 3 years ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- Odysseus: Playground of LLM Sequence Parallelism☆64Updated 8 months ago
- Benchmark scripts for TVM☆73Updated 2 years ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆43Updated last week
- Standalone Flash Attention v2 kernel without libtorch dependency☆104Updated 5 months ago
- Make triton easier☆44Updated 8 months ago
- An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.☆51Updated 6 months ago
- ☆124Updated last year
- ONNX Command-Line Toolbox☆35Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆16Updated 8 months ago
- torch.compile artifacts for common deep learning models, can be used as a learning resource for torch.compile☆16Updated last year
- ☆12Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆61Updated 3 weeks ago
- A Suite for Parallel Inference of Diffusion Transformers (DiTs) on multi-GPU Clusters☆39Updated 6 months ago
- Longitudinal Evaluation of LLMs via Data Compression☆31Updated 8 months ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated 2 weeks ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆89Updated this week
- ☆21Updated last week
- An experimental ahead of time compiler for Relay.☆50Updated 4 years ago