Motsepe-Jr / AI-research-papers-pseudo-codeLinks
This is a repo covers ai research papers pseudocodes
☆15Updated 2 years ago
Alternatives and similar repositories for AI-research-papers-pseudo-code
Users that are interested in AI-research-papers-pseudo-code are comparing it to the libraries listed below
Sorting:
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆96Updated 2 years ago
- a minimal cache manager for PagedAttention, on top of llama3.☆122Updated last year
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆100Updated last year
- A minimal implementation of vllm.☆58Updated last year
- ☆121Updated last year
- Applied AI experiments and examples for PyTorch☆296Updated last month
- Easy and Efficient Quantization for Transformers☆203Updated 3 months ago
- ring-attention experiments☆152Updated 11 months ago
- ☆220Updated 7 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆146Updated last year
- Cataloging released Triton kernels.☆260Updated 2 weeks ago
- Triton implementation of Flash Attention2.0☆39Updated 2 years ago
- Explorations into some recent techniques surrounding speculative decoding☆286Updated 9 months ago
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆268Updated 2 months ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆212Updated last week
- ☆240Updated this week
- This repository contains the experimental PyTorch native float8 training UX☆224Updated last year
- Triton-based implementation of Sparse Mixture of Experts.☆240Updated last month
- Prune transformer layers☆69Updated last year
- ☆119Updated last year
- Collection of kernels written in Triton language☆155Updated 5 months ago
- This repository contains an implementation of the LLaMA 2 (Large Language Model Meta AI) model, a Generative Pretrained Transformer (GPT)…☆71Updated last year
- Fast low-bit matmul kernels in Triton☆373Updated this week
- ☆172Updated last year
- PyTorch bindings for CUTLASS grouped GEMM.☆121Updated 3 months ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆216Updated last year
- ☆17Updated 2 years ago
- Mixed precision training from scratch with Tensors and CUDA☆27Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆82Updated last year
- ☆149Updated 2 years ago