Motsepe-Jr / AI-research-papers-pseudo-codeLinks
This is a repo covers ai research papers pseudocodes
☆14Updated last year
Alternatives and similar repositories for AI-research-papers-pseudo-code
Users that are interested in AI-research-papers-pseudo-code are comparing it to the libraries listed below
Sorting:
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆95Updated last year
- Mixed precision training from scratch with Tensors and CUDA☆24Updated last year
- ☆45Updated last year
- ☆108Updated last year
- Easy and Efficient Quantization for Transformers☆199Updated 4 months ago
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆92Updated last year
- Simple and efficient pytorch-native transformer training and inference (batched)☆75Updated last year
- ☆93Updated last week
- ☆129Updated 3 months ago
- ☆158Updated last year
- Vocabulary Parallelism☆19Updated 3 months ago
- A bunch of kernels that might make stuff slower 😉☆48Updated this week
- ring-attention experiments☆145Updated 7 months ago
- Prune transformer layers☆69Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 6 months ago
- Applied AI experiments and examples for PyTorch☆274Updated last week
- ☆252Updated last year
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆118Updated last year
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆134Updated 10 months ago
- ☆119Updated last year
- Ouroboros: Speculative Decoding with Large Model Enhanced Drafting (EMNLP 2024 main)☆106Updated 2 months ago
- [NeurIPS 2024] The official implementation of "Kangaroo: Lossless Self-Speculative Decoding for Accelerating LLMs via Double Early Exitin…☆57Updated 11 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆70Updated 11 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆70Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆223Updated 10 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆94Updated last week
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆163Updated 10 months ago
- Learn CUDA with PyTorch☆25Updated this week
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆96Updated last year
- ☆145Updated 2 years ago