Motsepe-Jr / AI-research-papers-pseudo-codeLinks
This is a repo covers ai research papers pseudocodes
☆17Updated 2 years ago
Alternatives and similar repositories for AI-research-papers-pseudo-code
Users that are interested in AI-research-papers-pseudo-code are comparing it to the libraries listed below
Sorting:
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆99Updated 2 years ago
- a minimal cache manager for PagedAttention, on top of llama3.☆135Updated last year
- Explorations into some recent techniques surrounding speculative decoding☆299Updated last year
- Easy and Efficient Quantization for Transformers☆205Updated 2 weeks ago
- A minimal implementation of vllm.☆67Updated last year
- ☆232Updated 2 months ago
- ☆125Updated last year
- Cataloging released Triton kernels.☆292Updated 5 months ago
- Applied AI experiments and examples for PyTorch☆315Updated 5 months ago
- ☆17Updated 2 years ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆219Updated last week
- ring-attention experiments☆165Updated last year
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆280Updated 2 months ago
- ☆124Updated last year
- Triton implementation of Flash Attention2.0☆49Updated 2 years ago
- ☆177Updated 2 years ago
- PyTorch bindings for CUTLASS grouped GEMM.☆143Updated 8 months ago
- Vocabulary Parallelism☆25Updated 11 months ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆222Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsity☆93Updated last year
- ☆131Updated 8 months ago
- Prune transformer layers☆74Updated last year
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆106Updated last year
- Mixed precision training from scratch with Tensors and CUDA☆28Updated last year
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆148Updated last year
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆68Updated 9 months ago
- Fast low-bit matmul kernels in Triton☆427Updated last week
- ☆286Updated last week
- Collection of kernels written in Triton language☆178Updated 2 weeks ago
- Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*☆86Updated 2 years ago