Motsepe-Jr / AI-research-papers-pseudo-code
This is a repo covers ai research papers pseudocodes
☆14Updated last year
Alternatives and similar repositories for AI-research-papers-pseudo-code
Users that are interested in AI-research-papers-pseudo-code are comparing it to the libraries listed below
Sorting:
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆95Updated last year
- ring-attention experiments☆142Updated 7 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆75Updated last year
- ☆117Updated last year
- A bunch of kernels that might make stuff slower 😉☆40Updated this week
- 📑 Dive into Big Model Training☆111Updated 2 years ago
- This repository contains the experimental PyTorch native float8 training UX☆224Updated 9 months ago
- Applied AI experiments and examples for PyTorch☆267Updated this week
- PyTorch bindings for CUTLASS grouped GEMM.☆89Updated 2 weeks ago
- Collection of kernels written in Triton language☆122Updated last month
- Boosting 4-bit inference kernels with 2:4 Sparsity☆73Updated 8 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 5 months ago
- Cataloging released Triton kernels.☆221Updated 4 months ago
- A minimal implementation of vllm.☆40Updated 9 months ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆70Updated 11 months ago
- Code and Configs for Asynchronous RLHF: Faster and More Efficient RL for Language Models☆51Updated 3 weeks ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training☆209Updated 8 months ago
- [NeurIPS'23] Speculative Decoding with Big Little Decoder☆92Updated last year
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆65Updated 5 months ago
- ☆106Updated 11 months ago
- Best practices for testing advanced Mixtral, DeepSeek, and Qwen series MoE models using Megatron Core MoE.☆10Updated 3 weeks ago
- ☆104Updated 8 months ago
- Vocabulary Parallelism☆19Updated 2 months ago
- Mixed precision training from scratch with Tensors and CUDA☆22Updated last year
- ☆204Updated 3 weeks ago
- Latency and Memory Analysis of Transformer Models for Training and Inference☆411Updated 3 weeks ago
- Odysseus: Playground of LLM Sequence Parallelism☆69Updated 11 months ago
- Easy and Efficient Quantization for Transformers☆197Updated 3 months ago
- PyTorch bindings for CUTLASS grouped GEMM.☆121Updated 4 months ago
- ☆155Updated last year