HuyNguyen-hust / flash-attn-101Links
☆21Updated 11 months ago
Alternatives and similar repositories for flash-attn-101
Users that are interested in flash-attn-101 are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] CAMEx: Curvature-Aware Merging of Experts☆22Updated 6 months ago
- Pioneering in Vietnamese Multimodal Large Language Model☆51Updated 7 months ago
- VIT inference in triton because, why not?☆31Updated last year
- ☆71Updated last year
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆127Updated last year
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆56Updated 5 months ago
- The evaluation framework for training-free sparse attention in LLMs☆91Updated 2 months ago
- LibMoE: A LIBRARY FOR COMPREHENSIVE BENCHMARKING MIXTURE OF EXPERTS IN LARGE LANGUAGE MODELS☆40Updated 2 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆174Updated 2 months ago
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆174Updated 5 months ago
- ☆240Updated 2 months ago
- Memory-Efficient CUDA kernels for training ConvNets with PyTorch.☆42Updated 6 months ago
- Pre-training script for BART in JAX/Flax☆38Updated 3 years ago
- Code for studying the super weight in LLM☆115Updated 8 months ago
- ☆87Updated last year
- Remasking Discrete Diffusion Models with Inference-Time Scaling☆38Updated 5 months ago
- Implementation of the proposed MaskBit from Bytedance AI☆82Updated 9 months ago
- ☆80Updated 6 months ago
- Implementations of attention with the softpick function, naive and FlashAttention-2☆83Updated 4 months ago
- Recaption large (Web)Datasets with vllm and save the artifacts.☆52Updated 9 months ago
- Official repository for FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models☆23Updated last month
- Official Pytorch Implementation of Paper "DarwinLM: Evolutionary Structured Pruning of Large Language Models"☆17Updated 6 months ago
- ☆34Updated 5 months ago
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆39Updated last year
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆124Updated 2 weeks ago
- Muon fsdp 2☆42Updated 3 weeks ago
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆124Updated last year
- Mixed precision training from scratch with Tensors and CUDA☆25Updated last year
- Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".☆132Updated 3 weeks ago
- Implementation of Infini-Transformer in Pytorch☆111Updated 7 months ago