apple / ml-sigmoid-attentionView external linksLinks
☆307Apr 23, 2025Updated 9 months ago
Alternatives and similar repositories for ml-sigmoid-attention
Users that are interested in ml-sigmoid-attention are comparing it to the libraries listed below
Sorting:
- Using FlexAttention to compute attention with different masking patterns☆47Sep 22, 2024Updated last year
- The AdEMAMix Optimizer: Better, Faster, Older.☆186Sep 12, 2024Updated last year
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Oct 17, 2023Updated 2 years ago
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆109Oct 11, 2025Updated 4 months ago
- ☆16Dec 19, 2024Updated last year
- Source-to-Source Debuggable Derivatives in Pure Python☆15Jan 23, 2024Updated 2 years ago
- Experiment of using Tangent to autodiff triton☆82Jan 22, 2024Updated 2 years ago
- ☆14Mar 20, 2025Updated 10 months ago
- ☆91Aug 18, 2024Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆248Jun 6, 2025Updated 8 months ago
- [NeurIPS 2024] Goldfish Loss: Mitigating Memorization in Generative LLMs☆94Nov 17, 2024Updated last year
- Triton Implementation of HyperAttention Algorithm☆48Dec 11, 2023Updated 2 years ago
- ☆19Dec 4, 2025Updated 2 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆157Apr 7, 2025Updated 10 months ago
- ☆58Jul 9, 2024Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- ☆53May 20, 2024Updated last year
- ☆35Feb 26, 2024Updated last year
- ☆17Aug 1, 2025Updated 6 months ago
- Helpful tools and examples for working with flex-attention☆1,127Updated this week
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆89Oct 30, 2024Updated last year
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆201Jul 17, 2024Updated last year
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆163Apr 13, 2025Updated 10 months ago
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,379Updated this week
- Long Context Extension and Generalization in LLMs☆62Sep 21, 2024Updated last year
- A State-Space Model with Rational Transfer Function Representation.☆83May 17, 2024Updated last year
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆63Oct 3, 2025Updated 4 months ago
- Repo for "Monarch Mixer: A Simple Sub-Quadratic GEMM-Based Architecture"☆562Dec 28, 2024Updated last year
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024☆357Feb 5, 2026Updated last week
- Code for BLT research paper☆2,028Nov 3, 2025Updated 3 months ago
- Official code for the paper "Attention as a Hypernetwork"☆47Jun 22, 2024Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆132Dec 3, 2024Updated last year
- Schedule-Free Optimization in PyTorch☆2,256May 21, 2025Updated 8 months ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆596Aug 12, 2025Updated 6 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆265Oct 3, 2025Updated 4 months ago
- Implementation of NAACL 2024 Outstanding Paper "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆152Mar 13, 2025Updated 11 months ago
- ☆220Jan 23, 2025Updated last year
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆75Aug 2, 2024Updated last year
- Unofficial implementation of ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech☆19Feb 9, 2025Updated last year