NonvolatileMemory / flash_attn_gqaLinks
triton ver of gqa flash attn, based on the tutorial
☆11Updated 10 months ago
Alternatives and similar repositories for flash_attn_gqa
Users that are interested in flash_attn_gqa are comparing it to the libraries listed below
Sorting:
- ☆20Updated last year
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆30Updated this week
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆50Updated 2 weeks ago
- The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free☆44Updated last month
- Official implementation of AAAI 2025 paper "Augmenting Math Word Problems via Iterative Question Composing"(https://arxiv.org/abs/2401.09…☆20Updated 6 months ago
- ☆35Updated last year
- Codebase for Instruction Following without Instruction Tuning☆34Updated 9 months ago
- ☆14Updated 2 years ago
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆22Updated 10 months ago
- Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span …☆13Updated last year
- Code for "RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing"☆22Updated 3 months ago
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated last year
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated last year
- ☆19Updated 2 years ago
- LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆29Updated last year
- A probabilitic model for contextual word representation. Accepted to ACL2023 Findings.☆23Updated last year
- Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"☆47Updated 2 years ago
- Official code repo for paper "Great Memory, Shallow Reasoning: Limits of kNN-LMs"☆23Updated last month
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆24Updated last year
- Align, a general text alignment function☆15Updated last year
- ☆19Updated last month
- MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following☆16Updated 7 months ago
- Long Context Extension and Generalization in LLMs☆57Updated 9 months ago
- ☆46Updated this week
- Use the tokenizer in parallel to achieve superior acceleration☆16Updated last year
- ☆55Updated 11 months ago
- ☆12Updated last year
- ☆20Updated 3 weeks ago
- The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling☆32Updated 3 months ago
- ☆30Updated 5 months ago