NonvolatileMemory / flash_attn_gqa
triton ver of gqa flash attn, based on the tutorial
☆11Updated 7 months ago
Alternatives and similar repositories for flash_attn_gqa:
Users that are interested in flash_attn_gqa are comparing it to the libraries listed below
- ☆14Updated 2 years ago
- This repository is the official implementation of our EMNLP 2022 paper ELMER: A Non-Autoregressive Pre-trained Language Model for Efficie…☆26Updated 2 years ago
- Implementation of the model: "Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models" in PyTorch☆30Updated last week
- [NeurIPS 2024] Fast Best-of-N Decoding via Speculative Rejection☆40Updated 5 months ago
- ☆18Updated 10 months ago
- ☆15Updated last year
- ☆19Updated 2 years ago
- ☆34Updated last year
- Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"☆44Updated this week
- Long Context Extension and Generalization in LLMs☆50Updated 6 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆36Updated 11 months ago
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆30Updated last year
- ☆20Updated last year
- Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span …☆14Updated last year
- Use the tokenizer in parallel to achieve superior acceleration☆16Updated last year
- [EMNLP 2023]Context Compression for Auto-regressive Transformers with Sentinel Tokens☆24Updated last year
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Updated last year
- Codebase for Instruction Following without Instruction Tuning☆33Updated 6 months ago
- Longitudinal Evaluation of LLMs via Data Compression☆32Updated 10 months ago
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆20Updated 7 months ago
- The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.☆44Updated 5 months ago
- Contextual Position Encoding but with some custom CUDA Kernels https://arxiv.org/abs/2405.18719☆22Updated 9 months ago
- ☆52Updated 8 months ago
- ☆22Updated last year
- Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.☆25Updated last year
- Official Implementation for the ICLR2023 paper "Fuzzy Alignments in Directed Acyclic Graph for Non-autoregressive Machine Translation"☆13Updated 2 years ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- ☆30Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆56Updated 8 months ago
- [EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations☆30Updated 2 years ago