FranxYao / Retrieval-Head-with-Flash-Attention
Efficient retrieval head analysis with triton flash attention that supports topK probability
☆12Updated 10 months ago
Alternatives and similar repositories for Retrieval-Head-with-Flash-Attention
Users that are interested in Retrieval-Head-with-Flash-Attention are comparing it to the libraries listed below
Sorting:
- Towards Systematic Measurement for Long Text Quality☆34Updated 8 months ago
- ☆11Updated 10 months ago
- ☆29Updated 4 months ago
- Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planning☆36Updated last year
- Complexity Based Prompting for Multi-Step Reasoning☆17Updated 2 years ago
- ☆16Updated 2 months ago
- Official implementation of the paper "From Complex to Simple: Enhancing Multi-Constraint Complex Instruction Following Ability of Large L…☆48Updated 10 months ago
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆30Updated last year
- Revisiting Mid-training in the Era of RL Scaling☆37Updated 2 weeks ago
- ☆50Updated last year
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆61Updated 10 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆47Updated 4 months ago
- The source code of "Merging Experts into One: Improving Computational Efficiency of Mixture of Experts (EMNLP 2023)":☆37Updated last year
- Official repository for MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models [NeurIPS 2024]☆66Updated 5 months ago
- ☆14Updated last year
- AbstainQA, ACL 2024☆25Updated 7 months ago
- Code for our EMNLP-2023 paper: "Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks"☆24Updated last year
- ☆35Updated last year
- "FiD-ICL: A Fusion-in-Decoder Approach for Efficient In-Context Learning" (ACL 2023)☆14Updated last year
- [NAACL 2025] The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language M…☆26Updated last year
- ☆14Updated last year
- Official implementation of AAAI 2025 paper "Augmenting Math Word Problems via Iterative Question Composing"(https://arxiv.org/abs/2401.09…☆20Updated 5 months ago
- Official Implementation for the paper "Integrative Decoding: Improving Factuality via Implicit Self-consistency"☆23Updated last month
- ☆31Updated last year
- One Network, Many Masks: Towards More Parameter-Efficient Transfer Learning☆39Updated last year
- ☆30Updated 8 months ago
- The code of “Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning”☆16Updated last year
- Repo for outstanding paper@ACL 2023 "Do PLMs Know and Understand Ontological Knowledge?"☆31Updated last year
- Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.☆25Updated last year