Efficient retrieval head analysis with triton flash attention that supports topK probability
☆13Jun 15, 2024Updated last year
Alternatives and similar repositories for Retrieval-Head-with-Flash-Attention
Users that are interested in Retrieval-Head-with-Flash-Attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- This repo is to demo the concept of lossless compression with Transformers as encoder and decoder.☆14May 2, 2024Updated last year
- ☆11Jun 15, 2019Updated 6 years ago
- ☆14Nov 29, 2023Updated 2 years ago
- [NAACL 2025] Source code for MMEvalPro, a more trustworthy and efficient benchmark for evaluating LMMs☆25Sep 26, 2024Updated last year
- [ICLR2025] Code and data for paper: Not All Heads Matter: A Head-Level KV Cache Compression Method with Integrated Retrieval and Reasonin…☆40Mar 10, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism☆30Jul 17, 2024Updated last year
- The code for the paper *The Sensitivity of Counterfactual Fairness to Unmeasured Confounding* @ UAI 2019☆14Apr 4, 2020Updated 5 years ago
- ACL24☆11Jun 7, 2024Updated last year
- ☆12Jun 30, 2024Updated last year
- Official implementation of the paper: "A deeper look at depth pruning of LLMs"☆15Jul 24, 2024Updated last year
- Dataset and baseline for Coling 2022 long paper (oral): "ConFiguRe: Exploring Discourse-level Chinese Figures of Speech"☆13Jul 27, 2023Updated 2 years ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 8 months ago
- ☆47Nov 25, 2024Updated last year
- simple implementation of Expected Gradients and Integrated Gradients by pytorch☆12May 11, 2022Updated 3 years ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- [EMNLP'24] LongHeads: Multi-Head Attention is Secretly a Long Context Processor☆31Apr 8, 2024Updated last year
- AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation (EMNLP 2024 Findings)☆16Dec 30, 2024Updated last year
- Rationale-enhanced language models are better continual relation learners (EMNLP 2023 Main Conference)☆12Oct 11, 2023Updated 2 years ago
- ☆19May 3, 2025Updated 10 months ago
- fork of karparthy's nanogpt with custom datasets☆10Jul 25, 2023Updated 2 years ago
- Code and data for Distributional Correlation–Aware Knowledge Distillation for Stock Trading Volume Prediction (ECML-PKDD 22)☆15Sep 6, 2022Updated 3 years ago
- AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence☆10Mar 2, 2025Updated last year
- References for Papers at the Intersection of Causality and Fairness☆18Dec 3, 2018Updated 7 years ago
- Enhances Overleaf by allowing article searches and BibTeX retrieval from DBLP and Google Scholar | 通过允许从 DBLP 和 Google Scholar 进行文章搜索和获取 …☆46Apr 14, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A command-line font manager☆18Sep 29, 2016Updated 9 years ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Dec 19, 2024Updated last year
- Source code of “Reinforcement Learning with Token-level Feedback for Controllable Text Generation (NAACL 2024)☆17Dec 8, 2024Updated last year
- [NeurIPS25] RULE: Reinforcement UnLEarning Achieves Forge-retain Pareto Optimality☆20Oct 22, 2025Updated 5 months ago
- codes for "Self-Checker: Plug-and-Play Modules for Fact-Checking with Large Language Models"☆12Feb 10, 2025Updated last year
- [ICML'25] MELON: Provable Defense Against Indirect Prompt Injection Attacks in AI Agents☆24Jul 31, 2025Updated 7 months ago
- ☆12Oct 10, 2021Updated 4 years ago
- ☆17Mar 21, 2026Updated last week
- Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)☆19Jul 19, 2025Updated 8 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- ☆15Apr 22, 2024Updated last year
- Variational Graph Convolutional Networks☆23Oct 16, 2020Updated 5 years ago
- [Findings of EMNLP22] From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models☆19Mar 16, 2023Updated 3 years ago
- PyTorch implementation of StableMask (ICML'24)☆15Jun 27, 2024Updated last year
- ☆14Dec 25, 2024Updated last year
- [ICLR 2025] No Preference Left Behind: Group Distributional Preference Optimization☆15Apr 21, 2025Updated 11 months ago
- Single Cell Pretrained Regulatory network INference from Transcripts☆11Sep 17, 2024Updated last year