THUDM/IndexCache

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/THUDM/IndexCache)

THUDM / IndexCache

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

☆123

Alternatives and similar repositories for IndexCache

Users that are interested in IndexCache are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NonvolatileMemory / GliDe_with_a_CaPE_ICML_24
View on GitHub
official code for GliDe with a CaPE
☆22Aug 13, 2024Updated last year
Lyun0912-wu / LongAttn
View on GitHub
LongAttn ：Selecting Long-context Training Data via Token-level Attention
☆15Jul 16, 2025Updated 11 months ago
SoftwareEnabledFlash / SEF-API
View on GitHub
SOFTWARE-ENABLED FLASH (SEF) Application Programming Interface (API)
☆23Dec 8, 2023Updated 2 years ago
Hambaobao / Marathon
View on GitHub
Marathon: A Multiple-choice Long Context Evaluation Benchmark for Large Language Models.
☆10May 16, 2024Updated 2 years ago
bytedance / AffineQuant
View on GitHub
Official implementation of the ICLR 2024 paper AffineQuant
☆30Mar 30, 2024Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
THU-KEG / Xlore2.0
View on GitHub
Xlore2.0 Code[BaiduExtractor, HudongExtractor, WikiExtractor, XloreData, XloreWeb]
☆12Apr 5, 2017Updated 9 years ago
safety-research / inverse-scaling-ttc
View on GitHub
Inverse Scaling in Test-Time Compute
☆25Dec 3, 2025Updated 7 months ago
sail-sg / SimLayerKV
View on GitHub
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆53Oct 18, 2024Updated last year
THU-KEG / Event-Level-Knowledge-Editing
View on GitHub
☆12Apr 25, 2024Updated 2 years ago
yunzhusong / NAACL2022-REFLECT
View on GitHub
Code for the paper: Improving Multi-Document Summarization through Referenced Flexible Extraction with Credit-Awareness
☆12Oct 22, 2023Updated 2 years ago
ShuyangCao / hibrids_summ
View on GitHub
Code for ACL 2022 paper "HIBRIDS: Attention with Hierarchical Biases for Structure-aware Long Document Summarization".
☆13May 24, 2022Updated 4 years ago
GaryStack / Trustworthy-Evaluation
View on GitHub
Repository of paper "Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis" (ACL 2025 Main)
☆19Jul 19, 2025Updated 11 months ago
zhangyuting725 / DPVP
View on GitHub
core for Modeling Dual Period-Varying Preferences for Takeaway Recommendation
☆12Dec 12, 2023Updated 2 years ago
VITA-Group / READ-ME
View on GitHub
[NeurIPS2024] "Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design", Ruisi Cai, Yeonju Ro, Geon-Woo …
☆16Dec 16, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
sanowl / CoRAG
View on GitHub
this is based on the paper Chain-of-Retrieval Augmented Generation
☆15Mar 29, 2025Updated last year
THU-KEG / DICE
View on GitHub
DICE: Detecting In-distribution Data Contamination with LLM's Internal State
☆12Sep 21, 2024Updated last year
pyxis-roc / ptxparser
View on GitHub
A parser for PTX 6.5
☆13Jun 19, 2023Updated 3 years ago
RainBowLuoCS / MMEvol
View on GitHub
(ACL 2025) 🔥🔥🔥Code for "Empowering Multimodal Large Language Models with Evol-Instruct"
☆21May 15, 2025Updated last year
Scotchman0 / XRDP-for-ubuntu
View on GitHub
Automatically installs and configures XFCE, XRDP and variables for a one-script setup
☆14Apr 14, 2021Updated 5 years ago
mamuyang / MIFN
View on GitHub
This is the our implementation for the paper: Exploring Mixed Information Flow for Cross-domain Sequential Recommendations
☆12Aug 17, 2020Updated 5 years ago
stone-zeng / talks
View on GitHub
A collection of my talks
☆12Jan 19, 2026Updated 5 months ago
matatonic / openedai-images
View on GitHub
An OpenAI API compatible images server to generate or manipulate images.
☆18Feb 2, 2025Updated last year
Asap7772 / fewshot-preference-optimization
View on GitHub
Few-Shot Preference Optimization (FSPO) personalizes LLMs by reframing reward modeling as a meta-learning problem, enabling rapid adaptat…
☆16Feb 27, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
THU-KEG / LongWriter-V
View on GitHub
[ACM MM25] LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models
☆24Mar 29, 2025Updated last year
thunlp / LLM-generated-text-detection
View on GitHub
☆13Nov 7, 2023Updated 2 years ago
ariasanovsky / ptx-parser
View on GitHub
☆11Jun 9, 2023Updated 3 years ago
XuandongZhao / pf-decoding
View on GitHub
[ICLR 2025] Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs
☆19Mar 20, 2025Updated last year
MexicanLemonade / LLM-Misinfo-QA
View on GitHub
This repository contains data and code used for On the Risk of Misinformation Pollution with Large Language Models (EMNLP 2023 Findings).
☆17Dec 14, 2023Updated 2 years ago
yuntian-group / interactive-training
View on GitHub
https://interactivetraining.ai/
☆18Oct 2, 2025Updated 9 months ago
llamajun / qwen.metal
View on GitHub
一个用Apple Metal实现的Llama和通义千问大模型本地推理
☆10Apr 26, 2024Updated 2 years ago
gl-ybnbxb / BoNBoN
View on GitHub
☆18Jun 3, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
kig / rdma-pipe
View on GitHub
Utility programs to pipe data across a RDMA-capable network
☆19Mar 14, 2026Updated 3 months ago
mcrl / tccl
View on GitHub
Thunder Research Group's Collective Communication Library
☆53Jul 8, 2025Updated last year
ModelTC / Outlier_Suppression_Plus
View on GitHub
Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…
☆52Oct 21, 2023Updated 2 years ago
FlashSampling / FlashSampling
View on GitHub
FlashSampling: Fast and Memory-Efficient Exact Sampling (https://huggingface.co/papers/2603.15854)
☆74Jun 15, 2026Updated 3 weeks ago
Junjie-Ye / RoTBench
View on GitHub
[EMNLP 2024] RoTBench: A Multi-Level Benchmark for Evaluating the Robustness of Large Language Models in Tool Learning
☆15May 13, 2025Updated last year
SuDIS-ZJU / llm-inference-all-in-one
View on GitHub
☆19Feb 18, 2025Updated last year
LMCache / demo
View on GitHub
☆32Apr 17, 2025Updated last year