Self Reproduction Code of Paper "Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (MIT CSAIL)
☆17May 24, 2024Updated 2 years ago
Alternatives and similar repositories for Cross-Layer-Attention
Users that are interested in Cross-Layer-Attention are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- Official implementation of the WACV 2025 paper "3D Part Segmentation via Geometric Aggregation of 2D Visual Features"☆25Jun 8, 2025Updated last year
- ☆14Nov 18, 2025Updated 7 months ago
- “阿里灵杰”问天引擎电商搜索算法赛 13/2771☆10Jul 31, 2022Updated 3 years ago
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆16Feb 4, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Generative Modeling via Drifting in MLX☆43Feb 6, 2026Updated 4 months ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆20Mar 28, 2024Updated 2 years ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆74Jul 30, 2024Updated last year
- 🤔 When in Doubt: Improving Classification Performance with Alternating Normalization [Findings of EMNLP2021]☆15Oct 29, 2021Updated 4 years ago
- RBF Drivers for Blender☆10Oct 14, 2022Updated 3 years ago
- Minimal implementation of TokenFormer for inference and learning☆13Nov 6, 2024Updated last year
- ☆11Sep 18, 2020Updated 5 years ago
- 回声Echo:AI文案助手☆10May 6, 2023Updated 3 years ago
- Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020☆30Nov 22, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Multi-span Style Extraction for Generative Reading Comprehension☆10Apr 2, 2021Updated 5 years ago
- [CVPR 2025] Offical implementation of the paper "Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters The…☆32Mar 12, 2026Updated 3 months ago
- LLaDA implementation☆19Jul 24, 2025Updated 10 months ago
- ChatTTS is a generative speech model for daily dialogue.☆14Oct 21, 2024Updated last year
- Contains the implementation of HyperFace: A deep multi task learning framework for facial recognition, landmark detection, pose and gende…☆14Apr 3, 2019Updated 7 years ago
- pytorch implementation of mvp: a multi-stage vision-language pre-training framework☆11Apr 23, 2022Updated 4 years ago
- implement of NoProp-CT☆28May 2, 2025Updated last year
- ☆19May 11, 2024Updated 2 years ago
- [ECCV 2024] Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression☆53Sep 21, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Paper reading: Jamba — Hybrid Transformer-Mamba LM (SSM → S4 → S6 → Jamba)☆15May 22, 2024Updated 2 years ago
- ☆20Nov 5, 2024Updated last year
- [TMLR 2022] Geometric Flow Network for 3D Point Cloud Semantic Segmentation☆42Jan 10, 2023Updated 3 years ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- Converting Mixtral-8x7B to Mixtral-[1~7]x7B☆22Mar 4, 2024Updated 2 years ago
- ☆16Oct 13, 2020Updated 5 years ago
- Tensorflow implementation of ICLR2019 paper "Exemplar Guided Unsupervised Image-to-Image Translation with Semantic Consistency"☆28Jul 4, 2020Updated 5 years ago
- NLP 相关岗位 笔试面试资源汇总☆16Jun 17, 2021Updated 5 years ago
- ☆13Jul 30, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- The OBMO module embedded in PatchNet☆10Feb 21, 2024Updated 2 years ago
- [TNNLS 2022] Official pytorch implementation of "Tackling the Challenges in Scene Graph Generation with Local-to-Global Interactions"☆11Apr 19, 2022Updated 4 years ago
- ☆14Sep 22, 2025Updated 8 months ago
- OSWorld-Human: Benchmarking the Efficiency of Computer-Use Agents☆26May 17, 2026Updated last month
- Leveraging Ontological Schema Information in Embedding Models for Knowledge Graphs☆14Jun 16, 2015Updated 11 years ago
- Fast and memory-efficient exact attention ported to rocm☆14Dec 1, 2023Updated 2 years ago
- ☆10Apr 6, 2026Updated 2 months ago