bojone / softtopk
differentiable top-k operator
☆21Updated 2 months ago
Alternatives and similar repositories for softtopk:
Users that are interested in softtopk are comparing it to the libraries listed below
- Triton implement of bi-directional (non-causal) linear attention☆44Updated last month
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- ☆30Updated 10 months ago
- ☆17Updated 2 months ago
- Keras implement of Finite Scalar Quantization☆71Updated last year
- Does VLM Classification Benefit from LLM Description Semantics? (AAAI 2025)☆16Updated 2 months ago
- Benchmarking Attention Mechanism in Vision Transformers.☆17Updated 2 years ago
- GIFT: Generative Interpretable Fine-Tuning☆20Updated 5 months ago
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆76Updated last week
- A torch-based implementation of K-Means and K-Means++☆17Updated 4 years ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.☆34Updated 9 months ago
- Mixture of Attention Heads☆43Updated 2 years ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆46Updated last year
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)☆18Updated last year
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆36Updated last year
- Self Reproduction Code of Paper "Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (MIT CSAIL)☆12Updated 10 months ago
- Code implementation for paper "On the Efficacy of Small Self-Supervised Contrastive Models without Distillation Signals".☆16Updated 3 years ago
- ☆22Updated last year
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆82Updated 2 years ago
- Paper List for In-context Learning 🌷☆20Updated 2 years ago
- An official pytorch implementation of AAAI 2024 paper "Latent Space Editing in Transformer-based Flow Matching"☆36Updated 11 months ago
- The official repository of paper "ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection" (N…☆50Updated last year
- The repository for our paper: Neighboring Perturbations of Knowledge Editing on Large Language Models☆16Updated 10 months ago
- Codebase for the paper-Elucidating the design space of language models for image generation☆45Updated 4 months ago
- Document the demo and a series of documents for learning the diffusion model.☆39Updated last year
- A small framework mimics PyTorch using CuPy or NumPy☆27Updated 3 years ago
- ☆102Updated last year
- The official repo of continuous speculative decoding☆25Updated 4 months ago
- ☆23Updated 5 months ago
- Efficient Mixture of Experts for LLM Paper List☆47Updated 3 months ago