☆3,295Mar 17, 2026Updated 2 months ago
Alternatives and similar repositories for Attention-Residuals
Users that are interested in Attention-Residuals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2026 Oral & ICML 2026] Generative Universal Verifier as Multimodal Meta-Reasoner☆61May 29, 2026Updated last week
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year
- ☆1,400Nov 17, 2025Updated 6 months ago
- (ICLR'26 + Netflix) Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning☆51May 23, 2026Updated 2 weeks ago
- ☆24Nov 29, 2024Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆38Apr 30, 2026Updated last month
- [ICCV 2025 Highlight] "Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis“☆27May 31, 2026Updated last week
- [ICML 2026] 4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere☆160May 18, 2026Updated 3 weeks ago
- [ICLR26] Official implementation of Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling☆192Jan 26, 2026Updated 4 months ago
- Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models☆4,444Jan 14, 2026Updated 4 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆2,123Apr 3, 2025Updated last year
- Simple MoE - Day 17 of 365 Days of Repos☆19Jun 2, 2026Updated last week
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆962Updated this week
- [NeurIPS 2024] DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering☆12Oct 22, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆17Jul 24, 2025Updated 10 months ago
- THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to bui…☆80Feb 27, 2026Updated 3 months ago
- [KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations [ICML…☆177Mar 29, 2026Updated 2 months ago
- A sparse attention kernel supporting mix sparse patterns☆518Jan 18, 2026Updated 4 months ago
- Fast and memory-efficient exact attention☆24,037Updated this week
- Accelerating MoE with IO and Tile-aware Optimizations☆707May 14, 2026Updated 3 weeks ago
- ☆14Mar 7, 2025Updated last year
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆101Apr 20, 2026Updated last month
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆1,004Feb 5, 2026Updated 4 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations☆90Jan 12, 2026Updated 4 months ago
- [CVPR 2026] "E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training" official implementation.☆292May 30, 2026Updated last week
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆479Apr 16, 2026Updated last month
- Ring attention implementation with flash attention☆1,024Sep 10, 2025Updated 8 months ago
- A standalone CXL-enabled system simulator.☆21Apr 19, 2026Updated last month
- Local AI runtime for training & running small LLMs directly on Apple Neural Engine (ANE). No CoreML. No Metal. Offline, on-device fine-tu…☆97Mar 6, 2026Updated 3 months ago
- 🚀 Efficient implementations for emerging model architectures☆5,182Updated this week
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆391Jul 10, 2025Updated 10 months ago
- The official codes for Fast Monte Carlo Rendering via Multi-Resolution Sampling☆15Dec 2, 2021Updated 4 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆86Nov 4, 2025Updated 7 months ago
- Official Code for Epona: Autoregressive Diffusion World Model for Autonomous Driving (ICCV 2025)☆353Jul 22, 2025Updated 10 months ago
- Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming☆745Jun 2, 2026Updated last week
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Apr 9, 2025Updated last year
- [CVPR'2025] URWKV: Unified RWKV Model with Multi-state Perspective for Low-light Image Restoration☆37Aug 6, 2025Updated 10 months ago
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,260Aug 27, 2025Updated 9 months ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆27Feb 21, 2025Updated last year