☆3,315Mar 17, 2026Updated 3 months ago
Alternatives and similar repositories for Attention-Residuals
Users that are interested in Attention-Residuals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2026 Oral & ICML 2026] Generative Universal Verifier as Multimodal Meta-Reasoner☆62May 29, 2026Updated last month
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year
- ☆1,411Nov 17, 2025Updated 7 months ago
- (ICLR'26 + Netflix) Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning☆52May 23, 2026Updated last month
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆38Apr 30, 2026Updated last month
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- [ICCV 2025 Highlight] "Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis“☆28May 31, 2026Updated last month
- [ICML 2026] 4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere☆182May 18, 2026Updated last month
- [ICLR26] Official implementation of Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling☆200Jan 26, 2026Updated 5 months ago
- Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models☆4,474Jan 14, 2026Updated 5 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆2,129Apr 3, 2025Updated last year
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆966Jun 8, 2026Updated 3 weeks ago
- Official implementation of "From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction"☆73Updated this week
- [NeurIPS 2024] DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering☆12Oct 22, 2024Updated last year
- ☆17Jul 24, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to bui…☆82Feb 27, 2026Updated 4 months ago
- [KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations [ICML…☆185Mar 29, 2026Updated 3 months ago
- A sparse attention kernel supporting mix sparse patterns☆528Jan 18, 2026Updated 5 months ago
- Fast and memory-efficient exact attention☆24,221Jun 22, 2026Updated last week
- Accelerating MoE with IO and Tile-aware Optimizations☆720Updated this week
- ☆14Mar 7, 2025Updated last year
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆102Apr 20, 2026Updated 2 months ago
- Data and Code for COLM 2025 Paper "MSRS: Evaluating Multi-Source Retrieval-Augmented Generation"☆32Aug 29, 2025Updated 10 months ago
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆1,006Feb 5, 2026Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations☆90Jan 12, 2026Updated 5 months ago
- [CVPR 2026] "E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training" official implementation.☆296May 30, 2026Updated 3 weeks ago
- [ICLR 2026] OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆482Apr 16, 2026Updated 2 months ago
- Ring attention implementation with flash attention☆1,026Sep 10, 2025Updated 9 months ago
- ☆45Jan 30, 2026Updated 4 months ago
- A standalone CXL-enabled system simulator.☆21Apr 19, 2026Updated 2 months ago
- 🚀 Efficient implementations for emerging model architectures☆5,249Updated this week
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆397Jul 10, 2025Updated 11 months ago
- ☆86Nov 4, 2025Updated 7 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- A CLI for managing AI skill packages☆27Jan 18, 2026Updated 5 months ago
- Official Code for Epona: Autoregressive Diffusion World Model for Autonomous Driving (ICCV 2025)☆364Jul 22, 2025Updated 11 months ago
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Apr 9, 2025Updated last year
- Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming☆760Jun 23, 2026Updated last week
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,262Aug 27, 2025Updated 10 months ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆27Feb 21, 2025Updated last year
- [ICLR 2024] Efficient Streaming Language Models with Attention Sinks☆7,233Jul 11, 2024Updated last year