☆3,225Mar 17, 2026Updated last month
Alternatives and similar repositories for Attention-Residuals
Users that are interested in Attention-Residuals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR 2026 Oral] Generative Universal Verifier as Multimodal Meta-Reasoner☆57Nov 14, 2025Updated 5 months ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year
- (ICLR'26 + Netflix) Rank-GRPO: Training LLM-based Conversational Recommender Systems with Reinforcement Learning☆43Nov 17, 2025Updated 5 months ago
- [CVPR 2025 Highlight] FIMA-Q: Post-Training Quantization for Vision Transformers by Fisher Information Matrix Approximation☆28Jun 16, 2025Updated 10 months ago
- 4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere☆119Apr 16, 2026Updated 2 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆37Dec 5, 2025Updated 4 months ago
- [ICCV 2025 Highlight] "Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis“☆27Mar 23, 2026Updated last month
- ☆1,377Nov 17, 2025Updated 5 months ago
- Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models☆4,332Jan 14, 2026Updated 3 months ago
- MoBA: Mixture of Block Attention for Long-Context LLMs☆2,100Apr 3, 2025Updated last year
- ☆14Jun 16, 2023Updated 2 years ago
- Checkpoint-engine is a simple middleware to update model weights in LLM inference engines☆945Feb 28, 2026Updated 2 months ago
- [KernelGYM & Dr. Kernel] A distributed GPU environment and a collection of RL training methods to support RL for Kernel Generations☆160Mar 29, 2026Updated last month
- Official implementation of "From Forecasting to Planning: Policy World Model for Collaborative State-Action Prediction"☆65Nov 25, 2025Updated 5 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [NeurIPS 2024] DN-4DGS: Denoised Deformable Network with Temporal-Spatial Aggregation for Dynamic Scene Rendering☆12Oct 22, 2024Updated last year
- ☆17Jul 24, 2025Updated 9 months ago
- THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to bui…☆73Feb 27, 2026Updated 2 months ago
- code for the paper titled "Adaptive Cross-Layer Attention for Image Restoration"☆14Nov 6, 2025Updated 5 months ago
- A sparse attention kernel supporting mix sparse patterns☆503Jan 18, 2026Updated 3 months ago
- Accelerating MoE with IO and Tile-aware Optimizations☆661Apr 22, 2026Updated last week
- mHC-lite: You Don’t Need 20 Sinkhorn-Knopp Iterations☆80Jan 12, 2026Updated 3 months ago
- ☆14Mar 7, 2025Updated last year
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆94Apr 20, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Fast and memory-efficient exact attention☆23,563Updated this week
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆989Feb 5, 2026Updated 2 months ago
- [CVPR 2026] "E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training" official implementation.☆281Feb 24, 2026Updated 2 months ago
- SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models☆223Updated this week
- Ring attention implementation with flash attention☆1,014Sep 10, 2025Updated 7 months ago
- A standalone CXL-enabled system simulator.☆21Apr 19, 2026Updated last week
- 🚀 Efficient implementations for emerging model architectures☆4,999Updated this week
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference☆381Jul 10, 2025Updated 9 months ago
- Official Code for Epona: Autoregressive Diffusion World Model for Autonomous Driving (ICCV 2025)☆338Jul 22, 2025Updated 9 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- The official codes for Fast Monte Carlo Rendering via Multi-Resolution Sampling☆15Dec 2, 2021Updated 4 years ago
- ☆82Nov 4, 2025Updated 5 months ago
- Helpful kernel tutorials, examples and SKILLs for tile-based GPU programming☆707Updated this week
- A CLI for managing AI skill packages☆27Jan 18, 2026Updated 3 months ago
- Moatless Testbeds allows you to create isolated testbed environments in a Kubernetes cluster where you can apply code changes through git…☆14Apr 9, 2025Updated last year
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,246Aug 27, 2025Updated 8 months ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆26Feb 21, 2025Updated last year